Oracle Solaris Cluster Generic Data Service (GDS) Guide

Exit Print View

Updated: July 2014, E48652-01
 
 

Additional ORCL.gds Extension Properties

The ORCL.gds resource type includes extension properties that affect how a resource of this type behaves. With the examples that follow, you must ensure that the resource group myrg has been created. If you need to create the resource group, use the following command:

# clresourcegroup create -p pathprefix=/opt/ORCLscgds/demo myrg

Child_mon_level Property


Note -  If you use Oracle Solaris Cluster administration commands, you can use the Child_mon_level property. If you use Agent Builder, you cannot use this property.

This property provides control over the processes that are monitored through the Process Monitor Facility (PMF). This property denotes the level up to which the forked children processes are monitored. This property works like the –C argument to the pmfadm command. See the pmfadm (1M) man page.

Omitting this property, or setting it to the default value of -1, has the same effect as omitting the –C option on the pmfadm command. The result is that all children and their descendants are monitored.

Debug_gds Property

The Debug_gds extension property is set to FALSE by default. This property is required by Oracle Solaris Cluster Development and support. It can be useful to understand the various call sequences that occur within GDSv2. If Debug_gds=FALSE is set, no GDSv2 internal debug messages are sent to the system-log. Consequently, if Debug_gds=TRUE is set, all internal debug_messages are sent to the system-log.

Perform the following steps to send debug messages to the system-log:

  1. Send all GDSv2 internal debug messages to the system-log.

    # clresource set -p debug_gds=TRUE myrs

  2. (Optional) To set Debug_gds as a per-node extension property, you can set it for one node or set different values for each node.

    # clresource set -p debug_gds=false myrs
    # clresource set -p "debug_gds{node1}"=true myrs
    # clresource show -p debug_gds myrs
    === Resources ===                              
    Resource:                                      myrs
      --- Standard and extension properties ---    
      Debug_gds{node1}:                            TRUE
      Debug_gds{node2}:                            FALSE
        Class:                                         extension
        Description:                                   Debug GDS code only
        Per-node:                                      True
        Type:                                          boolean

Debug_level Property

The Debug_level extension property is set to 0 by default. This property is part of the housekeeping KSH functions that provide trace and debug message support. To use Debug_level, your method_command script must source /opt/ORCLscgds/lib and call the debug_message() function at least once within the script. The ${DEBUG} variable can then be invoked to react to the Debug_level extension property.

The /opt/ORCLscgds/demo/demo_start script contains an example:

# . /opt/ORCLscgds/lib/gds_functions

get_opts "$@"
debug_message "Script: demo_start - Begin"
${DEBUG}

Use these guidelines to understand how Debug_level works:

  • Setting Debug_level=0 does not produce any trace output or debug messages.

  • Setting Debug_level=1 does not produce any trace output; however, reduced debug messages are written to the system-log.

  • Setting Debug_level=2 produces trace output and all debug messages are written to the system-log.


    Note -  To enable debug messages to be written to the system-log, the /etc/syslog.conf file must be amended and the SMF system-log service restarted. For example: *.err;kern.debug;daemon.debug;mail.crit /var/adm/messages.

Perform the following steps to set up trace and debug messages:

  1. Set the debug level for myrs.

    # clresource set -p Debug_level=2 myrs

    node1 - RESOURCE=myrs 
    node1 - RESOURCEGROUP=myrg 
    node1 - RESOURCETYPE=ORCL.gds:1 
    node1 - OPERATION=update 
    node1 - Debug_level=2 
    node2 - RESOURCE=myrs 
    node2 - RESOURCEGROUP=myrg 
    node2 - RESOURCETYPE=ORCL.gds:1 
    node2 - OPERATION=update 
    node2 - Debug_level=2

    Trace information is written to the console when the resource is enabled and disabled. Debug messages are written to the system-log. For example:

    Sep  4 07:28:43 node1 SC[ORCL.gds:1,myrg,myrs]: [ID 382926
         daemon.debug] debug_message - Script: demo_start - Begin
    Sep  4 07:28:43 node1 SC[ORCL.gds:1,myrg,myrs]: [ID 382926 daemon.debug]
         debug_message - Script: demo_start - hostname is lh1
    Sep  4 07:28:43 node1 SC[ORCL.gds:1,myrg,myrs]: [ID 382926 daemon.debug]
         debug_message - Script: demo_start - End (0)
  2. (Optional) To set Debug_level as a per-node extension property, you can set it for one node or set different values for each node.

    # clrs set -p "debug_level{node1}"=2 -p "debug_level{node2}"=0 myrs

    node1 - RESOURCE=myrs 
    node1 - RESOURCEGROUP=myrg 
    node1 - RESOURCETYPE=ORCL.gds:1 
    node1 - OPERATION=update 

Interpose_logical_hostname Property

The Interpose_logical_hostname extension property is empty ("") by default. This property determines if a logical hostname should be interposed whenever a system call to retrieve the hostname is made. Interposing a logical hostname provides a mechanism to return a logical hostname whenever a system call is made to retrieve the hostname. For example, when the physical node name is node1 and a hostname(1) command is issued, then node1 is returned.

However, assume you have a logical hostname, lh1, which is plumbed and available on node1. By interposing all system calls to retrieve the hostname, it is then possible to return lh1 when a hostname(1) command is issued. Interposing a logical hostname within GDSv2 requires that a value be set for the Interpose_logical_hostname property. You must also define symbolic links on each Oracle Solaris Cluster node.

Perform the following steps to define symbolic links on each cluster node so that GDSv2 can interpose the logical hostname from a secure library:

  1. For each cluster node, create a symbolic link.

    # ln -s /usr/cluster/lib/libschost.so.1 /usr/lib/secure/libschost.so.1

  2. For each AMD64 cluster node, create a symbolic link.

    # ln -s /usr/cluster/lib/amd64/libschost.so.1 /usr/lib/secure/64/libschost.so.1

  3. For each SPARC cluster node, create a symbolic link.

    # ln -s /usr/cluster/lib/sparcv9/libschost.so.1 /usr/lib/secure/64/libschost.so.1

After the Interpose_logical_hostname is set and the symbolic links are defined, the Interpose_logical_hostname value can be returned to your method_command whenever a system call is made to retrieve the hostname:

  • If PMF_managed=TRUE is set, then the Interpose_logical_hostname is automatically available to your Start_command and Probe_command.

  • If PMF_managed=FALSE is set, then the GDSv2 function interpose_logical_hostname() is available to retrieve the Interpose_logical_hostname value.

The GDSv2 function interpose_logical_hostname() can also be used by method_command entries other than the Start_command and Probe_command.

Perform the following steps to retrieve the hostname.

  1. Disable or delete the resource myrs.

    1. Disable the resource myrs.

      # clresource disable myrs

    2. Delete the resource myrs.

      # clresource delete myrs

  2. Create the resource.

    # clresource create -g myrg -t ORCL.gds \
    -p Start_command="%RG_PATHPREFIX/demo_start -R %RS_NAME -G %RG_NAME -T %RT_NAME" \
    -p Stop_command="%RG_PATHPREFIX/demo_stop -R %RS_NAME -G %RG_NAME -T %RT_NAME" \
    -p Probe_command="%RG_PATHPREFIX/demo_probe -R %RS_NAME -G %RG_NAME -T %RT_NAME" \
    -p Validate_command="%RG_PATHPREFIX/demo_validate -R %RS_NAME -G %RG_NAME \
    -T %RT_NAME" -d myrs
  3. Interpose the logical hostname value of lh1.


    Note -  Ensure that a logical hostname is plumbed and available. See the clreslogicalhostname(1CL ) man page for more information about creating a logical host.

    # clresource set -p PMF_managed=true -p interpose_logical_hostname=lh1 myrs

    If PMF_managed=TRUE is set, appropriate environment variables are set to interpose the Interpose_logical_hostname value after the resource is enabled.

  4. Enable the myrs resource.

    # clresource enable myrs

  5. Determine the environment variables.

    # pmfadm -l ""

    STATUS myrg,myrs,0.mon
    pmfadm -c myrg,myrs,0.mon -n 4 -t 2 /bin/ksh -c '/opt/ORCLscgds/bin/gds_probe -R myrs -T ORCL.gds -G myrg'
            environment:
                    LD_PRELOAD_32=/usr/lib/secure/libschost.so.1
                    LD_PRELOAD_64=/usr/lib/secure/64/libschost.so.1
                    SC_LHOSTNAME=lh1
            retries: 0
            owner: root
            monitor children: all
            pids: 4363
    STATUS myrg,myrs,0.svc
    pmfadm -c myrg,myrs,0.svc -a /usr/cluster/lib/sc/scds_pmf_action_script /bin/ksh -c 
            '/usr/cluster/bin/hatimerun -t 299 /opt/ORCLscgds/demo/demo_start -R myrs -G myrg -T ORCL.gds ;
            echo $? > /var/cluster/run/tempgna4xi'
            environment:
                    LD_PRELOAD_32=/usr/lib/secure/libschost.so.1
                    LD_PRELOAD_64=/usr/lib/secure/64/libschost.so.1
                    SC_LHOSTNAME=lh1
            retries: 0
            owner: root
            monitor children: all
            pids: 4313
    # 

If PMF_managed=FALSE is set, then the GDSv2 function interpose_logical_hostname() can be used to retrieve the Interpose_logical_hostname value.

An example of the GDSv2 function interpose_logical_hostname() is found in the /opt/ORCLscgds/demo/demo_start script. After Interpose_logical_hostname=lh1 has been set for a resource, the following standalone program can also be used to set appropriate environment variables:

# /opt/ORCLscgds/bin/gds_libschost -R myrs -G myrg -T ORCL.gds:1

LD_PRELOAD_32=/usr/lib/secure/libschost.so.1
LD_PRELOAD_64=/usr/lib/secure/64/libschost.so.1
SC_LHOSTNAME=lh1

The GDSv2 function interpose_logical_hostname() uses the standalone program previously described in the /opt/ORCLscgds/demo/demo_start script.

Num_probe_timeouts Property

The Num_probe_timeouts extension property is set to 2 by default. This property determines when a complete failure should be returned by GDSv2.

In the example for Timeout_delay, a complete failure was alluded to whenever the Probe_command suffered a timeout. In this context, if the Probe_command suffers a timeout, the GDSv2 probe counts that as a failure. With Num_probe_timeouts=2, that failure is treated as a partial failure (two Probe_command timeouts are tolerated).

However, if the Probe_command suffers two successive timeouts, then that failure is treated as a complete failure. If Num_probe_timeouts=5 is set, then five successive Probe_command timeouts must occur before a complete failure is returned by GDSv2. Likewise, if Num_probe_timeouts=1 is set, then just one Probe_command timeout causes GDSv2 to return a complete failure.

When a complete failure is returned by GDSv2, the RGM queries the Failover_mode property to determine what action to take.

PMF_managed Property

The PMF_managed extension property is set to TRUE by default.

When this property is TRUE, the GDSv2 software ensures that the application is started under the control of the PMF. Consequently, when PMF_managed=FALSE is set, GDSv2 will not start the application under the control of the PMF.

Typically, an application that is under the control of the PMF must leave at least one process running after the application has been started. However, with PMF_managed=FALSE, it is possible to have an application that does not leave behind at least one process. For example, the application could simply create a file or amend another application's configuration and subsequently end without leaving behind at least one process.


Note -  If PMF_managed=FALSE is set, then the Stop_command property is also required.

Perform the following steps to create a file for an application:


Note -  The purpose of creating a file using a GDSv2 resource is simply to show that the myrs resource can be brought online without leaving behind at least one process. This feature can be quite powerful if the myrs resource is used as a dependent resource for other resources (for example, where you want the myrs resource to do something before other dependent resources are brought online).
  1. Ensure that the file does not exist and disable or delete the GDSv2 resource myrs.

    1. Verify that the file does not exist.

      # ls -l /var/tmp/myrs
      /var/tmp/myrs: No such file or directory
    2. Disable the resource myrs.

      # clresource disable myrs
    3. Delete the resource myrs.

      # clresource delete myrs
  2. Create the resource myrs.

    # clresource create -g myrg -t ORCL.gds \
    -p Start_command="/bin/touch /var/tmp/myrs" \
    -p Stop_command="/bin/rm -f /var/tmp/myrs" \
    -p PMF_managed=false -d myrs
  3. Enable the resource myrs, check its status, and verify that the file exists.

    # clresource enable myrs

    # clresource status myrs

    === Cluster Resources ===
    Resource Name       Node Name     State        Status Message
    -------------       ---------     -----        --------------
    myrs                node1         Online       Online
                        node2         Offline      Offline

    ls -l /var/tmp/myrs

    rw-r--r--   1 root    root    0 Sept  2  04:07  /var/tmp/myrs
  4. Disable the resource and verify that the file no longer exists.

    # clresource disable myrs

    ls -l /var/tmp/myrs

    /var/tmp/myrs: No such file or directory

Probe_command Property

The Probe_command is not a required property. If set, this command must be a UNIX command with arguments that can be passed directly to a shell.

If Probe_command is set, then the GDSv2 probe will execute that command at intervals determined by the Thorough_probe_interval property and for the duration of the Probe_timeout property.

If Probe_command is not set and the default PMF_managed=TRUE is set, then an internal GDSv2 probe is used. This probe checks the application PMF tag to provide a faster application restart using PMF if all the application processes fail.

GDSv2 passes the following options and arguments to the Probe_command:

-R rs -G rg -T rt 'gds_start | gds_probe'

The /opt/ORCLscgds/lib/gds_functions file provides the helper function gds_opts() to process the options and their arguments as upper case KSH global variables. Property values are as defined.

The last argument, 'gds_start | gds_probe', is provided so that you can code different behavior within the Probe_command when the resource is being started or after the resource has been started and is now online.

See the /opt/ORCLscgds/demo/demo_probe file for an example that captures the last argument into the method variable. That variable can then be used to perform any appropriate conditional processing. Following is a snippet of code from demo_probe:

#!/usr/bin/ksh
eval typeset -r method=\$$#

The Probe_command should return one of the following exit statuses, which is then processed by the GDSv2 probe:

0

Success. The application is working correctly.

100

Complete failure. The application is not working.

201

Immediate failover.

The RGM responds to a Complete failure or Immediate failover by checking the Failover_mode property. By default, Failover_mode=SOFT is set. See the r_properties (5) man page for more information.

With Failover_mode=SOFT, if a Complete failure is returned, GDSv2 will request a restart of the resource up to a maximum of the Retry_count property value within the time specified by the Retry_interval property.

If the number of restarts exceeds the value of Retry_count within the time specified by Retry_interval, GDSv2 will request a failover of the resource's group to another node.

With Failover_mode=SOFT, if an Immediate failover is returned, GDSv2 will request an immediate failover of the resource's group to another node.

It is also possible for the Probe_command to return cumulative failures to the GDSv2 probe as follows:

<100

Cumulative failure. The application is not completely working or not completely failed.

GDSv2 can process consecutive failures within the Retry_interval. For example, if the Probe_command returns 25 on consecutive occasions within the default Retry_interval of 370 seconds, then as soon as the cumulative failure reaches 100, a complete failure is declared. GDSv2 then responds to a complete failure as described above.

Start_exit_on_error Property

The Start_exit_on_error extension property is set to FALSE by default.

When this property is FALSE, the GDSv2 software attempts to continuously start the application within the Start_timeout period if the application fails to start.

When the Start_exit_on_error property is set to TRUE, the GDSv2 software will not attempt to continually start the application within the Start_timeout period.

This can be advantageous if the application is expected to start immediately on the first attempt. Consequently, if the application fails to start on the first attempt, a Start_failed error occurs, without waiting for the Start_timeout period to expire.


Note -  The RGM reacts to a Start_failed error by checking the Failover_mode property. Consequently, if the default Failover_mode=SOFT is set, then the RGM attempts to fail over the resource group to another Oracle Solaris Cluster node.

Perform the following steps to attempt to start an application:


Note -  The Start_command string below is expected to be successful after it is executed. However, the Start_command will only work on node2. Nevertheless, the purpose of this feature is to demonstrate the behavior of the Start_exit_on_error property.
  1. Disable or delete the resource myrs.

    1. Disable the resource myrs.

      # clresource disable myrs
    2. Delete the resource myrs.

      # clresource delete myrs
  2. Set the Start_exit_on_error property.

    # clresource create -g myrg -t ORCL.gds \ 
    -p Start_command="/bin/uname -n | /bin/grep node2" -p Start_exit_on_error=TRUE \
    -p Stop_commend=/bin/true -p PMF_managed=false \
    -d myrs
  3. Enable the property.

    # clresource enable myrs

    clrs:  (C748634) Resource group myrg failed to start on chosen node and might
    fail over to other node(s)

    clresource status myrs

    === Cluster Resources ===
    Resource Name       Node Name      State        Status Message
    -------------       ---------      -----        --------------
    myrs                node1          Offline      Offline
                        node2          Online       Online

Note -  The Start_command="/bin/uname -n | /bin/grep node2" will only be successful on node2. The system-log on node1 contains the following:
Sep 2 04:59:45 node1 SC[,ORCL.gds:1,myrg,myrs,gds_start]:
     [ID 186822 daemon.error] /bin/uname -n | /bin/grep node1 has failed rc=1
Sep 2 04:59:45 node1 SC[,ORCL.gds:1,myrg,myrs,gds_start]:
     [ID 475178 daemon.notice]  Start_exit_on_error=true has been set. The
     resource will enter a start failed state.

However, the RGM reacts to a Start_failed error by querying the Failover_mode setting. Consequently, when Failover_mode=SOFT was set, the resource group failed over to node2, where the Start_command was successful. Because the PMF_managed=FALSE was also set, a Stop_command is required. In this scenario, it is acceptable to not invoke the STOP action by using Stop_command=/bin/true.

Stop_exit_on_error Property

The Stop_exit_on_error extension property is set to FALSE by default.

If Stop_exit_on_error=TRUE, Stop_command, and PMF_managed=TRUE were all set, then if the Stop_command property returns a non-zero exit status, the resource immediately enters a Stop_failed state. The GDSv2 software stops monitoring the process IDs running under the PMF tag; however, the PMF tag will still exist. Some application process IDs might still be running under the PMF tag, but the PMF does not monitor those process IDs.

Consequently, setting the Stop_exit_on_error=TRUE property is only useful when you also have the PMF_managed=TRUE property set. In this scenario, Stop_exit_on_error=TRUE prevents the PMF from sending the Stop_signal to the process IDs running under the PMF tag. This might be useful to determine why the Stop_command property failed to stop the application (for example, before the GDSv2 application cleans up the process IDs running under the PMF tag).

For example, perform the following steps to stop the application:

  1. Disable or delete the resource myrs.

    1. Disable the resource myrs.

      # clresource disable myrs
    2. Delete the resource myrs.

      # clresource delete myrs
  2. Create the resource and set the Stop_exit_on_error=TRUE property.

    # clresource create -g myrg -t ORCL.gds \
    -p Start_command="/bin/sleep 1800 &" \
    -p Stop_command="/bin/false" \
    -p Stop_exit_on_error=true \
    -d myrs
  3. Enable the resource and check its status.

    # clresource enable myrs

    # clresource status myrs

    === Cluster Resources ===
    Resource Name       Node Name      State        Status Message
    -------------       ---------      -----        --------------
    myrs                node1          Online       Online - Service is online.
                        node2          Offline      Offline
  4. Disable the resource.

    # clresource disable myrs

    resource group in ERROR_STOP_FAILED state requires operator attention
  5. Check the status of the resource.

    # clresource status myrs

    === Cluster Resources ===
    Resource Name       Node Name      State        Status Message
    -------------       ---------      -----        --------------
    myrs                node1          Stop_failed  Faulted
                        node2          Offline      Offline
  6. Display the PMF tag for the myrs resource.

    # pmfadm -l myrg,myrs,0.svc
    pmfadm -c myrg,myrs,0.svc -a /usr/cluster/lib/sc/scds_pmf_action_script \
    /bin/ksh -c \
    '/usr/cluster/bin/hatimerun -t 299 /bin/sleep 1800 &; echo $? > \
    /var/cluster/run/temp3PaWJC' 
         retries:  0
         owner:  root
         monitor children: all
         pids:  14624 14626

When the myrs resource is disabled, the Stop_command is executed. However, Stop_command=/bin/false was set, thereby inducing a Stop_failed error. When Stop_exit_on_error=TRUE was set, the GDSv2 application exits immediately with a Stop_failed error and does not attempt to clean up the process IDs running under the PMF tag.

The system-log on node1 also contains the following information:

Sep  2 06:11:41 node1 SC[,ORCL.gds:1,myrg,myrs,gds_stop]:
     [ID 186822 daemon.error] /bin/false  has failed rc=255
Sep  2 06:11:41 node1 SC[,ORCL.gds:1,myrg,myrs,gds_stop]: [ID 943012
     daemon.error] Stop_exit_on_error=true has been set. The resource will enter
     a stop failed state.
Sep  2 06:11:41 node1 Cluster.RGM.global.rgmd: [ID 938318 daemon.error]
     Method <gds_stop> failed on resource <myrs> in resource group <myrg>
     [exit code<1>, time used: 0% if timeout <300 seconds>] 

Stop_signal Property

This property specifies a value that identifies the signal to stop an application through the PMF. See the signal.h (3HEAD) man page for a list of the integer values that you can specify. The default value is 15 (SIGTERM).

Timeout_delay Property

The Timeout_delay extension property is set to FALSE by default. This extension property affects the GDSv2 probing algorithm and attempts to prevent a Probe_command timeout when the system is under a heavy load.


Note -  The Probe_command is executed periodically by the GDSv2 program, gds_probe, to determine if the application is healthy. When the system is under a heavy load, the Probe_command might be stuck waiting to execute as other higher-priority workload is executing. For example, if Probe_timeout=30 and Timeout_delay=FALSE are set and the system is under a heavy load, the Probe_command could suffer a probe timeout.

When this probe timeout occurs, the GDSv2 software is unable to tell if the application is healthy and might determine that a complete failure has occurred. If a complete failure is declared, the RGM queries the Failover_mode property to determine what action to take. However, if Probe_timeout=30 and Timeout_delay=TRUE are set and the system is under load, the timer for Probe_timeout will be delayed until the Probe_command is actually executing (rather than just being scheduled to execute).

The GDSv2 probe executes the Probe_command under a timeout clock and uses the fork(2) and exec(2) calls to execute the Probe_command as a new process. On a heavily loaded system, there can be seconds of delay from the time that the child process is forked until the time that the child process is executing the Probe_command.

If Timeout_delay=FALSE is set, the timeout clock is started as soon as the child process is forked.

If Timeout_delay=TRUE is set, the timeout clock is started only when the child process has started to execute.

There are advantages to both settings and you should consider the impact of setting Timeout_delay.

If the system is heavily loaded you might want a probe timeout to occur so that the RGM can attempt an application recovery by querying the Failover_mode property. In this case, on a heavily loaded system setting Timeout_delay=FALSE would be appropriate and is the default setting.

If the system is heavily loaded and you want to guarantee that the timeout clock is started only when the Probe_command has started to execute, then setting Timeout_delay=TRUE would be appropriate. However, there is no guarantee that a probe timeout might not still occur. Instead, the timeout clock is just delayed until Probe_command has started to execute. If the Probe_command still struggles to complete, once the timeout clock has been started, then a probe timeout can still occur.

If a probe timeout occurs, a failure is returned to GDSv2. By default, Num_probe_timeouts=2 is set meaning that two consecutive probe timeouts will result in a complete failure. When a complete failure is returned by GDSv2, the RGM queries the Failover_mode property to determine what action to take.

There is no practical example to actively demonstrate Timeout_delay.

Wait_for_online Property

The Wait_for_online extension property is set to TRUE by default.

When this property is TRUE, the GDSv2 software executes the Probe_command within the START method for the duration of Start_timeout when the resource is being enabled.


Note -  If the Probe_command is not set and PMF_managed=TRUE is set, a dummy probe is used for the Probe_command. This dummy probe simply checks if the associated PMF tag exists.

When the resource is being started (enabled), if the Probe_command returns a zero exit status, the application is deemed to be available and the resource then enters an Online state. If Wait_for_online=FALSE is set, the GDSv2 software does not attempt to execute the Probe_command within the START method. Instead, if the Start_command exits with a zero exit status, then the resource enters an Online state. Otherwise, the resource enters a Start_failed state.

The RGM queries the Failover_mode property to determine what action to take from a Start_failed state. This information can be useful when you do not want to wait for the Probe_command to declare a zero return code before the resource enters an Online state.

Perform the following steps to simulate an application that takes more than 10 seconds to start:

  1. Disable or delete the resource myrs.

    1. Disable the resource myrs.

      # clresource disable myrs
    2. Delete the resource myrs.

      # clresource delete myrs
  2. Create the following scripts on each Oracle Solaris Cluster node.

    # cat /var/tmp/start

    #!/usr/bin/ksh
    /var/tmp/start_child &
    exit 0

    # cat /var/tmp/start_child

    #!/usr/bin/ksh
    sleep 10
    /bin/touch /var/tmp/myrs
    exit 0

    # cat /var/tmp/probe

    #!/usr/bin/ksh
    if [[ -f /var/tmp/myrs ]]; then
         exit 0
    else 
         exit 100
    fi

    Note -  Create each of these scripts in this procedure on each Oracle Solaris Cluster node. Ensure that these scripts can be executed.

    The example above shows that the /var/tmp/start will execute a background job called /var/tmp/start_child. The /var/tmp/start_child sleeps for 10 seconds and then touches the /var/tmp/myrs. The Start_command=/var/tmp/start should then exit with a zero exit status.


    Note -  The purpose of /var/tmp/start and /var/tmp/start_child is to simulate an application that takes some time to start, such as 10 seconds. All the scripts described above should be created on every Oracle Solaris Cluster node. The /var/tmp/probe checks if the application is running and is used by the Probe_command below.
  3. Create the myrs resource.

    # clresource create -g myrg -t ORCL.gds \
    -p Start_command=/var/tmp/start \
    -p Stop_command="/bin/rm -f /var/tmp/myrs" \
    -p Probe_command=/var/tmp/probe \
    -p PMF_managed=false \
    -d myrs
  4. Enable the resource and check its status.

    # time clresource enable myrs

    real   0m10.45s
    user   0m0.07s
    sys    0m0.03s 

    # clresource status myrs

    === Cluster Resources ===
    Resource Name   Node Name    State    Status Message
    -------------   ---------    -----    --------------
    myrs            node1        Online   Online - Service is online.
                    node2        Offline  Offline

The following example shows how the myrs resource is created using the Wait_for_online=FALSE and immediately enters an Online state. However, the resource status is degraded because the Probe_command has not yet returned a zero exit status.

Perform the following steps to immediately put a resource into an online state and then into a degraded state:

  1. Disable the resource myrs.

    # clresource disable myrs

  2. Delete the resource myrs.

    # clresource delete myrs

  3. Create the myrg resource.

    # clresource -g myrg -t ORCL.gds \
    -p Start_command=/var/tmp/start \
    -p Stop_command="/bin/rm -f /var/tmp/myrs" \
    -p Probe_command=/var/tmp/probe \
    -p PMF_managed=false \
    -p Wait_for_online=false -d myrs
  4. Enable the resource and check its status.

    # time clresource enable myrs

    real    0m0.32s
    user    0m0.07s
    sys     0m0.03s

    # clresource status myrs

    === Cluster Resources ===
    Resource Name       Node Name      State        Status Message
    -------------       ---------      -----        --------------
    myrs                node1          Online       Degraded - Service is degraded.
                        node2          Offline      Offline

    After 60 seconds, check the status of the file again.

    # clresource status myrs

    The Probe_Command is executed periodically. After the Thorough_probe_interval (60 seconds), the Probe_command is executed again. This time the probe is successful and the resource status enters an Online status.

    === Cluster Resources ===
    Resource Name       Node Name      State        Status Message
    -------------       ---------      -----        --------------
    myrs                node1          Online       Online - Service is online.
                        node2          Offline      Offline

Wait_probe_limit Property

The Wait_probe_limit extension property is set to 0 by default.

This extension property is used when Wait_for_online=TRUE is set. See Wait_for_online Property for more information.

When Wait_for_online=TRUE is set, GDSv2 executes the Probe_command within the START method for the duration of Start_timeout or until the Probe_command returns a zero exit status. The Probe_command is attempted every two seconds.

By default, Start_timeout=300 is set and consequently the Probe_command could be attempted many times until it is successful.

Three possible scenarios could occur:

  • Wait_probe_limit=0 – The Probe_command is attempted for the duration of Start_timeout, until the Probe_command returns a zero exit status. Otherwise, the Probe_command attempts will continue until the RGM declares a START timeout.

  • Wait_probe_limit=1 – The Probe_command is attempted just once during processing of the Wait_for_online property. Likewise, if Wait_probe_limit=8 is set, then the Probe_command makes eight attempts during the Wait_for_online processing.

  • Wait_probe_limit=2 – The following procedure illustrates a simple example of Wait_probe_limit=2. The same scripts were used here as in the Wait_for_online=TRUE example in the Wait_for_online Property section. In the first example when the default Wait_for_online=TRUE was set, the clrs enable myrs command took approximately 10 seconds to complete. However, in the example below, the Wait_probe_limit=2 is set and the clresource enable myrs command takes approximately four seconds to complete.

Perform the following steps to attempt several times to start the resource:

  1. Disable the resource myrs.

    # clresource disable myrs

  2. Delete the resource myrs.

    # clresource delete myrs

  3. Create the myrg resource group.

    # clresource create -g myrg -t ORCL.gds \
    -p Start_command=/var/tmp/start \
    -p Stop_command="/bin/rm -f /var/tmp/myrs" \
    -p Probe_command=/var/tmp/probe \
    -p PMF_managed=false \
    -p Wait_probe_limit=2 \
    -d myrs
  4. Enable the resource and check its status.

    # time clresource enable myrs

    clrs:  (C748634) Resource group myrg failed to start on chosen node
    and might fail over to other node(s)
    real    0m4.795s
    user    0m0.075s
    sys     0m0.035s

    Check the resource status.

    # clresource status myrs

    === Cluster Resources ===
    Resource Name      Node Name      State         Status Message
    -------------      ---------      -----         --------------
    myrs               node1          Offline       Offline
                       node2          Starting      Unknown - Starting

    Recheck the resource status.

    # clresource status myrs

    === Cluster Resources ===
    Resource Name      Node Name      State         Status Message
    -------------      ---------      -----         --------------
    myrs               node1          Online        Online - Service is online.
                       node2          Offline       Offline

In the preceding procedure, the resource myrs is being enabled but fails after approximately four seconds (the Wait_probe_limit=2 was set and the Probe_command is attempted every two seconds after the last attempt). Consequently, the Probe_command did not return a zero exit status within those two attempts. The GDSv2 software then returned a START failed and the RGM declared a Start_failed state.

However, Failover_mode=SOFT was set by default and the RGM then failed over the resource group from node1 to node2 (the first clresource status myrs command shows the resource myrs being started on node2). However, when starting on node2, the Probe_command again also failed to return a zero exit status within two Wait_probe_limit attempts. Consequently, the GDSv2 software again returned a START failed and the RGM declared a Start_failed state. Because of the Failover_mode=SOFT setting, a failover of the resource group from node2 to node1 is now attempted.


Note -  The same scripts were used here as in the Wait_for_online=TRUE example in Wait_for_online Property. As such, the /var/tmp/start script executes the /var/tmp/start_child script in the background. That script sleeps for 10 seconds before touching the file (/var/tmp/myrs) that Probe_command is checking.

The first attempt to enable resource myrs on node1 took approximately four seconds, and even though you cannot see it on the terminal, the first attempt to enable resource myrs on node2 also took approximately four seconds. With the second attempt to start resource myrs on node1, /var/tmp/start_child had already consumed approximately eight seconds of its 10-second sleep. Consequently, with Wait_probe_limit=2 set, the second attempt to start the resource myrs was successful and the resource entered an Online state.

The system-log on node1 and node2 contains the following messages:

Sep  3 00:44:13 node1 SC[,ORCL.gds:1,myrg,myrs,gds_start]: [ID 496934 daemon.notice]
     wait_probe_limit=2 is set, resource will enter a start failed state.
Sep  3 00:44:17 node2 SC[,ORCL.gds:1,myrg,myrs,gds_start]: [ID 496934 daemon.notice]
     wait_probe_limit=2 is set, resource will enter a start failed state.