Sun Cluster 3.1 Data Services Developer's Guide

Start Method

The RGM invokes the Start method on a cluster node when the resource group containing the data service resource is brought online on that node or when the resource group is already online and the resource is enabled. In the sample application, the Start method activates the in.named (DNS) daemon on that node.

This section describes the major pieces of the Start method for the sample application. It does not describe functionality common to all callback methods, such as the parse_args function and obtaining the syslog facility, which are described in Providing Common Functionality to All Methods.

For the complete listing of the Start method, see Start Method Code Listing.

Start Overview

Before attempting to launch DNS, the Start method in the sample data service verifies the configuration directory and configuration file (named.conf) are accessible and available. Information in named.conf is essential to successful operation of DNS.

This callback method uses the process monitor facility (pmfadm) to start the DNS daemon (in.named). If DNS crashes or fails to start, the PMF attempts to start it a prescribed number of times during a specified interval. The number of retries and the interval are specified by properties in the data service's RTR file.

Verifying the Configuration

In order to operate, DNS requires information from the named.conf file in the configuration directory. Therefore, the Start method performs some sanity checks to verify that the directory and file are accessible before attempting to launch DNS.

The Confdir extension property provides the path to the configuration directory. The property itself is defined in the RTR file. However, the cluster administrator specifies the actual location when configuring the data service.

In the sample data service, the Start method retrieves the location of the configuration directory using the scha_resource_get(1HA) command.


Note –

Because Confdir is an extension property, scha_resource_get returns both the type and value. The awk(1) command retrieves just the value and places it in a shell variable, CONFIG_DIR.



# find the value of Confdir set by the cluster administrator at the time of
# adding the resource.
config_info=`scha_resource_get -O Extension -R $RESOURCE_NAME \
-G $RESOURCEGROUP_NAME Confdir`

# scha_resource_get returns the "type" as well as the "value" for the
# extension properties. Get only the value of the extension property 
CONFIG_DIR=`echo $config_info | awk '{print $2}'`

The Start method then uses the value of CONFIG_DIR to verify that the directory is accessible. If it is not accessible, Start logs an error message and exits with error status. See Start Exit Status.


# Check if $CONFIG_DIR is accessible.
if [ ! -d $CONFIG_DIR ]; then
   logger -p ${SYSLOG_FACILITY}.err \
         -t [$SYSLOG_TAG] \
         "${ARGV0} Directory $CONFIG_DIR is missing or not mounted"
   exit 1
fi

Before starting the application daemon, this method performs a final check to verify that the named.conf file is present. If it is not present, Start logs an error message and exits with error status.


# Change to the $CONFIG_DIR directory in case there are relative
# pathnames in the data files.
cd $CONFIG_DIR

# Check that the named.conf file is present in the $CONFIG_DIR directory
if [ ! -s named.conf ]; then
   logger -p ${SYSLOG_FACILITY}.err \
         -t [$SYSLOG_TAG] \
         "${ARGV0} File $CONFIG_DIR/named.conf is missing or empty"
   exit 1
fi

Starting the Application

This method uses the process manager facility (pmfadm) to launch the application. The pmfadm command allows you to set the number of times to restart the application during a specified time frame. The RTR file contains two properties, Retry_count, which specifies the number of times to attempt restarting an application, and Retry_interval, which specifies the time period over which to do so.

The Start method retrieves the values of Retry_count and Retry_interval using the scha_resource_get command and stores their values in shell variables. It then passes these values to pmfadm using the -n and -t options.


# Get the value for retry count from the RTR file.
RETRY_CNT=`scha_resource_get -O Retry_Count -R $RESOURCE_NAME \
-G $RESOURCEGROUP_NAME`
# Get the value for retry interval from the RTR file. This value is in seconds
# and must be converted to minutes for passing to pmfadm. Note that the 
# conversion rounds up; for example, 50 seconds rounds up to 1 minute.
((RETRY_INTRVAL=`scha_resource_get -O Retry_Interval -R $RESOURCE_NAME \
-G $RESOURCEGROUP_NAME` / 60))

# Start the in.named daemon under the control of PMF. Let it crash and restart 
# up to $RETRY_COUNT times in a period of $RETRY_INTERVAL; if it crashes
# more often than that, PMF will cease trying to restart it.
# If there is a process already registered under the tag
# <$PMF_TAG>, then PMF sends out an alert message that the
# process is already running.
pmfadm -c $PMF_TAAG -n $RETRY_CNT -t $RETRY_INTRVAL \
    /usr/sbin/in.named -c named.conf

# Log a message indicating that HA-DNS has been started.
if [ $? -eq 0 ]; then
   logger -p ${SYSLOG_FACILITY}.err \
         -t [$SYSLOG_TAG] \
         "${ARGV0} HA-DNS successfully started"
fi
exit 0

Start Exit Status

A Start method should not exit with success until the underlying application is actually running and available, particularly if other data services are dependent on it. One way to verify success is to probe the application to verify it is running before exiting the Start method. For a complex application, such as a database, be certain to set the value for the Start_timeout property in the RTR file sufficiently high to allow time for the application to initialize and perform crash recovery.


Note –

Because the application resource, DNS, in the sample data service launches quickly, the sample data service does not poll to verify it is running before exiting with success.


If this method fails to start DNS and exits with failure status, the RGM checks the Failover_mode property, which determines how to react. The sample data service does not explicitly set the Failover_mode property, so this property has the default value NONE (unless the cluster administrator has overridden the default and specified a different value). In this case, the RGM takes no action other than to set the state of the data service. User intervention is required to restart on the same node or fail over to a different node.