Sun Cluster 3.1 Data Services Developer's Guide

Appendix B Sample Data Service Code Listings

This appendix provides the complete code for each method in the sample data service. It also lists the contents of the resource type registration file.

This appendix includes the following code listings.

Resource Type Registration File Listing

The resource type registration (RTR) file contains resource and resource type property declarations that define the initial configuration of the data service at the time the cluster administrator registers the data service.


Example B–1 SUNW.Sample RTR File

#
# Copyright (c) 1998-2000 by Sun Microsystems, Inc.
# All rights reserved.
#
# Registration information for Domain Name Service (DNS)
#
 
 
#pragma ident   “@(#)SUNW.sample   1.1   00/05/24 SMI”
 
RESOURCE_TYPE = “sample”;
VENDOR_ID = SUNW;
RT_DESCRIPTION = “Domain Name Service on Sun Cluster”;
 
RT_VERSION =”1.0”; 
API_VERSION = 2;    
FAILOVER = TRUE;
 
RT_BASEDIR=/opt/SUNWsample/bin;
PKGLIST = SUNWsample;
 
START              = dns_svc_start;
STOP               = dns_svc_stop;
 
VALIDATE           = dns_validate;
UPDATE             = dns_update;
 
MONITOR_START      = dns_monitor_start;
MONITOR_STOP       = dns_monitor_stop;
MONITOR_CHECK      = dns_monitor_check;
 
# A list of bracketed resource property declarations follows the 
# resource-type declarations. The property-name declaration must
be
# the first attribute after the open curly bracket of each entry.
#
 
# The <method>_timeout properties set the value in seconds
after which 
# the RGM concludes invocation of the method has failed. 
 
# The MIN value for all method timeouts is set to 60 seconds. This 
# prevents administrators from setting shorter timeouts, which do
not 
# improve switchover/failover performance, and can lead to undesired 
# RGM actions (false failovers, node reboot, or moving the resource
group 
# to ERROR_STOP_FAILED state, requiring operator intervention).
Setting
# too-short method timeouts leads to a *decrease* in overall availability 
# of the data service.
{  
   PROPERTY = Start_timeout; 
   MIN=60; 
   DEFAULT=300;
}
 
{
             PROPERTY = Stop_timeout; 
            MIN=60; 
           DEFAULT=300;
}
{
        PROPERTY = Validate_timeout;
        MIN=60;
        DEFAULT=300;
}
{
        PROPERTY = Update_timeout;
        MIN=60;
        DEFAULT=300;
}
{
        PROPERTY = Monitor_Start_timeout;
        MIN=60;
        DEFAULT=300;
}
{
        PROPERTY = Monitor_Stop_timeout;
        MIN=60;
        DEFAULT=300;
}
{
        PROPERTY = Thorough_Probe_Interval;
        MIN=1;
        MAX=3600;
        DEFAULT=60;
        TUNABLE = ANYTIME;
}
 
# The number of retries to be done within a certain period before
concluding 
# that the application cannot be successfully started on this node.
{
        PROPERTY = Retry_Count;
        MIN=0;
        MAX=10;
        DEFAULT=2;
        TUNABLE = ANYTIME; 
}

# Set Retry_Interval as a multiple of 60 since it is converted from
seconds
# to minutes, rounding up. For example, a value of 50 (seconds)
# is converted to 1 minute. Use this property to time the number
of 
# retries (Retry_Count).
{
        PROPERTY = Retry_Interval;
        MIN=60;
        MAX=3600;
        DEFAULT=300;
        TUNABLE = ANYTIME;
}
 
{
        PROPERTY = Network_resources_used;
        TUNABLE = AT_CREATION;
        DEFAULT = ““;
}
 
#
# Extension Properties
#
 
# The cluster administrator must set the value of this property
to point to the 
# directory that contains the configuration files used by the application. 
# For this application, DNS, specify the path of the DNS configuration
file on 
# PXFS (typically named.conf).
{
   PROPERTY = Confdir;
   EXTENSION;
   STRING;
   TUNABLE = AT_CREATION;
   DESCRIPTION = “The Configuration Directory Path”;
}
 
# Time out value in seconds before declaring the probe as failed.
{
        PROPERTY = Probe_timeout;
        EXTENSION;
        INT;
        DEFAULT = 30;
        TUNABLE = ANYTIME;
        DESCRIPTION = “Time out value for the probe (seconds)”;
}

Start Method Code Listing

The RGM invokes the Start method on a cluster node when the resource group containing the data service resource is brought online on that node or when the resource is enabled. In the sample application, the Start method activates the in.named (DNS) daemon on that node.


Example B–2 dns_svc_start Method

#!/bin/ksh
#
# Start Method for HA-DNS.
#
# This method starts the data service under the control of PMF.
Before starting 
# the in.named process for DNS, it performs some sanity checks.
The PMF tag for 
# the data service is $RESOURCE_NAME.named. PMF tries to start the
service a
# specified number of times (Retry_count) and if the number of attempts
exceeds 
# this value within a specified interval (Retry_interval) PMF reports
a failure 
# to start the service. Retry_count and Retry_interval are both
properties of the 
# resource set in the RTR file. 


#pragma ident   “@(#)dns_svc_start   1.1   00/05/24 SMI”

###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
        typeset opt

        while getopts `R:G:T:' opt
        do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the
resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                *)
                    logger -p ${SYSLOG_FACILITY}.err \
                    -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
                    “ERROR: Option $OPTARG unknown”
                     exit 1
                     ;;

                esac
        done

}




###############################################################################
# MAIN
#
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.named
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# Get the value of the Confdir property of the resource in order
to start 
# DNS. Using the resource name and the resource group entered, find
the value of
# Confdir value set by the cluster administrator when adding the
resource. 
config_info=`scha_resource_get -O Extension -R $RESOURCE_NAME
-G $RESOURCEGROUP_NAME Confdir`
# scha_resource_get returns the “type” as well
as the “value” for the extension
# properties. Get only the value of the extension property.
CONFIG_DIR=`echo $config_info | awk `{print $2}'`

# Check if $CONFIG_DIR is accessible.
if [ ! -d $CONFIG_DIR ]; then
   logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
       “${ARGV0} Directory $CONFIG_DIR missing or not mounted”
   exit 1
fi

# Change to the $CONFIG_DIR directory in case there are relative
# path names in the data files.
cd $CONFIG_DIR

# Check that the named.conf file is present in the $CONFIG_DIR directory.
if [ ! -s named.conf ]; then
   logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
       “${ARGV0} File $CONFIG_DIR/named.conf is missing or
empty”
   exit 1
fi

# Get the value for Retry_count from the RTR file.
RETRY_CNT=`scha_resource_get -O Retry_Count -R $RESOURCE_NAME
-G \ $RESOURCEGROUP_NAMÈ

# Get the value for Retry_interval from the RTR file. Convert this
value, which is in  
# seconds, to minutes for passing to pmfadm. Note that this is a
# conversion with round-up, for example, 50 seconds rounds up to
one minute.
((RETRY_INTRVAL = `scha_resource_get -O Retry_Interval
-R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ / 60))

# Start the in.named daemon under the control of PMF. Let it crash
and restart
# up to $RETRY_COUNT times in a period of $RETRY_INTERVAL; if it
crashes
# more often than that, PMF will cease trying to restart it. If
there is a 
# process already registered under the tag <$PMF_TAG>,
then, 
# PMF sends out an alert message that the process is already running.
echo “Retry interval is “$RETRY_INTRVAL
pmfadm -c $PMF_TAG.named -n $RETRY_CNT -t $RETRY_INTRVAL \
    /usr/sbin/in.named -c named.conf

# Log a message indicating that HA-DNS has been started.
if [ $? -eq 0 ]; then
   logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG]\
           “${ARGV0} HA-DNS successfully started”
fi
exit 0
 

Stop Method Code Listing

The Stop method is invoked on a cluster node when the resource group containing the HA-DNS resource is brought offline on that node or the resource is disabled. This method stops the in.named (DNS) daemon on that node.


Example B–3 dns_svc_stop Method

#!/bin/ksh
#
# Stop method for HA-DNS
#
# Stop the data service using PMF. If the service is not running
the
# method exits with status 0 as returning any other value puts the
resource 
# in STOP_FAILED state.


#pragma ident   “@(#)dns_svc_stop   1.1   00/05/24 SMI”

###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
        typeset opt

        while getopts `R:G:T:' opt
        do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the
resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                *)
                    logger -p ${SYSLOG_FACILITY}.err \
                    -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
                    “ERROR: Option $OPTARG unknown”
                     exit 1
                     ;;

                esac
        done

}


###############################################################################
# MAIN
#
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.named
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# Obtain the Stop_timeout value from the RTR file.
STOP_TIMEOUT=`scha_resource_get -O STOP_TIMEOUT -R $RESOURCE_NAME
-G \ $RESOURCEGROUP_NAMÈ

# Attempt to stop the data service in an orderly manner using a
SIGTERM
# signal through PMF. Wait for up to 80% of the Stop_timeout value
to
# see if SIGTERM is successful in stopping the data service. If
not, send SIGKILL
# to stop the data service. Use up to 15% of the Stop_timeout value
to see
# if SIGKILL is successful. If not, there is a failure and the method
exits with
# non-zero status. The remaining 5% of the Stop_timeout is for other
uses. 
((SMOOTH_TIMEOUT=$STOP_TIMEOUT * 80/100))

((HARD_TIMEOUT=$STOP_TIMEOUT * 15/100))

# See if in.named is running, and if so, kill it. 
if pmfadm -q $PMF_TAG.named; then 
   # Send a SIGTERM signal to the data service and wait for 80% of
the
   # total timeout value.
   pmfadm -s $PMF_TAG.named -w $SMOOTH_TIMEOUT TERM
   if [ $? -ne 0 ]; then 
      logger -p ${SYSLOG_FACILITY}.info -t [SYSLOG_TAG] \
          “${ARGV0} Failed to stop HA-DNS with SIGTERM; Retry
with \
           SIGKILL”
      
      # Since the data service did not stop with a SIGTERM signal, use
      # SIGKILL now and wait for another 15% of the total timeout value.
      pmfadm -s $PMF_TAG.named -w $HARD_TIMEOUT KILL
      if [ $? -ne 0 ]; then
          logger -p ${SYSLOG_FACILITY}.err -t [SYSLOG_TAG]
\
          “${ARGV0} Failed to stop HA-DNS; Exiting UNSUCCESFUL”
          
          exit 1
      fi   
fi
else 
   # The data service is not running as of now. Log a message and 
   # exit success.
   logger -p ${SYSLOG_FACILITY}.info -t [SYSLOG_TAG] \
           “HA-DNS is not started”

   # Even if HA-DNS is not running, exit success to avoid putting 
   # the data service in STOP_FAILED State.

   exit 0

fi

# Successfully stopped DNS. Log a message and exit success.
logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG] \
    “HA-DNS successfully stopped”
exit 0

gettime Utility Code Listing

The gettime utility is a C program used by the PROBE program to track the elapsed time between restarts of the probe. You must compile this program and place it in the same directory as the callback methods, that is, the directory pointed to by the RT_basedir property.


Example B–4 gettime.c utility program

#
# This utility program, used by the probe method of the data service,
tracks
# the elapsed time in seconds from a known reference point (epoch
point). It
# must be compiled and placed in the same directory as the data
service callback
# methods (RT_basedir).


#pragma ident   “@(#)gettime.c   1.1   00/05/24 SMI”


#include <stdio.h>
#include <sys/types.h>
#include <time.h>

main()
{
    printf(“%d\n”, time(0));
    exit(0);
}

PROBE Program Code Listing

The PROBE program checks the availability of the data service using nslookup(1M) commands. The Monitor_start callback method launches this program and the Monitor_start callback method stops it.


Example B–5 dns_probe Program

#!/bin/ksh
#pragma ident   “@(#)dns_probe   1.1   00/04/19 SMI”
#
# Probe method for HA-DNS.
#
# This program checks the availability of the data service using
nslookup, which 
# queries the DNS server to look for the DNS server itself. If the
server
# does not respond or if the query is replied to by some other server, 
# then the probe concludes that there is some problem with the data
service
# and fails the service over to another node in the cluster. Probing
is done
# at a specific interval set by THOROUGH_PROBE_INTERVAL in the RTR
file. 
  
#pragma ident   “@(#)dns_probe   1.1   00/05/24 SMI”


###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
        typeset opt

        while getopts `R:G:T:' opt
        do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the
resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                *)
                    logger -p ${SYSLOG_FACILITY}.err \
                    -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
                    “ERROR: Option $OPTARG unknown”
                     exit 1
                     ;;

                esac
        done

}


###############################################################################
# restart_service ()
#
# This function tries to restart the data service by calling the
Stop method
# followed by the Start method of the dataservice. If the dataservice
has
# already died and no tag is registered for the dataservice under
PMF,
# then this function fails the service over to another node in the
cluster.
#
function restart_service
{

        # To restart the dataservice, first, verify that the 
        # dataservice itself is still registered under PMF.
        pmfadm -q $PMF_TAG
        if [[ $? -eq 0 ]]; then
                # Since the TAG for the dataservice is still registered
under
                # PMF, first stop the dataservice and start it back
up again.

                # Obtain the Stop method name and the STOP_TIMEOUT
value for
                # this resource.
                STOP_TIMEOUT=`scha_resource_get -O STOP_TIMEOUT
\
                        -R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ
                STOP_METHOD=`scha_resource_get -O STOP
\
                        -R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ
                hatimerun -t $STOP_TIMEOUT $RT_BASEDIR/$STOP_METHOD
\
                        -R $RESOURCE_NAME -G $RESOURCEGROUP_NAME
\
                        -T $RESOURCETYPE_NAME

                if [[ $? -ne 0 ]]; then
                        logger-p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG]
\
                                “${ARGV0} Stop method failed.”
                        return 1
                fi

                # Obtain the Start method name and the START_TIMEOUT
value for
                # this resource.
                START_TIMEOUT=`scha_resource_get -O START_TIMEOUT
\
                        -R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ
                START_METHOD=`scha_resource_get -O START
\
                        -R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ
                hatimerun -t $START_TIMEOUT $RT_BASEDIR/$START_METHOD
\
                        -R $RESOURCE_NAME -G $RESOURCEGROUP_NAME
\
                        -T $RESOURCETYPE_NAME

                if [[ $? -ne 0 ]]; then
                        logger-p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG]
\
                                “${ARGV0} Start method
failed.”
                        return 1
                fi


        else
                # The absence of the TAG for the dataservice 
                # implies that the dataservice has already
                # exceeded the maximum retries allowed under PMF. 
                # Therefore, do not attempt to restart the
                # dataservice again, but try to failover
                # to another node in the cluster.
                scha_control -O GIVEOVER -G $RESOURCEGROUP_NAME
\
                        -R $RESOURCE_NAME
        fi

        return 0
}




###############################################################################
# decide_restart_or_failover ()
#
# This function decides the action to be taken upon the failure
of a probe: 
# restart the data service locally or fail over to another node
in the cluster.
#
function decide_restart_or_failover
{
   
   # Check if this is the first restart attempt.
   if [ $retries -eq 0 ]; then
         # This is the first failure. Note the time of 
         # this first attempt. 
         start_time=`$RT_BASEDIR/gettimè
         retries=`expr $retries + 1`
         # Because this is the first failure, attempt to restart
         # the data service.
         restart_service
         if [ $? -ne 0 ]; then
            logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
                “${ARGV0} Failed to restart data service.”
            exit 1
         fi
   else
      # This is not the first failure
      current_time=`$RT_BASEDIR/gettimè
      time_diff=`expr $current_time - $start_timè
      if [ $time_diff -ge $RETRY_INTERVAL ]; then
         # This failure happened after the time window
         # elapsed, so reset the retries counter,
         # slide the window, and do a retry.
         retries=1
         start_time=$current_time
         # Because the previous failure occurred more than 
         # Retry_interval ago, attempt to restart the data service.
         restart_service
         if [ $? -ne 0 ]; then
            logger -p ${SYSLOG_FACILITY}.err \
                -t [$SYSLOG_TAG
                “${ARGV0} Failed to restart HA-DNS.”
            exit 1
         fi
      elif [ $retries -ge $RETRY_COUNT ]; then
         # Still within the time window,
         # and the retry counter expired, so fail over.
         retries=0
         scha_control -O GIVEOVER -G $RESOURCEGROUP_NAME \
             -R $RESOURCE_NAME
         if [ $? -ne 0 ]; then
            logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
                “${ARGV0} Failover attempt failed.”
            exit 1
         fi
      else
         # Still within the time window,
         # and the retry counter has not expired,
         # so do another retry.
         retries=`expr $retries + 1`
         restart_service
         if [ $? -ne 0 ]; then
            logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
                “${ARGV0} Failed to restart HA-DNS.”
            exit 1
         fi
      fi
fi
}


###############################################################################
# MAIN
###############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.named
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# The interval at which probing is to be done is set in the system
defined
# property THOROUGH_PROBE_INTERVAL. Obtain the value of this property
with 
# scha_resource_get 
PROBE_INTERVAL=`scha_resource_get -O THOROUGH_PROBE_INTERVAL
-R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ

# Obtain the timeout value allowed for the probe, which is set in
the 
# PROBE_TIMEOUT extension property in the RTR file. The default
timeout for 
# nslookup is 1.5 minutes.
probe_timeout_info=`scha_resource_get -O Extension -R $RESOURCE_NAME
-G \$RESOURCEGROUP_NAME Probe_timeout`
PROBE_TIMEOUT=`echo $probe_timeout_info | awk `{print $2}'`

# Identify the server on which DNS is serving by obtaining the value
# of the NETWORK_RESOURCES_USED property of the resource.
DNS_HOST=`scha_resource_get -O NETWORK_RESOURCES_USED -R
$RESOURCE_NAME -G \$RESOURCEGROUP_NAMÈ

# Get the retry count value from the system defined property Retry_count
RETRY_COUNT=`scha_resource_get -O RETRY_COUNT -R $RESOURCE_NAME
-G \$RESOURCEGROUP_NAMÈ

# Get the retry interval value from the system defined property
Retry_interval
RETRY_INTERVAL=`scha_resource_get -O RETRY_INTERVAL -R
$RESOURCE_NAME -G \$RESOURCEGROUP_NAMÈ

# Obtain the full path for the gettime utility from the 
# RT_basedir property of the resource type.
RT_BASEDIR=`scha_resource_get -O RT_BASEDIR -R $RESOURCE_NAME
-G \$RESOURCEGROUP_NAMÈ

# The probe runs in an infinite loop, trying nslookup commands. 
# Set up a temporary file for the nslookup replies.
DNSPROBEFILE=/tmp/.$RESOURCE_NAME.probe
probefail=0
retries=0

while :
do
   # The interval at which the probe needs to run is specified in
the
   # property THOROUGH_PROBE_INTERVAL. Therefore, set the probe to
sleep for a 
   # duration of <THOROUGH_PROBE_INTERVAL>
   sleep $PROBE_INTERVAL

   # Run the probe, which queries the IP address on 
   # which DNS is serving.
   hatimerun -t $PROBE_TIMEOUT /usr/sbin/nslookup $DNS_HOST $DNS_HOST
\
           > $DNSPROBEFILE 2>&1
   
   retcode=$?
        if [ retcode -ne 0 ]; then
                probefail=1
        fi

   # Make sure that the reply to nslookup command comes from the HA-DNS
   # server and not from another name server listed in the 
   # /etc/resolv.conf file.
   if [ $probefail -eq 0 ]; then
      # Get the name of the server that replied to the nslookup query.
                   SERVER=` awk ` $1==”Server:” {
print $2 }' \
                   $DNSPROBEFILE | awk -F. ` { print $1 } ` `
                if [ -z “$SERVER” ];
then
                        probefail=1
                else
                        if [ $SERVER != $DNS_HOST ]; then
                                probefail=1
                        fi
                fi
        fi

   # If the probefail variable is not set to 0, either the nslookup command
   # timed out or the reply to the query was came from another server
   # (specified in the /etc/resolv.conf file). In either case, the DNS server is
   # not responding and the method calls decide_restart_or_failover,
   # which evaluates whether to restart the data service or to fail it over
   # to another node.

   if [ $probefail -ne 0 ]; then
         decide_restart_or_failover
   else
         logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG]\
         “${ARGV0} Probe for resource HA-DNS successful”
   fi
done

Monitor_start Method Code Listing

This method starts the PROBE program for the data service.


Example B–6 dns_monitor_start Method

#!/bin/ksh
#
# Monitor start Method for HA-DNS.
#
# This method starts the monitor (probe) for the data service under
the 
# control of PMF. The monitor is a process that probes the data
service
# at periodic intervals and if there is a problem restarts it on
the same node
# or fails it over to another node in the cluster. The PMF tag for
the
# monitor is $RESOURCE_NAME.monitor.


#pragma ident   “@(#)dns_monitor_start   1.1   00/05/24 SMI”

###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
        typeset opt

        while getopts `R:G:T:' opt
        do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the
resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                *)
          logger -p ${SYSLOG_FACILITY}.err \
                   -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
                    “ERROR: Option $OPTARG unknown”
                     exit 1
                     ;;
                esac
        done

}



###############################################################################
# MAIN
#
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.monitor
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# Find where the probe method resides by obtaining the value of
the 
# RT_BASEDIR property of the data service.
RT_BASEDIR=`scha_resource_get -O RT_BASEDIR -R $RESOURCE_NAME
-G \$RESOURCEGROUP_NAMÈ

# Start the probe for the data service under PMF. Use the infinite
retries 
# option to start the probe. Pass the resource name, group, and
type to the 
# probe method. 
pmfadm -c $PMF_TAG.monitor -n -1 -t -1 \
    $RT_BASEDIR/dns_probe -R $RESOURCE_NAME -G $RESOURCEGROUP_NAME
\
    -T $RESOURCETYPE_NAME

# Log a message indicating that the monitor for HA-DNS has been
started.
if [ $? -eq 0 ]; then
   logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG] \
           “${ARGV0} Monitor for HA-DNS successfully started”
fi
exit 0
 

Monitor_stop Method Code Listing

This method stops the PROBE program for the data service.


Example B–7 dns_monitor_stop Method

#!/bin/ksh
#
# Monitor stop method for HA-DNS
#
# Stops the monitor that is running using PMF.


#pragma ident   “@(#)dns_monitor_stop   1.1   00/05/24 SMI”


###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
        typeset opt

        while getopts `R:G:T:' opt
        do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the
resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                *)
                    logger -p ${SYSLOG_FACILITY}.err \
                    -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
                    “ERROR: Option $OPTARG unknown”
                     exit 1
                     ;;

                esac
        done

}


###############################################################################
# MAIN
#
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.monitor
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# See if the monitor is running, and if so, kill it. 
if pmfadm -q $PMF_TAG.monitor; then 
   pmfadm -s $PMF_TAG.monitor KILL
   if [ $? -ne 0 ]; then 
      logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
          “${ARGV0} Could not stop monitor for resource “ \
          $RESOURCE_NAME
           exit 1
   else
      # Could successfully stop the monitor. Log a message.
      logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG]\
          “${ARGV0} Monitor for resource “ $RESOURCE_NAME
\
          “ successfully stopped”
   fi
fi

exit 0

Monitor_check Method Code Listing

This method verifies the existence of the directory pointed to by the Confdir property. The RGM calls Monitor_check whenever the PROBE method fails the data service over to a new node and also to check nodes that are potential masters.


Example B–8 dns_monitor_check Method

#!/bin/ksh
#
# Monitor check  Method for DNS.
#
# The RGM calls this method whenever the fault monitor fails the
data service
# over to a new node. Monitor_check calls the Validate method to
verify
# that the configuration directory and files are available on the
new node.


#pragma ident   “@(#)dns_monitor_check 1.1   00/05/24 SMI”

###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
   typeset opt

   while getopts `R:G:T:' opt
   do
      case “$opt” in

      R)
      # Name of the DNS resource.
      RESOURCE_NAME=$OPTARG
      ;;

      G)
      # Name of the resource group in which the resource is
      # configured.
      RESOURCEGROUP_NAME=$OPTARG
      ;;

      T)
      # Name of the resource type.
      RESOURCETYPE_NAME=$OPTARG
      ;;

      *)
      logger -p ${SYSLOG_FACILITY}.err \
      -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
      “ERROR: Option $OPTARG unknown”
      exit 1
      ;;

      esac
   done

}

###############################################################################
# MAIN
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method.
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.named
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# Obtain the full path for the Validate method from
# the RT_BASEDIR property of the resource type.
RT_BASEDIR=`scha_resource_get -O RT_BASEDIR -R $RESOURCE_NAME
\
   -G $RESOURCEGROUP_NAMÈ

# Obtain the name of the Validate method for this resource.
VALIDATE_METHOD=`scha_resource_get -O VALIDATE \
   -R $RESOURCE_NAME -G $RESOURCEGROUP_NAMÈ

# Obtain the value of the Confdir property in order to start the
# data service. Use the resource name and the resource group entered
to
# obtain the Confdir value set at the time of adding the resource.
config_info=`scha_resource_get -O Extension -R $RESOURCE_NAME
-G $RESOURCEGROUP_NAME Confdir`

# scha_resource_get returns the type as well as the value for extension
# properties. Use awk to get only the value of the extension property.
CONFIG_DIR=`echo $config_info | awk `{print $2}'`

# Call the validate method so that the dataservice can be failed
over 
# successfully to the new node.
$RT_BASEDIR/$VALIDATE_METHOD -R $RESOURCE_NAME -G $RESOURCEGROUP_NAME
\
   -T $RESOURCETYPE_NAME -x Confdir=$CONFIG_DIR

# Log a message indicating that monitor check was successful.
if [ $? -eq 0 ]; then
   logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG] \
      “${ARGV0} Monitor check for DNS successful.”
   exit 0
else
   logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG] \
      “${ARGV0} Monitor check for DNS not successful.”
   exit 1
fi

Validate Method Code Listing

This method verifies the existence of the directory pointed to by the Confdir property. The RGM calls this method when the data service is created and when data service properties are updated by the cluster administrator. The Monitor_check method calls this method whenever the fault monitor fails the data service over to a new node.


Example B–9 dns_validate Method

#!/bin/ksh
#
# Validate method for HA-DNS.
# This method validates the Confdir property of the resource. The
Validate 
# method gets called in two scenarios. When the resource is being
created and
# when a resource property is getting updated. When the resource
is being
# created, this method gets called with the -c flag and all the
system-defined
# and extension properties are passed as command-line arguments.
When a resource
# property is being updated, the Validate method gets called with
the -u flag,
# and only the property/value pair of the property being updated
is passed as a
# command-line argument. 
#
# ex: When the resource is being created command args will be
#   
# dns_validate -c -R <..> -G <...> -T <..>
-r <sysdef-prop=value>... 
#       -x <extension-prop=value>.... -g <resourcegroup-prop=value>....
#
# when the resource property is being updated
#
# dns_validate -u -R <..> -G <...> -T <..>
-r <sys-prop_being_updated=value>
#   OR
# dns_validate -u -R <..> -G <...> -T <..>
-x <extn-prop_being_updated=value>
#


#pragma ident   “@(#)dns_validate   1.1   00/05/24 SMI”

###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
   typeset opt

   while getopts `cur:x:g:R:T:G:' opt
   do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                r)      
                        #The method is not accessing any system defined 
                        #properties, so this is a no-op.
                        ;;

                g)
                        # The method is not accessing any resource group 
                        # properties, so this is a no-op.
                        ;;

                c)
                        # Indicates the Validate method is being called while
                        # creating the resource, so this flag is a no-op.
                        ;;

                u)
                        # Indicates the updating of a property when the 
                        # resource already exists. If the update is to the
                        # Confdir property then Confdir should appear in the
                        # command-line arguments. If it does not, the method must
                        # look for it specifically using scha_resource_get.
                        UPDATE_PROPERTY=1
                        ;;

                x)
                        # Extension property list. Separate the property and 
                        # value pairs using “=” as the separator.
                        PROPERTY=`echo $OPTARG | awk -F= `{print $1}'`
                        VAL=`echo $OPTARG | awk -F= `{print $2}'`

                        # If the Confdir extension property is found on the 
                        # command line, note its value. 
                        if [ $PROPERTY == “Confdir” ];
                        then
                        CONFDIR=$VAL
                        CONFDIR_FOUND=1
                        fi
                        ;;

                *)
                        logger -p ${SYSLOG_FACILITY}.err \
                        -t [$SYSLOG_TAG] \
                        “ERROR: Option $OPTARG unknown”
                        exit 1
                        ;;
                esac
   done
}

###############################################################################
# MAIN
#
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Set the Value of CONFDIR to null. Later, this method retrieves
the value 
# of the Confdir property from the command line or using scha_resource_get.
CONFDIR=””
UPDATE_PROPERTY=0
CONFDIR_FOUND=0

# Parse the arguments that have been passed to this method.
parse_args “$@”

# If the validate method is being called due to the updating of
properties 
# try to retrieve the value of the Confdir extension property from
the command
# line. Otherwise, obtain the value of Confdir using scha_resource_get.
if ( (( $UPDATE_PROPERTY == 1 )) &&  (( CONFDIR_FOUND
== 0 )) ); then
   config_info=`scha_resource_get -O Extension -R $RESOURCE_NAME
\
       -G $RESOURCEGROUP_NAME Confdir`
   CONFDIR=`echo $config_info | awk `{print $2}'`
fi

# Verify that the Confdir property has a value. If not there is
a failure
# and exit with status 1.
if [[ -z $CONFDIR ]]; then
   logger -p ${SYSLOG_FACILITY}.err \
       “${ARGV0} Validate method for resource “$RESOURCE_NAME “ failed”
   exit 1
fi

# Now validate the actual Confdir property value. 

# Check if $CONFDIR is accessible.
if [ ! -d $CONFDIR ]; then
        logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG]\
            “${ARGV0} Directory $CONFDIR missing or not
mounted”
        exit 1
fi

# Check that the named.conf file is present in the Confdir directory.
if [ ! -s $CONFDIR/named.conf ]; then
        logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG]
\
            “${ARGV0} File $CONFDIR/named.conf is missing
or empty”
        exit 1
fi

# Log a message indicating that the Validate method was successful.
logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG] \
   “${ARGV0} Validate method for resource “$RESOURCE_NAME
\
   “ completed successfully”

exit 0

Update Method Code Listing

The RGM calls the Update method to notify a running resource that its properties have been changed.


Example B–10 dns_update Method

#!/bin/ksh
#
# Update method for HA-DNS.
#
# The actual updates to properties are done by the RGM. Updates
affect only 
# the fault monitor so this method must restart the fault monitor.


#pragma ident   “@(#)dns_update   1.1   00/05/24 SMI”

###############################################################################
# Parse program arguments.
#
function parse_args # [args ...]
{
        typeset opt

        while getopts `R:G:T:' opt
        do
                case “$opt” in
                R)
                        # Name of the DNS resource.
                        RESOURCE_NAME=$OPTARG
                        ;;
                G)
                        # Name of the resource group in which the
resource is
                        # configured.
                        RESOURCEGROUP_NAME=$OPTARG
                        ;;
                T)
                        # Name of the resource type.
                        RESOURCETYPE_NAME=$OPTARG
                        ;;

                *)
                    logger -p ${SYSLOG_FACILITY}.err \
                    -t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]
\
                    “ERROR: Option $OPTARG unknown”
                     exit 1
                     ;;

                esac
        done

}




###############################################################################
# MAIN
#
##############################################################################

export PATH=/bin:/usr/bin:/usr/cluster/bin:/usr/sbin:/usr/proc/bin:$PATH

# Obtain the syslog facility to use to log messages.
SYSLOG_FACILITY=`scha_cluster_get -O SYSLOG_FACILITY`

# Parse the arguments that have been passed to this method
parse_args “$@”

PMF_TAG=$RESOURCE_NAME.monitor
SYSLOG_TAG=$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME

# Find where the probe method resides by obtaining the value of
the 
# RT_BASEDIR property of the resource.
RT_BASEDIR=`scha_resource_get -O RT_BASEDIR -R $RESOURCE_NAME
-G $RESOURCEGROUP_NAMÈ

# When the Update method is called, the RGM updates the value of
the property 
# being updated. This method must check if the fault monitor (probe) 
# is running, and if so, kill it and then restart it.
if pmfadm -q $PMF_TAG.monitor; then

   # Kill the monitor that is running already
        pmfadm -s $PMF_TAG.monitor TERM
        if [ $? -ne 0 ]; then
                logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG]
\
                    “${ARGV0} Could not stop the monitor”
                exit 1
        else
                # Could successfully stop DNS. Log a message.
                logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG]
\
                    “Monitor for HA-DNS successfully stopped”
        fi

   # Restart the monitor.
   pmfadm -c $PMF_TAG.monitor -n -1 -t -1 $RT_BASEDIR/dns_probe \
      -R $RESOURCE_NAME -G $RESOURCEGROUP_NAME -T $RESOURCETYPE_NAME
   if [ $? -ne 0 ]; then
          logger -p ${SYSLOG_FACILITY}.err -t [$SYSLOG_TAG]\
                  “${ARGV0} Could not restart monitor for HA-DNS “
      exit 1
   else
      logger -p ${SYSLOG_FACILITY}.info -t [$SYSLOG_TAG]\
                    “Monitor for HA-DNS successfully restarted”

   fi
fi
exit 0