Sun Cluster 2.2 API Developer's Guide

2.2 Setting Up the Sample Application

The in.named data service uses only one logical host, even when the underlying cluster has more than one logical host. The method implementations will compute dynamically which logical host is being used. For example, if the hahost1 logical host is used, then the in.named data is placed on the hahost1 disk set.

An administrator can place the boot file (pointed to by the -b flag argument) on any arbitrary file system in the diskset, depending on which file system has space. However, the HA-in.named method implementations need a specific starting point from which to find the boot file. The sample application places this starting point in the administrative file system under the hainnamed subdirectory. It is placed in the hainnamed.config configuration file, which contains a single directory name that indicates a directory elsewhere in the logical host's multihosted disk. This is where the data actually resides (it is a level of indirection).

For our hahost1 logical host, the path name for the file hainnamed.config is:

/hahost1/hainnamed/hainnamed.config

In general, the path name for an arbitrary logical host would be:

/loghost/hainnamed/hainnamed.config

The HA-in.named methods are written to compute dynamically which logical host is being used for HA-in.named by testing, for the presence or absence of this configuration file, for each logical host.

For example, if file systems A1 through A5 reside on the hahost1 diskset, and the administrator chooses to locate the HA-in.named data in the directory /hahost1/A1/hainnamed, then the hainnamed.config file must contain that directory name.

In the /hahost1/A1/hainnamed directory, the administrator must create a named.boot file for in.named. (See the in.named(1M) man page for information about the contents of the named.boot file.) The administrator updates the in.named database by editing the named.boot file in this directory, just as he or she would edit the /etc/named.boot file in a non-HA in.named configuration. See "2.2.8 Administering HA-in.named: Updating the Database"," for additional discussion of administration and updates.

2.2.1 Basic Functionality of the `in.named` Method Implementations

Consider the basic functionality of the HA-in.named method implementations. The start method is not registered in this case, and all the work is accomplished in the start_net method. Similarly, the stop method is not registered for HA-in.named, and all the work is accomplished in the stop_net method. The start_net method starts up the in.named daemon, and the stop_net method kills the in.named daemon by sending a -TERM signal.

The Sun Cluster API requires each method to be idempotent--that is, repeated calls on a method must have the same effect as a single call on that method. For HA-in.named, the idempotency is achieved by having each method test whether its work has already been accomplished. That is, start_net tests whether the in.named daemon is already running, and stop_net tests whether the in.named daemon is already stopped.

The Sun Cluster process monitor facility consists of two components, the pmfadm(1M) command and the rpc.pmfd(1M) process monitor daemon. In the sample application, the pmfadm(1M) command is used to start and kill the in.named daemon, and to query whether the in.named daemon is already running. See the pmfadm(1M) and rpc.pmfd(1M) man pages for details.

The HA-in.named method implementations use the haget(1M) utility program to extract information about the Sun Cluster configuration. (See the haget(1M) man page for details.) The method implementations log their error messages to syslog(3), because the code runs without user attendance. They use the same syslog facility that Sun Cluster uses. Determine the syslog facility name by calling haget(1M) with the option -f syslog_facility.

2.2.2 `start_net` Method for the `in.named` Data Service

The following is a sample start_net method for the in.named data service.

#! /bin/sh
 #
 #Copyright 13 Apr 1996 Sun Microsystems, Inc. All Rights Reserved.
 #
 #ident	 "@(#)innamed_start_net.sh	 1.1 	96/04/13 SMI"
 #
 # HA-in.named start_net method
 ARGV0=`basename $0`
 SYSLOG_FACILITY=`haget -f syslog_facility`
MASTERED_LOGICAL_HOSTS="$1"
 if [ -z "$MASTERED_LOGICAL_HOSTS" ]; then
 	# This physical host does not currently master any logical hosts.

	exit 0

fi

# Replace comma with space to form an sh word list:

MASTERED_LOGICAL_HOSTS="`echo $MASTERED_LOGICAL_HOSTS | tr ',' '`"

# Dynamically search the list of logical hosts which this physical
 # host currently masters, to see if one of them is the logical host
 # that HA-in.named uses.

MYLH=

for LH in $MASTERED_LOGICAL_HOSTS ; do
 	# Map logical hostname to administrative file system name:
 	PATHPREFIX_FS=`haget -f pathprefix $LH`

	CONFIG="${PATHPREFIX_FS}/hainnamed/hainnamed.config"
 	if [ -f $CONFIG ]; then
 			MYLH=$LH
 			break
 	fi
 done
 if [ -z "$MYLH" ]; then
 	# This host does not currently master the logical host
 	# that HA-in.named uses.
 	exit 0
 fi

 # This host currently masters the logical host that HA-in.named uses, $MYLH
 # See if in.named is already running, if so exit. (We must have
 # started it on some earlier cluster reconfiguration when this
 # physical host first took over mastery of the $MYLH logical host.)
 # We determine whether in.named is already running by using the pmfadm
 # command to query its status: if the query succeeds, it is already
 # running.
 if pmfadm -q hainnamed >/dev/null 2>&1 ; then
 	exit 0
 fi

 HA_INNAMED_DIR="`cat $CONFIG`"
 if [ ! -d $HA_INNAMED_DIR ]; then
 	logger -p ${SYSLOG_FACILITY}.err \
 			"${ARGV0}: directory $HA_INNAMED_DIR missing or not mounted"
 	exit 1
 fi

 # We cd to the HA_INNAMED_DIR directory because the named.boot file
 # contains the names of other files. By cd'ing, we permit all of
 # those names to be relative names, relative to the current directory
 cd $HA_INNAMED_DIR
 if [ ! -s named.boot ]; then
 	logger -p ${SYSLOG_FACILITY}.err \
 			"${ARGV0}:file $HA_INNAMED_DIR/named.boot is missing or empty"
 	exit 1
 fi

 # Run the in.named daemon under the control of the Sun Cluster process
 # monitory facility. Let it crash and restart up to 4 times an hour;
 # if it crashes more often than that, the process monitor facility daemon
 # will cease trying to restart it.			
 pmfadm -c hainnamed -n 4 -t 60 /usr/sbin/in.named -b named.boot
 if [ $? -ne 0 ]; then						
 	logger -p ${SYSLOG_FACILITY}.err \
 			"${ARGV0}: pmfadm -c of in.named failed"
 	exit 1							
 fi									
 exit 0

2.2.3 `stop_net` Method for the `in.named` Data Service

The following is a sample stop_net method for the in.named data service.

#! /bin/sh
# Copyright 13 Apr 1996 Sun Microsystems, Inc. All Rights Reserved.
#
#ident "@(#)innamed_stop_net.sh 1.1 96/04/13 SMI"
#
# HA-in.named stop_net method
#
ARGV0=`basename $0`
SYSLOG_FACILITY=`haget -f syslog_facility`
NOT_MASTERED_LOGICAL_HOSTS="$2"
if [ -z "$NOT_MASTERED_LOGICAL_HOSTS" ]; then
# This physical host currently masters all logical hosts.
exit 0
fi
# Replace comma with space to have an sh word list: # NOT_MASTERED_LOGICAL_HOSTS="`echo $NOT_MASTERED_LOGICAL_HOSTS |tr ',' ' '`"
# Dynamically search the list of logical hosts that this physical
# host should not master, to see if one of them is the logical host
# that HA-in.named uses. There are two cases to consider:
# (1) This physical host gave up mastery of that logical host during
# some earlier cluster reconfiguration. In that case, the HA administrative
# file system for the logical host will no longer be mounted so the
# /HA administrative_file_system/hainnamed directory will not exist.
# This method has no work to do, because the work got done during the
# earlier cluster reconfiguration when this physical host first gave up
# mastery of the logical host.
# (2) This cluster reconfiguration is the one in which this physical
# host is giving up mastery of the logical host. In that case, the
# administrative file system is still mounted when the stop_net method
# is called and the /HA administrative_file_system/hainnamed directory
# will exist.

MYLH=
for LH in $NOT_MASTERED_LOGICAL_HOSTS ; do
# Map logical hostname to pathprefix file system name:
PATHPREFIX_FS=`haget -f pathprefix $LH`

CONFIGDIR="${PATHPREFIX_FS}/hainnamed
if [ -d $CONFIGDIR ]; then
MYLH=$LH
break
fi
done
if [ -z "$MYLH" ]; then
# This host is not giving up mastery of the HA-in.named logical host
# during this cluster reconfiguration.
exit 0
fi
# This host is giving up mastery of the HA-in.named logical host, $MYLH
# during this cluster reconfiguration.
#
# See if in.named is running, and if so, kill it. If it is not running,
# then either we must have killed it during some earlier reconfiguration
# when this physical host first gave up mastery of the logical host, or
# this physical host has not had mastery of the logical host since it
# last rebooted.
#
# Tell process monitor to kill the in.named daemon, if it was already
# running.
if pmfadm -q hainnamed; then
pmfadm -s hainnamed TERM
if [ $? -ne 0 ]; then
logger ${SYSLOG_FACILITY}.err \
"${ARGV0}: pmfadm -s of in.named failed"
exit 1
fi
fi

exit 0

2.2.4 `abort_net` Method for the `in.named` Data Service

The abort method is not registered for the HA-in.named example. The abort_net method uses the same code as the stop_net method; when HA-in.named is registered with Sun Cluster by the hareg(1M) utility, the abort_net registration points to the code used by stop_net.

2.2.5 Setting Timeout Values for `in.named` Methods

When you register your data service using hareg(1M), you can specify timeout values for methods you have created, such as the start_net, stop_net, and fm_start methods. However, the timeout values you set for your methods must be less than half the timeout value set for logical host takeover during cluster reconfiguration. The default timeout value for logical host takeover is 180 seconds. Therefore, if the timeout values you set for your methods are greater than 90 seconds, you must increase the timeout value for logical host takeover. Otherwise, your methods will time out.

You can increase the logical host takeover timeout values with the scconf(1M) command. Refer to the scconf(1M) man page and to the section on configuring timeouts for cluster transition steps in Chapter 3 of the Sun Cluster 2.2 System Administration Guide for details.

2.2.6 Improving the `in.named` Methods

Consider some possible improvements to the start_net and stop_net methods for HA-in.named. The methods can benefit from better error detection and handling. For example, you can test whether the /usr/sbin/in.named binary exists, is executable, and is non-empty. If not, an error message can be logged. Before attempting to cat(1) the file hainnamed.config, verify that the file exists, exhibits the correct permissions, and is non-empty.

The methods also can test for the existence of the non-HA in.named data file /etc/named.boot. If the file exists, there is confusion about whether this host is running non-HA in.named or HA-in.named; only one can run at a time. The code can treat this case as a severe configuration error, log appropriate messages, and neither start nor kill in.named.

2.2.7 DNS Clients

In Solaris, a host that is a client of DNS has an /etc/resolv.conf file. The file lists name server hosts to contact for DNS service. The name server hosts are listed as IP addresses rather than host names. More than one host IP address might be listed.

Network clients of HA-in.named would list the IP address of the logical host, for example, that of hahost1, in the /etc/resolv.conf file.

There are periods when a physical host does not master the logical host that HA-in.named uses. However, the host must have the ability to be a client of HA-in.named during those periods. To achieve this, add the IP address of the logical host to the /etc/resolv.conf file on all physical hosts of the cluster.

2.2.8 Administering `HA-in.named`: Updating the Database

Administration of HA-in.named resembles that of non-HA in.named. To update the in.named database, log in to the server (it is a security risk to grant root NFS access to the file system where the in.named data files are stored). For HA-in.named, log in to the physical server that currently masters the logical host that HA-in.named has been configured to use. Use the hastat(1M) utility to determine which physical host masters which logical hosts.

You perform an update to HA-in.named by editing its data files. Do this in a way that leaves the data files well-formed in the event of a sudden crash. For example, after logging in, cd to the directory where the HA-in.named data is stored (in our example, the directory /hahost1/A1/hainnamed). Then edit a new temporary copy of the data file, and once you are finished, move this copy onto the real data file name. For example:

% cd /hahost1/A1/hainnamed
% cp named.boot named.boot.new
% vi named.boot.new
% sync
% mv named.boot.new named.boot

As explained in the in.named(1M) man page, you then can use the kill(1M) command to send a SIGHUP signal to the in.named daemon, to cause it to re-read the file.

2.2.9 Documenting `HA-in.named`

You must document the installation and configuration of the highly available data service. This documentation must explain how to configure any administrative files that live in the administrative file system, and how to install the data service's data on one or more of the logical host's file systems or raw partitions. You should also document administration history and updates for the HA version of your data service.

2.2 Setting Up the Sample Application

2.2.1 Basic Functionality of the in.named Method Implementations

2.2.2 start_net Method for the in.named Data Service

2.2.3 stop_net Method for the in.named Data Service

2.2.4 abort_net Method for the in.named Data Service

2.2.5 Setting Timeout Values for in.named Methods

2.2.6 Improving the in.named Methods