Sun Cluster 2.2 API 開発ガイド

in.named データサービスの障害監視メソッド

Sun Cluster を使用すると、HA データサービスの作成者はデータサービス用の障害監視メソッドを作成できます。たとえば、in.named 用の障害モニターを作成し、nslookup(1M) を使用して定期的に in.named に照会できます。非常に長いタイムアウト値を使用して照会がタイムアウトした場合、障害モニターは、in.named デーモンがハングアップしたために、強制終了および再起動しなければならないと判断します。

障害監視を実行するのは、in.named が動作している物理ホスト、つまり、in.named が使用する論理ホストをマスターするホスト上だけです。論理ホストをマスターしない物理ホスト上では障害監視を実行しません。

障害モニターは FM_START メソッドにより起動され、FM_STOP メソッドにより停止されます。FM_INIT メソッドは必要ありません。つまり、hareg(1M) の呼び出し時、HA-in.namedFM_INIT を登録しません。

次に、in.named データサービス用の FM_START メソッドの例を示します。


#! /bin/sh
 # Copyright 26 Oct 1996 Sun Microsystems, Inc.  All Rights Reserved.
 #ident "@(#)innamed_fm_start.sh  1.1  96/04/13 SMI"
 # HA in.named fm_start method
 # Called-back by Solaris Cluster as the FM_START method for HA in.named.
 #
 ARGV0=`basename $0`
 SYSLOG_FACILITY=`haget -f syslog_facility`

 MASTERED_LOGICAL_HOSTS="$1"
 if [ -z "$MASTERED_LOGICAL_HOSTS" ]; then
 		# This physical host does not currently master any logical hosts.
 		exit 0
 fi

 # Replace comma with space to form an sh word list:
 MASTERED_LOGICAL_HOSTS="`echo $MASTERED_LOGICAL_HOSTS  tr ',' ' '`"

 # Dynamically search the list of logical hosts which this physical
 # host currently masters, to see if one of them is the logical host
 # that HA-in.named uses.

 MYLH=
 for LH in $MASTERED_LOGICAL_HOSTS ; do
 	# Map logical hostname to administrative file system name:
 	PATHPREFIX_FS=`haget -f pathprefix $LH`
 	CONFIG="${PATHPREFIX_FS}/hainnamed/hainnamed.config"

 	if [ -f $CONFIG ]; then
 			MYLH=$LH
 			break
 	fi
 done
 if [ -z "$MYLH" ]; then
 	# This host does not currently master the logical host
 	# that HA-in.named uses.
 	exit 0
 fi

 # This host currently masters the logical host that HA in.named uses,
 # $MYLH.
 # Create an asynchronous process to periodically probe the in.named
 # daemon, under the control of the process monitor facility.
 # The asynchronous probe is in its own shell script:
 #     hainnamed_fmprobe
 # The asynchronous process will be terminated by the FM_STOP method.
 pmfadm -c hainnamedfm hainnamed_fmprobe $MYLH
 exit 0

次に、in.named データサービス用の FM_STOP メソッドの例を示します。


#! /bin/sh
 #
 # Copyright 26 Oct 1996 Sun Microsystems, Inc.  All Rights Reserved.
 #
 #ident "@(#)innamed_fm_stop.sh  1.1  96/04/13 SMI"
 #
 # HA in.named fm_stop method
 #
 # Called back by Sun Cluster as the FM_STOP method for HA in.named.
 #
 # Stop the asynchronous fault monitoring process that was created
 # earlier under the control of pmfd.
 #
 # Ignore errors when calling pmfadm just in case the hainnamed_fmprobe
 # is already not running.  Reasons for it being already not running
 # include the fact that it is started only on the physical host that
 # currently masters the logical host, the fact that FM_STOP can be
 # called even though FM_START has not be en called, and the fact
 # that it may have died an early death all by itself.
 pmfadm -s hainnamedfm TERM >/dev/null 2>&1
 exit 0

次に、in.named データサービス用の検証スクリプト例 ha.innamed_fmprobe を示します。このスクリプトは、FM_START メソッドによるプロセス監視機能の制御下で起動されます。


#! /bin/sh
 #
 # Copyright 26 Oct 1996 Sun Microsystems, Inc.  All Rights Reserved.
 #
 #ident "@(#)hainnamed_fmprobe.sh  1.1  96/04/13 SMI"
 #
 # Usage: hainnamed_fmprobe logical_host
 #
 # Periodically probes the in.named running on the logical_host.
 # If the probe times out, then this script will query the pmfd to
 # see if the pmfd is still running in.named:
 # (i) if so, this script assumes that in.named is hung and
 # sends a KILL signal to the in.named process, causing it to
 # die.  pmfd will restart in.named provided it has not used
 # up its ration of restarts per time period.
 # (ii) if not, this script will assume that in.named has exhausted
 # its ration of restarts.  This script will call hactl -g to give up
 # mastery of the logical host to some other new master physical host.
 #
 ARGV0=`basename $0`
 LOGICAL_HOST="$1"
 SYSLOG_FACILITY=`haget -f syslog_facility`
 PROBE_INTERVAL_SECS=60
 MIN_PROBE_SECS=`hactl -f min_probe_timeout_secs`
 PROBE_TIMEOUT_SECS=`expr $MIN_PROBE_SECS + 180`
 CLUSTER_KEY=`hactl -f cluster_key`
 NSLOOKUP=/usr/sbin/nslookup
 if [ ! -x $NSLOOKUP  -o  ! -s $NSLOOKUP ]; then
 	logger ${SYSLOG_FACILITY}.err ¥
 		"${ARGV0}: $NSLOOKUP does not exist or is not executable"
 	exit 1
 fi

 while true; do
 	# Call nslookup under a timeout, using hatimerun.
 	# The -norecurse option tells in.named not to consult
 	# other name service instances on other hosts beyond the
 	# one on $LOGICAL_HOST.
 	# The -retry=10000 is telling nslookup to take forever
 	# retrying: this means that for a hung server, nslookup
 	# will never itself giveup, rather, the timeout on hatimerun
 	# will expire first.
 	hatimerun -t $PROBE_TIMEOUT_SECS ¥
 		$NSLOOKUP -norecurse -retry=10000 $LOGICAL_HOST $LOGICAL_HOST
 	if [ $? -ne 99 ]; then
 			sleep $PROBE_INTERVAL_SECS
 			continue
 	fi

 	# Here when the timeout occurred.
 	logger -p ${SYSLOG_FACILITY}.err ¥
 		"${ARGV0}: nslookup of in.named on $LOGICAL_HOST timed-out"
 	if pmfadm -q hainnamed then
 			# The in.named process exists.  Kill it on the
 			# assumption that it is hung.  Sleep a short time,
 			# and if hainnamed still exists in the pmfd, assume
 			# that pmfd is restarting it (it has not yet used
 			# up its ration of restarts per time interval.)
 			logger -p ${SYSLOG_FACILITY}.err ¥
 				"${ARGV0}: KILLing hung in.named"
 			pmfadm -k hainnamed KILL
 			sleep 30
 			if pmfadm -q hainnamed; then
 					continue
 			fi
 	fi
 	# Here when pmfadm -q says that hainnamed no longer
 	# exists in pmfd.  Assume that the ration of restarts
 	# was exhausted.  Also assume that something is amiss
 	# that moving to a new master could improve.
 	logger -p ${SYSLOG_FACILITY}.err ¥
 		"${ARGV0}: in.named restarted too many times, not restarting"
 	logger -p ${SYSLOG_FACILITY}.err ¥
 		"${ARGV0}: giving up mastery of $LOGICAL_HOST"
 	hactl -g -s hainnamed -k $CLUSTER_KEY -l $LOGICAL_HOST
 done