When a heartbeat is lost, you can configure the Sun Cluster Geographic Edition software to send email notification or to execute an action script. You configure loss of heartbeat notification by using the optional Notification_emailaddrs and Notification_actioncmd properties.
Heartbeat-loss notification occurs if the heartbeat still fails after the interval you configure with the Query_interval property of the heartbeat. The heartbeat monitor sends out a heartbeat request to the responder on the logical host every Query_interval period. If no response is received within the Query_interval, an internal count is incremented. If the recount reaches the number that is specified in the heartbeat.retries property, the heartbeat is deemed to have failed.
For example, you can use sing the default Query_interval of 120 seconds and the default heartbeat.retries of 3. The heartbeat-lost event will be sent a maximum of 10 minutes after the last heartbeat response from the partner cluster.
120sec (delay since last query) + 3*120sec (wait for normal response) + 120 sec (wait for retry response) |
Additional delays can occur between the generation of the heartbeat-lost system event and the triggering of the heartbeat-loss notification. You might experience some further delay of the delivery of email if you configure email notification.
A heartbeat loss event does not necessarily indicate that the remote cluster has crashed.
The following sections describe how to configure the heartbeat-loss notification properties and how to create a custom action script that the Sun Cluster Geographic Edition software executes after a heartbeat-loss event.
You can configure loss of heartbeat notification by using two partnership properties, Notification_emailaddrs and Notification_actioncmd. You set these properties by using the geops command.
You can set these properties on the default heartbeat during partnership creation. For more information, see How to Create a Partnership. You can also modify these properties by using the procedure that is described in How to Modify the Heartbeat Properties.
If you want to be notified of heartbeat loss by email, set the Notification_emailaddrs property. You can specify a list of email addresses, separated by commas. If you want to use email notification, the cluster nodes must be configured as email clients. For more information about configuring mail services, see the Solaris System Administration Guide: Network Services.
If you want a command to be executed in response to heartbeat loss, set the Notification_actioncmd property.
A notification email address and a custom notification script are specified for the existing partnership, paris-newyork-ps, as follows:
phys-paris-1# geops set-prop \ -p Notification_emailaddrs=ops@paris.com,ops@newyork.com \ -p Notification_actioncmd=/opt/hb_action.sh paris-newyork-ps |
You can create an action shell script that is executed when the local cluster detects a loss of heartbeat with the partner cluster. The script is executed with root permissions, so the file must have root ownership and execution permissions.
If you have configured the Notification_actioncmd property, the action command is executed with arguments that provide information about the event in the following command line:
# custom-action-command-path -c local-cluster-name -r remote-cluster-name -e 1 \ -n node-name -t time |
Specifies a path to the action command you have created
Specifies the name of the local cluster
Specifies the name of the remote partner cluster
Specifies that HBLOST=1, meaning that a heartbeat-loss event has occurred
Specifies name of the cluster node that sent the heartbeat-loss event notification
Specifies the time of the heartbeat-loss event as the number of milliseconds since January 1, 1970, 00:00:00 GMT
You can use this script to perform an automatic takeover on the secondary cluster. However, such an automated action is risky. If the heartbeat loss notification is caused by a total loss of all heartbeat connectivity on both the primary and secondary clusters, such an automated action could lead to a situation where two primary clusters exist.
This example shows the event information that is provided in the command-line being parsed in a notification action shell script.
#!/bin/sh set -- `getopt abo: $*` if [ $? != 0] then echo $USAGE exit 2 fi for i in $* do case $i in -p) PARTNER_CLUSTER=$1; shift;; -e) HB_EVENT=$2; shift;; -c) LOCAL_CLUSTER=$3; shift;; -n) EVENT_NODE=$4; shift;; esac done |