Sun Cluster 2.2 System Administration Guide

Configuring Timeouts for Cluster Transition Steps

Sun Cluster has configurable timeouts for the cluster transition steps where logical hosts of the HA framework are taken over and given up as cluster membership changes. Adapt these timeouts as needed to effectively handle configurations consisting of large numbers of data services on each node. It is impractical to have constant timeout values for a wide variety of configurations, unless the timeouts are set to a very large default value.

There are essentially two considerations when tuning timeouts:

Number of logical hosts per cluster node
Number of data services on a logical host

It is difficult to estimate what the correct value should be for a particular installation. These values should be arrived at by trial and error. You can use as guidelines the cluster console messages related to the beginning and end of each cluster transition step. They should give you a fairly good idea of how long a step takes to execute.

The timeouts need to account for the worst case scenario. When you configure cluster timeouts, take into consideration the maximum number of logical hosts that a cluster node can potentially master at any time.

For example, in an N+1 configuration, the standby node can potentially master all the logical hosts of the other cluster nodes. In this case, the reconfiguration timeouts must be large enough to accommodate the time needed to master all of the logical hosts configured in the cluster.

How to Adjust Cluster Timeouts

Adjust the cluster reconfiguration timeouts by using the scconf -T command.

For example, to change the configurable transition step timeout values to 500 seconds, you would run the following command on all cluster nodes.
# scconf clustername -T 500
The default values for these steps are 720 seconds. Use the ssconf -p command to see the current timeout values.

Within the reconfiguration steps, the time taken to master a single logical host can vary depending on how many data services are configured on each logical host. If there is insufficient time to master a logical host--if the loghost_timeout parameter is too small--messages similar to the following appear on the console:
ID[SUNWcluster.ccd.ccdd.5001]: error freeze cmd = command /opt/SUNWcluster/bin/loghost_sync timed out.
The cluster framework makes a "best effort" to bring the system to a consistent state by attempting to give up the logical host. If this is not successful, the node may abort from the cluster to prevent inconsistencies.

Use the scconf -l option to adjust the loghost_timeout parameter.

The default is 180 seconds.

Note -
The reconfiguration step timeouts can never be less than the loghost_timeout value. Otherwise, an error results and the cluster configuration file is not modified. This requirement is verified by the scconf -T or scconf -l options. A warning is printed if either of these timeouts is set to 100 seconds or less.