Sun Cluster 3.0 Data Services Developers' Guide

Starting and Stopping a Resource

A resource type implementation requires, at a minimum, a START method and a STOP method. The RGM calls a resource type's method functions or programs at appropriate times and on the appropriate nodes for bringing resource groups offline and online. For example, after the crash of a cluster node, the RGM moves any resource groups mastered by that node onto a new node. You must implement a START method to provide the RGM with a way of restarting each resource on the surviving host node.

A START method must not return until the resource has been started and is available on the local node. Be certain that resource types requiring a long initialization period have sufficiently long timeouts set on their START methods (set default and minimum values for the Start_timeout property in the resource type registration file).

You must implement a STOP method for situations in which the RGM takes a resource group offline. For example, suppose a resource group is taken offline on Node1 and back online on Node2. While taking the resource group offline, the RGM calls the STOP method on resources in the group to stop all activity on Node1. After the STOP methods for all resources have completed on Node1, the RGM brings the resource group back online on Node2.

A STOP method must not return until the resource has completely stopped all its activity on the local node and has completely shut down. The safest implementation of a STOP method would terminate all processes on the local node related to the resource. Resource types requiring a long time to shut down should have sufficiently long timeouts set on their STOP methods. Set the Stop_timeout property in the resource type registration file.

Failure or timeout of a STOP method causes the resource group to enter an error state that requires operator intervention. To avoid this state, the STOP and MONITOR_STOP method implementations should attempt to recover from all possible error conditions. Ideally, these methods should exit with 0 (success) error status, having successfully stopped all activity of the resource and its monitor on the local node.