Sun Cluster 3.1 10/03 Data Services Developer's Guide

Controlling an Application

Callback methods enable the RGM to take control of the underlying resource (application) whenever nodes are in the process of joining or leaving the cluster.

Starting and Stopping a Resource

A resource type implementation requires, at a minimum, a Start method and a Stop method. The RGM calls a resource type's method programs at appropriate times and on the appropriate nodes for bringing resource groups offline and online. For example, after the crash of a cluster node, the RGM moves any resource groups mastered by that node onto a new node. You must implement a Start method to provide the RGM with a way of restarting each resource on the surviving host node.

A Start method must not return until the resource has been started and is available on the local node. Be certain that resource types requiring a long initialization period have sufficiently long timeouts set on their Start methods (set default and minimum values for the Start_timeout property in the resource type registration file).

You must implement a Stop method for situations in which the RGM takes a resource group offline. For example, suppose a resource group is taken offline on Node1 and back online on Node2. While taking the resource group offline, the RGM calls the Stop method on resources in the group to stop all activity on Node1. After the Stop methods for all resources have completed on Node1, the RGM brings the resource group back online on Node2.

A Stop method must not return until the resource has completely stopped all its activity on the local node and has completely shut down. The safest implementation of a Stop method would terminate all processes on the local node related to the resource. Resource types requiring a long time to shut down should have sufficiently long timeouts set on their Stop methods. Set the Stop_timeout property in the resource type registration file.

Failure or timeout of a Stop method causes the resource group to enter an error state that requires operator intervention. To avoid this state, the Stop and Monitor_stop method implementations should attempt to recover from all possible error conditions. Ideally, these methods should exit with 0 (success) error status, having successfully stopped all activity of the resource and its monitor on the local node.

Deciding Which Start and Stop Methods to Use

This section provides some tips about when to use the Start and Stop methods versus using the Prenet_start and Postnet_stop methods. You must have in-depth knowledge of both the client and the data service's client-server networking protocol to decide the methods that are correct to use.

Services that use network address resources might require that start or stop steps be done in a particular order that is relative to the logical hostname address configuration. The optional callback methods Prenet_start and Postnet_stop allow a resource type implementation to do special start-up and shutdown actions before and after network addresses in the same resource group are configured up or configured down.

The RGM calls methods that plumb (but do not configure up) the network addresses before calling the data service's Prenet_start method. The RGM calls methods that unplumb the network addresses after calling the data service's Postnet_stop methods. The sequence is as follows when the RGM takes a resource group online.

  1. Plumb network addresses.

  2. Call data service's Prenet_start method (if any).

  3. Configure network addresses up.

  4. Call data service's Start method (if any).

The reverse happens when the RGM takes a resource group offline:

  1. Call data service's Stop method (if any).

  2. Configure network addresses down.

  3. Call data service's Postnet_stop method (if any).

  4. Unplumb network addresses.

When deciding whether to use the Start, Stop, Prenet_start, or Postnet_stop methods, first consider the server side. When bringing online a resource group containing both data service application resources and network address resources, the RGM calls methods to configure up the network addresses before it calls the data service resource Start methods. Therefore, if a data service requires network addresses to be configured up at the time it starts, use the Start method to start the data service.

Likewise, when bringing offline a resource group that contains both data service resources and network address resources, the RGM calls methods to configure down the network addresses after it calls the data service resource Stop methods. Therefore, if a data service requires network addresses to be configured up at the time it stops, use the Stop method to stop the data service.

For example, to start or stop a data service, you might have to invoke the data service's administrative utilities or libraries. Sometimes, the data service has administrative utilities or libraries that use a client-server networking interface to perform the administration. That is, an administrative utility makes a call to the server daemon, so the network address might need to be up to use the administrative utility or library. Use the Start and Stop methods in this scenario.

If the data service requires that the network addresses be configured down at the time it starts and stops, use the Prenet_start and Postnet_stop methods to start and stop the data service. Consider whether your client software will respond differently depending on whether the network address or the data service comes online first after a cluster reconfiguration (either scha_control() with the SCHA_GIVEOVER argument or a switchover with scswitch). For example, the client implementation might do minimal retries, giving up soon after determining that the data service port is not available.

If the data service does not require the network address to be configured up when it starts, start it before the network interface is configured up. This ensures that the data service is able to respond immediately to client requests as soon as the network address has been configured up, and clients are less likely to stop retrying. In this scenario, use the Prenet_start method rather than the Start method to start the data service.

If you use the Postnet_stop method, the data service resource is still up at the point the network address is configured to be down. Only after the network address is configured down is the Postnet_stop method invoked. As a result, the data service's TCP or UDP service port, or its RPC program number, always appears to be available to clients on the network, except when the network address also is not responding.

The decision to use the Start and Stop methods versus the Prenet_start and Postnet_stop methods, or to use both, must take the requirements and behavior of both the server and client into account.

Init, Fini, and Boot Methods

Three optional methods, Init, Fini, and Boot enable the RGM to execute initialization and termination code on a resource. The RGM invokes the Init method to perform a one-time initialization of the resource when the resource becomes managed—either when the resource group it is in is switched from an unmanaged to a managed state, or when it is created in a resource group that is already managed.

The RGM invokes the Fini method to clean up after the resource when the resource becomes unmanaged—either when the resource group it is in is switched to an unmanaged state or when it is deleted from a managed resource group. The clean up must be idempotent, that is, if the clean up has already been done, Fini exits 0 (success).

The RGM invokes the Boot method on nodes that have newly joined the cluster, that is, have been booted or rebooted.

The Boot method normally performs the same initialization as Init. This initialization must be idempotent, that is, if the resource has already been initialized on the local node, Boot and Init exit 0 (success).