Sun Cluster Data Services Developer's Guide for Solaris OS

Controlling an Application

Callback methods enable the RGM to take control of the underlying resource (that is, the application). For example, callback methods enable the RGM to take control of the underlying resource when a node or zone joins or leaves the cluster.

Starting and Stopping a Resource

A resource type implementation requires, at a minimum, a Start method and a Stop method. The RGM calls a resource type's method programs at correct times and on the correct nodes or zones for bringing resource groups offline and online. For example, after the crash of a cluster node or zone, the RGM moves any resource groups that are mastered by that node or zone onto a new node or zone. In this case, you must implement a Start method to provide the RGM with, among other things, a way of restarting each resource on the surviving host node or zone.

A Start method must not return until the resource has been started and is available on the local node or zone. Be certain that resource types that require a long initialization period have sufficiently long timeouts set on their Start methods. To ensure sufficient timeouts, set the default and minimum values for the Start_timeout property in the RTR file.

You must implement a Stop method for situations in which the RGM takes a resource group offline. For example, suppose a resource group is taken offline in ZoneA on Node1 and brought back online in ZoneB on Node2. While taking the resource group offline, the RGM calls the Stop method on resources in the resource group to stop all activity in ZoneA on Node1. After the Stop methods for all resources have completed in ZoneA on Node1, the RGM brings the resource group back online in ZoneB on Node2.

A Stop method must not return until the resource has completely stopped all its activity on the local node or zone and has completely shut down. The safest implementation of a Stop method terminates all processes on the local node or zone that are related to the resource. Resource types that require a long time to shut down need sufficiently long timeouts set on their Stop methods. Set the Stop_timeout property in the RTR file.

Failure or timeout of a Stop method causes the resource group to enter an error state that requires the cluster administrator to intervene. To avoid this state, the Stop and Monitor_stop method implementations must attempt to recover from all possible error conditions. Ideally, these methods must exit with 0 (success) error status, having successfully stopped all activity of the resource and its monitor on the local node or zone.

Deciding Which `Start` and `Stop` Methods to Use

This section provides some tips about when to use the Start and Stop methods as opposed to using the Prenet_start and Postnet_stop methods. You must have in-depth knowledge of both the client and the data service's client-server networking protocol to decide the correct methods to use.

Services that use network address resources might require that start or stop steps be done in a particular order. This order must be relative to the logical host name address configuration. The optional callback methods Prenet_start and Postnet_stop enable a resource type implementation to perform special startup and shutdown operations before and after network addresses in the same resource group are configured to go up or configured to go down.

The RGM calls methods that plumb the network addresses (but do not configure network addresses to go up) before calling the data service's Prenet_start method. The RGM calls methods that unplumb the network addresses after calling the data service's Postnet_stop methods.

The sequence is as follows when the RGM takes a resource group online:

Plumb network addresses.
Call the data service's Prenet_start method (if any).
Configure network addresses to go up.
Call the data service's Start method (if any).

The reverse happens when the RGM takes a resource group offline:

Call the data service's Stop method (if any).
Configure network addresses to go down.
Call the data service's Postnet_stop method (if any).
Unplumb network addresses.

When deciding whether to use the Start, Stop, Prenet_start, or Postnet_stop methods, first consider the server side. When bringing online a resource group that contains both data service application resources and network address resources, the RGM calls methods to configure the network addresses to go up before it calls the data service resource Start methods. Therefore, if a data service requires network addresses to be configured to go up at the time it starts, use the Start method to start the data service.

Likewise, when bringing offline a resource group that contains both data service resources and network address resources, the RGM calls methods to configure the network addresses to go down after it calls the data service resource Stop methods. Therefore, if a data service requires network addresses to be configured to go up at the time it stops, use the Stop method to stop the data service.

For example, to start or stop a data service, you might have to run the data service's administrative utilities or libraries. Sometimes, the data service has administrative utilities or libraries that use a client-server networking interface to perform the administration. That is, an administrative utility makes a call to the server daemon, so the network address might need to be up to use the administrative utility or library. Use the Start and Stop methods in this scenario.

If the data service requires that the network addresses be configured to go down at the time it starts and stops, use the Prenet_start and Postnet_stop methods to start and stop the data service. Consider whether your client software is to respond differently, depending on whether the network address or the data service comes online first after a cluster reconfiguration (either scha_control() with the SCHA_GIVEOVER argument or a switchover with the clnode evacuate command). For example, the client implementation might perform the fewest retries, giving up soon after determining that the data service port is not available.

If the data service does not require the network address to be configured to go up when it starts, start the data service before the network interface is configured to go up. Starting the data service in this way ensures that the data service is able to respond immediately to client requests as soon as the network address has been configured to go up. As a result, clients are less likely to stop retrying. In this scenario, use the Prenet_start method rather than the Start method to start the data service.

If you use the Postnet_stop method, the data service resource is still up at the point the network address is configured to be down. Only after the network address is configured to go down is the Postnet_stop method run. As a result, the data service's TCP or UDP service port, or its RPC program number, always appears to be available to clients on the network, except when the network address is also not responding.

Note –

If you install an RPC service in the cluster, the service must not use the following program numbers: 100141, 100142, and 100248. These numbers are reserved for the Sun Cluster daemons rgmd_receptionist, fed, and pmfd, respectively. If the RPC service that you install uses one of these program numbers, change the program number of that RPC service.

The decision to use the Start and Stop methods as opposed to the Prenet_start and Postnet_stop methods, or to use both, must take into account the requirements and behavior of both the server and client.

Using the Optional `Init`, `Fini`, and `Boot` Methods

Three optional methods, Init, Fini, and Boot, enable the RGM to execute initialization and termination code on a resource.

Using the `Init` Method

The RGM executes the Init method to perform a one-time initialization of the resource when the resource becomes managed as a result of one of the following conditions:

The resource group in which the resource is located is switched from an unmanaged to a managed state.
The resource is created in a resource group that is already managed.

Using the `Fini` Method

The RGM executes the Fini method to clean up after a resource when that resource is no longer managed by the RGM. The Fini method usually undoes any initializations that were performed by the Init method.

The RGM executes Fini on the node or zone where the resource becomes unmanaged when the following situations arise:

The resource group that contains the resource is switched to an unmanaged state. In this case, the RGM executes the Fini method on all nodes and zones in the node list.
The resource is deleted from a managed resource group. In this case, the RGM executes the Fini method on all nodes and zones in the node list.
A node or zone is deleted from the node list of the resource group that contains the resource. In this case, the RGM executes the Fini method on only the deleted node or zone.

A “node list” is either the resource group's Nodelist or the resource type's Installed_nodes list. Whether “node list” refers to the resource group's Nodelist or the resource type's Installed_nodes list depends on the setting of the resource type's Init_nodes property. The Init_nodes property can be set to RG_nodelist or RT_installed_nodes. For most resource types, Init_nodes is set to RG_nodelist, the default. In this case, both the Init and Fini methods are executed on the nodes and zones that are specified in the resource group's Nodelist.

The type of initialization that the Init method performs defines the type of cleanup that the Fini method that you implement needs to perform, as follows:

Cleanup of node-specific configuration.
Cleanup of cluster-wide configuration.

Guidelines for Implementing a `Fini` Method

The Fini method that you implement needs to determine whether to perform only cleanup of node-specific configuration or cleanup of both node-specific and cluster-wide configuration.

When a resource becomes unmanaged on only a particular node or zone, the Fini method can clean up local, node-specific configuration. However, the Fini method must not clean up global, cluster-wide configuration, because the resource remains managed on other nodes. If the resource becomes unmanaged cluster-wide, the Fini method can perform cleanup of both node-specific and global configuration. Your Fini method code can distinguish these two cases by determining whether the resource group's node list contains the local node or zone on which your Fini method is executing.

If the local node or zone appears in the resource group's node list, the resource is being deleted or is moving to an unmanaged state. The resource is no longer active on any node or zone. In this case, your Fini method needs to clean up any node-specific configuration on the local node as well as cluster-wide configuration.

If the local node or zone does not appear in the resource group's node list, your Fini method can clean up node-specific configuration on the local node or zone. However, your Fini method must not clean up cluster-wide configuration. In this case, the resource remains active on other nodes or zones.

You must also code the Fini method so that it is idempotent. In other words, even if the Fini method has cleaned up a resource during a previous execution, subsequent calls to the Fini method exit successfully.

Using the `Boot` Method

The RGM executes the Boot method on nodes or zones that join the cluster, that is, that have just been booted or rebooted.

The Boot method normally performs the same initialization as Init. You must code the Boot method so that it is idempotent. In other words, even if the Boot method has initialized the resource during a previous execution, subsequent calls to the Boot method exit successfully.

Controlling an Application

Starting and Stopping a Resource

Deciding Which Start and Stop Methods to Use

Using the Optional Init, Fini, and Boot Methods

Using the Init Method

Using the Fini Method