Sun Cluster 3.1 Data Services Developer's Guide

Chapter 7 Designing Resource Types

This chapter explains the typical usage of the DSDL in designing and implementing resource types. It focuses on designing the resource type to validate the resource configuration, and to start, stop, and monitor the resource. It then describes how to implement the resource type callback methods using the DSDL. Please also refer to the rt_callbacks(1HA) man page.

A resource type developer needs access to the resource's property settings to complete these tasks. The DSDL utility, scds_initialize(), gives the programmer a uniform way to access the resource properties. This function is designed to be called at the beginning of each callback method. This utility function retrieves all the properties for a resource from the cluster framework and makes it available to the family of scds_get() functions.

The RTR File

The Resource Type Registration (RTR) file is a very important component of a resource type. It specifies the details about the resource type to Sun Cluster. This includes information such as what properties are needed by the implementation, the data types of those properties, the default values of those properties, the file system path for the callback methods for the resource type implementation, and various settings for the system defined properties.

The sample RTR file shipped with the DSDL should suffice for most resource type implementations after editing some basic elements such as the resource type name and the pathname of the resource type callback methods. If a new property is needed to implement the resource type, programmers can declare it as an extension property in the Resource Type Registration (RTR) file of the resource type implementation, and then access the new property using the DSDL scds_get_ext_property() utility.

The `Validate` Method

The Validate method of a resource type implementation is called by the RGM in two scenarios: 1) when a new resource of the resource type is being created, and 2) when a property of the resource or resource group is being updated. These two scenarios can be distinguished by the presence of the command line option -c (creation) or -u (update) passed to the Validate method of the resource.

The Validate method is called on each node of a set of nodes, where the set of nodes is defined by the value of the resource type property INIT_NODES. If INIT_NODES is set to RG_PRIMARIES, Validate is called on each node that can host (be a primary of) the resource group containing the resource. If INIT_NODES is set to RT_INSTALLED_NODES, Validate is called on each node where the resource type software is installed, typically all nodes in the cluster. The default value of INIT_NODES is RG_PRIMARIES (see rt_reg(4). At the point the Validate method is called, the RGM has not yet created the resource (in the case of creation callback) or has not yet applied the updated value(s) of the properties being updated (in the case of update callback). The purpose of the Validate callback method of a resource type implementation is to check that the proposed resource settings (as specified by the proposed property settings on the resource) are acceptable to the resource type.

Note –

If you are using local file systems managed by HAStoragePlus, you use the scds_hasp_check to check the state of the HAStoragePlus resource, This information is obtained from the state (online or otherwise) of all SUNW.HAStoragePlus(1) resources that the resource depends upon using Resource_dependencies or Resource_dependencies_weak system properties defined for the resource. .See scds_hasp_check(3HA) for a complete list of status codes returned from the scds_hasp_check call.

The DSDL function scds_initialize() takes care of these situations in the following manner:

In the case of resource creation, it parses the proposed resource properties, as passed on the command line. The proposed values of resource properties are thus available to the resource type developer as if the the resource were already created in the system.
In the case of resource or resource group update, the proposed values of the properties being updated by the administrator are read in from the command line, and the remaining properties (whose values are not being updated) are read in from Sun Cluster using the Resource Management API. A resource type developer using the DSDL need not concern himself with all these housekeeping tasks. The validation of a resource can be done as if all the properties of the resource were available to the developer.

Suppose the function that implements the validation of a resource's properties is called svc_validate() which uses the scds_get_*() family of functions to look at the property it is interested in validating. Assuming that an acceptable resource setting is represented by a 0 return code from this function, the Validate method of the resource type can thus be represented by the following code fragment:

in
tmain(int argc, char *argv[])
{
   scds_handle_t handle;
   int rc;

   if (scds_initialize(&handle, argc, argv)!= SCHA_ERR_NOERR) {
   return (1);   /* Initialization Error */
   }
   rc = svc_validate(handle);
   scds_close(&handle);
   return (rc);
}

The the validation function should also log the reason for the failure of the validation of resource. Leaving out that detail (see the next chapter for a more realistic treatment of a validation routine), a simple example svc_validate() routine can then be implemented as:

int
svc_validate(scds_handle_t handle)
{
   scha_str_array_t *confdirs;
   struct stat    statbuf;
   confdirs = scds_get_confdir_list(handle);
   if (stat(confdirs->str_array[0], &statbuf) == -1) {
   return (1);   /* Invalid resource property setting */
   }
   return (0);   /* Acceptable setting */
}

The resource type developer thus has to concern himself with only the implementation of the svc_validate() routine. A typical example for a resource type implementation could be to ensure that an application configuration file named app.conf exists under the Confdir_list property. That can be conveniently implemented by a stat() system call on the appropriate pathname derived from the Confdir_list property.

The `Start` Method

The Start callback method of a resource type implementation is called by the RGM on a chosen cluster node to start the resource. The resource group name, the resource name, and resource type name are passed on the command line. The Start method is expected to perform the actions needed to start up a data service resource on the cluster node. Typically this involves retrieving the resource properties, locating the application specific executables and/or configuration files, and launching the application with appropriate command line arguments.

With the DSDL, the resource configuration is already retrieved by the scds_initialize() utility. The startup action for the application can be contained in a routine svc_start(). Another routine, svc_wait(), can be called to verify that the application actually starts. The simplified code for the Start method becomes:

int
main(int argc, char *argv[])
{
   scds_handle_t handle;

   if (scds_initialize(&handle, argc, argv)!= SCHA_ERR_NOERR) {
   return (1);   /* Initialization Error */
   }
   if (svc_validate(handle) != 0) {
   return (1);   /* Invalid settings */
   }
   if (svc_start(handle) != 0) {
   return (1);   /* Start failed */
   }
   return (svc_wait(handle));
}

This start method implementation calls svc_validate() to validate the resource configuration. If it fails, either the resource configuration and application configuration do not match, or there is currently a problem on this cluster node with regard to the system. For example, a global file system needed by the resource may currently not be available on this cluster node. In this case, it is futile to even attempt to start the resource on this cluster node. It is better to let the RGM attempt to start the resource on a different node. Note however that the above assumes svc_validate() is sufficiently conservative (so that it checks only for resources on the cluster node that are absolutely needed by the application) or else the resource might fail to start up on all cluster nodes and thus land in START_FAILED state. See scswitch(1M) and the Sun Cluster Data Services Guide for an explanation of this \.,`lll,lk.` state.

The svc_start() routine must return 0 for a successful startup of the resource on the node. If the startup routine encountered a problem, it must return non-zero. Upon failure of this routine, the RGM attempts to start the resource on a different cluster node.

To leverage the DSDL as much as possible, the svc_start() routine can use the scds_pmf_start() utility to start the application under the Process Management Facility (PMF). This utility also leverages the failure callback action feature of PMF (see the -a action flag in pmfadm(1M)) to implement process failure detection.

The `Stop` Method

The Stop callback method of a resource type implementation is called by the RGM on a cluster node to stop the application. The callback semantics for the Stop method demands that

The Stop method must be idempotent because the Stop method can be called by the RGM even if the Start method did not complete successfully on the node. Thus the Stop method must succeed (exit zero) even if the application is not currently running on the cluster node and there is no work for it to do.
If the Stop method of the resource type fails (exits non-zero) on a cluster node, the resource being stopped would end up in the STO_FAILED state. Depending upon the Failover_mode setting on the resource, this may lead to a hard rebooting of the cluster node by the RGM. Thus it is important to design the Stop method so that it tries very hard to really stop the application, even by a hard and abrupt killing of the application (for example, using SIGKILL) if the application otherwise fails to terminate. It should also make sure that it does so in a timely fashion, because the framework treats expiry of Stop_timeout as a stop failure, and puts the resource in STOP_FAILED state.

The DSDL utility scds_pmf_stop() should suffice for most applications as it first attempts to softly (via SIGTERM) stop the application (it assumes that it was started under PMF via scds_pmf_start()) followed by a delivering a SIGKILL to the process. See PMF Functions for details about this utility.

Following the model of the code we have been using so far, assuming that the application specific routine to stop the application is called svc_stop() (whether the implementation of svc_stop() uses the scds_pmf_stop() is besides the point here, and would depend upon whether or not the application was started under PMF via the Start method)) the Stop method can be implemented as

if (scds_initialize(&handle, argc, argv)!= SCHA_ERR_NOERR)
{
   return (1);   /* Initialization Error */
}
return (svc_stop(handle));

The svc_validate() method is not used in the implementation of the Stop method, because even if the system currently has a problem, the Stop method should attempt to STOP the application on this node.

The `Monitor_start` Method

The RGM calls the Monitor_start method to start a fault monitor for the resource. Fault monitors monitor the health of the application being managed by the resource. Resource type implementations typically implement a fault monitor as a separate daemon which runs in the background. The Monitor_start callback method is used to launch this daemon with the appropriate arguments.

Because the monitor daemon itself is prone to failures (for example, it could die, leaving the application unmonitored) you should use the PMF to start the monitor daemon. The DSDL utility scds_pmf_start() has built in support for starting fault monitors. This utility uses the relative pathname (relative to the RT_basedir for the location of the resource type callback method implementations) of the monitor daemon program. It uses the Monitor_retry_interval and Monitor_retry_count extension properties managed by the DSDL to prevent unlimited restarts of the daemon. It imposes the same command line syntax as defined for all callback methods (that is, -R resource -G resource_group -T resource_type) onto the monitor daemon, although the monitor daemon is never called directly by the RGM. It allows the monitor daemon implementation itself to leverage the scds_initialize() utility to set up its own environment. The main effort is in designing the monitor daemon itself.

The `Monitor_stop` Method

The RGM calls the Monitor_stop method to stop the fault monitor daemon that was started via the Monitor_start method. Failure of this callback method is treated in exactly the same fashion as failure of the Stop method; therefore the Monitor_stop method must be idempotent and robust like the Stop method.

If you use the scds_pmf_start() utility to start the fault monitor daemon, use the scds_pmf_stop() utility to stop it.

The `Monitor_check` Method

The Monitor_check callback method on a resource is invoked on a node for the specified resource to ascertain whether the cluster node is capable of mastering the resource (that is, can the application(s) being managed by the resource be run successfully on the node?). Typically this involves making sure that all the system resources needed by the application are indeed available on the cluster node. As discussed in The Validate Method, the routine svc_validate() implemented by the developer is intended to ascertain at least that.

Depending upon the specific application being managed by the resource type implementation, the Monitor_check method can be written to do some additional tasks. The Monitor_check method must be implemented so that it does not conflict with other methods running concurrently. For developers using the DSDL it is recommended that the Monitor_check method leverage the svc_validate() routine written for the purpose of implementing application specific validation of resource properties.

The `Update` Method

The RGM calls the Update method of a resource type implementation to apply any changes that were made by the system administrator to the configuration of the active resource. The Update method is only called on nodes (if any) where the resource is currently online.

The changes that have just been made to the resource configuration are guaranteed to be acceptable to the resource type implementation because the RGM runs the Validate method of the resource type before it runs the Update method. The Validate method is called before the resource or resource group properties are changed and the Validate method can veto the proposed changes. The Update method is called after the changes have been applied to give the active (online) resource the opportunity to take notice of the new settings.

A resource type developer needs to cautiously decide which properties are to be dynamically updatable and mark those with the TUNABLE=ANYTIME setting in the RTR file. Typically any property used by the fault monitor daemon of a resource type implementation could be made dynamically updatable provided the Update method implementation at least restarts the monitor daemon.

Possible candidates are

Thorough_Probe_Interval

Retry_Count

Retry_Interval

Monitor_retry_count

Monitor_retry_interval

Probe_timeout

These properties affect the way a fault monitor daemon does health checking of the service, how often it does it, what history interval it uses to keep track of the errors, and what are the restart thresholds set on it by PMF. To implement updates of these properties the utility scds_pmf_restart() is provided in the DSDL.

If a resource type developer identifies the need to make a resource property dynamically updatable where modification of that property might have an effect on the running application, the resource type developer needs to implement the appropriate actions so that the updates to that property are correctly applied to any running instances of the application. Currently there is no way to facilitate this via the DSDL. Update is not passed the modified properties on the command line (as is Validate).

The `Init`, `Fini`, and `Boot` Methods

These are one time action methods as defined by the Resource Management API specifications. The sample implementation included with the DSDL does not illustrate the use of these methods. However, all the facilities in the DSDL are available to these methods as well, should a resource type developer have a need for these methods. Typically, the Init and the Boot methods would be exactly the same for a resource type implementation to implement a one time action. The Fini method typically would perform an action which undoes the action of the Init or Boot methods.

Designing the Fault Monitor Daemon

Resource type implementations using the DSDL typically have a fault monitor daemon with the following responsibilities.

Periodically monitoring the health of the application being managed. This particular aspect of a monitor daemon is heavily application dependent and could vary widely from resource type to resource type. The DSDL has some built in utility functions to perform health checks for simple TCP based services. Applications implementing ASCII based protocols such as HTTP, NNTP, IMAP, and POP3 can be implemented using these utilities.

Keeping track of the problems encountered by the application using the resource properties Retry_interval and Retry_count. Upon complete failures of the application, deciding whether the PMF action script should restart the service or whether the application failures have accumulated so rapidly that a failover could be considered. The DSDL utilities scds_fm_action() and scds_fm_sleep() are intended to aid programmers implementing this mechanism.

Taking appropriate actions (typically either restarting the application or attempting a failover of the containing resource group). The DSDL utility scds_fm_action() implements such an algorithm. It computes the current accumulation of probe failures in the past Retry_interval seconds for this purpose.

Updating the resource state so that application health state is available to the scstat command as well as to the cluster management GUI.

The DSDL utilities are designed so the main loop of the fault monitor daemon can be represented by the following pseudo code.

For fault monitors implemented using the DSDL,

The detection of application process death by scds_fm_sleep() is fairly rapid because the process death notification via PMF is asynchronous. Contrast that with a case where a fault monitor wakes up every so often to check on service health and finds the application dead. The fault detection time is reduced significantly, thereby increasing the availability of the service.
If the RGM rejects the attempt to fail over the service via the scha_control(3HA) API, scds_fm_action() resets (forgets) its current failure history. The reason is that the failure history is already above Retry_count, and if the monitor daemon wakes up in the next iteration and is unable to successfully complete its health check of the daemon, it would again attempt to invoke the scha_control() call, which would probably still be rejected, as the situation which led to its rejection in the last iteration is still valid. Resetting the history ensures that the fault monitor at least attempts to correct the situation locally (for example, via application restart) in the next iteration.
scds_fm_action() does not reset application failure history in case of restart failures, as one would typically like to try scha_control() soon if the situation doesn't correct itself.
The utility scds_fm_action() updates the resource status to SCHA_RSSTATUS_OK, SCHA_RSSTATUS_DEGRADED or SCHA_RSSTATUS_FAULTED depending upon the failure history. This status is thus available to cluster system management.

In most cases, the application specific health check action can be implemented in a separate stand-alone utility [for example, svc_probe()] and integrated with this generic main loop.

for (;;) { 

   / * sleep for a duration of thorough_probe_interval between
   *  successive probes. */
   (void) scds_fm_sleep(scds_handle,
   scds_get_rs_thorough_probe_interval(scds_handle));

   /* Now probe all ipaddress we use. Loop over
   * 1. All net resources we use.
   * 2. All ipaddresses in a given resource.
   * For each of the ipaddress that is probed,
   * compute the failure history. */
   probe_result = 0;
   /* Iterate through the all resources to get each
    * IP address to use for calling svc_probe() */
   for (ip = 0; ip < netaddr->num_netaddrs; ip++) {
   /* Grab the hostname and port on which the
   * health has to be monitored.
   */
   hostname = netaddr->netaddrs[ip].hostname;
   port = netaddr->netaddrs[ip].port_proto.port;
   /*
   * HA-XFS supports only one port and
   * hence obtaint the port value from the
   * first entry in the array of ports.
   */
   ht1 = gethrtime(); /* Latch probe start time */
   probe_result = svc_probe(scds_handle, 

   hostname, port, timeout);
   /*
   * Update service probe history,
   * take action if necessary.
   * Latch probe end time.
   */
   ht2 = gethrtime();
   /* Convert to milliseconds */
   dt = (ulong_t)((ht2 - ht1) / 1e6);

   /*
   * Compute failure history and take
   * action if needed
   */
   (void) scds_fm_action(scds_handle,
   probe_result, (long)dt);
   }       /* Each net resource */
   }       /* Keep probing forever */

Chapter 7 Designing Resource Types

The RTR File

The Validate Method

The Start Method

The Stop Method

The Monitor_start Method

The Monitor_stop Method

The Monitor_check Method

The Update Method

The Init, Fini, and Boot Methods