Sun Cluster Data Services Developer's Guide for Solaris OS

Chapter 6 Data Service Development Library

This chapter provides an overview of the application programming interfaces that constitute the Data Service Development Library (DSDL). The DSDL is implemented in the libdsdev.so library and is included in the Sun Cluster package.

This chapter covers the following topics:

DSDL Overview

The DSDL API is layered on top of the Resource Management Application Programming Interface (RMAPI). As such, the DSDL API does not supersede the RMAPI but rather encapsulates and extends the RMAPI functionality. The DSDL simplifies data service development by providing predetermined solutions to specific Sun Cluster integration issues. Consequently, you can devote the majority of development time to the high availability and scalability issues that are intrinsic to your application. You spend less time integrating the application startup, shutdown, and monitor procedures with Sun Cluster.

Managing Configuration Properties

All callback methods require access to the configuration properties. The DSDL supports access to properties in these ways:

Initializing the environment
Providing a set of convenience functions to retrieve property values

The scds_initialize() function, which must be called at the beginning of each callback method, does the following:

Checks and processes the command-line arguments (argc and argv[]) that the RGM passes to the callback method, obviating the need for you to write a command-line parsing function.
Sets up internal data structures for use by other DSDL functions. For example, the convenience functions that retrieve property values from the RGM store the values in these structures. Likewise, values from the command line, which take precedence over values retrieved from the RGM, are stored in these data structures.
Initializes the logging environment and validates fault monitor probe settings.

Note –

For the Validate method, scds_initialize() parses the property values that are passed on the command line, obviating the need to write a parse function for Validate.

The DSDL provides sets of functions to retrieve resource type, resource, and resource group properties as well as commonly used extension properties. These functions standardize access to properties by using the following conventions:

Each function takes only a handle argument (returned by scds_initialize()).
Each function corresponds to a particular property. The return value type of the function matches the type of the property value that it retrieves.
Functions do not return errors as the values have been precomputed by scds_initialize(). Functions retrieve values from the RGM unless a new value is passed on the command line.

Starting and Stopping a Data Service

A Start method performs the actions that are required to start a data service on a cluster node. Typically, these actions include retrieving the resource properties, locating application-specific executable and configuration files, and starting the application with the correct command-line arguments.

The scds_initialize() function retrieves the resource configuration. The Start method can use property convenience functions to retrieve values for specific properties, such as Confdir_list, that identify the configuration directories and files for the application to start.

A Start method can call scds_pmf_start() to start an application under control of the Process Monitor Facility (PMF). The PMF enables you to specify the level of monitoring to apply to the process and provides the ability to restart the process in case of failure. See xfnts_start Method for an example of a Start method that is implemented with the DSDL.

A Stop method must be idempotent so that the Stop method exits with success even if it is called on a node when the application is not running. If the Stop method fails, the resource that is being stopped is set to the STOP_FAILED state, which can cause the cluster to perform a hard reboot.

To avoid putting the resource in the STOP_FAILED state, the Stop method must make every effort to stop the resource. The scds_pmf_stop() function provides a phased attempt to stop the resource. This function first attempts to stop the resource by using a SIGTERM signal, and if this fails, uses a SIGKILL signal. See the scds_pmf_stop(3HA) man page for more information.

Implementing a Fault Monitor

The DSDL absorbs much of the complexity of implementing a fault monitor by providing a predetermined model. A Monitor_start method starts the fault monitor, under the control of the PMF, when the resource starts on a node. The fault monitor runs in a loop as long as the resource is running on the node. The high-level logic of a DSDL fault monitor is as follows:

The scds_fm_sleep() function uses the Thorough_probe_interval property to determine the amount of time between probes. Any application process failures that are detected by the PMF during this interval lead to a restart of the resource.
The probe itself returns a value that indicates the severity of failures, from 0, no failure, to 100 complete failure.
The probe return value is sent to the scds_action() function, which maintains a cumulative failure history within the interval of the Retry_interval property.
The scds_action() function determines what to do in the event of a failure, as follows:
- If the cumulative failure is below 100, do nothing.
- If the cumulative failure reaches 100 (complete failure), restart the data service. If Retry_interval is exceeded, reset the history.
- If the number of restarts exceeds the value of the Retry_count property, within the time specified by Retry_interval, fail over the data service.

Accessing Network Address Information

The DSDL provides convenience functions to return network address information for resources and resource groups. For example, the scds_get_netaddr_list() retrieves the network address resources that are used by a resource, enabling a fault monitor to probe the application.

The DSDL also provides a set of functions for TCP-based monitoring. Typically, these functions establish a simple socket connect to a service, read and write data to the service, and disconnect from the service. The result of the probe can be sent to the DSDL scds_fm_action() function to determine the action to take.

See xfnts_validate Method for an example of TCP-based fault monitoring.

Debugging the Resource Type Implementation

The DSDL has built-in features to help you debug your data service.

The DSDL utility scds_syslog_debug() provides a basic framework for adding debugging statements to the resource type implementation. The debugging level (a number between 1-9) can be dynamically set for each resource type implementation on each cluster node. A file named /var/cluster/rgm/rt/rtname/loglevel, which contains only an integer between 1 and 9, is read by all resource type callback methods. The DSDL function scds_initialize() reads this file and sets the debug level internally to the specified level. The default debug level 0 specifies that the data service is not to log debugging messages.

The scds_syslog_debug() function uses the facility that is returned by the scha_cluster_getlogfacility() function at a priority of LOG_DEBUG. You can configure these debug messages in the /etc/syslog.conf file.

You can turn some debugging messages into information messages for regular operation of the resource type (perhaps at LOG_INFO priority) by using the scds_syslog() function. Note that the sample DSDL application in Chapter 8, Sample DSDL Resource Type Implementation makes liberal use of the scds_syslog_debug() and scds_syslog() functions.

Enabling Highly Available Local File Systems

You can use the HAStoragePlus resource type to make a local file system highly available within a Sun Cluster environment. The local file system partitions must be located on global disk groups. Affinity switchovers must be enabled, and the Sun Cluster environment must be configured for failover. This setup enables the cluster administrator to make any file system that is located on multihost disks accessible from any host that is directly connected to those multihost disks. Using a highly available local file system is strongly recommended for some I/O intensive data services. Enabling Highly Available Local File Systems in Sun Cluster Data Services Planning and Administration Guide for Solaris OS contains information about configuring the HAStoragePlus resource type.