This chapter provides an overview of the application programming interfaces constituting the Data Service Development Library, or DSDL. The DSDL is implemented in the libdsdev.so library and is included in the Sun Cluster package.
The DSDL API is layered on top of the RMAPI. As such, it does not supersede the RMAPI but rather encapsulates and extends the RMAPI functionality. The DSDL simplifies data service development by providing predetermined solutions to specific Sun Cluster integration issues. Consequently, you can devote the majority of development time to the high availability and scalability issues intrinsic to your application, and avoid spending a large amount of time on integrating the application startup, shutdown, and monitor procedures with Sun Cluster.
Initializing the environment
Providing a set of convenience functions to retrieve property values
Checks and processes the command-line arguments (argc and argv) the RGM passes to the callback method, obviating the need for you to write a command-line parsing function.
Sets up internal data structures for use by other DSDL functions. For example, the convenience functions that retrieve property values from the RGM store the values in these structures. Likewise, values from the command-line, which take precedence over values retrieved from the RGM, are stored in these data structures.
For the Validate method, scds_initialize parses the property values that are passed on the command line, obviating the need to write a parse function for Validate.
The scds_initialize function also initializes the logging environment and validates fault monitor probe settings.
The DSDL provides sets of functions to retrieve resource, resource type, and resource group properties as well as commonly-used extension properties. These functions standardize access to properties by using the following conventions.
Each function takes only a handle argument (returned by scds_initialize).
Each function corresponds to a particular property. The return value type of the function matches the type of the property value it retrieves.
Functions do not return errors as the values have been precomputed by scds_initialize. Functions retrieve values from the RGM unless a new value is passed on the command line.
A Start method is expected to perform the actions required to start a data service on a cluster node. Typically, this includes retrieving the resource properties, locating application-specific executables and configuration files, and launching the application with the appropriate command-line arguments.
The scds_initialize function retrieves the resource configuration. The Start method can use property convenience functions to retrieve values for specific properties, such as Confdir_list, that identify the configuration directories and files for the application to launch.
A Start method can call scds_pmf_start to launch an application under control of the Process Monitor Facility (PMF). PMF enables you to specify the level of monitoring to apply to the process and provides the ability to restart the process in case of failure. See xfnts_start Method for an example of a Start method implemented with the DSDL.
A Stop method must be idempotent such that it exits with success even if it is called on a node when the application is not running. If the Stop method fails, the resource being stopped is set to the STOP_FAILED state, which can lead to a hard reboot of the cluster.
To avoid putting the resource in STOP_FAILED state the Stop method must make every effort to stop the resource. The scds_pmf_stop function provides a phased attempt to stop the resource. It first attempts to stop the resource using SIGTERM signal, and if this fails, uses a SIGKILL signal. See scds_pmf_stop(3HA) for details.
The DSDL absorbs much of the complexity of implementing a fault monitor by providing a predetermined model. A Monitor_start method launches the fault monitor, under the control of PMF, when the resource starts on a node. The fault monitor runs in loop as long as the resource is running on the node. The high-level logic of a DSDL fault monitor is as follows.
The scds_fm_sleep function uses the Thorough_probe_interval property to determine the amount of time between probes. Any application process failures determined by PMF during this interval lead to a restart of the resource.
The probe itself returns a value indicating the severity of failures, from 0, no failure, to 100 complete failure.
The probe return value is sent to the scds_action function, which maintains a cumulative failure history within the interval of the Retry_interval property.
The scds_action function determines what to do in the event of failure, as follows.
If the cumulative failure is below 100, do nothing.
If the cumulative failure reaches 100 (complete failure) restart the data service. If Retry_interval is exceeded, reset the history.
If the number of restarts exceeds the value of the Retry_count property, within the time specified by Retry_interval, failover the data service.
The DSDL provides convenience functions to return network address information for resources and resource groups. For example, the scds_get_netaddr_list retrieves the network-address resources used by a resource, enabling a fault monitor to probe the application.
The DSDL also provides a set of functions for TCP-based monitoring. Typically, these functions establish a simple socket connection to a service, read and write data to the service, and then disconnect from the service. The result of the probe can be sent to the DSDL scds_fm_action function to determine the action to take.
See xfnts_validate Method for an example of TCP-based fault monitoring.
The DSDL utility scds_syslog_debug() provides a basic framework for adding debugging statements to the resource type implementation. The debugging level (a number between 1-9) can be dynamically set per resource type implementation per cluster node. A file named /var/cluster/rgm/rt/rtname/loglevel, which contains only an integer between 1 and 9, is read by all resource type callback methods. The DSDL routine scds_initialize() reads this file and sets the debug level internally to the specified level. The default debug level 0, specifies that the data service log no debugging messages.
The scds_syslog_debug() function uses the facility returned by the scha_cluster_getlogfacility() function at a priority of LOG_DEBUG. You can configure these debug messages in /etc/syslog.conf.
You can turn some debugging messages into informational messages for regular operation of the resource type (perhaps at LOG_INFO priority) using the scds_syslog utility. If you look at the sample DSDL application in Chapter 8, Sample DSDL Resource Type Implementation you will notice that it makes liberal use of scds_syslog_debug and scds_syslog functions.
You can use the HAStoragePlus resource type to make a local file system highly available within a Sun Cluster environment. The local file system partitions must be located on global disk groups. Affinity switchovers must be enabled and the Sun Cluster environment must be configured for failover. This set up enables the user to make any file system on multi-host disks accessible from any host directly connected to those multi-host disks. Using a highly available local file system is strongly recommended for some I/O intensive data services. “Enabling Highly Available Local File Systems” in Sun Cluster 3.1 Data Service Planning and Administration Guide contains information about configuring the HAStoragePlus resource type.