Generic Data Service Concepts

Language:

The GDS is a mechanism for making simple network-aware and non-network-aware applications highly available or scalable by plugging them into the Oracle Solaris Cluster Resource Group Management (RGM) framework. This mechanism does not require you to code a data service, which you typically must do to make an application highly available or scalable.

Note - You can install and configure this data service to run in either the global zone or a zone cluster. For updated information about supported configurations of this data service, see the Oracle Solaris Cluster 4 Compatibility Guide.

The GDS is a single, precompiled data service. You cannot modify the precompiled data service and its components, the callback method (rt_callbacks) implementations, and the resource type registration file (rt_reg).

This section covers the following topics:

Precompiled Resource Type
Advantages and Disadvantages of Using the GDS
Ways to Create a Service That Uses the GDS
How the GDS Logs Events
Required GDS Properties
Optional GDS Properties

Precompiled Resource Type

The generic data service resource type SUNW.gds is included in the ha-cluster/ha-service/gds package. The ha-cluster/ha-service/gds package includes the following files:

# pkg contents ha-cluster/ha-service/gds

PATH
/opt/SUNWscgds
/opt/SUNWscgds/bin
/opt/SUNWscgds/bin/gds_monitor_check
/opt/SUNWscgds/bin/gds_monitor_start
/opt/SUNWscgds/bin/gds_monitor_stop
/opt/SUNWscgds/bin/gds_probe
/opt/SUNWscgds/bin/gds_svc_start
/opt/SUNWscgds/bin/gds_svc_stop
/opt/SUNWscgds/bin/gds_update
/opt/SUNWscgds/bin/gds_validate
/opt/SUNWscgds/etc
/opt/SUNWscgds/etc/SUNW.gds
/opt/cluster
/opt/cluster/lib
/opt/cluster/lib/rgm
/opt/cluster/lib/rgm/rtreg
/opt/cluster/lib/rgm/rtreg/SUNW.gds

Advantages and Disadvantages of Using the GDS

Using the GDS has the following advantages over using either the Agent Builder source code (see the scdscreate (1HA) man page) or Oracle Solaris Cluster administration commands:

The GDS is easy to use.
The GDS and its methods are precompiled and therefore cannot be modified.
You can use Agent Builder to generate scripts for your application. These scripts are put in an Oracle Solaris package that can be reused across multiple clusters.

While using the GDS has many advantages, the GDS is not the mechanism to use in these instances:

When more control is required than is available with the precompiled resource type, such as when you need to add extension properties or change default values
When the source code needs to be modified to add special functions

Ways to Create a Service That Uses the GDS

There are two ways to create a service that uses the GDS:

Agent Builder
Oracle Solaris Cluster administration commands

GDS and Agent Builder

Use Agent Builder and select GDS as the type of generated source code. The user input is used to generate a set of scripts that configure resources for the given application. For more information, see Chapter 3, Using Agent Builder to Create a Service That Uses GDS or GDSv2.

GDS and Oracle Solaris Cluster Administration Commands

This method uses the precompiled data service code in ha-cluster/ha-service/gds. However, the cluster administrator must use Oracle Solaris Cluster administration commands to create and configure the resource. See the clresource (1CL) man page.

Selecting the Method to Use to Create a GDS-Based Service

A significant amount of typing is required to issue Oracle Solaris Cluster commands. For example, see How to Use Oracle Solaris Cluster Administration Commands to Create a Highly Available Service That Uses the GDS and How to Use Oracle Solaris Cluster Administration Commands to Create a Scalable Service That Uses the GDS.

Using the GDS with Agent Builder simplifies the process because the GDS generates the scripts that issue the scrgadm and scswitch commands for you.

How the GDS Logs Events

The GDS enables you to log relevant information that is passed from the GDS to the scripts that the GDS starts. This information includes the status of the start, probe, validate, and stop methods as well as property variables. You can use this information to diagnose problems or errors in your scripts, or apply it to other purposes.

You use the Log_level property that is described in Log_level Property to specify the level, or type, of messages that the GDS will log. You can specify NONE, INFO, or ERR.

GDS Log Files

The following two GDS log files are placed in the /var/cluster/logs/DS/resource-group-name/resource-name directory:

The start_stop_log.txt, which contains messages that are generated by resource start and stop methods
The probe_log.txt, which contains messages that are generated by the resource monitor

The following example shows the types of information that start_stop_log.txt contains:

06/12/2006 12:38:05 phys-node-1 START-INFO> Start succeeded. [/home/brianx/sc/start_cmd]
06/12/2006 12:42:11 phys-node-1 STOP-INFO> Successfully stopped the application

The following example shows the types of information that probe_log.txt contains:

06/12/2006 12:38:15 phys-node-1 PROBE-INFO> The GDS monitor (gds_probe) has been started
06/12/2006 12:39:15 phys-node-1 PROBE-INFO> The probe result is 0
06/12/2006 12:40:15 phys-node-1 PROBE-INFO> The probe result is 0
06/12/2006 12:41:15 phys-node-1 PROBE-INFO> The probe result is 0

Required GDS Properties

This section describes the required GDS properties.

`Port_list` Property

The Port_list property identifies the list of ports on which the application listens. You must specify the Port_list property in the start script that Agent Builder creates or with the clresource command.

Whether you must specify this property depends on whether your application is network aware or not. If you specify that your application is network aware (you set the Network_aware property to TRUE, which is the default), you must provide both the Start_command extension property and the Port_list property. If you specify that your application is non-network aware (you set the Network_aware property to FALSE), you must provide only the Start_command extension property. The Port_list property is optional.

`Start_command` Property

The start command, which you specify with the Start_command extension property, starts the application. This command must be a UNIX command with arguments that can be passed directly to a shell to start the application.

If your application is network aware, you must provide both the Start_command extension property and the Port_list property. If your application is non-network aware, you must provide only the Start_command extension property.

Optional GDS Properties

Optional GDS properties include both standard properties and extension properties. Standard properties are a standard set of properties that are provided by Oracle Solaris Cluster. Properties that are defined in the RTR file are called extension properties.

Optional GDS properties include:

Child_mon_level extension property (used only with administration commands)
Failover_enabled extension property
Log_level extension property
Monitor_retry_count extension property
Monitor_retry_interval extension property
Network_aware extension property
Probe_command extension property
Probe_timeout extension property
Resource_dependencies property
Start_timeout property
Stop_command extension property
Stop_signal extension property
Stop_timeout property
Timeout_threshold property
Validate_command extension property
Validate_timeout property

`Child_mon_level` Property

Note - If you use Oracle Solaris Cluster administration commands, you can use the Child_mon_level property. If you use Agent Builder, you cannot use this property.

This property provides control over the processes that are monitored through the Process Monitor Facility (PMF). This property denotes the level up to which the forked children processes are monitored. This property works like the –C argument to the pmfadm command. See the pmfadm(1M) man page.

Omitting this property, or setting it to the default value of -1, has the same effect as omitting the –C option on the pmfadm command. That is, all children and their descendants are monitored.

`Failover_enabled` Property

This property controls the failover behavior of the resource. If this extension property is set to TRUE, the application fails over when the number of restarts exceeds the Retry_count within the Retry_interval number of seconds.

If this property is set to FALSE, the application does not restart or fail over to another node when the number of restarts exceeds the Retry_count within the Retry_interval number of seconds.

You can use this property to prevent the application resource from initiating a failover of the resource group. The default value for this property is TRUE.

Note - In future, use the Failover_mode property in place of the Failover_enabled extension property as Failover_mode better controls failover behavior. For more information, see the descriptions of the LOG_ONLY and RESTART_ONLY values for Failover_mode in the r_properties(5) man page.

`Log_level` Property

This property specifies the level, or type, of diagnostic messages that are logged by the GDS. You can specify NONE, INFO, or ERR for this property. When you specify NONE, diagnostic messages are not logged by the GDS. When you specify INFO, only informational messages are logged. When you specify ERR, only error messages are logged. By default, the GDS does not log diagnostic messages (NONE).

`Monitor_retry_count` Property

This property specifies the number of times that the process monitor facility (PMF) restarts the fault monitor during the time window that the Monitor_retry_interval property specifies. This property refers to restarts of the fault monitor itself rather than to the resource. The system-defined properties Retry_interval and Retry_count control restarting of the resource.

`Monitor_retry_interval` Property

This property specifies the time (in minutes) over which failures of the fault monitor are counted. If the number of times that the fault monitor fails exceeds the value that is specified in the extension property Monitor_retry_count within this period, the PMF does not restart the fault monitor.

`Network_aware` Property

This property specifies whether your application uses the network. By default, the GDS assumes that your application is network aware, that is, uses the network (Network_aware is set to TRUE).

`Probe_command` Property

This property specifies the probe command that periodically checks the health of a given application. This command must be a UNIX command with arguments that can be passed directly to a shell to probe the application. The probe command returns with an exit status of 0 if the application is running correctly.

The exit status of the probe command is used to determine the severity of the application's failure. This exit status, called the probe status, must be an integer between 0 (for success) and 100 (for complete failure). The probe status can also be a special value of 201, which causes the application to immediately fail over unless Failover_enabled is set to FALSE. The GDS probing algorithm uses the probe status to determine whether to restart the application locally or fail it over. See the scds_fm_action(3HA) man page for more information. If the exit status is 201, the application is immediately failed over.

If the probe command is omitted, the GDS provides its own simple probe. This probe connects to the application on the set of IP addresses that is derived from the output of the scds_get_netaddr_list() function. This includes all network resources on which the GDS resource declares a resource dependency. If there are no such resources, it includes all network resources configured in the same resource group as the GDS resource. See the scds_get_netaddr_list(3HA) man page for more information.

If the connect succeeds, the connect disconnects immediately. If both the connect and disconnect succeed, the application is deemed to be running well.

Note - The probe that is provided with the GDS is only intended to be a simple substitute for the fully functioning application-specific probe.

`Probe_timeout` Property

This property specifies the timeout value for the probe command. See Probe_command Property for additional information. The default for Probe_timeout is 30 seconds.

`Resource_dependencies` Property

This property specifies a list of resources in the same group or in different groups upon which this resource has a strong dependency. This resource cannot be started if the start of any resource in the list fails. If this resource and one of the resources in the list start at the same time, the RGM waits until the resource in the list starts before the RGM starts this resource. If the resource in this resource's Resource_dependencies list does not start (for example, if the resource group for the resource in the list remains offline or if the resource in the list is in a Start_failed state), this resource also remains offline. If this resource remains offline because of a dependency on a resource in a different resource group that fails to start, this resource's group enters a Pending_online_blocked state.

To specify the scope of a dependency, append the qualifier {ANY_NODE}, {FROM_RG_AFFINITIES}, {LOCAL_NODE}, or @nodename, including the braces ({ }) or at-sign (@), to the resource name when you specify this property.

See Resource_dependencies in the r_properties(5) man page for details about resource dependencies.

`Start_timeout` Property

This property specifies the start timeout for the start command. See Start_command Property for additional information. The default for Start_timeout is 300 seconds.

`Stop_command` Property

This property specifies the command that must stop an application and only return after the application has been completely stopped. This command must be a complete UNIX command that can be passed directly to a shell to stop the application.

If the Stop_command extension property is provided, the GDS stop method starts the stop command with 80 percent of the stop timeout. Regardless of the outcome of starting the stop command, the GDS stop method sends SIGKILL with 15 percent of the stop timeout. The remaining 5 percent of the time is reserved for housekeeping overhead.

If the stop command is omitted, the GDS tries to stop the application by using the signal specified in Stop_signal.

`Stop_signal` Property

This property specifies a value that identifies the signal to stop an application through the PMF. See the signal(3HEAD) man page for a list of the integer values that you can specify. The default value is 15 (SIGTERM).

`Stop_timeout` Property

This property specifies the timeout for the stop command. See Stop_command Property for additional information. The default for Stop_timeout is 300 seconds.

`Timeout_threshold` Property

This property specifies after what percentage of a timeout period a notification should be sent that the timeout limit is almost reached.

`Validate_command` Property

This property specifies the absolute path to a command to invoke to validate the application. If you do not provide an absolute path, the application is not validated.

`Validate_timeout` Property

This property specifies the timeout for the validate command. See Validate_command Property for additional information. The default for Validate_timeout is 300 seconds.

Oracle® Solaris Cluster Generic Data Service (GDS) Guide