JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster Data Services Developer's Guide     Oracle Solaris Cluster 4.1
search filter icon
search icon

Document Information

Preface

1.  Overview of Resource Management

2.  Developing a Data Service

Analyzing the Application for Suitability

Determining the Interface to Use

Setting Up the Development Environment for Writing a Data Service

How to Set Up the Development Environment

Transferring a Data Service to a Cluster

Setting Standard Properties

Cluster Properties

Resource Type Properties

Resource Properties

Resource Group Properties

Resource Property Attributes

Node List Properties

Setting Resource and Resource Type Properties

Declaring Resource Type Properties

Declaring Resource Properties

Declaring Extension Properties

Implementing Callback Methods

Accessing Resource and Resource Group Property Information

Idempotence of Methods

How Methods Are Invoked in Zones

Generic Data Service

Controlling an Application

Starting and Stopping a Resource

Using Start and Stop Methods

Deciding Which Start and Stop Methods to Use

Using the Optional Init, Fini, and Boot Methods

Using the Init Method

Using the Fini Method

Guidelines for Implementing a Fini Method

Using the Boot Method

Monitoring a Resource

Implementing Monitors and Methods That Execute Exclusively in the Global Zone

Adding Message Logging to a Resource

Providing Process Management

Providing Administrative Support for a Resource

Implementing a Failover Resource

Implementing a Scalable Resource

Validation Checks for Scalable Services

Writing and Testing Data Services

Using TCP Keep-Alives to Protect the Server

Testing HA Data Services

Coordinating Dependencies Between Resources

Legal RGM Names

RGM Legal Names

Rules for Names Except Resource Type Names

Format of Resource Type Names

RGM Values

3.  Resource Management API Reference

4.  Modifying a Resource Type

5.  Sample Data Service

6.  Data Service Development Library

7.  Designing Resource Types

8.  Sample DSDL Resource Type Implementation

9.  Oracle Solaris Cluster Agent Builder

10.  Generic Data Service

11.  DSDL API Functions

12.  Cluster Reconfiguration Notification Protocol

13.  Security for Data Services

A.  Sample Data Service Code Listings

B.  DSDL Sample Resource Type Code Listings

C.  Requirements for Non-Cluster-Aware Applications

D.  Document Type Definitions for the CRNP

E.  CrnpClient.java Application

Index

Writing and Testing Data Services

This section describes how to write and test a data service. Topics that are covered include using TCP keep-alives to protect the server, testing highly available data services, and coordinating dependencies between resources.

Using TCP Keep-Alives to Protect the Server

On the server side, using TCP keep-alives protects the server from wasting system resources for a down (or network-partitioned) client. If these resources are not cleaned up in a server that stays up long enough, the wasted resources eventually grow without bound as clients crash and reboot.

If the client-server communication uses a TCP stream, both the client and the server should enable the TCP keep-alive mechanism. This provision applies even in the non-HA, single-server case.

Other connection-oriented protocols might also have a keep-alive mechanism.

On the client side, using TCP keep-alives enables the client to be notified when a network address resource has failed over or switched over from one physical host to another physical host. That transfer of the network address resource breaks the TCP connection. However, unless the client has enabled the keep-alive, it does not necessarily learn of the connection break if the connection happens to be quiescent at the time.

For example, suppose the client is waiting for a response from the server to a long-running request, and the client's request message has already arrived at the server and has been acknowledged at the TCP layer. In this situation, the client's TCP module has no need to keep retransmitting the request. Also, the client application is blocked, waiting for a response to the request.

Where possible, in addition to using the TCP keep-alive mechanism, the client application also must perform its own periodic keep-alive at its level. The TCP keep-alive mechanism is not perfect in all possible boundary cases. Using an application-level keep-alive typically requires that the client-server protocol support a null operation or at least an efficient read-only operation, such as a status operation.

Testing HA Data Services

This section provides suggestions about how to test a data service implementation in a highly-available environment. The test cases are suggestions and are not exhaustive. You need access to a test-bed Oracle Solaris Cluster configuration so that the testing work does not affect production machines.

Test that your HA data service behaves correctly in all cases where a resource group is moved between physical hosts. These cases include system crashes and the use of the clnode command. Test that client machines continue to get service after these events.

Test the idempotence of the methods. For example, replace each method temporarily with a short shell script that calls the original method two or more times.

Coordinating Dependencies Between Resources

Sometimes one client-server data service makes requests on another client-server data service while fulfilling a request for a client. For example, data service A depends on data service B if, for A to provide its service, B must provide its service. Oracle Solaris Cluster provides for this requirement by permitting resource dependencies to be configured within a resource group. The dependencies affect the order in which Oracle Solaris Cluster starts and stops data services. See the r_properties(5) man page.

If resources of your resource type depend on resources of another type, you need to instruct the cluster administrator to configure the resources and resource groups correctly. As an alternative, provide scripts or tools to correctly configure them.

Decide whether to use explicit resource dependencies, or omit them and poll for the availability of other data services in your HA data service's code. If the dependent and depended-on resource can run on different nodes, configure them in separate resource groups. In this case, polling is required because configuring resource dependencies across groups is not possible.

Some data services store no data directly themselves. Instead, they depend on another back-end data service to store all their data. Such a data service translates all read and update requests into calls on the back-end data service. For example, consider a hypothetical client-server appointment calendar service that keeps all of its data in an SQL database, such as Oracle. The appointment calendar service uses its own client-server network protocol. For example, it might have defined its protocol using an RPC specification language, such as ONC RPC.

In the Oracle Solaris Cluster environment, you can use HA for Oracle to make the back-end Oracle database highly available. Then, you can write simple methods for starting and stopping the appointment calendar daemon. The cluster administrator registers the appointment calendar resource type with Oracle Solaris Cluster software.

If the HA for Oracle resource is to run on a different node than the appointment calendar resource, the cluster administrator configures them into two separate resource groups. The cluster administrator consequently makes the appointment calendar resource dependent on the HA for Oracle resource.

The cluster administrator makes the resources dependent by doing either of the following:

The calendar data service daemon, after it has been started, might poll while waiting for the Oracle database to become available. The calendar resource type's Start method usually returns success in this case. If the Start method blocks indefinitely, however, this method moves its resource group into a busy state. This busy state prevents any further state changes, such as edits, failovers, or switchovers on the resource group. If the calendar resource's Start method times out or exits with a nonzero status, its timing out or nonzero exit status might cause the resource group to ping-pong between two or more nodes while the Oracle database remains unavailable.