Sun Cluster 2.2 API Developer's Guide

1.4 Data Service Requirements

This section presents the requirements that a data service must meet to participate in the Sun Cluster Data Service API.

1.4.1 Client-Server Environment

Sun Cluster is designed for client-server networking environments. Sun Cluster cannot operate in time-sharing environments in which applications are run on a server that is accessed through telnet or rlogin. Such models typically have no inherent ability to recover from a server crash.

1.4.2 Crash Tolerance

The data service must be crash-tolerant. This means that the data service's daemon processes must be relatively stateless, in that they write all updates to disk synchronously.

When a physical host that masters a logical host crashes and a new physical host takes over, Sun Cluster calls the start method of each data service. The start method triggers any crash recovery of the on-disk data. For example, if the data service uses logging techniques, the start method should cause the data service to carry out crash recovery using the log.

1.4.3 Multihosted Data

The logical host's disksets are multihosted so that when a physical host crashes, one of the surviving hosts can access the disk. For a data service to be highly available, its data must be highly available, and thus its data must reside on the logical host's diskset.

A data service might have command-line switches or configuration files pointing to the location of the data files. If the data service uses hard-wired path names, it might be possible to change the path name to a symbolic link that points to a file in the logical host's diskset, without changing the data service code. See Appendix A, Using Symbolic Links for Multihosted Data Placement, for a more detailed discussion about using symbolic links.

In the worst case, the data service's code must be modified to provide some mechanism for pointing to the actual data location. You can do this by implementing additional command-line switches.

Sun Cluster supports the use of UFS, VxFS, and raw partitions on the logical host's diskset. When the system administrator installs and configures Sun Cluster, he or she must specify which disk resources to use for UFS or VxFS file systems and which for raw partitions. Typically, raw partitions are used only by database servers and multimedia servers.

1.4.4 Host Names

You must determine whether the data service ever needs to know the host name of the server on which it is running. If so, the data service might need to be modified to use the host name of the logical host, rather than that of the physical host. Recall that the Sun Cluster concept of "logical host" involves having a physical host "impersonate" a logical host's host name and IP address.

Occasionally, in the client-server protocol for a data service, the server returns its own host name to the client as part of the contents of a message to the client. For such protocols, the client could be depending on this returned host name as the host name to use when contacting the server. For the returned host name to be usable after a takeover or switchover, the host name should be that of the logical host, not the physical host. In this case, you must modify the data service code to return the logical host name to the client.

1.4.5 Multihomed Hosts

The term multihomed host describes a host that is on more than one public network. Such a host has multiple host names and IP addresses; it has one host name/IP address pair for each network. Sun Cluster is designed to permit a host to appear on any number of networks, including just one (the non-multihomed case). Just as the physical host name has multiple host name/IP address pairs, each logical host has multiple host name/IP address pairs, one for each public network. By convention, one of the host names in the set of pairs is the same name as that of the logical host itself. When Sun Cluster moves a logical host from one physical host to another, the complete set of host name/IP address pairs for that logical host is moved.

For each Sun Cluster logical host, the set of host name/IP address pairs is part of the Sun Cluster configuration data and is specified by the system administrator when Sun Cluster is first installed and configured. The Sun Cluster Data Service API contains facilities for querying the set of pairs, specifically, the names_on_subnets field described in the hads(3HA) and haget(1M) man pages.

Most off-the-shelf data service daemons that have been written for Solaris already handle multihomed hosts properly. Many data services do all their network communication by binding to the Solaris wildcard address INADDR_ANY. This automatically causes them to handle all the IP addresses for all the network interfaces. INADDR_ANY effectively binds to all IP addresses currently configured on the machine. A data service daemon that uses INADDR_ANY generally does not have to be changed to handle the Sun Cluster logical host's IP addresses.

1.4.6 Binding to INADDR_ANY Versus Binding to Specific IP Addresses

Even in the non-multihomed case, the Sun Cluster logical host concept allows the machine to have more than one IP address. It has one for its own physical host and one additional IP address for each logical host it currently masters. When a machine becomes the master of a logical host, it dynamically acquires an additional IP address. When it gives up mastery of a logical host, it dynamically relinquishes an IP address.

Some data services cannot work properly using only INADDR_ANY. These data services must dynamically change the set of IP addresses to which they are bound as a logical host is mastered or unmastered. The starting and stopping methods provide the hooks for Sun Cluster to inform the data service that a logical host has appeared or disappeared. One strategy for such a data service to accomplish the rebinding is for its stop and start methods to kill and restart the data service's daemons.

During cluster reconfiguration, there is a relationship between the order in which data service methods are called and the time when the logical host's network addresses are configured by Sun Cluster. See the hareg(1M) man page for details about this relationship.

By the time the data service's stop method returns, the data service should have stopped using the logical host's IP addresses. Similarly, by the time the start_net method returns, the data service should have started to use the logical host's IP addresses. If the data service uses INADDR_ANY rather than binding to individual IP addresses, then there is no problem. If the data service's stop and start methods accomplish their work by killing and restarting the data service's daemons, then the data service stops and starts using the network addresses at the appropriate times.

1.4.7 Client Retry

To a network client, a takeover or switchover appears to be a crash of the logical host followed by a fast reboot. Ideally, the client application and the client-server protocol are structured to do some amount of retrying. If the application and protocol already handle the case of a single server crashing and rebooting, then they also will handle the case of the logical host being taken over or switched over. Some applications might elect to retry endlessly. More sophisticated applications notify the user that a long retry is in progress and allow the user to choose whether or not to continue.