Sun Cluster Concepts Guide for Solaris OS

High-Availability Framework

The Sun Cluster software makes all components on the “path” between users and data highly available, including network interfaces, the applications themselves, the file system, and the multihost devices. In general, a cluster component is highly available if it survives any single (software or hardware) failure in the system.

The following table shows the kinds of Sun Cluster component failures (both hardware and software) and the kinds of recovery that are built into the high-availability framework.

Table 3–1 Levels of Sun Cluster Failure Detection and Recovery


Failed Cluster Component	Software Recovery	Hardware Recovery
Data service	HA API, HA framework	Not applicable
Public network adapter	IP network multipathing	Multiple public network adapter cards
Cluster file system	Primary and secondary replicas	Multihost devices
Mirrored multihost device	Volume management (Solaris Volume Manager and VERITAS Volume Manager)	Hardware RAID-5 (for example, Sun StorEdge^TM A3x00)
Global device	Primary and secondary replicas	Multiple paths to the device, cluster transport junctions
Private network	HA transport software	Multiple private hardware-independent networks
Node	CMM, failfast driver	Multiple nodes
Zone	HA API, HA framework	Not applicable

Sun Cluster software's high-availability framework detects a node or zone failure quickly and creates a new equivalent server for the framework resources on a remaining node or zone in the cluster. At no time are all framework resources unavailable. Framework resources that are unaffected by a crashed node or zone are fully available during recovery. Furthermore, framework resources of the failed node or zone become available as soon as they are recovered. A recovered framework resource does not have to wait for all other framework resources to complete their recovery.

Most highly available framework resources are recovered transparently to the applications (data services) that are using the resource. The semantics of framework resource access are fully preserved across node or zone failure. The applications cannot detect that the framework resource server has been moved to another node. Failure of a single node is completely transparent to programs on remaining nodes by using the files, devices, and disk volumes that are attached to this node. This transparency exists if an alternative hardware path exists to the disks from another node. An example is the use of multihost devices that have ports to multiple nodes.

Zone Membership

Sun Cluster software also tracks zone membership by detecting when a zone boots up or halts. These changes also trigger a reconfiguration. A reconfiguration can redistribute cluster resources among the nodes and zones in the cluster.

Cluster Membership Monitor

To ensure that data is kept safe from corruption, all nodes must reach a consistent agreement on the cluster membership. When necessary, the CMM coordinates a cluster reconfiguration of cluster services (applications) in response to a failure.

The CMM receives information about connectivity to other nodes from the cluster transport layer. The CMM uses the cluster interconnect to exchange state information during a reconfiguration.

After detecting a change in cluster membership, the CMM performs a synchronized configuration of the cluster. In a synchronized configuration, cluster resources might be redistributed, based on the new membership of the cluster.

See About Failure Fencing for more information about how the cluster protects itself from partitioning into multiple separate clusters.

Failfast Mechanism

The failfast mechanism detects a critical problem in either the global zone or in a non-global zone on a node. The action that Sun Cluster takes when failfast detects a problem depends on whether the problem occurs in the global zone or a non-global zone.

If the critical problem is located in the global zone, Sun Cluster forcibly shuts down the node. Sun Cluster then removes the node from cluster membership.

If the critical problem is located in a non-global zone, Sun Cluster reboots that non-global zone.

If a node loses connectivity with other nodes, the node attempts to form a cluster with the nodes with which communication is possible. If that set of nodes does not form a quorum, Sun Cluster software halts the node and “fences” the node from shared storage. See About Failure Fencing for details about this use of failfast.

If one or more cluster-specific daemons die, Sun Cluster software declares that a critical problem has occurred. Sun Cluster software runs cluster-specific daemons in both the global zone and in non-global zones. If a critical problem occurs, Sun Cluster either shuts down and removes the node or reboots the non-global zone where the problem occurred.

When a cluster-specific daemon that runs in a non-global zone fails, a message similar to the following is displayed on the console.

cl_runtime: NOTICE: Failfast: Aborting because "pmfd" died in zone "zone4" (zone id 3)
35 seconds ago.

When a cluster-specific daemon that runs in the global zone fails and the node panics, a message similar to the following is displayed on the console.

panic[cpu1]/thread=2a10007fcc0: Failfast: Aborting because "pmfd" died in zone "global" (zone id 0)
35 seconds ago.
409b8 cl_runtime:__0FZsc_syslog_msg_log_no_argsPviTCPCcTB+48 (70f900, 30, 70df54, 407acc, 0)
%l0-7: 1006c80 000000a 000000a 10093bc 406d3c80 7110340 0000000 4001 fbf0

After the panic, the node might reboot and attempt to rejoin the cluster. Alternatively, if the cluster is composed of SPARC based systems, the node might remain at the OpenBoot^TM PROM (OBP) prompt. The next action of the node is determined by the setting of the auto-boot? parameter. You can set auto-boot? with the eeprom command, at the OpenBoot PROM ok prompt. See the eeprom(1M) man page.

Cluster Configuration Repository (CCR)

The CCR uses a two-phase commit algorithm for updates: An update must be successfully completed on all cluster members or the update is rolled back. The CCR uses the cluster interconnect to apply the distributed updates.

Caution –

Although the CCR consists of text files, never edit the CCR files yourself. Each file contains a checksum record to ensure consistency between nodes. Updating CCR files yourself can cause a node or the entire cluster to stop functioning.

The CCR relies on the CMM to guarantee that a cluster is running only when quorum is established. The CCR is responsible for verifying data consistency across the cluster, performing recovery as necessary, and facilitating updates to the data.