Sun Cluster Overview for Solaris OS

Chapter 3 Sun Cluster Architecture

Sun Cluster architecture permits a group of systems to be deployed, managed, and viewed as a single, large system.

This chapter contains the following sections:

Sun Cluster Hardware Environment

The following hardware components make up a cluster:

Figure 3–1 illustrates how the hardware components work with each other.

Figure 3–1 Sun Cluster Hardware Components

Illustration: A two-node cluster with public and private networks,
interconnect hardware, local and multihost disks, console, and clients.

Sun Cluster Software Environment

To function as a cluster member, a node must have the following software installed:

Figure 3–2 shows a high-level view of the software components that work together to create the Sun Cluster software environment.

Figure 3–2 Sun Cluster Software Architecture

Illustration: Sun Cluster software components such as the RGM,
CMM, CCR, volume managers, and the cluster file system.

Cluster Membership Monitor

To ensure that data is safe from corruption, all nodes must reach a consistent agreement on the cluster membership. When necessary, the CMM coordinates a cluster reconfiguration of cluster services in response to a failure.

The CMM receives information about connectivity to other nodes from the cluster transport layer. The CMM uses the cluster interconnect to exchange state information during a reconfiguration.

After detecting a change in cluster membership, the CMM performs a synchronized configuration of the cluster. In this configuration, cluster resources might be redistributed, based on the new membership of the cluster.

The CMM runs entirely in the kernel.

Cluster Configuration Repository (CCR)

The CCR relies on the CMM to guarantee that a cluster is running only when quorum is established. The CCR is responsible for verifying data consistency across the cluster, performing recovery as necessary, and facilitating updates to the data.

Cluster File Systems

A cluster file system is a proxy between the following:

Cluster file systems are dependent on global devices (disks, tapes, CD-ROMs). The global devices can be accessed from any node in the cluster through the same file name (for example, /dev/global/). That node does not need a physical connection to the storage device. You can use a global device the same as a regular device, that is, you can create a file system on a global device by using newfs or mkfs.

The cluster file system has the following features:

Scalable Data Services

The primary goal of cluster networking is to provide scalability for data services. Scalability means that as the load offered to a service increases, a data service can maintain a constant response time to this increased workload as new nodes are added to the cluster and new server instances are run. A good example of a scalable data service is a web service. Typically, a scalable data service is composed of several instances, each of which runs on different nodes of the cluster. Together, these instances behave as a single service for a remote client of that service and implement the functionality of the service. A scalable web service with several httpd daemons that run on different nodes can have any daemon serve a client request. The daemon that serves the request depends on a load-balancing policy. The reply to the client appears to come from the service, not the particular daemon that serviced the request, thus preserving the single-service appearance.

The following figure depicts the scalable service architecture.

Figure 3–3 Scalable Data Service Architecture

Illustration: A data service request running on multiple nodes.

The nodes that are not hosting the global interface (proxy nodes) have the shared address hosted on their loopback interfaces. Packets that are coming into the global interface are distributed to other cluster nodes, based on configurable load-balancing policies. The possible load-balancing policies are described next.

Load-Balancing Policies

Load balancing improves performance of the scalable service, both in response time and in throughput.

Two classes of scalable data services exist: pure and sticky. A pure service is one where any instance can respond to client requests. A sticky service has the cluster balancing the load for requests to the node. Those requests are not redirected to other instances.

A pure service uses a weighted load-balancing policy. Under this load-balancing policy, client requests are by default uniformly distributed over the server instances in the cluster. For example, in a three-node cluster where each node has the weight of 1, each node services one-third of the requests from any client on behalf of that service. Weights can be changed at any time through the scrgadm(1M) command interface or through the SunPlex Manager GUI.

A sticky service has two types: ordinary sticky and wildcard sticky. Sticky services allow concurrent application-level sessions over multiple TCP connections to share in-state memory (application session state).

Ordinary sticky services permit a client to share state between multiple concurrent TCP connections. The client is said to be “sticky” toward the server instance listening on a single port. The client is guaranteed that all requests go to the same server instance, if that instance remains up and accessible and the load balancing policy is not changed while the service is online.

Wildcard sticky services use dynamically assigned port numbers, but still expect client requests to go to the same node. The client is “sticky wildcard” over ports toward the same IP address.

Multihost Disk Storage

Sun Cluster software makes disks highly available by utilizing multihost disk storage, which can be connected to more than one node at a time. Volume management software can be used to arrange these disks into shared storage that is mastered by a cluster node. The disks are then configured to move to another node if a failure occurs. The use of multihosted disks in Sun Cluster systems provides a variety of benefits, including the following:

Cluster Interconnect

All nodes must be connected by the cluster interconnect through at least two redundant physically independent networks, or paths, to avoid a single point of failure. While two interconnects are required for redundancy, up to six can be used to spread traffic to avoid bottlenecks and improve redundancy and scalability. The Sun Cluster interconnect uses Fast Ethernet, Gigabit-Ethernet, InfiniBand, Sun Fire Link, or the Scalable Coherent Interface (SCI, IEEE 1596-1992), enabling high-performance cluster-private communications.

In clustered environments, high-speed, low-latency interconnects and protocols for internode communications are essential. The SCI interconnect in Sun Cluster systems offers improved performance over typical network interface cards (NICs). Sun Cluster uses the Remote Shared Memory (RSMTM) interface for internode communication across a Sun Fire Link network. RSM is a Sun messaging interface that is highly efficient for remote memory operations.

The RSM Reliable Datagram Transport (RSMRDT) driver consists of a driver that is built on top of the RSM API and a library that exports the RSMRDT-API interface. The driver provides enhanced Oracle Parallel Server/Real Application Clusters performance. The driver also enhances load-balancing and high-availability (HA) functions by providing them directly inside the driver, making them available to the clients transparently.

The cluster interconnect consists of the following hardware components:

Figure 3–4 shows how the three components are connected.

Figure 3–4 Cluster Interconnect

Illustration: Two nodes connected by a transport adapter, cables,
and a transport junction.

IP Network Multipathing Groups

Public network adapters are organized into IP multipathing groups (multipathing groups). Each multipathing group has one or more public network adapters. Each adapter in a multipathing group can be active, or you can configure standby interfaces that are inactive unless a failover occurs.

Multipathing groups provide the foundation for logical hostname and shared address resources. The same multipathing group on a node can host any number of logical hostname or shared address resources. To monitor public network connectivity of cluster nodes, you can create multipathing.

For more information about logical hostname and shared address resources, see the Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Public Network Interfaces

Clients connect to the cluster through the public network interfaces. Each network adapter card can connect to one or more public networks, depending on whether the card has multiple hardware interfaces. You can set up nodes to include multiple public network interface cards that are configured so that multiple cards are active, and serve as failover backups for one another. If one of the adapters fails, the Solaris Internet Protocol (IP) network multipathing software on Sun Cluster is called to fail over the defective interface to another adapter in the group.