Sun Cluster 3.1 Concepts Guide

Chapter 2 Key Concepts for Hardware Service Providers

This chapter describes the key concepts related to the hardware components of a SunPlex system configuration. The topics covered include:

The SunPlex System Hardware Components

This information is directed primarily toward hardware service providers. These concepts can help service providers understand the relationships between the hardware components before they install, configure, or service cluster hardware. Cluster system administrators might also find this information useful as background to installing, configuring, and administering cluster software.

A cluster is composed of several hardware components including:

Cluster nodes with local disks (unshared)
Multihost storage (disks shared between nodes)
Removable media (tapes and CD-ROM)
Cluster interconnect
Public network interfaces
Client systems
Administrative console
Console access devices

The SunPlex system enables you to combine these components into a variety of configurations, described in Sun Cluster Topology Examples.

The following figure shows a sample cluster configuration.

Figure 2–1 Sample Two-Node Cluster Configuration

Illustration's purpose is to show a 2-node cluster with public/private networks, interconnect hardware, local/multihost disks, console, and clients.

Cluster Nodes

A cluster node is a machine running both the Solaris operating environment and Sun Cluster software, and is either a current member of the cluster (a cluster member), or a potential member. The Sun Cluster software enables you to have from two to eight nodes in a cluster. See Sun Cluster Topology Examples for the supported node configurations.

Cluster nodes are generally attached to one or more multihost disks. Nodes not attached to multihost disks use the cluster file system to access the multihost disks. For example, one scalable services configuration allows nodes to service requests without being directly attached to multihost disks.

In addition, nodes in parallel database configurations share concurrent access to all the disks. See Multihost Disks and Chapter 3, Key Concepts for Administration and Application Development for more information on parallel database configurations.

All nodes in the cluster are grouped under a common name, known as the cluster name. The cluster name is used to access and manage the cluster.

Public network adapters attach nodes to the public networks, providing client access to the cluster.

Cluster members communicate with the other nodes in the cluster through one or more physically independent networks. This set of physically independent networks is referred to as the cluster interconnect.

Every node in the cluster is aware when another node joins or leaves the cluster. Additionally, every node in the cluster is aware of the resources that are running locally as well as the resources that are running on the other cluster nodes.

Nodes in the same cluster should have similar processing, memory, and I/O capability to enable failover to occur without significant degradation in performance. Because of the possibility of failover, every node must have enough excess capacity to take on the workload of all nodes for which they are a backup or secondary.

Each node boots its own individual root (/) file system.

Software Components for Cluster Hardware Members

To function as a cluster member, the following software must be installed:

Solaris operating environment
Sun Cluster software
Data service application
Volume management (Solaris Volume Manager^TM or VERITAS Volume Manager)

An exception is a configuration that uses hardware redundant array of independent disks (RAID). This configuration may not require a software volume manager such as Solaris Volume Manager or VERITAS Volume Manager.

See the Sun Cluster 3.1 System Administration Guide for information on how to install the Solaris operating environment, Sun Cluster, and volume management software.

See the Sun Cluster 3.1 Data Service Collection for information on how to install and configure data services.

See Chapter 3, Key Concepts for Administration and Application Development for conceptual information on the preceding software components.

The following figure provides a high-level view of the software components that work together to create the Sun Cluster software environment.

Figure 2–2 High-Level Relationship of Sun Cluster Software Components

Illustration: The preceding context describes the graphic.

See Chapter 4, Frequently Asked Questions for questions and answers about cluster members.

Multihost Disks

Sun Cluster requires multihost disk storage: disks that can be connected to more than one node at a time. In the Sun Cluster environment, multihost storage makes disks highly available.

Multihost disks have the following characteristics.

They can tolerate single node failures.
They store application data and can also store application binaries and configuration files.
They protect against node failures. If client requests are accessing the data through one node and it fails, the requests are switched over to use another node that has a direct connection to the same disks.
They are either accessed globally through a primary node that “masters” the disks, or by direct concurrent access through local paths. The only application that uses direct concurrent access currently is OPS.

A volume manager provides for mirrored or RAID-5 configurations for data redundancy of the multihost disks. Currently, Sun Cluster supports Solaris Volume Manager^TM and VERITAS Volume Manager as volume managers, and the RDAC RAID-5 hardware controller on several hardware RAID platforms.

Combining multihost disks with disk mirroring and striping protects against both node failure and individual disk failure.

See Chapter 4, Frequently Asked Questions for questions and answers about multihost storage.

Multi-Initiator SCSI

This section applies only to SCSI storage devices and not to Fibre Channel storage used for the multihost disks.

In a standalone server, the server node controls the SCSI bus activities by way of the SCSI host adapter circuit connecting this server to a particular SCSI bus. This SCSI host adapter circuit is referred to as the SCSI initiator. This circuit initiates all bus activities for this SCSI bus. The default SCSI address of SCSI host adapters in Sun systems is 7.

Cluster configurations share storage between multiple server nodes, using multihost disks. When the cluster storage consists of singled-ended or differential SCSI devices, the configuration is referred to as multi-initiator SCSI. As this terminology implies, more than one SCSI initiator exists on the SCSI bus.

The SCSI specification requires that each device on a SCSI bus has a unique SCSI address. (The host adapter is also a device on the SCSI bus.) The default hardware configuration in a multi-initiator environment results in a conflict because all SCSI host adapters default to 7.

To resolve this conflict, on each SCSI bus, leave one of the SCSI host adapters with the SCSI address of 7, and set the other host adapters to unused SCSI addresses. Proper planning dictates that these “unused” SCSI addresses include both currently and eventually unused addresses. An example of addresses unused in the future is the addition of storage by installing new drives into empty drive slots. In most configurations, the available SCSI address for a second host adapter is 6.

You can change the selected SCSI addresses for these host adapters by setting the scsi-initiator-id Open Boot PROM (OBP) property. You can set this property globally for a node or on a per-host-adapter basis. Instructions for setting a unique scsi-initiator-id for each SCSI host adapter are included in the chapter for each disk enclosure in the Sun Cluster 3.1 Hardware Collection.

Local Disks

Local disks are the disks that are only connected to a single node. They are, therefore, not protected against node failure (not highly available). However, all disks, including local disks, are included in the global namespace and are configured as global devices. Therefore, the disks themselves are visible from all cluster nodes.

You can make the file systems on local disks available to other nodes by putting them under a global mount point. If the node that currently has one of these global file systems mounted fails, all nodes lose access to that file system. Using a volume manager lets you mirror these disks so that a failure cannot cause these file systems to become inaccessible, but volume managers do not protect against node failure.

See the section Global Devices for more information about global devices.

Removable Media

Removable media such as tape drives and CD-ROM drives are supported in a cluster. In general, you install, configure, and service these devices in the same way as in a non-clustered environment. These devices are configured as global devices in Sun Cluster, so each device can be accessed from any node in the cluster. Refer to the Sun Cluster 3.x Hardware Administration Collection for information on installing and configuring removable media.

See the section Global Devices for more information about global devices.

Cluster Interconnect

The cluster interconnect is the physical configuration of devices used to transfer cluster-private communications and data service communications between cluster nodes. Because the interconnect is used extensively for cluster-private communications, it can limit performance.

Only cluster nodes can be connected to the cluster interconnect. The Sun Cluster security model assumes that only cluster nodes have physical access to the cluster interconnect.

All nodes must be connected by the cluster interconnect through at least two redundant physically independent networks, or paths, to avoid a single point of failure. You can have several physically independent networks (two to six) between any two nodes. The cluster interconnect consists of three hardware components: adapters, junctions, and cables.

The following list describes each of these hardware components.

Adapters: The network interface cards that reside in each cluster node. Their names are constructed from a device name immediately followed by a physical-unit number, for example, qfe2. Some adapters have only one physical network connection, but others, like the qfe card, have multiple physical connections. Some also contain both network interfaces and storage interfaces.

A network adapter with multiple interfaces could become a single point of failure if the entire adapter fails. For maximum availability, plan your cluster so that the only path between two nodes does not depend on a single network adapter.
Junctions: The switches that reside outside of the cluster nodes. They perform pass-through and switching functions to enable you to connect more than two nodes together. In a two-node cluster, you do not need junctions because the nodes can be directly connected to each other through redundant physical cables connected to redundant adapters on each node. Greater than two-node configurations generally require junctions.
Cables: The physical connections that go either between two network adapters or between an adapter and a junction.

See Chapter 4, Frequently Asked Questions for questions and answers about the cluster interconnect.

Public Network Interfaces

Clients connect to the cluster through the public network interfaces. Each network adapter card can connect to one or more public networks, depending on whether the card has multiple hardware interfaces. You can set up nodes to include multiple public network interface cards configured so that multiple cards are active, and serve as failover backups for one another. If one of the adapters fails, IP Network Multipathing software is called to fail over the defective interface to another adapter in the group.

No special hardware considerations relate to clustering for the public network interfaces.

See Chapter 4, Frequently Asked Questions for questions and answers about public networks.

Client Systems

Client systems include workstations or other servers that access the cluster over the public network. Client-side programs use data or other services provided by server-side applications running on the cluster.

Client systems are not highly available. Data and applications on the cluster are highly available.

See Chapter 4, Frequently Asked Questions for questions and answers about client systems.

Console Access Devices

You must have console access to all cluster nodes. To gain console access, use the terminal concentrator purchased with your cluster hardware, the System Service Processor (SSP) on Sun Enterprise E10000^TM servers, the system controller on Sun Fire^TM servers, or another device that can access ttya on each node.

Only one supported terminal concentrator is available from Sun and use of the supported Sun terminal concentrator is optional. The terminal concentrator enables access to /dev/console on each node by using a TCP/IP network. The result is console-level access for each node from a remote workstation anywhere on the network.

The System Service Processor (SSP) provides console access for Sun Enterprise E10000 servers. The SSP is a machine on an Ethernet network that is configured to support the Sun Enterprise E10000 server. The SSP is the administrative console for the Sun Enterprise E10000 server. Using the Sun Enterprise E10000 Network Console feature, any workstation in the network can open a host console session.

Other console access methods include other terminal concentrators, tip(1) serial port access from another node and dumb terminals. You can use Sun^TM keyboards and monitors, or other serial port devices if your hardware service provider supports them.

Administrative Console

You can use a dedicated SPARCstation^TM system, known as the administrative console, to administer the active cluster. Usually, you install and run administrative tool software, such as the Cluster Control Panel (CCP) and the Sun Cluster module for the Sun Management Center^TM product, on the administrative console. Using cconsole under the CCP enables you to connect to more than one node console at a time. For more information on using the CCP, see the Sun Cluster 3.1 System Administration Guide.

The administrative console is not a cluster node. You use the administrative console for remote access to the cluster nodes, either over the public network, or optionally through a network-based terminal concentrator. If your cluster consists of the Sun Enterprise E10000 platform, you must have the ability to log in from the administrative console to the System Service Processor (SSP) and connect by using the netcon(1M) command.

Typically, you configure nodes without monitors. Then, you access the node's console through a telnet session from the administrative console, which is connected to a terminal concentrator, and from the terminal concentrator to the node's serial port. (In the case of a Sun Enterprise E10000 server, you connect from the System Service Processor.) See Console Access Devices for more information.

Sun Cluster does not require a dedicated administrative console, but using one provides these benefits:

Enables centralized cluster management by grouping console and management tools on the same machine

Provides potentially quicker problem resolution by your hardware service provider

See Chapter 4, Frequently Asked Questions for questions and answers about the administrative console.

Sun Cluster Topology Examples

A topology is the connection scheme that connects the cluster nodes to the storage platforms used in the cluster.

Sun Cluster supports the following topologies:

Clustered pairs
Pair+N
N+1 (star)

The following sections include sample diagrams of each topology.

Clustered Pairs Topology

A clustered pairs topology is two or more pairs of nodes operating under a single cluster administrative framework. In this configuration, failover occurs only between a pair. However, all nodes are connected by the cluster interconnect and operate under Sun Cluster software control. You might use this topology to run a parallel database application on one pair and a failover or scalable application on another pair.

Using the cluster file system, you could also have a two-pair configuration where more than two nodes run a scalable service or parallel database even though all of the nodes are not directly connected to the disks that store the application data.

The following figure illustrates a clustered pair configuration.

Figure 2–3 Clustered Pairs Topology

Pair+N Topology

The pair+N topology includes a pair of nodes directly connected to shared storage and an additional set of nodes that use the cluster interconnect to access shared storage—they have no direct connection themselves.

The following figure illustrates a pair+N topology where two of the four nodes (Node 3 and Node 4) use the cluster interconnect to access the storage. This configuration can be expanded to include additional nodes that do not have direct access to the shared storage.

Figure 2–4 Pair+N Topology

N+1 (Star) Topology

An N+1 topology includes some number of primary nodes and one secondary node. You do not have to configure the primary nodes and secondary node identically. The primary nodes actively provide application services. The secondary node need not be idle while waiting for a primary to fail.

The secondary node is the only node in the configuration that is physically connected to all the multihost storage.

If a failure occurs on a primary, Sun Cluster fails over the resources to the secondary, where the resources function until they are switched back (either automatically or manually) to the primary.

The secondary must always have enough excess CPU capacity to handle the load if one of the primaries fails.

The following figure illustrates an N+1 configuration.