In campus clustering, nodes or groups of nodes are located in separate rooms, sometimes several kilometers apart. In addition to providing the usual benefits of using a Sun cluster, correctly designed campus clusters can generally survive the loss of any single room and continue to provide their services.
This chapter introduces the basic concepts of campus clustering and provides some configuration and setup examples. The following topics are covered:
This chapter does not explain clustering, provide information about clustering administration, or furnish details about hardware installation and configuration. For conceptual and administrative information, see your Sun Cluster concepts documentation and your Sun Cluster system administration documentation, respectively.
When designing your campus cluster, all of the requirements for a standard cluster still apply. Plan your cluster to eliminate any single point of failure in nodes, cluster interconnect, data storage, and public network. Just as in the standard cluster, a campus cluster requires redundant connections and switches. Disk multipathing helps ensure that each node can access each shared storage device. These concerns are universal for Sun Cluster.
After you have a valid cluster plan, follow the requirements in this section to ensure a correct campus cluster. To achieve maximum benefits from your campus cluster, consider implementing the Guidelines for Designing a Campus Cluster.
This chapter describes ways to design your campus cluster using fully tested and supported hardware components and transport technologies. You can also design your campus cluster according to Sun Cluster's specification, regardless of the components used.
To build a specifications-based campus cluster, contact your Sun representative, who will assist you with the design and implementation of your specific configuration. This process ensures that the configuration that you implement complies with the specification guidelines, is interoperable, and is supportable.
Your campus cluster must observe all requirements and limitations of the technologies that you choose to use. Determining Campus Cluster Connection Technologies provides a list of tested technologies and their known limitations.
When planning your cluster interconnect, remember that campus clustering requires redundant network connections.
A campus cluster must include at least two rooms using two independent SANs to connect to the shared storage. See Figure 7–1 for an illustration of this configuration.
If you are using Oracle Real Application Clusters (RAC), all nodes that support Oracle RAC must be fully connected to the shared storage devices. Also, all rooms of a specifications-based campus cluster must be fully connected to the shared storage devices.
See Quorum in Clusters With Four Rooms or More for a description of a campus cluster with both direct and indirect storage connections.
Your campus cluster must use SAN-supported storage devices for shared storage. When planning the cluster, ensure that it adheres to the SAN requirements for all storage connections. See the SAN Solutions documentation site for information about SAN requirements.
Sun Cluster software supports two methods of data replication: host-based replication and storage-based replication. Host-based data replication can mirror a campus cluster's shared data. If one room of the cluster is lost, another room must be able to provide access to the data. Therefore, mirroring between shared disks must always be performed across rooms, rather than within rooms. Both copies of the data should never be located in a single room. Host-based data replication can be a less expensive solution because it uses locally-attached disks and does not require special storage arrays.
An alternative to host-based replication is storage-based replication, which moves the work of data replication off the cluster nodes and onto the storage device. Storage-based data replication can simplify the infrastructure required, which can be useful in campus cluster configurations.
For more information on both types of data replication and supported software, see Chapter 4, Data Replication Approaches, in Sun Cluster System Administration Guide for Solaris OS.
You must use a quorum device for a two-node cluster. For larger clusters, a quorum device is optional. These are standard cluster requirements.
On Sun Cluster 3.2 only, a quorum device can be a storage device or a quorum server.
In addition, you can configure quorum devices to ensure that specific rooms can form a cluster in the event of a failure. For guidelines about where to locate your quorum device, see Deciding How to Use Quorum Devices.
If you use Solaris Volume Manager as your volume manager for shared device groups, carefully plan the distribution of your replicas. In two-room configurations, all disksets should be configured with an additional replica in the room that houses the cluster quorum device.
For example, in three-room two-node configurations, a single room houses both the quorum device and at least one extra disk that is configured in each of the disksets. Each diskset should have extra replicas in the third room.
You can use a quorum disk for these replicas.
Refer to your Solaris Volume Manager documentation for details about configuring diskset replicas.
In planning a campus cluster, your goal is to build a cluster that can at least survive the loss of a room and continue to provide services. The concept of a room must shape your planning of redundant connectivity, storage replication, and quorum. Use the following guidelines to assist in managing these design considerations.
The concept of a room, or location, adds a layer of complexity to the task of designing a campus cluster. Think of a room as a functionally independent hardware grouping, such as a node and its attendant storage, or a quorum device that is physically separated from any nodes. Each room is separated from other rooms to increase the likelihood of failover and redundancy in case of accident or failure. The definition of a room therefore depends on the type of failure to safeguard against, as described in the following table.
Table 7–1 Definitions of “Room”
Failure Scenario |
Sample Definitions of “Room” |
---|---|
Power-line failure |
Isolated and independent power supplies |
Minor accidents, furniture collapse, water seepage |
Different parts of a physical room |
Small fire, fire sprinklers starting |
Different physical areas (for example, sprinkler zone) |
Structural failure, building-wide fire |
Different buildings |
Large-scale natural disaster (for example, earthquake or flood) |
Different corporate campuses up to several kilometers apart |
Sun Cluster does support two-room campus clusters. These clusters are valid and might offer nominal insurance against disasters. However, consider adding a small third room, possibly even a secure closet or vault (with a separate power supply and correct cabling), to contain the quorum device or a third server.
Whenever a two-room campus cluster loses a room, it has only a 50 percent chance of remaining available. If the room with fewest quorum votes is the surviving room, the surviving nodes cannot form a cluster. In this case, your cluster requires manual intervention from your Sun service provider before it can become available.
The advantage of a three-room or larger cluster is that, if any one of the three rooms is lost, automatic failover can be achieved. Only a correctly configured three-room or larger campus cluster can guarantee system availability if an entire room is lost (assuming no other failures).
A three-room campus cluster configuration supports up to eight nodes. Three rooms enable you to arrange your nodes and quorum device so that your campus cluster can reliably survive the loss of a single room and still provide cluster services. Mediators are also supported for three-room campus clusters that use Solaris Volume Manager or multi-owner Solaris Volume Manager. The following example configurations all follow the campus cluster requirements and the design guidelines described in this chapter.
Figure 7–1 shows a three-room, two-node campus cluster. In this arrangement, two rooms each contain a single node and an equal number of disk arrays to mirror shared data. The third room contains at least one disk subsystem, attached to both nodes and configured with a quorum device.
Figure 7–2 shows an alternative three-room, two-node campus cluster.
Figure 7–3 shows a three-room, three-node cluster. In this arrangement, two rooms each contain one node and an equal number of disk arrays. The third room contains a small server, which eliminates the need for a storage array to be configured as a quorum device.
Mediators for three-room campus clusters that use Solaris Volume Manager or multi-owner Solaris Volume Manager are supported. The third mediator host exists outside the campus cluster and does not need to be attached to the shared storage. See Solaris Volume Manager Three-Mediator Support for more information.
These examples illustrate general configurations and are not intended to indicate required or recommended setups. For simplicity, the diagrams and explanations concentrate only on features that are unique to understanding campus clustering. For example, public-network Ethernet connections are not shown.
In the configuration that is shown in the following figure, if at least two rooms are up and communicating, recovery is automatic. Only three-room or larger configurations can guarantee that the loss of any one room can be handled automatically.
In the configuration shown in the following figure, one room contains one node and shared storage. A second room contains a cluster node only. The third room contains shared storage only. A LUN or disk of the storage device in the third room is configured as a quorum device.
This configuration provides the reliability of a three-room cluster with minimum hardware requirements. This campus cluster can survive the loss of any single room without requiring manual intervention.
In the configuration that is shown in the preceding figure, a server acts as the quorum vote in the third room. This server does not necessarily support data services. Instead, it replaces a storage device as the quorum device.
Sun Cluster software supports mediators for three-room campus cluster configurations that use Solaris Volume Manager or multi-owner Solaris Volume Manager for Sun Cluster. A two-room (two-node) campus cluster can work with a third mediator host outside the cluster. The third mediator host does not have to be attached to the shared storage that contains the disk set for which the host is a mediator.
The mediator host uses Solaris Volume Manager to facilitate automatic recovery for a two-room campus cluster by tracking which mirrored half of the storage is the most up to date. The third mediator then provides mediator quorum to allow Solaris Volume Manager to recover from a destroyed room.
Use the following guidelines to configure dual-string mediators:
A disk set can have up to three mediator hosts
The mediator host no longer needs to be part of the cluster
Mediators that are configured for disk sets must meet the existing two-string disk set criteria
The entire campus cluster can have more than two nodes
An N+1 cluster and other topologies are permitted
To add the third mediator host, follow the instructions in How to Add Mediator Hosts in Sun Cluster Software Installation Guide for Solaris OS. See the appropriate documentation for Sun Cluster 3.1 or Sun Cluster 3.2 software.
When adding quorum devices to your campus cluster, your goal should be to balance the number of quorum votes in each room. No single room should have a much larger number of votes than the other rooms because loss of that room can bring the entire cluster down.
For campus clusters with more than three rooms and three nodes, quorum devices are optional. Whether you use quorum devices in such a cluster, and where you place them, depends on your assessment of the following:
Your particular cluster topology
The specific characteristics of the rooms involved
Resiliency requirements for your cluster
As with two-room clusters, locate the quorum device in a room you determine is more likely to survive any failure scenario. Alternatively, you can locate the quorum device in a room that you want to form a cluster, in the event of a failure. Use your understanding of your particular cluster requirements to balance these two criteria.
Refer to your Sun Cluster concepts documentation for general information about quorum devices and how they affect clusters that experience failures. If you decide to use one or more quorum devices, consider the following recommended approach:
For each room, total the quorum votes (nodes) for that room.
Define a quorum device in the room that contains the lowest number of votes and that contains a fully connected shared storage device.
When your campus cluster contains more than two nodes, do not define a quorum device if each room contains the same number of nodes.
The following sections discuss quorum devices in various sizes of campus clusters.
The following figure illustrates a four-node campus cluster with fully connected storage. Each node is in a separate room. Two rooms also contain the shared storage devices, with data mirrored between them.
Note that the quorum devices are marked optional in the illustration. This cluster does not require a quorum device. With no quorum devices, the cluster can still survive the loss of any single room.
Consider the effect of adding Quorum Device A. Because the cluster contains four nodes, each with a single quorum vote, the quorum device receives three votes. Four votes (one node and the quorum device, or all four nodes) are required to form the cluster. This configuration is not optimal, because the loss of Room 1 brings down the cluster. The cluster is not available after the loss of that single room.
If you then add Quorum Device B, both Room 1 and Room 2 have four votes. Six votes are required to form the cluster. This configuration is clearly better, as the cluster can survive the random loss of any single room.
In Figure 7–4, the cluster interconnect is not shown.
Consider the optional I/O connection between Room 1 and Room 4. Although fully connected storage is preferable for reasons of redundancy and reliability, fully redundant connections might not always be possible in campus clusters. Geography might not accommodate a particular connection, or the project's budget might not cover the additional fiber.
In such a case, you can design a campus cluster with indirect access between some nodes and the storage. In Figure 7–4, if the optional I/O connection is omitted, Node 4 must access the storage indirectly.
In three-room, two-node campus clusters, you should use the third room for the quorum device (Figure 7–1) or a server (Figure 7–3). Isolating the quorum device gives your cluster a better chance to maintain availability after the loss of one room. If at least one node and the quorum device remain operational, the cluster can continue to operate.
In two-room configurations, the quorum device occupies the same room as one or more nodes. Place the quorum device in the room that is more likely to survive a failure scenario if all cluster transport and disk connectivity are lost between rooms. If only cluster transport is lost, the node that shares a room with the quorum device is not necessarily the node that reserves the quorum device first. For more information about quorum and quorum devices, see the Sun Cluster concepts documentation.
This section lists example technologies for the private cluster interconnect and for the data paths and their various distance limits. In some cases, it is possible to extend these limits. For more information, ask your Sun representative.
The following table lists example node-to-node link technologies and their limitations.
Table 7–2 Campus Cluster Interconnect Technologies and Distance Limits
Link Technology |
Maximum Distance |
Comments |
---|---|---|
100 Mbps Ethernet |
100 meters per segment |
unshielded twisted pair (UTP) |
1000 Mbps Ethernet |
100 meters per segment |
UTP |
1000 Mbps Ethernet |
260 meters per segment |
62.5/125 micron multimode fiber (MMF) |
1000 Mbps Ethernet |
550 meters per segment |
50/125 micron MMF |
1000 Mbps Ethernet (FC) |
10 kilometers at 1 Gbps |
9/125 micron single-mode fiber (SMF) |
DWDM |
200 kilometers and up | |
Other |
Consult your Sun representative |
Always check your vendor documentation for technology-specific requirements and limitations.
The following table lists example link technologies for the cluster data paths and the distance limits for a single interswitch link (ISL).
Table 7–3 ISL Limits
Link Technology |
Maximum Distance |
Comments |
---|---|---|
FC short-wave gigabit interface converter (GBIC) |
500 meters at 1 Gbps |
50/125 micron MMF |
FC long-wave GBIC |
10 kilometers at 1 Gbps |
9/125 micron SMF |
FC short-wave small form-factor pluggable (SFP) |
300 meters at 2 Gbps |
62.5/125 micron MMF |
FC short-wave SFP |
500 meters at 2 Gbps |
62.5/125 micron MMF |
FC long-wave SFP |
10 kilometers at 2 Gbps |
9/125 micron SMF |
DWDM |
200 kilometers and up | |
Other |
Consult your Sun representative |
Generally, using interconnect, storage, and Fibre Channel (FC) hardware does not differ markedly from standard cluster configurations.
The steps for installing Ethernet-based campus cluster interconnect hardware are the same as the steps for standard clusters. Refer to Installing Ethernet or InfiniBand Cluster Interconnect Hardware. When installing the media converters, consult the accompanying documentation, including requirements for fiber connections.
The guidelines for installing virtual local area networks interconnect networks are the same as the guidelines for standard clusters. See Configuring VLANs as Private Interconnect Networks.
The steps for installing shared storage are the same as the steps for standard clusters. Refer to the Sun Cluster Hardware Administration Collection for Solaris OS for those steps.
Campus clusters require FC switches to mediate between multimode and single-mode fibers. The steps for configuring the settings on the FC switches are very similar to the steps for standard clusters.
If your switch supports flexibility in the buffer allocation mechanism, (for example the QLogic switch with donor ports), make certain you allocate a sufficient number of buffers to the ports that are dedicated to interswitch links (ISLs). If your switch has a fixed number of frame buffers (or buffer credits) per port, you do not have this flexibility.
The following rules determine the number of buffers that you might need:
For 1 Gbps, calculate buffer credits as:
(length-in-km) x (0.6)
Round the result up to the next whole number. For example, a 10 km connection requires 6 buffer credits, and a 7 km connection requires 5 buffer credits.
For 2 Gbps, calculate buffer credits as:
(length-in-km) x (1.2)
Round the result up to the next whole number. For example, a 10 km connection requires 12 buffer credits, while a 7 km connection requires 9 buffer credits.
For greater speeds or for more details, refer to your switch documentation for information about computing buffer credits.
While detailing all of the configurations that are possible in campus clustering is beyond the scope of this document, the following illustrations depict variations on the configurations that were previously shown.
Three-room campus cluster with a multipathing solution implemented (Figure 7–5)
Two-room campus cluster with a multipathing solution implemented (Figure 7–6)
Two-room campus cluster without a multipathing solution implemented (Figure 7–7)
Figure 7–6 shows a two-room campus cluster that uses partner pairs of storage devices and four FC switches, with a multipathing solution implemented. The four switches are added to the cluster for greater redundancy and potentially better I/O throughput. Other possible configurations that you could implement include using Sun StorEdge T3 partner groups or Sun StorEdge 9910/9960 arrays with Sun StorEdge Traffic Manager or Sun StorEdge Traffic Manager software installed.
For information about Traffic Manager software for the Solaris 9 OS, see the Sun StorEdge Traffic Manager Installation and Configuration Guide at http://dlc.sun.com/pdf/817-3674-12/817-3674-12.pdf. For information about Solaris I/O multipathing software for the Solaris 10 OS, see the Solaris Fibre Channel Storage Configuration and Multipathing Support Guide.
The configuration in the following figure could be implemented by using Sun StorEdge T3 or T3+ arrays in single-controller configurations, rather than partner groups.