Sun Cluster 2.2 Software Installation Guide

Configuration Rules for Improved Reliability

The rules discussed in this section help ensure that your Sun Cluster configuration is highly available. These rules also help determine the appropriate hardware for your configuration.

Although it is not required in some configurations, in general the Sun Cluster nodes should have identical local hardware. This means that if one cluster node is configured with two FC/S cards, then all Sun Cluster nodes in the cluster also should have two FC/S cards.
Identify "redundant" hardware components on each node and plan their placement to prevent the loss of both components in the event of a single hardware failure. For example, consider the private networks on the E10,000 system. The minimum configuration consists of two I/O boards, each supporting one of the private network connections and one of the multihost disk connections. A localized failure on an I/O board is unlikely to affect both private network connections, or both multihost disk connections.

Configuring redundant hardware is not always possible--some configurations might contain only one system board--but some of the concerns can still be addressed easily with hardware options. For example, in an Ultra Enterprise^(TM) 2 Cluster with two SPARCstorage Arrays, one private network can be connected to a Sun Quad FastEthernet^(TM) Controller card (SQEC), while the other private network can be connected to the on-board interface.

Mirroring Guidelines

Unless you are using a RAID5 configuration, all multihost disks must be mirrored in Sun Cluster configurations. This enables the configuration to tolerate single-disk failures.

Consider these points when mirroring multihost disks:

Each submirror of a given mirror or plex should reside in a different multihost disk expansion unit.
Mirroring doubles the amount of necessary disk space.
Three-way mirroring is supported by Solstice DiskSuite and VERITAS Volume Manager. However, only two-way mirroring is required by Sun Cluster.
Under Solstice DiskSuite, mirrors are made up of other metadevices such as concatenations or stripes. Large configurations might contain a large number of metadevices. For example, seven metadevices are created for each logging UFS file system.
If you mirror to a disk of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.

Mirroring Root (`/`)

For maximum availability, you should mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VERITAS Volume Manager, this means encapsulating the root disk and mirroring the generated subdisks. However, mirroring the root disk is not a requirement of Sun Cluster.

You should consider the risks, complexity, cost, and service time for the various alternatives concerning the root disk. There is not one answer for all configurations. You might want to consider your local Enterprise Services representative's preferred solution when deciding whether to mirror root.

Refer to your volume manager documentation for instructions on mirroring root.

Consider the following issues when deciding whether to mirror the root file system.

Mirroring root adds complexity to system administration and complicates booting in single user mode.
Regardless of whether or not you mirror root, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors; only a backup plan can allow you to restore files which have been accidentally altered or deleted.
Under Solstice DiskSuite, in failure scenarios in which metadevice state database quorum is lost, you cannot reboot the system until maintenance is performed.
Refer to the discussion on metadevice state database and state database replicas in the Solstice DiskSuite documentation.
Highest availability includes mirroring root on a separate controller.
You might regard a sibling node as the "mirror" and allow a takeover to occur in the event of a local disk drive failure. Later, when the disk is repaired, you can copy over data from the root disk on the sibling node.

Note, however, that there is nothing in the Sun Cluster software that guarantees an immediate takeover. In fact, the takeover might not occur at all. For example, presume some sectors of a disk are bad. Presume they are all in the user data portions of a file that is crucial to some data service. The data service will start getting I/O errors, but the Sun Cluster node will stay up.

You can set up the mirror to be a bootable root disk so that if the primary boot disk fails, you can boot from the mirror.
With a mirrored root, it is possible for the primary root disk to fail and work to continue on the secondary (mirror) root disk.

At a later point the primary root disk might return to service (perhaps after a power cycle or transient I/O errors) and subsequent boots are performed using the primary root disk specified in the OpenBoot^(TM) PROM boot-device field. Note that a Solstice DiskSuite resync has not occurred--that requires a manual step when the drive is returned to service.

In this situation there was no manual repair task--the drive simply started working "well enough" to boot.

If there were changes to any files on the secondary (mirror) root device, they would not be reflected on the primary root device during boot time (causing a stale submirror). For example, changes to /etc/system would be lost. It is possible that some Solstice DiskSuite administrative commands changed /etc/system while the primary root device was out of service.

The boot program does not know whether it is booting from a mirror or an underlying physical device, and the mirroring becomes active part way through the boot process (after the metadevices are loaded). Before this point the system is vulnerable to stale submirror problems.

Upgrading to later versions of the Solaris environment while using volume management software to mirror root requires steps not currently outlined in the Solaris documentation. The current Solaris upgrade is incompatible with the volume manager software used by Sun Cluster. Consequently, a root mirror must be converted to a one-way mirror before running the Solaris upgrade. Additionally, all three supported volume managers require that other tasks be performed to successfully upgrade Solaris. Refer to the appropriate volume management documentation for more information.

Solstice DiskSuite Mirroring Alternatives

Consider the following alternatives when deciding whether to mirror root (/) file systems under Solstice DiskSuite. The issues mentioned in this section are not applicable to VERITAS Volume Manager configurations.

For highest availability, mirror root on a separate controller with metadevice state database replicas on three different controllers. This tolerates both disk and controller failures.
Under Solstice DiskSuite, use one of the following methods to tolerate disk media failures:
- Mirror the root disk on a second controller and keep a copy of the metadevice state database on a third disk on one of the controllers.
- Mirror the root disk on the same controller and keep a copy of the metadevice state database on a third disk on the same controller.
  
  It is possible to reboot the system before performing maintenance in these configurations, because a quorum is maintained after a disk media failure. These configurations do not tolerate controller failures, with the exception that option `a' tolerates controller failure of the controller that contains metadevice state database replicas on a single disk.
  
  If the controller that contains replicas on two disks fails, quorum is lost.
Mirroring the root disk on the same controller and storing metadevice state database replicas on both disks tolerates a disk media failure and prevents an immediate takeover. However, you cannot reboot the machine until after maintenance is performed because more than half of the metadevice state database replicas are not available after the failure.
Do not mirror the root disk, but perform a daily manual backup of the root disk (with dd(1) or some other utility) to a second disk which can be used for booting if the root disk fails. Configure the second disk as an alternate boot device in the OpenBoot PROM. The /etc/vfstab file might need to be updated after the dd(1) operation to reflect the different root partition. Configure additional metadevice state database replicas on Slice 4 of the second disk. In the event of failure of the first disk, these will continue to point to the multihost disk replicas. Do not copy and restore the metadevice state database. Rather, let Solstice DiskSuite do the replication.

Configuration Rules for Improved Reliability

Mirroring Guidelines

Mirroring Root (/)

Solstice DiskSuite Mirroring Alternatives

Mirroring Root (`/`)