Sun Cluster 3.0 Installation Guide

Planning Volume Management

This section provides guidelines for planning volume management of your cluster configuration.

Sun Cluster uses volume manager software to group disks into disk device groups that can then be administered as one unit. Sun Cluster supports Solstice DiskSuite software and VERITAS Volume Manager (VxVM). You can use only one volume manager within a single cluster configuration. Refer to your volume manager documentation and to either Appendix A, Configuring Solstice DiskSuite Software or Appendix B, Configuring VERITAS Volume Manager for instructions on configuring the volume manager software. For more information about volume management in a cluster configuration, refer to Sun Cluster 3.0 Concepts.

Add this planning information to the "Disk Device Group Configurations Worksheet" and the "Volume Manager Configurations Worksheet" in Sun Cluster 3.0 Release Notes, and to the "Metadevices Worksheet (Solstice DiskSuite)" in Sun Cluster 3.0 Release Notes, if applicable.

Guidelines for Volume Manager software

Consider the following general guidelines when configuring your disks.

Mirrored multihost disks - You must mirror all multihost disks across disk expansion units. See "Mirroring Multihost Disks" for guidelines on mirroring multihost disks.
Mirrored root - Mirroring the root disk ensures high availability, but such mirroring is not required. See "Mirroring Guidelines" for guidelines on deciding whether to mirror the root disk.
Unique naming - On any cluster node, if a local Solstice DiskSuite metadevice or VxVM volume is used as the device on which the /global/.devices/node@nodeid file system is mounted, the name of that metadevice or volume must be unique throughout the cluster.
Node lists - To ensure high availability of a disk device group, make its node lists of potential masters and its failback policy identical to any associated resource group. Or, if a scalable resource group uses more nodes than its associated disk device group, make the scalable resource group's node list a superset of the disk device group's node list. Refer to the resource group planning information in Sun Cluster 3.0 Data Services Installation and Configuration Guide for information about node lists.
Multiported disks - You must connect, or port, all disks used to construct a device group within the cluster to all of the nodes configured in the node list for that device group. Solstice DiskSuite software is able to automatically check for this at the time that disks are added to a diskset. However, configured VxVM disk groups do not have an association to any particular set of nodes. In addition, when you use the clustering software to register Solstice DiskSuite disksets, VxVM disk groups, or individual sets of global devices as global device groups, you can perform only limited connectivity checking.
Hot spare disks - You can use hot spare disks to increase availability, but they are not required.

Refer to your volume manager documentation for disk layout recommendations and any additional restrictions.

Guidelines for Solstice DiskSuite

Consider the following points when planning Solstice DiskSuite configurations.

Mediators - Each diskset configured with exactly two disk strings and mastered by exactly two nodes must have Solstice DiskSuite mediators configured for the diskset. A disk string consists of a disk enclosure, its physical disks, cables from the enclosure to the node(s), and the interface adapter cards. You must configure each diskset with exactly two nodes acting as mediator hosts. You must use the same two nodes for all disksets requiring mediators and those two nodes must master those disksets. Mediators cannot be configured for disksets that do not meet the two-string and two-host requirements. See the mediator(7) man page for details.
/kernel/drv/md.conf settings - All metadevices used by each diskset are created in advance, at reconfiguration boot time, based on configuration parameters found in the /kernel/drv/md.conf file. The fields in the md.conf file are described in the Solstice DiskSuite documentation. You must modify the nmd and md_nsets fields as follows to support a Sun Cluster configuration.
- nmd - The nmd field defines the number of metadevices created for each diskset. You must set the value of nmd to the predicted largest number of metadevices used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices in its first 15 disksets, but 1000 metadevices in the 16th diskset, you must set the value of nmd to at least 1000. The maximum number of metadevices allowed per diskset is 8192.
- md_nsets - The md_nsets field defines the total number of disksets that can be created for a system to meet the needs of the entire cluster. You must set the value of md_nsets to the expected number of disksets in the cluster, plus one to allow Solstice DiskSuite software to manage the private disks on the local host (that is, those metadevices that are not in the local diskset). The maximum number of disksets allowed per cluster is 32.
Set these fields at installation time to allow for all predicted future expansion of the cluster. Increasing these values after the cluster is in production is time consuming because it requires a reconfiguration reboot for each node. Raising these values later also increases the possibility of inadequate space allocation in the root (/) file system to create all of the requested devices.

Caution -
All cluster nodes must have identical /kernel/drv/md.conf files, regardless of the number of disksets served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite errors and possible loss of data.

Guidelines for VERITAS Volume Manager

Consider the following points when planning VERITAS Volume Manager (VxVM) configurations.

Root disk group - You must create a default root disk group (rootdg) on each node. The rootdg disk group can be created on the following disks.
- The root disk, which must be encapsulated
- One or more local non-root disks, which can be encapsulated or initialized
- A combination of root and local non-root disks
The rootdg disk group must be local to the node.
Encapsulation - Disks to be encapsulated must have two disk-slice table entries free.
Number of volumes - Estimate the maximum number of volumes any given disk device group will use at the time the disk device group is created.
- If the number of volumes is less than 1000, you can use default minor numbering.
- If the number of volumes is 1000 or greater, you must carefully plan the way in which minor numbers are assigned to disk device group volumes. No two disk device groups can have overlapping minor number assignments.
Dirty Region Logging - Using Dirty Region Logging (DRL) is highly recommended but not required. Using DRL decreases volume recovery time after a node failure. Using DRL might decrease I/O throughput.

File-System Logging

Logging is required for cluster file systems. Sun Cluster supports the following logging file systems.

Solstice DiskSuite trans-metadevice UNIX file system (UFS) logging
Solaris UFS logging

For information about Solstice DiskSuite trans-metadevice UFS logging, refer to your Solstice DiskSuite documentation. For information about Solaris UFS logging, refer to the mount_ufs(1M) man page and Solaris Transition Guide.

The following table lists the logging file systems supported by each volume manager.

Table 1-4 Supported File-System Logging Matrix


Volume Manager	Supported File-System Logging
Solstice DiskSuite	Solstice DiskSuite trans-metadevice UFS logging, Solaris UFS logging
VERITAS Volume Manager	Solaris UFS logging

Consider the following points when choosing between Solaris UFS logging and Solstice DiskSuite trans-metadevice UFS logging for your Solstice DiskSuite volume manager.

Solaris UFS log size - Solaris UFS logging always allocates the log using free space on the UFS file system, and depending on the size of the file system.
- On file systems less than 1 Gbyte, the log occupies 1 Mbyte.
- On file systems 1 Gbyte or greater, the log occupies 1 Mbyte per Gbyte on the file system, up to a maximum of 64 Mbytes.
Log metadevice - With Solstice DiskSuite trans-metadevice UFS logging, the trans device used for logging creates a metadevice. The log is yet another metadevice that you can mirror and stripe. Furthermore, you can create a maximum 1-Tbyte logging file system with the Solstice DiskSuite software.

Mirroring Guidelines

This section provides guidelines for planning the mirroring of your cluster configuration.

Mirroring Multihost Disks

Mirroring all multihost disks in a Sun Cluster configuration enables the configuration to tolerate single-disk failures. Sun Cluster software requires that you mirror all multihost disks across disk expansion units.

Consider the following points when mirroring multihost disks.

Separate disk expansion units - Each submirror of a given mirror or plex should reside in a different multihost disk expansion unit.
Disk space - Mirroring doubles the amount of necessary disk space.
Three-way mirroring - Solstice DiskSuite software and VERITAS Volume Manager (VxVM) support three-way mirroring. However, Sun Cluster requires only two-way mirroring.
Number of metadevices - Under Solstice DiskSuite software, mirrors consist of other metadevices such as concatenations or stripes. Large configurations might contain a large number of metadevices. For example, seven metadevices are created for each logging UFS file system.
Differing disk sizes - If you mirror to a disk of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.

For more information about multihost disks, refer to Sun Cluster 3.0 Concepts.

Mirroring the Root Disk

For maximum availability, you should mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VxVM, you encapsulate the root disk and mirror the generated subdisks. However, mirroring the root disk is not a requirement of Sun Cluster.

Before deciding whether to mirror the root disk, consider the risks, complexity, cost, and service time for the various alternatives concerning the root disk. There is no single mirroring strategy that works for all configurations. You might want to consider your local Enterprise Services representative's preferred solution when deciding whether to mirror root.

Refer to your volume manager documentation and to either Appendix A, Configuring Solstice DiskSuite Software or Appendix B, Configuring VERITAS Volume Manager for instructions on mirroring the root disk.

Consider the following issues when deciding whether to mirror the root disk.

Complexity - Mirroring the root disk adds complexity to system administration and complicates booting in single-user mode.
Backups - Regardless of whether or not you mirror the root disk, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors. Only a backup plan enables you to restore files that have been accidentally altered or deleted.
Quorum - Under Solstice DiskSuite software, in failure scenarios in which metadevice state database quorum is lost, you cannot reboot the system until maintenance is performed. Refer to the Solstice DiskSuite documentation for information about the metadevice state database and state database replicas.
Separate controllers - Highest availability includes mirroring the root disk on a separate controller.
Boot disk - You can set up the mirror to be a bootable root disk so that you can boot from the mirror if the primary boot disk fails.
Secondary root disk - With a mirrored root disk, the primary root disk can fail but work can continue on the secondary (mirror) root disk. At a later point, the primary root disk might return to service (perhaps after a power cycle or transient I/O errors) and subsequent boots are performed by using the primary root disk specified in the OpenBoot^TM PROM boot-device field. In this situation no manual repair task occurs, but the drive starts working well enough to boot. Note that a Solstice DiskSuite resync does occur. A resync requires a manual step when the drive is returned to service.

If changes were made to any files on the secondary (mirror) root disk, they would not be reflected on the primary root disk during boot time (causing a stale submirror). For example, changes to the /etc/system file would be lost. Some Solstice DiskSuite administrative commands might have changed the /etc/system file while the primary root disk was out of service.

The boot program does not check whether it is booting from a mirror or an underlying physical device, and the mirroring becomes active partway through the boot process (after the metadevices are loaded). Before this point, the system is vulnerable to stale submirror problems.