Add this planning information to the "Disk Device Group Configurations Worksheet" and the "Volume Manager Configurations Worksheet" in the Sun Cluster 3.0 U1 Release Notes. For Solstice DiskSuite, also add this planning information to the "Metadevices Worksheet (Solstice DiskSuite)."
This section provides guidelines for planning volume management of your cluster configuration.
Sun Cluster uses volume manager software to group disks into disk device groups that can then be administered as one unit. Sun Cluster supports Solstice DiskSuite software and VERITAS Volume Manager (VxVM).
If you use Solstice DiskSuite software, you must install it on all nodes of the cluster, regardless of whether you use VxVM on some nodes to manage disks.
If you use VxVM and enable the VxVM cluster feature, you must install and license VxVM on all nodes of the cluster.
If you use VxVM and do not enable the VxVM cluster feature, you only need to install and license VxVM on the nodes which are attached to storage devices which VxVM will manage.
If you install both Solstice DiskSuite software and VxVM on a node, you must use Solstice DiskSuite software to manage disks local to each node (such as the root disk) and you must use VxVM to manage all shared disks.
See your volume manager documentation and "Installing and Configuring Solstice DiskSuite Software" or "Installing and Configuring VxVM Software" for instructions on how to install and configure the volume manager software. For more information about volume management in a cluster configuration, see Sun Cluster 3.0 U1 Concepts.
Consider the following general guidelines when configuring your disks.
Mirrored multihost disks - You must mirror all multihost disks across disk expansion units. See "Mirroring Multihost Disks" for guidelines on mirroring multihost disks.
Mirrored root - Mirroring the root disk ensures high availability, but such mirroring is not required. See "Mirroring Guidelines" for guidelines on deciding whether to mirror the root disk.
Unique naming - On any cluster node, if a local Solstice DiskSuite metadevice or VxVM volume is used as the device on which the /global/.devices/node@nodeid file system is mounted, the name of that metadevice or volume must be unique throughout the cluster.
Node lists - To ensure high availability of a disk device group, make its node lists of potential masters and its failback policy identical to any associated resource group. Or, if a scalable resource group uses more nodes than its associated disk device group, make the scalable resource group's node list a superset of the disk device group's node list. See the resource group planning information in the Sun Cluster 3.0 U1 Data Services Installation and Configuration Guide for information about node lists.
Multiported disks - You must connect, or port, all disks used to construct a device group within the cluster to all of the nodes configured in the node list for that device group. Solstice DiskSuite software is able to automatically check for this at the time that disks are added to a diskset. However, configured VxVM disk groups do not have an association to any particular set of nodes. In addition, when you use the clustering software to register Solstice DiskSuite disksets, VxVM disk groups, or individual sets of global devices as global device groups, you can perform only limited connectivity checking.
Hot spare disks - You can use hot spare disks to increase availability, but they are not required.
See your volume manager documentation for disk layout recommendations and any additional restrictions.
Consider the following points when you plan Solstice DiskSuite configurations.
Local metadevice names - Each local metadevice name must be unique throughout the cluster and cannot be the same as any device ID (DID) name.
Mediators - Each diskset configured with exactly two disk strings and mastered by exactly two nodes must have Solstice DiskSuite mediators configured for the diskset. A disk string consists of a disk enclosure, its physical disks, cables from the enclosure to the node(s), and the interface adapter cards. You must configure each diskset with exactly two nodes acting as mediator hosts. You must use the same two nodes for all disksets requiring mediators and those two nodes must master those disksets. Mediators cannot be configured for disksets that do not meet the two-string and two-host requirements. See the mediator(7) man page for details.
/kernel/drv/md.conf settings - All metadevices used by each diskset are created in advance, at reconfiguration boot time, based on configuration parameters found in the /kernel/drv/md.conf file. The fields in the md.conf file are described in the Solstice DiskSuite documentation. You must modify the nmd and md_nsets fields as follows to support a Sun Cluster configuration.
nmd - The nmd field defines the number of metadevices created for each diskset. You must set the value of nmd to the predicted largest number of metadevices used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices in its first 15 disksets, but 1000 metadevices in the 16th diskset, you must set the value of nmd to at least 1000. Also, the value of nmd must be large enough to ensure that there are enough numbers for each DID name and each local metadevice name to be unique throughout the cluster. The maximum number of metadevices allowed per diskset is 8192. The default number of metadevices per diskset is 128.
md_nsets - The md_nsets field defines the total number of disksets that can be created for a system to meet the needs of the entire cluster. You must set the value of md_nsets to the expected number of disksets in the cluster, plus one to allow Solstice DiskSuite software to manage the private disks on the local host (that is, those metadevices that are not in the local diskset). The maximum number of disksets allowed per cluster is 32. The default number of disksets is 4.
Set these fields at installation time to allow for all predicted future expansion of the cluster. Increasing these values after the cluster is in production is time consuming because it requires a reconfiguration reboot for each node. Raising these values later also increases the possibility of inadequate space allocation in the root (/) file system to create all of the requested devices.
All cluster nodes must have identical /kernel/drv/md.conf files, regardless of the number of disksets served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite errors and possible loss of data.
Consider the following points when you plan VERITAS Volume Manager (VxVM) configurations.
Root disk group - You must create a default root disk group (rootdg) on each node. The rootdg disk group can be created on the following disks.
The root disk, which must be encapsulated
One or more local non-root disks, which can be encapsulated or initialized
A combination of root and local non-root disks
The rootdg disk group must be local to the node.
Encapsulation - Disks to be encapsulated must have two disk-slice table entries free.
Number of volumes - Estimate the maximum number of volumes any given disk device group will use at the time the disk device group is created.
If the number of volumes is less than 1000, you can use default minor numbering.
If the number of volumes is 1000 or greater, you must carefully plan the way in which minor numbers are assigned to disk device group volumes. No two disk device groups can have overlapping minor number assignments.
Dirty Region Logging - Using Dirty Region Logging (DRL) is highly recommended but not required. Using DRL decreases volume recovery time after a node failure. Using DRL might decrease I/O throughput.
Logging is required for cluster file systems. Sun Cluster supports the following logging file systems.
Solaris UFS logging
Solstice DiskSuite trans-metadevice UNIX file system (UFS) logging
For information about Solstice DiskSuite trans metadevice UFS logging, see your Solstice DiskSuite documentation. For information about Solaris UFS logging, see the mount_ufs(1M) man page. provided
The following table lists the logging file systems supported by each volume manager.
Table 1-4 Supported File System Logging Matrix
Volume Manager |
Supported File System Logging |
---|---|
Solstice DiskSuite |
Solaris UFS loggingSolstice DiskSuite trans metadevice UFS logging, |
VERITAS Volume Manager |
Solaris UFS logging |
Consider the following points when choosing between Solaris UFS logging and Solstice DiskSuite trans metadevice UFS logging for your Solstice DiskSuite volume manager.
Solaris UFS log size - Solaris UFS logging always allocates the log using free space on the UFS file system, and depending on the size of the file system.
On file systems less than 1 Gbyte, the log occupies 1 Mbyte.
On file systems 1 Gbyte or greater, the log occupies 1 Mbyte per Gbyte on the file system, up to a maximum of 64 Mbytes.
Log metadevice - A Solstice DiskSuite trans metadevice manages UFS logging. The logging device component of a trans metadevice is a metadevice that you can mirror and stripe. You can create a maximum 1-Gbyte log size, although 64 Mbytes is sufficient for most file systems. The minimum log size is 1 Mbyte. See your Solstice DiskSuite documentation for information about logging with trans metadevices.
This section provides guidelines for planning the mirroring of your cluster configuration.
Mirroring all multihost disks in a Sun Cluster configuration enables the configuration to tolerate single-disk failures. Sun Cluster software requires that you mirror all multihost disks across disk expansion units.
Consider the following points when mirroring multihost disks.
Separate disk expansion units - Each submirror of a given mirror or plex should reside in a different multihost disk expansion unit.
Disk space - Mirroring doubles the amount of necessary disk space.
Three-way mirroring - Solstice DiskSuite software and VERITAS Volume Manager (VxVM) support three-way mirroring. However, Sun Cluster requires only two-way mirroring.
Number of metadevices - Under Solstice DiskSuite software, mirrors consist of other metadevices such as concatenations or stripes. Large configurations might contain a large number of metadevices. For example, seven metadevices are created for each logging UFS file system.
Differing disk sizes - If you mirror to a disk of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.
For more information about multihost disks, see Sun Cluster 3.0 U1 Concepts.
Add this planning information to the "Local File System Layout Worksheet" in the Sun Cluster 3.0 U1 Release Notes.
For maximum availability, you should mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VxVM, you encapsulate the root disk and mirror the generated subdisks. However, mirroring the root disk is not a requirement of Sun Cluster.
Before deciding whether to mirror the root disk, consider the risks, complexity, cost, and service time for the various alternatives concerning the root disk. There is no single mirroring strategy that works for all configurations. You might want to consider your local Enterprise Services representative's preferred solution when deciding whether to mirror root.
See your volume manager documentation and "Installing and Configuring Solstice DiskSuite Software" or "Installing and Configuring VxVM Software" for instructions on mirroring the root disk.
Consider the following issues and guidelines when deciding whether to mirror the root disk.
Complexity - Mirroring the root disk adds complexity to system administration and complicates booting in single-user mode.
Backups - Regardless of whether or not you mirror the root disk, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors. Only a backup plan enables you to restore files that have been accidentally altered or deleted.
Quorum devices - Do not use a disk configured as a quorum device to mirror a root disk.
Quorum - Under Solstice DiskSuite software, in failure scenarios in which metadevice state database quorum is lost, you cannot reboot the system until maintenance is performed. See the Solstice DiskSuite documentation for information about the metadevice state database and state database replicas.
Separate controllers - Highest availability includes mirroring the root disk on a separate controller.
Boot disk - You can set up the mirror to be a bootable root disk so that you can boot from the mirror if the primary boot disk fails.
Secondary root disk - With a mirrored root disk, the primary root disk can fail but work can continue on the secondary (mirror) root disk. At a later point, the primary root disk might return to service (perhaps after a power cycle or transient I/O errors) and subsequent boots are performed by using the primary root disk specified in the OpenBootTM PROM boot-device field. In this situation no manual repair task occurs, but the drive starts working well enough to boot. Note that a Solstice DiskSuite resync does occur. A resync requires a manual step when the drive is returned to service.
If changes were made to any files on the secondary (mirror) root disk, they would not be reflected on the primary root disk during boot time (causing a stale submirror). For example, changes to the /etc/system file would be lost. Some Solstice DiskSuite administrative commands might have changed the /etc/system file while the primary root disk was out of service.
The boot program does not check whether it is booting from a mirror or an underlying physical device, and the mirroring becomes active partway through the boot process (after the metadevices are loaded). Before this point, the system is vulnerable to stale submirror problems.