Sun Cluster Software Installation Guide for Solaris OS

Planning Volume Management

Add this planning information to the Disk Device Group Configurations Worksheet and the Volume-Manager Configurations Worksheet. For Solstice DiskSuite/Solaris Volume Manager, also add this planning information to the Metadevices Worksheet (Solstice DiskSuite/Solaris Volume Manager).

This section provides the following guidelines for planning volume management of your cluster configuration:

Sun Cluster software uses volume-manager software to group disks into disk device groups which can then be administered as one unit. Sun Cluster software supports Solstice DiskSuite/Solaris Volume Manager software and VERITAS Volume Manager (VxVM) software that you install or use in the following ways.

Table 1–5 Supported Use of Volume Managers with Sun Cluster Software


Volume-Manager Software	Requirements
Solstice DiskSuite/Solaris Volume Manager	You must install Solstice DiskSuite/Solaris Volume Manager software on all nodes of the cluster, regardless of whether you use VxVM on some nodes to manage disks.
SPARC: VxVM with the cluster feature	You must install and license VxVM with the cluster feature on all nodes of the cluster.
SPARC: VxVM without the cluster feature	You are only required to install and license VxVM on those nodes that are attached to storage devices which VxVM manages.
SPARC: Both Solstice DiskSuite/Solaris Volume Manager and VxVM	If you install both volume managers on the same node, you must use Solstice DiskSuite/Solaris Volume Manager software to manage disks that are local to each node. Local disks include the root disk. Use VxVM to manage all shared disks.

See your volume-manager documentation and Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or SPARC: Installing and Configuring VxVM Software for instructions on how to install and configure the volume-manager software. For more information about volume management in a cluster configuration, see the Sun Cluster Concepts Guide for Solaris OS.

Guidelines for Volume-Manager Software

Consider the following general guidelines when you configure your disks with volume-manager software:

Mirrored multihost disks – You must mirror all multihost disks across disk expansion units. See Guidelines for Mirroring Multihost Disks for guidelines on mirroring multihost disks. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Mirrored root – Mirroring the root disk ensures high availability, but such mirroring is not required. See Mirroring Guidelines for guidelines about deciding whether to mirror the root disk.
Unique naming – You might have local Solstice DiskSuite metadevices, local Solaris Volume Manager volumes, or VxVM volumes that are used as devices on which the /global/.devices/node@nodeid file systems are mounted. If so, the name of each local metadevice or local volume must be unique throughout the cluster.
Node lists – To ensure high availability of a disk device group, make its node lists of potential masters and its failback policy identical to any associated resource group. Or, if a scalable resource group uses more nodes than its associated disk device group, make the scalable resource group's node list a superset of the disk device group's node list. See the resource group planning information in the Sun Cluster Data Service Planning and Administration Guide for Solaris OS for information about node lists.
Multiported disks – You must connect, or port, all disks used to construct a device group within the cluster to all of the nodes that are configured in the node list for that device group. Solstice DiskSuite/Solaris Volume Manager software can automatically check for this connection at the time that disks are added to a diskset. However, configured VxVM disk groups do not have an association to any particular set of nodes.
Hot spare disks – You can use hot spare disks to increase availability, but hot spare disks are not required.

See your volume-manager documentation for disk layout recommendations and any additional restrictions.

Guidelines for Solstice DiskSuite/Solaris Volume Manager Software

Consider the following points when you plan Solstice DiskSuite/Solaris Volume Manager configurations:

Local metadevice names or volume names – The name of each local Solstice DiskSuite metadevice or Solaris Volume Manager volume must be unique throughout the cluster. Also, the name cannot be the same as any device-ID name.
Dual-string mediators – Each diskset configured with exactly two disk strings and mastered by exactly two nodes must have Solstice DiskSuite/Solaris Volume Manager mediators configured for the diskset. A disk string consists of a disk enclosure, its physical disks, cables from the enclosure to the node(s), and the interface adapter cards. Observe the following rules to configure dual-string mediators:
- You must configure each diskset with exactly two nodes that act as mediator hosts.
- You must use the same two nodes for all disksets that require mediators. Those two nodes must master those disksets.
- Mediators cannot be configured for disksets that do not meet the two-string and two-host requirements.
See the mediator(7D) man page for details.
/kernel/drv/md.conf settings – All Solstice DiskSuite metadevices or Solaris Volume Manager volumes used by each diskset are created in advance, at reconfiguration boot time. This reconfiguration is based on the configuration parameters that exist in the /kernel/drv/md.conf file.

Caution –
All cluster nodes must have identical /kernel/drv/md.conf files, regardless of the number of disksets that are served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite/Solaris Volume Manager errors and possible loss of data.

You must modify the nmd and md_nsets fields as follows to support a Sun Cluster configuration:
- md_nsets – The md_nsets field defines the total number of disksets that can be created for a system to meet the needs of the entire cluster. Set the value of md_nsets to the expected number of disksets in the cluster plus one additional diskset. Solstice DiskSuite/Solaris Volume Manager software uses the additional diskset to manage the private disks on the local host. The private disks are those metadevices or volumes that are not in the local diskset.
  
  The maximum number of disksets that are allowed per cluster is 32. This number allows for 31 disksets for general use plus one diskset for private disk management. The default value of md_nsets is 4.
- nmd – The nmd field defines the number of metadevices or volumes that are created for each diskset. Set the value of nmd to the predicted highest value of metadevice or volume name that is used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices or volumes in its first 15 disksets, but 1000 metadevices or volumes in the 16th diskset, set the value of nmd to at least 1000. Also, the value of nmd must be large enough to ensure that enough numbers exist for each device–ID name. The number must also be large enough to ensure that each local metadevice name or local volume name can be unique throughout the cluster.
  
  The highest allowed value of a metadevice or volume name per diskset is 8192. The default value of nmd is 128.
Set these fields at installation time to allow for all predicted future expansion of the cluster. To increase the value of these fields after the cluster is in production is time consuming. The value change requires a reconfiguration reboot for each node. To raise these values later also increases the possibility of inadequate space allocation in the root (/) file system to create all of the requested devices.

At the same time, keep the value of the nmdfield and the md_nsets field as low as possible. Memory structures exist for all possible devices as determined by nmdand md_nsets, even if you have not created those devices. For optimal performance, keep the value of nmd and md_nsets only slightly higher than the number of metadevices or volumes you plan to use.

See “System and Startup Files” in Solstice DiskSuite 4.2.1 Reference Guide or “System Files and Startup Files” in Solaris Volume Manager Administration Guide for more information about the md.conf file.

SPARC: Guidelines for VERITAS Volume Manager Software

Consider the following points when you plan VERITAS Volume Manager (VxVM) configurations.

Enclosure-Based Naming – Enclosure-Based Naming is a feature that was introduced in VxVM version 3.2. If you use Enclosure-Based Naming of devices, ensure that you use consistent device names on all cluster nodes that share the same storage. VxVM does not coordinate these names, so the administrator must ensure that VxVM assigns the same names to the same devices from different nodes. Failure to assign consistent names does not interfere with correct cluster behavior. However, inconsistent names greatly complicate cluster administration and greatly increase the possibility of configuration errors, potentially leading to loss of data.
Root disk group – You must create a default root disk group on each node. The root disk group can be created on the following disks:
- The root disk, which must be encapsulated
- One or more local nonroot disks, which you can encapsulate or initialize
- A combination of root and local nonroot disks
The root disk group must be local to the node.
Encapsulation – Disks to be encapsulated must have two disk-slice table entries free.
Number of volumes – Estimate the maximum number of volumes any given disk device group can use at the time the disk device group is created.
- If the number of volumes is less than 1000, you can use default minor numbering.
- If the number of volumes is 1000 or greater, you must carefully plan the way in which minor numbers are assigned to disk device group volumes. No two disk device groups can have overlapping minor number assignments.
Dirty Region Logging – Using Dirty Region Logging (DRL) decreases volume recovery time after a node failure. Using DRL might decrease I/O throughput.
Dynamic Multipathing (DMP) –

The use of DMP alone to manage multiple I/O paths per node to the shared storage is not supported. The use of DMP is supported only in the following configurations:
- A single I/O path per node to the cluster's shared storage.
- A supported multipathing solution, such as Sun Traffic Manager, EMC PowerPath, or Hitachi HDLM, that manages multiple I/O paths per node to the shared cluster storage.

File-System Logging

Logging is required for cluster file systems. Sun Cluster software supports the following choices of file-system logging:

Solaris UFS logging – See the mount_ufs(1M) man page for more information.
Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging – See “Creating DiskSuite Objects” in Solstice DiskSuite 4.2.1 User's Guide or “Transactional Volumes (Overview)” in Solaris Volume Manager Administration Guide for more information.
SPARC: VERITAS File System (VxFS) logging – See the mount_vxfs man page provided with VxFS software for more information.

The following table lists the file-system logging supported by each volume manager.

Table 1–6 Supported File System Logging Matrix


Volume Manager	Supported File System Logging
Solstice DiskSuite/Solaris Volume Manager	Solaris UFS logging, Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging, VxFS logging
SPARC: VERITAS Volume Manager	Solaris UFS logging, VxFS logging

Consider the following points when you choose between Solaris UFS logging and Solstice DiskSuite trans-metadevice logging/Solaris Volume Manager transactional-volume logging:

Solaris Volume Manager transactional-volume logging (formerly Solstice DiskSuite trans-metadevice logging) is scheduled to be removed from the Solaris operating environment in an upcoming Solaris release. Solaris UFS logging provides the same capabilities but superior performance, as well as lower system administration requirements and overhead.
Solaris UFS log size – Solaris UFS logging always allocates the log by using free space on the UFS file system, and depending on the size of the file system.
- On file systems less than 1 Gbyte, the log occupies 1 Mbyte.
- On file systems 1 Gbyte or greater, the log occupies 1 Mbyte per Gbyte on the file system, to a maximum of 64 Mbytes.
Log metadevice/transactional volume – A Solstice DiskSuite trans metadevice or Solaris Volume Manager transactional volume manages UFS logging. The logging device component of a trans metadevice or transactional volume is a metadevice or volume that you can mirror and stripe. You can create a maximum 1-Gbyte log size, although 64 Mbytes is sufficient for most file systems. The minimum log size is 1 Mbyte.

Mirroring Guidelines

This section provides the following guidelines for planning the mirroring of your cluster configuration:

Guidelines for Mirroring Multihost Disks

To mirror all multihost disks in a Sun Cluster configuration enables the configuration to tolerate single-disk failures. Sun Cluster software requires that you mirror all multihost disks across disk expansion units. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.

Consider the following points when you mirror multihost disks.

Separate disk expansion units – Each submirror of a given mirror or plex should reside in a different multihost disk-expansion unit.
Disk space – Mirroring doubles the amount of necessary disk space.
Three-way mirroring – Solstice DiskSuite/Solaris Volume Manager software and VERITAS Volume Manager (VxVM) support three-way mirroring. However, Sun Cluster software requires only two-way mirroring.
Number of metadevices or volumes – Under Solstice DiskSuite/Solaris Volume Manager software, mirrors consist of other Solstice DiskSuite metadevices or Solaris Volume Manager volumes such as concatenations or stripes. Large configurations might contain a large number of metadevices or volumes.
Differing disk sizes – If you mirror to a disk of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.

For more information about multihost disks, see “Multihost Disk Storage” in Sun Cluster Overview for Solaris OS and Sun Cluster Concepts Guide for Solaris OS.

Guidelines for Mirroring the Root Disk

Add this planning information to the Local File System Layout Worksheet.

For maximum availability, mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VxVM, you encapsulate the root disk and mirror the generated subdisks. However, Sun Cluster software does not require that you mirror the root disk.

Before you decide whether to mirror the root disk, consider the risks, complexity, cost, and service time for the various alternatives that concern the root disk. No single mirroring strategy works for all configurations. You might want to consider your local Sun service representative's preferred solution when you decide whether to mirror root.

Consider the following points when you decide whether to mirror the root disk.

Boot disk – You can set up the mirror to be a bootable root disk. You can then boot from the mirror if the primary boot disk fails.
Complexity – To mirror the root disk adds complexity to system administration. To mirror the root disk also complicates booting in single-user mode.
Backups – Regardless of whether you mirror the root disk, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors. Only a backup plan enables you to restore files that have been accidentally altered or deleted.
Quorum devices – Do not use a disk that was configured as a quorum device to mirror a root disk.
Quorum – Under Solstice DiskSuite/Solaris Volume Manager software, in failure scenarios in which state database quorum is lost, you cannot reboot the system until maintenance is performed. See your Solstice DiskSuite/Solaris Volume Manager documentation for information about the state database and state database replicas.
Separate controllers – Highest availability includes mirroring the root disk on a separate controller.
Secondary root disk – With a mirrored root disk, the primary root disk can fail but work can continue on the secondary (mirror) root disk. Later, the primary root disk might return to service, for example, after a power cycle or transient I/O errors. Subsequent boots are then performed by using the primary root disk that is specified for the eeprom(1M) boot-device parameter. In this situation, no manual repair task occurs, but the drive starts working well enough to boot. With Solstice DiskSuite/Solaris Volume Manager, a resync does occur. A resync requires a manual step when the drive is returned to service.

If changes were made to any files on the secondary (mirror) root disk, they would not be reflected on the primary root disk during boot time. This condition would cause a stale submirror. For example, changes to the /etc/system file would be lost. With Solstice DiskSuite/Solaris Volume Manager, some administrative commands might have changed the /etc/system file while the primary root disk was out of service.

The boot program does not check whether the system is booting from a mirror or from an underlying physical device. The mirroring becomes active partway through the boot process, after the metadevices or volumes are loaded. Before this point, the system is therefore vulnerable to stale submirror problems.