Add this planning information to the Disk Device Group Configurations Worksheet and the Volume-Manager Configurations Worksheet. For Solstice DiskSuite or Solaris Volume Manager, also add this planning information to the Metadevices Worksheet (Solstice DiskSuite or Solaris Volume Manager).
This section provides the following guidelines for planning volume management of your cluster configuration:
Sun Cluster software uses volume-manager software to group disks into disk device groups which can then be administered as one unit. Sun Cluster software supports Solstice DiskSuite or Solaris Volume Manager software and VERITAS Volume Manager (VxVM) software that you install or use in the following ways.
Table 1–4 Supported Use of Volume Managers With Sun Cluster Software
Volume-Manager Software |
Requirements |
---|---|
Solstice DiskSuite or Solaris Volume Manager |
You must install Solstice DiskSuite or Solaris Volume Manager software on all nodes of the cluster, regardless of whether you use VxVM on some nodes to manage disks. |
You must install and license VxVM with the cluster feature on all nodes of the cluster. |
|
SPARC: VxVM without the cluster feature |
You are only required to install and license VxVM on those nodes that are attached to storage devices which VxVM manages. |
SPARC: Both Solstice DiskSuite or Solaris Volume Manager and VxVM |
If you install both volume managers on the same node, you must use Solstice DiskSuite or Solaris Volume Manager software to manage disks that are local to each node. Local disks include the root disk. Use VxVM to manage all shared disks. |
See your volume-manager documentation and Installing and Configuring Solstice DiskSuite or Solaris Volume Manager Software or SPARC: Installing and Configuring VxVM Software for instructions about how to install and configure the volume-manager software. For more information about volume management in a cluster configuration, see the Sun Cluster Concepts Guide for Solaris OS.
Consider the following general guidelines when you configure your disks with volume-manager software:
Software RAID – Sun Cluster software does not support software RAID 5.
Mirrored multihost disks – You must mirror all multihost disks across disk expansion units. See Guidelines for Mirroring Multihost Disks for guidelines on mirroring multihost disks. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to devices.
Mirrored root – Mirroring the root disk ensures high availability, but such mirroring is not required. See Mirroring Guidelines for guidelines about deciding whether to mirror the root disk.
Unique naming – You might have local Solstice DiskSuite metadevices, local Solaris Volume Manager volumes, or VxVM volumes that are used as devices on which the /global/.devices/node@nodeid file systems are mounted. If so, the name of each local metadevice or local volume on which a /global/.devices/node@nodeid file system is to be mounted must be unique throughout the cluster.
Node lists – To ensure high availability of a disk device group, make its node lists of potential masters and its failback policy identical to any associated resource group. Or, if a scalable resource group uses more nodes than its associated disk device group, make the scalable resource group's node list a superset of the disk device group's node list. See the resource group planning information in the Sun Cluster Data Services Planning and Administration Guide for Solaris OS for information about node lists.
Multihost disks – You must connect, or port, all devices that are used to construct a device group to all of the nodes that are configured in the node list for that device group. Solstice DiskSuite or Solaris Volume Manager software can automatically check for this connection at the time that devices are added to a disk set. However, configured VxVM disk groups do not have an association to any particular set of nodes.
Hot spare disks – You can use hot spare disks to increase availability, but hot spare disks are not required.
See your volume-manager documentation for disk layout recommendations and any additional restrictions.
Consider the following points when you plan Solstice DiskSuite or Solaris Volume Manager configurations:
Local metadevice names or volume names – The name of each local Solstice DiskSuite metadevice or Solaris Volume Manager volume on which a global–devices file system, /global/.devices/node@nodeid, is mounted must be unique throughout the cluster. Also, the name cannot be the same as any device-ID name.
Dual-string mediators – Each disk set configured with exactly two disk strings and mastered by exactly two nodes must have Solstice DiskSuite or Solaris Volume Manager mediators configured for the disk set. A disk string consists of a disk enclosure, its physical disks, cables from the enclosure to the node(s), and the interface adapter cards. Observe the following rules to configure dual-string mediators:
You must configure each disk set with exactly two nodes that act as mediator hosts.
You must use the same two nodes for all disk sets that require mediators. Those two nodes must master those disk sets.
Mediators cannot be configured for disk sets that do not meet the two-string and two-host requirements.
See the mediator(7D) man page for details.
/kernel/drv/md.conf settings – All Solstice DiskSuite metadevices or Solaris 9 Solaris Volume Manager volumes used by each disk set are created in advance, at reconfiguration boot time. This reconfiguration is based on the configuration parameters that exist in the /kernel/drv/md.conf file.
With the Solaris 10 release, Solaris Volume Manager has been enhanced to configure volumes dynamically. You no longer need to edit the nmd and the md_nsets parameters in the /kernel/drv/md.conf file. New volumes are dynamically created, as needed.
You must modify the nmd and md_nsets fields as follows to support a Sun Cluster configuration on the Solaris 8 or Solaris 9 OS:
All cluster nodes must have identical /kernel/drv/md.conf files, regardless of the number of disk sets that are served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite or Solaris Volume Manager errors and possible loss of data.
md_nsets – The md_nsets field defines the total number of disk sets that can be created for a system to meet the needs of the entire cluster. Set the value of md_nsets to the expected number of disk sets in the cluster plus one additional disk set. Solstice DiskSuite or Solaris Volume Manager software uses the additional disk set to manage the private disks on the local host.
The maximum number of disk sets that are allowed per cluster is 32. This number allows for 31 disk sets for general use plus one disk set for private disk management. The default value of md_nsets is 4.
nmd – The nmd field defines the highest predicted value of any metadevice or volume name that will exist in the cluster. For example, if the highest value of the metadevice or volume names that are used in the first 15 disk sets of a cluster is 10, but the highest value of the metadevice or volume in the 16th disk set is 1000, set the value of nmd to at least 1000. Also, the value of nmd must be large enough to ensure that enough numbers exist for each device–ID name. The number must also be large enough to ensure that each local metadevice name or local volume name can be unique throughout the cluster.
The highest allowed value of a metadevice or volume name per disk set is 8192. The default value of nmd is 128.
Set these fields at installation time to allow for all predicted future expansion of the cluster. To increase the value of these fields after the cluster is in production is time consuming. The value change requires a reconfiguration reboot for each node. To raise these values later also increases the possibility of inadequate space allocation in the root (/) file system to create all of the requested devices.
At the same time, keep the value of the nmdfield and the md_nsets field as low as possible. Memory structures exist for all possible devices as determined by nmdand md_nsets, even if you have not created those devices. For optimal performance, keep the value of nmd and md_nsets only slightly higher than the number of metadevices or volumes you plan to use.
See System and Startup Files in Solstice DiskSuite 4.2.1 Reference Guide (Solaris 8) or System Files and Startup Files in Solaris Volume Manager Administration Guide (Solaris 9 or Solaris 10) for more information about the md.conf file.
Consider the following points when you plan VERITAS Volume Manager (VxVM) configurations.
Enclosure-Based Naming – If you use Enclosure-Based Naming of devices, ensure that you use consistent device names on all cluster nodes that share the same storage. VxVM does not coordinate these names, so the administrator must ensure that VxVM assigns the same names to the same devices from different nodes. Failure to assign consistent names does not interfere with correct cluster behavior. However, inconsistent names greatly complicate cluster administration and greatly increase the possibility of configuration errors, potentially leading to loss of data.
Root disk group – As of VxVM 4.0, the creation of a root disk group is optional.
A root disk group can be created on the following disks:
The root disk, which must be encapsulated
One or more local nonroot disks, which you can encapsulate or initialize
A combination of root and local nonroot disks
The root disk group must be local to the node.
Simple root disk groups – Simple root disk groups (rootdg created on a single slice of the root disk) are not supported as disk types with VxVM on Sun Cluster software. This is a general VxVM software restriction.
Encapsulation – Disks to be encapsulated must have two disk-slice table entries free.
Number of volumes – Estimate the maximum number of volumes any given disk device group can use at the time the disk device group is created.
If the number of volumes is less than 1000, you can use default minor numbering.
If the number of volumes is 1000 or greater, you must carefully plan the way in which minor numbers are assigned to disk device group volumes. No two disk device groups can have overlapping minor number assignments.
Dirty Region Logging – The use of Dirty Region Logging (DRL) decreases volume recovery time after a node failure. Using DRL might decrease I/O throughput.
Dynamic Multipathing (DMP) – The use of DMP alone to manage multiple I/O paths per node to the shared storage is not supported. The use of DMP is supported only in the following configurations:
A single I/O path per node to the cluster's shared storage.
A supported multipathing solution, such as Sun Traffic Manager, EMC PowerPath, or Hitachi HDLM, that manages multiple I/O paths per node to the shared cluster storage.
See your VxVM installation documentation for additional information.
Logging is required for UFS and VxFS cluster file systems. This requirement does not apply to QFS shared file systems. Sun Cluster software supports the following choices of file-system logging:
Solaris UFS logging – See the mount_ufs(1M) man page for more information.
Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging – See Chapter 2, Creating DiskSuite Objects, in Solstice DiskSuite 4.2.1 User’s Guide or Transactional Volumes (Overview) in Solaris Volume Manager Administration Guide for more information. Transactional volumes are no longer valid as of the Solaris 10 release of Solaris Volume Manager.
SPARC: VERITAS File System (VxFS) logging – See the mount_vxfs man page provided with VxFS software for more information.
The following table lists the file-system logging supported by each volume manager.
Table 1–5 Supported File System Logging Matrix
Consider the following points when you choose between Solaris UFS logging and Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging for UFS cluster file systems:
Solaris Volume Manager transactional-volume logging (formerly Solstice DiskSuite trans-metadevice logging) is scheduled to be removed from the Solaris OS in an upcoming Solaris release. Solaris UFS logging provides the same capabilities but superior performance, as well as lower system administration requirements and overhead.
Solaris UFS log size – Solaris UFS logging always allocates the log by using free space on the UFS file system, and depending on the size of the file system.
On file systems less than 1 Gbyte, the log occupies 1 Mbyte.
On file systems 1 Gbyte or greater, the log occupies 1 Mbyte per Gbyte on the file system, to a maximum of 64 Mbytes.
Log metadevice/transactional volume – A Solstice DiskSuite trans metadevice or Solaris Volume Manager transactional volume manages UFS logging. The logging device component of a trans metadevice or transactional volume is a metadevice or volume that you can mirror and stripe. You can create a maximum 1-Gbyte log size, although 64 Mbytes is sufficient for most file systems. The minimum log size is 1 Mbyte.
This section provides the following guidelines for planning the mirroring of your cluster configuration:
To mirror all multihost disks in a Sun Cluster configuration enables the configuration to tolerate single-device failures. Sun Cluster software requires that you mirror all multihost disks across expansion units. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to devices.
Consider the following points when you mirror multihost disks:
Separate disk expansion units – Each submirror of a given mirror or plex should reside in a different multihost expansion unit.
Disk space – Mirroring doubles the amount of necessary disk space.
Three-way mirroring – Solstice DiskSuite or Solaris Volume Manager software and VERITAS Volume Manager (VxVM) software support three-way mirroring. However, Sun Cluster software requires only two-way mirroring.
Differing device sizes – If you mirror to a device of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.
For more information about multihost disks, see Multihost Disk Storage in Sun Cluster Overview for Solaris OS and Sun Cluster Concepts Guide for Solaris OS.
Add this planning information to the Local File System Layout Worksheet.
For maximum availability, mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VxVM, you encapsulate the root disk and mirror the generated subdisks. However, Sun Cluster software does not require that you mirror the root disk.
Before you decide whether to mirror the root disk, consider the risks, complexity, cost, and service time for the various alternatives that concern the root disk. No single mirroring strategy works for all configurations. You might want to consider your local Sun service representative's preferred solution when you decide whether to mirror root.
See your volume-manager documentation and Installing and Configuring Solstice DiskSuite or Solaris Volume Manager Software or SPARC: Installing and Configuring VxVM Software for instructions about how to mirror the root disk.
Consider the following points when you decide whether to mirror the root disk.
Boot disk – You can set up the mirror to be a bootable root disk. You can then boot from the mirror if the primary boot disk fails.
Complexity – To mirror the root disk adds complexity to system administration. To mirror the root disk also complicates booting in single-user mode.
Backups – Regardless of whether you mirror the root disk, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors. Only a backup plan enables you to restore files that have been accidentally altered or deleted.
Quorum devices – Do not use a disk that was configured as a quorum device to mirror a root disk.
Quorum – Under Solstice DiskSuite or Solaris Volume Manager software, in failure scenarios in which state database quorum is lost, you cannot reboot the system until maintenance is performed. See your Solstice DiskSuite or Solaris Volume Manager documentation for information about the state database and state database replicas.
Separate controllers – Highest availability includes mirroring the root disk on a separate controller.
Secondary root disk – With a mirrored root disk, the primary root disk can fail but work can continue on the secondary (mirror) root disk. Later, the primary root disk might return to service, for example, after a power cycle or transient I/O errors. Subsequent boots are then performed by using the primary root disk that is specified for the eeprom(1M) boot-device parameter. In this situation, no manual repair task occurs, but the drive starts working well enough to boot. With Solstice DiskSuite or Solaris Volume Manager software, a resync does occur. A resync requires a manual step when the drive is returned to service.
If changes were made to any files on the secondary (mirror) root disk, they would not be reflected on the primary root disk during boot time. This condition would cause a stale submirror. For example, changes to the /etc/system file would be lost. With Solstice DiskSuite or Solaris Volume Manager software, some administrative commands might have changed the /etc/system file while the primary root disk was out of service.
The boot program does not check whether the system is booting from a mirror or from an underlying physical device. The mirroring becomes active partway through the boot process, after the metadevices or volumes are loaded. Before this point, the system is therefore vulnerable to stale submirror problems.