This chapter provides planning information and guidelines for installing a Sun Cluster configuration.
The following overview information is in this chapter:
The following table shows where to find instructions for various Sun Cluster software installation tasks and the order in which you should perform them.
Table 1–1 Location of Sun Cluster Software Installation Task Information
Task |
For Instructions, Go To … |
---|---|
Set up cluster hardware. |
Sun Cluster 3.x Hardware Administration Manual Documentation shipped with your server and storage devices |
Plan cluster software installation. |
This chapter “Sun Cluster Installation and Configuration Worksheets” in Sun Cluster 3.1 Release Notes |
Install a new cluster or add nodes to an existing cluster: Install the Solaris operating environment, Cluster Control Panel (optional), SunPlex Manager (optional), cluster framework, and data service software packages. | |
Install and configure Solstice DiskSuite/Solaris Volume Manager software. |
Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software Solstice DiskSuite/Solaris Volume Manager documentation |
Install and configure VERITAS Volume Manager (VxVM) software. |
Installing and Configuring VxVM Software VxVM documentation |
Configure cluster framework software and optionally install and configure Sun Management Center. | |
Plan, install, and configure resource groups and data services. |
Sun Cluster 3.1 Data Service Planning and Administration Guide |
Develop custom data services. | |
Upgrade from Sun Cluster 3.0 to Sun Cluster 3.1 software (Solaris operating environment, cluster framework, data services, and volume manager software). |
Upgrading From Sun Cluster 3.0 to Sun Cluster 3.1 Software Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software Volume manager documentation |
This section provides guidelines for planning Solaris software installation in a cluster configuration. For more information about Solaris software, see the Solaris installation documentation.
You can install Solaris software from a local CD-ROM or from a network installation server by using the JumpStartTM installation method. In addition, Sun Cluster software provides a custom method for installing both the Solaris operating environment and Sun Cluster software by using JumpStart. If you are installing several cluster nodes, consider a network installation.
See How to Install Solaris and Sun Cluster Software (JumpStart) for details about the scinstall JumpStart installation method. See the Solaris installation documentation for details about standard Solaris installation methods.
Sun Cluster 3.1 software requires at least the Solaris End User System Support software group. However, other components of your cluster configuration might have their own Solaris software requirements as well. Consider the following information when you decide which Solaris software group you will install.
See your server documentation for any Solaris software requirements. For example, Sun Enterprise 10000 servers require the Entire Distribution + OEM software group.
If you install the Solaris 8 10/01 operating environment and intend to use SCI-PCI adapters or the Remote Shared Memory Application Programming Interface (RSMAPI), ensure that you install the RSMAPI software packages (SUNWrsm, SUNWrsmx, SUNWrsmo, and SUNWrsmox). The Solaris Developer System Support software group or higher includes these packages. If you install the End User System Support software group, use the pkgadd(1M) command to install these RSMAPI packages before you install Sun Cluster software. See the Solaris 8 10/01 section (3RSM) man pages for information about using the RSMAPI.
You might need to install other Solaris software packages which are not part of the End User System Support software group, for example, the Apache HTTP server packages. Third-party software, such as ORACLE®, might also require additional Solaris packages. See your third-party documentation for any Solaris software requirements.
Add this information to “Local File Systems With Mirrored Root Worksheet” in Sun Cluster 3.1 Release Notes or “Local File Systems with Non-Mirrored Root Worksheet” in Sun Cluster 3.1 Release Notes.
When you install the Solaris operating environment, ensure that you create the required Sun Cluster partitions and that all partitions meet minimum space requirements.
swap – The amount of swap space allocated for Solaris and Sun Cluster software combined must be no less that 750 Mbytes. For best results, add at least 512 Mbytes for Sun Cluster software to the amount required by the Solaris operating environment. In addition, allocate any additional amount required by applications that will run on the cluster node.
/globaldevices – Create a 512-Mbyte file system that will be used by the scinstall(1M) utility for global devices.
Volume manager – Create a 20-Mbyte partition for volume manager use on a slice at the end of the disk (slice 7). If your cluster uses VERITAS Volume Manager (VxVM) and you intend to encapsulate the root disk, you need two unused slices available for use by VxVM.
To meet these requirements, you must customize the partitioning if you are performing interactive installation of the Solaris operating environment.
See the following guidelines for additional partition planning information.
As with any other system running the Solaris operating environment, you can configure the root (/), /var, /usr, and /opt directories as separate file systems, or you can include all the directories in the root (/) file system. The following describes the software contents of the root (/), /var, /usr, and /opt directories in a Sun Cluster configuration. Consider this information when you plan your partitioning scheme.
root (/) – The Sun Cluster software itself occupies less than 40 Mbytes of space in the root (/) file system. Solstice DiskSuite/Solaris Volume Manager software requires less than 5 Mbytes, and VxVM software requires less than 15 Mbytes. For best results, you need to configure ample additional space and inode capacity for the creation of both block special devices and character special devices used by either Solstice DiskSuite/Solaris Volume Manager or VxVM software, especially if a large number of shared disks are in the cluster. Therefore, add at least 100 Mbytes to the amount of space you would normally allocate for your root (/) file system.
/var – The Sun Cluster software occupies a negligible amount of space in the /var file system at installation time. However, you need to set aside ample space for log files. Also, more messages might be logged on a clustered node than would be found on a typical standalone server. Therefore, allow at least 100 Mbytes for the /var file system.
/usr – Sun Cluster software occupies less than 25 Mbytes of space in the /usr file system. Solstice DiskSuite/Solaris Volume Manager and VxVM software each require less than 15 Mbytes.
/opt – Sun Cluster framework software uses less than 2 Mbytes in the /opt file system. However, each Sun Cluster data service might use between 1 Mbyte and 5 Mbytes. Solstice DiskSuite/Solaris Volume Manager software does not use any space in the /opt file system. VxVM software can use over 40 Mbytes if all of its packages and tools are installed. In addition, most database and applications software is installed in the /opt file system. If you use Sun Management Center software to monitor the cluster, you need an additional 25 Mbytes of space on each node to support the Sun Management Center agent and Sun Cluster module packages.
The amount of swap space allocated for Solaris and Sun Cluster software combined must be no less that 750 Mbytes. For best results, add at least 512 Mbytes for Sun Cluster software to the amount required by the Solaris operating environment. In addition, allocate additional swap space for any third-party applications you install on the node that also have swap requirements. See your third-party application documentation for any swap requirements.
Sun Cluster software requires that you set aside a special file system on one of the local disks for use in managing global devices. This file system must be separate, as it will later be mounted as a cluster file system. Name this file system /globaldevices, which is the default name recognized by the scinstall(1M) command. The scinstall command later renames the file system /global/.devices/node@nodeid, where nodeid represents the number assigned to a node when it becomes a cluster member, and the original /globaldevices mount point is removed. The /globaldevices file system must have ample space and inode capacity for creating both block special devices and character special devices, especially if a large number of disks are in the cluster. A file system size of 512 Mbytes should be more than enough for most cluster configurations.
If you use Solstice DiskSuite/Solaris Volume Manager software, you must set aside a slice on the root disk for use in creating the state database replica. Specifically, set aside a slice for this purpose on each local disk. But, if you only have one local disk on a node, you might need to create three state database replicas in the same slice for Solstice DiskSuite/Solaris Volume Manager software to function properly. See the Solstice DiskSuite/Solaris Volume Manager documentation for more information.
If you use VxVM and you intend to encapsulate the root disk, you need two unused slices available for use by VxVM, as well as some additional unassigned free space at either the beginning or the end of the disk. See the VxVM documentation for more information about root disk encapsulation.
Table 1–2 shows a partitioning scheme for a cluster node that has less than 750 Mbytes of physical memory. This scheme will be installed with the Solaris operating environment End User System Support software group, Sun Cluster software, and the Sun Cluster HA for NFS data service. The last slice on the disk, slice 7, is allocated with a small amount of space for volume manager use.
This layout allows for the use of either Solstice DiskSuite/Solaris Volume Manager software or VxVM. If you use Solstice DiskSuite/Solaris Volume Manager software, you use slice 7 for the state database replica. If you use VxVM, you later free slice 7 by assigning it a zero length. This layout provides the necessary two free slices, 4 and 7, and it provides for unused space at the end of the disk.
Table 1–2 Example File System Allocation
Slice |
Contents |
Allocation (in Mbytes) |
Description |
---|---|---|---|
0 |
/ |
6.75GB |
Remaining free space on the disk after allocating space to slices 1 through 7. Used for Solaris operating environment software, Sun Cluster software, data services software, volume manager software, Sun Management Center agent and Sun Cluster module agent packages, root file systems, and database and application software. |
1 |
swap |
1GB |
512 Mbytes for Solaris operating environment software. 512 Mbytes for Sun Cluster software. |
2 |
overlap |
8.43GB |
The entire disk. |
3 |
/globaldevices |
512MB |
The Sun Cluster software later assigns this slice a different mount point and mounts it as a cluster file system. |
4 |
unused |
- |
Available as a free slice for encapsulating the root disk under VxVM. |
5 |
unused |
- |
- |
6 |
unused |
- |
- |
7 |
volume manager |
20MB |
Used by Solstice DiskSuite/Solaris Volume Manager software for the state database replica, or used by VxVM for installation after you free the slice. |
This section provides guidelines for planning and preparing for Sun Cluster software installation. For detailed information about Sun Cluster components, see the Sun Cluster 3.1 Concepts Guide.
Ensure that you have any necessary license certificates available before you begin software installation. Sun Cluster software does not require a license certificate, but each node installed with Sun Cluster software must be covered under your Sun Cluster software license agreement.
For licensing requirements for volume manager software and applications software, see the installation documentation for those products.
After installing each software product, you must also install any required patches. For information about current required patches, see “Patches and Required Firmware Levels” in Sun Cluster 3.1 Release Notes or consult your Sun service provider. See “Patching Sun Cluster Software and Firmware” in Sun Cluster 3.1 System Administration Guide for general guidelines and procedures for applying patches.
You must set up a number of IP addresses for various Sun Cluster components, depending on your cluster configuration. Each node in the cluster configuration must have at least one public network connection to the same set of public subnets.
The following table lists the components that need IP addresses assigned to them. Add these IP addresses to any naming services used. Also add these IP addresses to the local /etc/inet/hosts file on each cluster node after you install Solaris software.
Table 1–3 Sun Cluster Components That Use IP Addresses
You must have console access to all cluster nodes. If you install Cluster Control Panel software on your administrative console, you must provide the hostname of the console-access device used to communicate with the cluster nodes. A terminal concentrator can be used to communicate between the administrative console and the cluster node consoles. A Sun Enterprise 10000 server uses a System Service Processor (SSP) instead of a terminal concentrator. A Sun FireTM server uses a system controller. For more information about console access, see the Sun Cluster 3.1 Concepts Guide.
Each data service resource group that uses a logical address must have a hostname specified for each public network from which the logical address can be accessed. See the Sun Cluster 3.1 Data Service Planning and Administration Guide for information and “Sun Cluster Installation and Configuration Worksheets” in Sun Cluster 3.1 Data Service Release Notes for worksheets for planning resource groups. For more information about data services and resources, also see the Sun Cluster 3.1 Concepts Guide.
This section provides guidelines for the Sun Cluster components that you configure during installation.
Add this planning information to the “Cluster and Node Names Worksheet” in Sun Cluster 3.1 Release Notes.
You specify a name for the cluster during Sun Cluster installation. The cluster name should be unique throughout the enterprise.
Add this planning information to the “Cluster and Node Names Worksheet” in Sun Cluster 3.1 Release Notes. Information for most other worksheets is grouped by node name.
The node name is the name you assign to a machine when you install the Solaris operating environment. During Sun Cluster installation, you specify the names of all nodes that you are installing as a cluster.
Add this planning information to the “Cluster and Node Names Worksheet” in Sun Cluster 3.1 Release Notes.
Sun Cluster software uses the private network for internal communication between nodes. Sun Cluster requires at least two connections to the cluster interconnect on the private network. You specify the private network address and netmask when you install Sun Cluster software on the first node of the cluster. You can either accept the default private network address (172.16.0.0) and netmask (255.255.0.0) or type different choices if the default network address is already in use elsewhere in the enterprise.
After you have successfully installed the node as a cluster member, you cannot change the private network address and netmask.
If you specify a private network address other than the default, it must meet the following requirements:
Use zeroes for the last two octets of the address.
Follow the guidelines in RFC 1597 for network address assignments.
See “Planning Your TCP/IP Network” in System Administration Guide, Volume 3 (Solaris 8) or “Planning Your TCP/IP Network (Task)” in System Administration Guide: IP Services (Solaris 9) for instructions on how to contact the InterNIC to obtain copies of RFCs.
If you specify a netmask other than the default, it must meet the following requirements:
Minimally mask all bits given in the private network address
Have no “holes”
Add this planning information to the “Cluster and Node Names Worksheet” in Sun Cluster 3.1 Release Notes.
The private hostname is the name used for inter–node communication over the private network interface. Private hostnames are automatically created during Sun Cluster installation, and follow the naming convention clusternodenodeid-priv, where nodeid is the numeral of the internal node ID. During Sun Cluster installation, this node ID number is automatically assigned to each node when it becomes a cluster member. After installation, you can rename private hostnames by using the scsetup(1M) utility.
Add this planning information to the “Cluster Interconnect Worksheet” in Sun Cluster 3.1 Release Notes.
The cluster interconnects provide the hardware pathways for private network communication between cluster nodes. Each interconnect consists of a cable connected between two transport adapters, a transport adapter and a transport junction, or two transport junctions. During Sun Cluster installation, you specify the following configuration information for two cluster interconnects.
Transport adapters – For the transport adapters, such as ports on network interfaces, specify the transport adapter names and transport type. If your configuration is a two-node cluster, you also specify whether your interconnect is direct connected (adapter to adapter) or uses a transport junction. If your two-node cluster is direct connected, you can still specify a transport junction for the interconnect. If you specify a transport junction, it will be easier to add another node to the cluster in the future.
Transport junctions – If you use transport junctions, such as a network switch, specify a transport junction name for each interconnect. You can use the default name switchN, where N is a number automatically assigned during installation, or create other names.
Also specify the junction port name, or accept the default name. The default port name is the same as the internal node ID number of the node that hosts the adapter end of the cable. However, you cannot use the default port name for certain adapter types, such as SCI.
Clusters with three or more nodes must use transport junctions. Direct connection between cluster nodes is supported only for two-node clusters.
You can configure additional private network connections after installation by using the scsetup(1M) utility.
For more information about the cluster interconnect, see the Sun Cluster 3.1 Concepts Guide.
Add this planning information to the “Public Networks Worksheet” in Sun Cluster 3.1 Release Notes.
Public networks communicate outside the cluster. Consider the following points when you plan your public network configuration.
Public networks and the private network (cluster interconnect) must use separate adapters.
You must have at least one public network that is connected to all cluster nodes.
You can have as many additional public network connections as your hardware configuration allows.
The local-mac-address? variable must use the default value true for Ethernet adapters. Sun Cluster 3.1 software does not support a local-mac-address? value of false for Ethernet adapters. This requirement is a change from Sun Cluster 3.0, which did require a local-mac-address? value of false.
See also IP Network Multipathing Groups for guidelines on planning public network adapter backup groups. For more information about public network interfaces, see the Sun Cluster 3.1 Concepts Guide.
Add this planning information to the “Disk Device Groups Worksheet” in Sun Cluster 3.1 Release Notes.
You must configure all volume manager disk groups as Sun Cluster disk device groups. This configuration enables a secondary node to host multihost disks if the primary node fails. Consider the following points when you plan disk device groups.
Failover – You can configure multiported disks and properly-configured volume manager devices as failover devices. Proper configuration of a volume manager device includes multiported disks and correct setup of the volume manager itself so that multiple nodes can host the exported device. You cannot configure tape drives, CD-ROMs, or single-ported disks as failover devices.
Mirroring – You must mirror the disks to protect the data from disk failure. See Mirroring Guidelines for additional guidelines. See Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software and your volume manager documentation for instructions on mirroring.
For more information about disk device groups, see the Sun Cluster 3.1 Concepts Guide.
Add this planning information to the “Public Networks Worksheet” in Sun Cluster 3.1 Release Notes.
Internet Protocol (IP) Network Multipathing groups, which replace Network Adapter Failover (NAFO) groups, provide public network adapter monitoring and failover, and are the foundation for a network address resource. If a multipathing group is configured with two or more adapters and an adapter fails, all of the addresses on the failed adapter fail over to another adapter in the multipathing group. In this way, the multipathing group adapters maintain public network connectivity to the subnet to which the adapters in the multipathing group connect.
Consider the following points when you plan your multipathing groups.
Each public network adapter must belong to a multipathing group.
The local-mac-address? variable must have a value of true for Ethernet adapters. This is a change from the requirement for Sun Cluster 3.0 software.
You must configure a test IP address for each multipathing group adapter.
Test IP addresses for all adapters in the same multipathing group must belong to a single IP subnet.
Test IP addresses must not be used by normal applications because they are not highly available.
There are no requirements or restrictions for the name of a multipathing group.
For more information about IP Network Multipathing, see “Deploying Network Multipathing” in IP Network Multipathing Administration Guide or “Administering Network Multipathing (Task)” in System Administration Guide: IP Services.
Sun Cluster configurations use quorum devices to maintain data and resource integrity. If the cluster temporarily loses connection to a node, the quorum device prevents amnesia or split-brain problems when the cluster node attempts to rejoin the cluster. You assign quorum devices by using the scsetup(1M) utility.
Consider the following points when you plan quorum devices.
Minimum – A two-node cluster must have at least one shared disk assigned as a quorum device. For other topologies, quorum devices are optional.
Odd-number rule – If more than one quorum device is configured in a two-node cluster, or in a pair of nodes directly connected to the quorum device, configure an odd number of quorum devices so that the quorum devices have completely independent failure pathways.
Connection – Do not connect a quorum device to more than two nodes.
For more information about quorum, see the Sun Cluster 3.1 Concepts Guide.
This section provides guidelines for planning global devices and cluster file systems. For more information about global devices and cluster files systems, see the Sun Cluster 3.1 Concepts Guide.
Sun Cluster software does not require any specific disk layout or file system size. Consider the following points when you plan your global device and cluster file system layout.
Mirroring – You must mirror all global devices for the global device to be considered highly available. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Disks – When you mirror, lay out file systems so that they are mirrored across disk arrays.
Availability – You must physically connect a global device to more than one node in the cluster for the global device to be considered highly available. A global device with multiple physical connections can tolerate a single-node failure. A global device with only one physical connection is supported, but the global device becomes inaccessible from other nodes if the node with the connection is down.
Consider the following points when you plan mount points for cluster file systems.
Mount point location – Create mount points in the /global directory, unless prohibited by other software products. Using the /global directory enables you to easily distinguish cluster file systems, which are globally available, from local file systems.
VxFS mount requirement – Globally mount and unmount a VxFS file system from the primary node (the node that masters the disk on which the VxFS file system resides) to ensure that the operation succeeds. A VxFS file system mount or unmount operation that is performed from a secondary node might fail.
Nesting mount points – Normally, you should not nest the mount points for cluster file systems. For example, do not set up one file system mounted on /global/a and another file system mounted on /global/a/b. Ignoring this rule can cause availability and node boot order problems if the parent mount point is not present when the system attempts to mount a child of that file system. The only exception to this rule is if the devices for the two file systems have the same physical node connectivity (for example, different slices on the same disk).
Add this planning information to the “Disk Device Groups Worksheet” in Sun Cluster 3.1 Release Notes and the “Volume Manager Configurations Worksheet” in Sun Cluster 3.1 Release Notes. For Solstice DiskSuite/Solaris Volume Manager, also add this planning information to the “Metadevices Worksheet (Solstice DiskSuite/Solaris Volume Manager)” in Sun Cluster 3.1 Release Notes.
This section provides guidelines for planning volume management of your cluster configuration.
Sun Cluster uses volume manager software to group disks into disk device groups that can then be administered as one unit. Sun Cluster supports Solstice DiskSuite/Solaris Volume Manager software and VERITAS Volume Manager (VxVM).
If you use Solstice DiskSuite/Solaris Volume Manager software, you must install it on all nodes of the cluster, regardless of whether you use VxVM on some nodes to manage disks.
If you use VxVM and enable the VxVM cluster feature, you must install and license VxVM on all nodes of the cluster.
If you use VxVM and do not enable the VxVM cluster feature, you only need to install and license VxVM on those nodes which are attached to storage devices which VxVM will manage.
If you install both Solstice DiskSuite/Solaris Volume Manager software and VxVM on a node, you must use Solstice DiskSuite/Solaris Volume Manager software to manage disks local to each node (such as the root disk) and you must use VxVM to manage all shared disks.
See your volume manager documentation and Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software for instructions on how to install and configure the volume manager software. For more information about volume management in a cluster configuration, see the Sun Cluster 3.1 Concepts Guide.
Consider the following general guidelines when configuring your disks.
Mirrored multihost disks – You must mirror all multihost disks across disk expansion units. See Mirroring Multihost Disks for guidelines on mirroring multihost disks. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Mirrored root – Mirroring the root disk ensures high availability, but such mirroring is not required. See Mirroring Guidelines for guidelines on deciding whether to mirror the root disk.
Unique naming – On any cluster node, if a local Solstice DiskSuite metadevice or a local Solaris Volume Manager or VxVM volume is used as the device on which the /global/.devices/node@nodeid file system is mounted, the name of that metadevice or volume must be unique throughout the cluster.
Node lists – To ensure high availability of a disk device group, make its node lists of potential masters and its failback policy identical to any associated resource group. Or, if a scalable resource group uses more nodes than its associated disk device group, make the scalable resource group's node list a superset of the disk device group's node list. See the resource group planning information in the Sun Cluster 3.1 Data Service Planning and Administration Guide for information about node lists.
Multiported disks – You must connect, or port, all disks used to construct a device group within the cluster to all of the nodes configured in the node list for that device group. Solstice DiskSuite/Solaris Volume Manager software is able to automatically check for this at the time that disks are added to a diskset. However, configured VxVM disk groups do not have an association to any particular set of nodes.
Hot spare disks – You can use hot spare disks to increase availability, but they are not required.
See your volume manager documentation for disk layout recommendations and any additional restrictions.
Consider the following points when you plan Solstice DiskSuite/Solaris Volume Manager configurations.
Local metadevice or volume names – The name of each local Solstice DiskSuite metadevice or Solaris Volume Manager volume must be unique throughout the cluster and cannot be the same as any device ID (DID) name.
Mediators – Each diskset configured with exactly two disk strings and mastered by exactly two nodes must have Solstice DiskSuite/Solaris Volume Manager mediators configured for the diskset. A disk string consists of a disk enclosure, its physical disks, cables from the enclosure to the node(s), and the interface adapter cards. You must configure each diskset with exactly two nodes acting as mediator hosts. You must use the same two nodes for all disksets requiring mediators and those two nodes must master those disksets. Mediators cannot be configured for disksets that do not meet the two-string and two-host requirements. See the mediator(7D) man page for details.
/kernel/drv/md.conf settings – All Solstice DiskSuite metadevices or Solaris Volume Manager volumes used by each diskset are created in advance, at reconfiguration boot time, based on configuration parameters found in the /kernel/drv/md.conf file.
You must modify the nmd and md_nsets fields as follows to support a Sun Cluster configuration:
md_nsets – The md_nsets field defines the total number of disksets that can be created for a system to meet the needs of the entire cluster. You must set the value of md_nsets to the expected number of disksets in the cluster, plus one to allow Solstice DiskSuite/Solaris Volume Manager software to manage the private disks on the local host (that is, those metadevices or volumes that are not in the local diskset).
The maximum number of disksets allowed per cluster is 32 (31 disksets for general use plus one for private disk management). The default value of md_nsets is 4.
nmd – The nmd field defines the number of metadevices or volumes created for each diskset. You must set the value of nmd to the predicted highest value of metadevice or volume name used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices or volumes in its first 15 disksets, but 1000 metadevices or volumes in the 16th diskset, you must set the value of nmd to at least 1000. Also, the value of nmd must be large enough to ensure that there are enough numbers for each DID name and for each local metadevice or volume name to be unique throughout the cluster.
The highest allowed value of a metadevice or volume name per diskset is 8192. The default value of nmd is 128.
Set these fields at installation time to allow for all predicted future expansion of the cluster. Increasing the value of these fields after the cluster is in production is time consuming because it requires a reconfiguration reboot for each node. Raising these values later also increases the possibility of inadequate space allocation in the root (/) file system to create all of the requested devices.
At the same time, keep the value of the nmd field and the md_nsets field as low as possible. Memory structures exist for all possible devices as determined by nmd and md_nsets, even if you have not created those devices. For optimal performance, keep the value of nmd and md_nsets only slightly higher than the number of metadevices or volumes you will use.
All cluster nodes must have identical /kernel/drv/md.conf files, regardless of the number of disksets served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite/Solaris Volume Manager errors and possible loss of data.
See “System and Startup Files” in Solstice DiskSuite 4.2.1 Reference Guide or “System Files and Startup Files” in Solaris Volume Manager Administration Guide for more information about the md.conf file.
Consider the following points when you plan VERITAS Volume Manager (VxVM) configurations.
Enclosure-based naming – If you use Enclosure-Based Naming of devices (a feature introduced in VxVM version 3.2), ensure that you use consistent device names on all cluster nodes that share the same storage. VxVM does not coordinate these names, so the administrator must ensure that VxVM assigns the same names to the same devices from different nodes. While failure to assign consistent names does not interfere with correct cluster behavior, it greatly complicates cluster administration and greatly increases the possibility of configuration errors, potentially leading to loss of data.
Root disk group – You must create a default root disk group (rootdg) on each node. The rootdg disk group can be created on the following disks:
The root disk, which must be encapsulated
One or more local non-root disks, which can be encapsulated or initialized
A combination of root and local non-root disks
The rootdg disk group must be local to the node.
Encapsulation – Disks to be encapsulated must have two disk-slice table entries free.
Number of volumes – Estimate the maximum number of volumes any given disk device group will use at the time the disk device group is created.
If the number of volumes is less than 1000, you can use default minor numbering.
If the number of volumes is 1000 or greater, you must carefully plan the way in which minor numbers are assigned to disk device group volumes. No two disk device groups can have overlapping minor number assignments.
Dirty Region Logging – Using Dirty Region Logging (DRL) is highly recommended but not required. Using DRL decreases volume recovery time after a node failure. Using DRL might decrease I/O throughput.
Logging is required for cluster file systems. Sun Cluster software supports the following logging file systems:
Solaris UFS logging—See the mount_ufs(1M) man page for more information.
Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging—See “Creating DiskSuite Objects” in Solstice DiskSuite 4.2.1 User's Guide or “Transactional Volumes (Overview)” in Solaris Volume Manager Administration Guide for more information.
VERITAS File System (VxFS) logging—See the mount_vxfs man page provided with VxFS software for more information.
The following table lists the logging file systems supported by each volume manager.
Table 1–4 Supported File System Logging Matrix
Volume Manager |
Supported File System Logging |
---|---|
Solstice DiskSuite/Solaris Volume Manager |
Solaris UFS logging,Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging, VxFS logging |
VERITAS Volume Manager |
Solaris UFS logging, VxFS logging |
Consider the following points when you choose between Solaris UFS logging and trans-metadevice logging.
Transactional volumes are scheduled to be removed from the Solaris operating environment in an upcoming Solaris release. Solaris UFS logging, available since the Solaris 8 release, provides the same capabilities but superior performance, as well as lower system administration requirements and overhead.
Solaris UFS log size – Solaris UFS logging always allocates the log using free space on the UFS file system, and depending on the size of the file system.
On file systems less than 1 Gbyte, the log occupies 1 Mbyte.
On file systems 1 Gbyte or greater, the log occupies 1 Mbyte per Gbyte on the file system, up to a maximum of 64 Mbytes.
Log metadevice – A Solstice DiskSuite trans metadevice or Solaris Volume Manager transactional volume manages UFS logging. The logging device component of a trans metadevice or transactional volume is a metadevice or volume that you can mirror and stripe. You can create a maximum 1-Gbyte log size, although 64 Mbytes is sufficient for most file systems. The minimum log size is 1 Mbyte.
This section provides guidelines for planning the mirroring of your cluster configuration.
Mirroring all multihost disks in a Sun Cluster configuration enables the configuration to tolerate single-disk failures. Sun Cluster software requires that you mirror all multihost disks across disk expansion units. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Consider the following points when you mirror multihost disks.
Separate disk expansion units – Each submirror of a given mirror or plex should reside in a different multihost disk expansion unit.
Disk space – Mirroring doubles the amount of necessary disk space.
Three-way mirroring – Solstice DiskSuite/Solaris Volume Manager software and VERITAS Volume Manager (VxVM) support three-way mirroring. However, Sun Cluster requires only two-way mirroring.
Number of metadevices or volumes – Under Solstice DiskSuite/Solaris Volume Manager software, mirrors consist of other Solstice DiskSuite metadevices or Solaris Volume Manager volumes such as concatenations or stripes. Large configurations might contain a large number of metadevices or volumes.
Differing disk sizes – If you mirror to a disk of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.
For more information about multihost disks, see the Sun Cluster 3.1 Concepts Guide.
Add this planning information to the “Local File Systems With Mirrored Root Worksheet” in Sun Cluster 3.1 Release Notes.
For maximum availability, you should mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VxVM, you encapsulate the root disk and mirror the generated subdisks. However, Sun Cluster software does not require that you mirror the root disk.
Before you decide whether to mirror the root disk, consider the risks, complexity, cost, and service time for the various alternatives concerning the root disk. No single mirroring strategy works for all configurations. You might want to consider your local Sun service representative's preferred solution when you decide whether to mirror root.
See your volume manager documentation and Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software for instructions on how to mirror the root disk.
Consider the following points when you decide whether to mirror the root disk.
Boot disk – You can set up the mirror to be a bootable root disk so that you can boot from the mirror if the primary boot disk fails.
Complexity – Mirroring the root disk adds complexity to system administration and complicates booting in single-user mode.
Backups – Regardless of whether or not you mirror the root disk, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors. Only a backup plan enables you to restore files that have been accidentally altered or deleted.
Quorum devices – Do not use a disk configured as a quorum device to mirror a root disk.
Quorum – Under Solstice DiskSuite/Solaris Volume Manager software, in failure scenarios in which state database quorum is lost, you cannot reboot the system until maintenance is performed. See the Solstice DiskSuite/Solaris Volume Manager documentation for information about the state database and state database replicas.
Separate controllers – Highest availability includes mirroring the root disk on a separate controller.
Secondary root disk – With a mirrored root disk, the primary root disk can fail but work can continue on the secondary (mirror) root disk. At a later point, the primary root disk might return to service (perhaps after a power cycle or transient I/O errors) and subsequent boots are performed by using the primary root disk specified in the OpenBootTM PROM boot-device field. In this situation no manual repair task occurs, but the drive starts working well enough to boot. Note that a Solstice DiskSuite/Solaris Volume Manager resync does occur. A resync requires a manual step when the drive is returned to service.
If changes were made to any files on the secondary (mirror) root disk, they would not be reflected on the primary root disk during boot time (causing a stale submirror). For example, changes to the /etc/system file would be lost. Some Solstice DiskSuite/Solaris Volume Manager administrative commands might have changed the /etc/system file while the primary root disk was out of service.
The boot program does not check whether the system is booting from a mirror or an underlying physical device, and the mirroring becomes active partway through the boot process (after the metadevices or volumes are loaded). Before this point, the system is vulnerable to stale submirror problems.