This chapter provides planning information and guidelines for installing a Sun Cluster configuration.
The following overview information is in this chapter:
The following table shows where to find instructions for various installation tasks for Sun Cluster software installation and the order in which you should perform the tasks.
Table 1–1 Sun Cluster Software Installation Task Information
Task |
Instructions |
---|---|
Set up cluster hardware. |
|
Plan cluster software installation. | |
Install a new cluster or add nodes to an existing cluster. | |
Install and configure Solstice DiskSuiteTM/Solaris Volume Manager software. |
|
Install and configure VERITAS Volume Manager (VxVM) software. |
|
Configure cluster framework software and optionally install and configure the Sun Cluster module to Sun Management Center. | |
Plan, install, and configure resource groups and data services. |
Sun Cluster 3.1 Data Service Planning and Administration Guide |
Develop custom data services. | |
Upgrade to Sun Cluster 3.1 10/03 software. |
This section provides guidelines for planning Solaris software installation in a cluster configuration. For more information about Solaris software, see your Solaris installation documentation.
You can install Solaris software from a local CD-ROM or from a network installation server by using the JumpStartTM installation method. In addition, Sun Cluster software provides a custom method for installing both the Solaris operating environment and Sun Cluster software by using the JumpStart installation method. If you are installing several cluster nodes, consider a network installation.
See How to Install Solaris and Sun Cluster Software (JumpStart) for details about the scinstall JumpStart installation method. See your Solaris installation documentation for details about standard Solaris installation methods.
The following Solaris operating-environment features are not supported in a Sun Cluster configuration:
Solaris interface groups are not supported in a Sun Cluster configuration. The Solaris interface groups feature is disabled by default during Solaris software installation. Do not re-enable Solaris interface groups. See the ifconfig(1M) man page for more information about Solaris interface groups.
Automatic power-saving shutdown is not supported in Sun Cluster configurations and should not be enabled. See the pmconfig(1M) and power.conf(4) man pages for more information.
Sun Cluster 3.1 software requires at least the Solaris End User System Support software group. However, other components of your cluster configuration might have their own Solaris software requirements as well. Consider the following information when you decide which Solaris software group you are installing.
Check your server documentation for any Solaris software requirements. For example, Sun Enterprise 10000 servers require the Entire Distribution + OEM software group.
If you intend to use SCI-PCI adapters or the Remote Shared Memory Application Programming Interface (RSMAPI), ensure that you install the RSMAPI software packages (SUNWrsm, SUNWrsmx, SUNWrsmo, and SUNWrsmox). The RSMAPI software packages are included in only some Solaris software groups. For example, the Solaris Developer System Support software group includes the RSMAPI software packages but the End User System Support software group does not.
If the software group that you are installing does not include the RSMAPI software packages, install the RSMAPI software packages manually before you install Sun Cluster software. Use the pkgadd(1M) command to manually install the software packages. See the Solaris 8 Section (3RSM) man pages for information about using the RSMAPI.
You might need to install other Solaris software packages that are not part of the End User System Support software group. The Apache HTTP server packages are one example. Third-party software, such as ORACLE®, might also require additional Solaris software packages. See your third-party documentation for any Solaris software requirements.
Add this information to the appropriate Local File System Layout Worksheet.
When you install the Solaris operating environment, ensure that you create the required Sun Cluster partitions and that all partitions meet minimum space requirements.
swap – The combined amount of swap space that is allocated for Solaris and Sun Cluster software must be no less than 750 Mbytes. For best results, add at least 512 Mbytes for Sun Cluster software to the amount that is required by the Solaris operating environment. In addition, allocate any additional swap amount that is required by applications that are to run on the cluster node.
/globaldevices – Create a 512-Mbyte file system that is to be used by the scinstall(1M) utility for global devices.
Volume manager – Create a 20-Mbyte partition on a slice at the end of the disk (slice 7) for volume manager use. If your cluster uses VERITAS Volume Manager (VxVM) and you intend to encapsulate the root disk, you need to have two unused slices available for use by VxVM.
To meet these requirements, you must customize the partitioning if you are performing interactive installation of the Solaris operating environment.
See the following guidelines for additional partition planning information:
As with any other system running the Solaris operating environment, you can configure the root (/), /var, /usr, and /opt directories as separate file systems. Or, you can include all the directories in the root (/) file system. The following describes the software contents of the root (/), /var, /usr, and /opt directories in a Sun Cluster configuration. Consider this information when you plan your partitioning scheme.
root (/) – The Sun Cluster software itself occupies less than 40 Mbytes of space in the root (/) file system. Solstice DiskSuite/Solaris Volume Manager software requires less than 5 Mbytes, and VxVM software requires less than 15 Mbytes. To configure ample additional space and inode capacity, add at least 100 Mbytes to the amount of space you would normally allocate for your root (/) file system. This space is used for the creation of both block special devices and character special devices used by either Solstice DiskSuite/Solaris Volume Manager or VxVM software. You especially need to allocate this extra space if a large number of shared disks are in the cluster.
/var – The Sun Cluster software occupies a negligible amount of space in the /var file system at installation time. However, you need to set aside ample space for log files. Also, more messages might be logged on a clustered node than would be found on a typical standalone server. Therefore, allow at least 100 Mbytes for the /var file system.
/usr – Sun Cluster software occupies less than 25 Mbytes of space in the /usr file system. Solstice DiskSuite/Solaris Volume Manager and VxVM software each require less than 15 Mbytes.
/opt – Sun Cluster framework software uses less than 2 Mbytes in the /opt file system. However, each Sun Cluster data service might use between 1 Mbyte and 5 Mbytes. Solstice DiskSuite/Solaris Volume Manager software does not use any space in the /opt file system. VxVM software can use over 40 Mbytes if all of its packages and tools are installed.
In addition, most database and applications software is installed in the /opt file system. If you use Sun Management Center software to monitor the cluster, you need an additional 25 Mbytes of space on each node to support the Sun Management Center agent and Sun Cluster module packages.
Sun Cluster software requires you to set aside a special file system on one of the local disks for use in managing global devices. This file system is later mounted as a cluster file system. Name this file system /globaldevices, which is the default name that is recognized by the scinstall(1M) command.
The scinstall command later renames the file system /global/.devices/node@nodeid, where nodeid represents the number that is assigned to a node when it becomes a cluster member. The original /globaldevices mount point is removed.
The /globaldevices file system must have ample space and ample inode capacity for creating both block special devices and character special devices. This guideline is especially important if a large number of disks are in the cluster. A file system size of 512 Mbytes should suffice for most cluster configurations.
If you use Solstice DiskSuite/Solaris Volume Manager software, you must set aside a slice on the root disk for use in creating the state database replica. Specifically, set aside a slice for this purpose on each local disk. But, if you only have one local disk on a node, you might need to create three state database replicas in the same slice for Solstice DiskSuite/Solaris Volume Manager software to function properly. See your Solstice DiskSuite/Solaris Volume Manager documentation for more information.
If you use VxVM and you intend to encapsulate the root disk, you need to have two unused slices that are available for use by VxVM. Additionally, you need to have some additional unassigned free space at either the beginning or the end of the disk. See your VxVM documentation for more information about root disk encapsulation.
Table 1–2 shows a partitioning scheme for a cluster node that has less than 750 Mbytes of physical memory. This scheme is to be installed with the Solaris operating environment End User System Support software group, Sun Cluster software, and the Sun Cluster HA for NFS data service. The last slice on the disk, slice 7, is allocated with a small amount of space for volume-manager use.
This layout allows for the use of either Solstice DiskSuite/Solaris Volume Manager software or VxVM. If you use Solstice DiskSuite/Solaris Volume Manager software, you use slice 7 for the state database replica. If you use VxVM, you later free slice 7 by assigning the slice a zero length. This layout provides the necessary two free slices, 4 and 7, as well as provides for unused space at the end of the disk.
Table 1–2 Example File-System Allocation
Slice |
Contents |
Allocation (in Mbytes) |
Description |
---|---|---|---|
0 |
/ |
6.75GB |
Remaining free space on the disk after allocating space to slices 1 through 7. Used for Solaris operating environment software, Sun Cluster software, data-services software, volume-manager software, Sun Management Center agent and Sun Cluster module agent packages, root file systems, and database and application software. |
1 |
swap |
1GB |
512 Mbytes for Solaris operating environment software. 512 Mbytes for Sun Cluster software. |
2 |
overlap |
8.43GB |
The entire disk. |
3 |
/globaldevices |
512MB |
The Sun Cluster software later assigns this slice a different mount point and mounts the slice as a cluster file system. |
4 |
unused |
- |
Available as a free slice for encapsulating the root disk under VxVM. |
5 |
unused |
- |
- |
6 |
unused |
- |
- |
7 |
volume manager |
20MB |
Used by Solstice DiskSuite/Solaris Volume Manager software for the state database replica, or used by VxVM for installation after you free the slice. |
This section provides guidelines for planning and preparing the following components for Sun Cluster software installation:
For detailed information about Sun Cluster components, see the Sun Cluster 3.1 10/03 Concepts Guide.
Ensure that you have available all necessary license certificates before you begin software installation. Sun Cluster software does not require a license certificate, but each node installed with Sun Cluster software must be covered under your Sun Cluster software license agreement.
For licensing requirements for volume-manager software and applications software, see the installation documentation for those products.
After installing each software product, you must also install any required patches.
For information about current required patches, see “Patches and Required Firmware Levels” in Sun Cluster 3.1 10/03 Release Notes or consult your Sun service provider.
For general guidelines and procedures for applying patches, see “Patching Sun Cluster Software and Firmware” in Sun Cluster 3.1 10/03 System Administration Guide.
You must set up a number of IP addresses for various Sun Cluster components, depending on your cluster configuration. Each node in the cluster configuration must have at least one public network connection to the same set of public subnets.
The following table lists the components that need IP addresses assigned. Add these IP addresses to any naming services that are used. Also add these IP addresses to the local /etc/inet/hosts file on each cluster node after you install Solaris software.
For more information about IP addresses, see System Administration Guide, Volume 3 (Solaris 8) or System Administration Guide: IP Services (Solaris 9).
For more information about test IP addresses to support IP Network Multipathing, see IP Network Multipathing Administration Guide.
Component |
Number of IP Addresses Needed |
---|---|
1 per subnet |
|
|
|
Cluster nodes |
1 per node, per subnet |
1 per domain |
|
1 |
|
Logical addresses |
1 per logical host resource, per subnet |
You must have console access to all cluster nodes. If you install Cluster Control Panel software on your administrative console, you must provide the hostname of the console-access device that is used to communicate with the cluster nodes.
A terminal concentrator is used to communicate between the administrative console and the cluster node consoles.
A Sun Enterprise 10000 server uses a System Service Processor (SSP) instead of a terminal concentrator.
A Sun FireTM server uses a system controller instead of a terminal concentrator.
For more information about console access, see the Sun Cluster 3.1 10/03 Concepts Guide.
Each data-service resource group that uses a logical address must have a hostname specified for each public network from which the logical address can be accessed.
For more information, see the Sun Cluster 3.1 Data Service Planning and Administration Guide.
For additional information about data services and resources, also see the Sun Cluster 3.1 10/03 Concepts Guide.
This section provides guidelines for the following Sun Cluster components that you configure during installation:
Add this planning information to the Cluster and Node Names Worksheet.
Specify a name for the cluster during Sun Cluster installation. The cluster name should be unique throughout the enterprise.
Add this planning information to the Cluster and Node Names Worksheet. Information for most other worksheets is grouped by node name.
The node name is the name that you assign to a machine when you install the Solaris operating environment. During Sun Cluster installation, you specify the names of all nodes that you are installing as a cluster. In single-node cluster installations, the default node name is the same as the cluster name.
Add this planning information to the Cluster and Node Names Worksheet.
You do not need to configure a private network for a single-node cluster.
Sun Cluster software uses the private network for internal communication between nodes. A Sun Cluster configuration requires at least two connections to the cluster interconnect on the private network. You specify the private network address and netmask when you install Sun Cluster software on the first node of the cluster. You can either accept the default private network address (172.16.0.0) and netmask (255.255.0.0) or type different choices if the default network address is already in use elsewhere in the enterprise.
After you have successfully installed the node as a cluster member, you cannot change the private network address and netmask.
If you specify a private network address other than the default, the address must meet the following requirements:
Use zeroes for the last two octets of the address.
Follow the guidelines in RFC 1597 for network address assignments.
You can contact the InterNIC to obtain copies of RFCs. See “Planning Your TCP/IP Network” in System Administration Guide, Volume 3 (Solaris 8) or “Planning Your TCP/IP Network (Task)” in System Administration Guide: IP Services (Solaris 9) for instructions.
If you specify a netmask other than the default, the netmask must minimally mask all bits that are given in the private network address.
Add this planning information to the Cluster and Node Names Worksheet.
The private hostname is the name that is used for internode communication over the private-network interface. Private hostnames are automatically created during Sun Cluster installation. These private hostnames follow the naming convention clusternodenodeid-priv, where nodeid is the numeral of the internal node ID. During Sun Cluster installation, the node ID number is automatically assigned to each node when the node becomes a cluster member. After installation, you can rename private hostnames by using the scsetup(1M) utility.
Add this planning information to the Cluster Interconnect Worksheet.
You do not need to configure a cluster interconnect for a single-node cluster. However, if you anticipate eventually adding nodes to a single-node cluster configuration, you might want to configure the cluster interconnect for future use.
The cluster interconnects provide the hardware pathways for private network communication between cluster nodes. Each interconnect consists of a cable that is connected in one of the following ways:
Between two transport adapters
Between a transport adapter and a transport junction
Between two transport junctions
During Sun Cluster installation, you specify the following configuration information for two cluster interconnects:
Transport adapters – For the transport adapters, such as ports on network interfaces, specify the transport adapter names and transport type. If your configuration is a two-node cluster, you also specify whether your interconnect is direct connected (adapter to adapter) or uses a transport junction. If your two-node cluster is direct connected, you can still specify a transport junction for the interconnect.
If you specify a transport junction, you can more easily add another node to the cluster in the future.
See the scconf_trans_adap_*(1M) family of man pages for information about a specific transport adapter.
Transport junctions – If you use transport junctions, such as a network switch, specify a transport junction name for each interconnect. You can use the default name switchN, where N is a number that is automatically assigned during installation, or create another name.
Also specify the junction port name or accept the default name. The default port name is the same as the internal node ID number of the node that hosts the adapter end of the cable. However, you cannot use the default port name for certain adapter types, such as SCI-PCI.
Clusters with three or more nodes must use transport junctions. Direct connection between cluster nodes is supported only for two-node clusters.
You can configure additional private-network connections after installation by using the scsetup(1M) utility.
For more information about the cluster interconnect, see the Sun Cluster 3.1 10/03 Concepts Guide.
Add this planning information to the Public Networks Worksheet.
Public networks communicate outside the cluster. Consider the following points when you plan your public network configuration.
Public networks and the private network (cluster interconnect) must use separate adapters.
You must have at least one public network that is connected to all cluster nodes.
You can have as many additional public network connections as your hardware configuration allows.
The local-mac-address? variable must use the default value true for Ethernet adapters. Sun Cluster 3.1 software does not support a local-mac-address? value of false for Ethernet adapters. This requirement is a change from Sun Cluster 3.0, which did require a local-mac-address? value of false.
See IP Network Multipathing Groups for guidelines on planning public-network-adapter backup groups. For more information about public network interfaces, see the Sun Cluster 3.1 10/03 Concepts Guide.
Add this planning information to the Disk Device Group Configurations Worksheet.
You must configure all volume-manager disk groups as Sun Cluster disk device groups. This configuration enables a secondary node to host multihost disks if the primary node fails. Consider the following points when you plan disk device groups.
Failover – You can configure multiported disks and properly configured volume-manager devices as failover devices. Proper configuration of a volume-manager device includes multiported disks and correct setup of the volume manager itself. This configuration ensures that multiple nodes can host the exported device. You cannot configure tape drives, CD-ROMs, or single-ported disks as failover devices.
Mirroring – You must mirror the disks to protect the data from disk failure. See Mirroring Guidelines for additional guidelines. See Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software and your volume-manager documentation for instructions on mirroring.
For more information about disk device groups, see the Sun Cluster 3.1 10/03 Concepts Guide.
Add this planning information to the Public Networks Worksheet.
Internet Protocol (IP) Network Multipathing groups, which replace Network Adapter Failover (NAFO) groups, provide public network adapter monitoring and failover, and are the foundation for a network-address resource. A multipathing group provides high availability when the multipathing group is configured with two or more adapters. If one adapter fails, all of the addresses on the failed adapter fail over to another adapter in the multipathing group. In this way, the multipathing-group adapters maintain public-network connectivity to the subnet to which the adapters in the multipathing group connect.
Consider the following points when you plan your multipathing groups.
Each public network adapter must belong to a multipathing group.
For multipathing groups that contain two or more adapters, you must configure a test IP address for each adapter in the group. If a multipathing group contains only one adapter, you do not need to configure a test IP address.
Test IP addresses for all adapters in the same multipathing group must belong to a single IP subnet.
Test IP addresses must not be used by normal applications because the test IP addresses are not highly available.
In the /etc/default/mpathd file, do no change the value of TRACK_INTERFACES_ONLY_WITH_GROUPS from yes to no.
The name of a multipathing group has no requirements or restrictions.
For more information about IP Network Multipathing, see “Deploying Network Multipathing” in IP Network Multipathing Administration Guide (Solaris 8) or“Administering Network Multipathing (Task)” in System Administration Guide: IP Services (Solaris 9).
Sun Cluster configurations use quorum devices to maintain data and resource integrity. If the cluster temporarily loses connection to a node, the quorum device prevents amnesia or split-brain problems when the cluster node attempts to rejoin the cluster. You assign quorum devices by using the scsetup(1M) utility.
You do not need to configure quorum devices for a single-node cluster.
Consider the following points when you plan quorum devices.
Minimum – A two-node cluster must have at least one shared disk assigned as a quorum device. For other topologies, quorum devices are optional.
Odd-number rule – If more than one quorum device is configured in a two-node cluster, or in a pair of nodes directly connected to the quorum device, configure an odd number of quorum devices. This configuration ensures that the quorum devices have completely independent failure pathways.
Connection – You must connect a quorum device to at least two nodes.
For more information about quorum devices, see the Sun Cluster 3.1 10/03 Concepts Guide.
This section provides the following guidelines for planning global devices and for planning cluster file systems:
For more information about global devices and about cluster files systems, see the Sun Cluster 3.1 10/03 Concepts Guide.
Sun Cluster software does not require any specific disk layout or file system size. Consider the following points when you plan your layout for global devices and for cluster file systems.
Mirroring – You must mirror all global devices for the global device to be considered highly available. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Disks – When you mirror, lay out file systems so that the file systems are mirrored across disk arrays.
Availability – You must physically connect a global device to more than one node in the cluster for the global device to be considered highly available. A global device with multiple physical connections can tolerate a single-node failure. A global device with only one physical connection is supported, but the global device becomes inaccessible from other nodes if the node with the connection is down.
Consider the following points when you plan mount points for cluster file systems.
Mount-point location – Create mount points for cluster file systems in the /global directory, unless you are prohibited by other software products. By using the /global directory, you can more easily distinguish cluster file systems, which are globally available, from local file systems.
The following VxFS features are not supported in a Sun Cluster 3.1 configuration.
Quick I/O
Snapshots
Storage checkpoints
convosync (Convert O_SYNC)
mincache
qlog, delaylog, tmplog
VERITAS CFS requires VERITAS cluster feature & VCS
Cache advisories can be used, but the effect is observed on the given node only.
All other VxFS features and options that are supported in a cluster configuration are supported by Sun Cluster 3.1 software. See VxFS documentation for details about VxFS options that are supported in a cluster configuration.
VxFS mount requirement – Globally mount and unmount a VxFS file system from the primary node. The primary node is the node that masters the disk on which the VxFS file system resides. This method ensures that the mount or unmount operation succeeds. A VxFS file-system mount or unmount operation that is performed from a secondary node might fail.
Nesting mount points – Normally, you should not nest the mount points for cluster file systems. For example, do not set up one file system that is mounted on /global/a and another file system that is mounted on /global/a/b. To ignore this rule can cause availability and node boot-order problems. These problems would occur if the parent mount point is not present when the system attempts to mount a child of that file system. The only exception to this rule is if the devices for the two file systems have the same physical node connectivity. An example is different slices on the same disk.
Add this planning information to the Disk Device Group Configurations Worksheet and the Volume Manager Configurations Worksheet. For Solstice DiskSuite/Solaris Volume Manager, also add this planning information to the Metadevices Worksheet (Solstice DiskSuite/Solaris Volume Manager).
This section provides the following guidelines for planning volume management of your cluster configuration:
Sun Cluster software uses volume-manager software to group disks into disk device groups which can then be administered as one unit. Sun Cluster software supports Solstice DiskSuite/Solaris Volume Manager software and VERITAS Volume Manager (VxVM) software that you install or use in the following ways.
Table 1–4 Supported Use of Volume Managers with Sun Cluster Software
Volume-Manager Software |
Requirements |
---|---|
Solstice DiskSuite/Solaris Volume Manager |
You must install Solstice DiskSuite/Solaris Volume Manager software on all nodes of the cluster, regardless of whether you use VxVM on some nodes to manage disks. |
VxVM with the cluster feature |
You must install and license VxVM with the cluster feature on all nodes of the cluster. |
VxVM without the cluster feature |
You are only required to install and license VxVM on those nodes that are attached to storage devices which VxVM manages. |
Both Solstice DiskSuite/Solaris Volume Manager and VxVM |
If you install both volume managers on the same node, you must use Solstice DiskSuite/Solaris Volume Manager software to manage disks that are local to each node. Local disks include the root disk. Use VxVM to manage all shared disks. |
See your volume-manager documentation and Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software for instructions on how to install and configure the volume-manager software. For more information about volume management in a cluster configuration, see the Sun Cluster 3.1 10/03 Concepts Guide.
Consider the following general guidelines when you configure your disks with volume-manager software:
Mirrored multihost disks – You must mirror all multihost disks across disk expansion units. See Guidelines for Mirroring Multihost Disks for guidelines on mirroring multihost disks. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Mirrored root – Mirroring the root disk ensures high availability, but such mirroring is not required. See Mirroring Guidelines for guidelines about deciding whether to mirror the root disk.
Unique naming – You might have local Solstice DiskSuite metadevices, local Solaris Volume Manager volumes, or VxVM volumes that are used as devices on which the /global/.devices/node@nodeid file systems are mounted. If so, the name of each local metadevice or local volume must be unique throughout the cluster.
Node lists – To ensure high availability of a disk device group, make its node lists of potential masters and its failback policy identical to any associated resource group. Or, if a scalable resource group uses more nodes than its associated disk device group, make the scalable resource group's node list a superset of the disk device group's node list. See the resource group planning information in the Sun Cluster 3.1 Data Service Planning and Administration Guide for information about node lists.
Multiported disks – You must connect, or port, all disks used to construct a device group within the cluster to all of the nodes that are configured in the node list for that device group. Solstice DiskSuite/Solaris Volume Manager software can automatically check for this connection at the time that disks are added to a diskset. However, configured VxVM disk groups do not have an association to any particular set of nodes.
Hot spare disks – You can use hot spare disks to increase availability, but hot spare disks are not required.
See your volume-manager documentation for disk layout recommendations and any additional restrictions.
Consider the following points when you plan Solstice DiskSuite/Solaris Volume Manager configurations:
Local metadevice names or volume names – The name of each local Solstice DiskSuite metadevice or Solaris Volume Manager volume must be unique throughout the cluster. Also, the name cannot be the same as any device-ID name.
Mediators – Each diskset configured with exactly two disk strings and mastered by exactly two nodes must have Solstice DiskSuite/Solaris Volume Manager mediators configured for the diskset. A disk string consists of a disk enclosure, its physical disks, cables from the enclosure to the node(s), and the interface adapter cards. Observe the following rules to configure mediators:
You must configure each diskset with exactly two nodes that act as mediator hosts.
You must use the same two nodes for all disksets that require mediators. Those two nodes must master those disksets.
Mediators cannot be configured for disksets that do not meet the two-string and two-host requirements.
See the mediator(7D) man page for details.
/kernel/drv/md.conf settings – All Solstice DiskSuite metadevices or Solaris Volume Manager volumes used by each diskset are created in advance, at reconfiguration boot time. This reconfiguration is based on the configuration parameters that exist in the /kernel/drv/md.conf file.
All cluster nodes must have identical /kernel/drv/md.conf files, regardless of the number of disksets that are served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite/Solaris Volume Manager errors and possible loss of data.
You must modify the nmd and md_nsets fields as follows to support a Sun Cluster configuration:
md_nsets – The md_nsets field defines the total number of disksets that can be created for a system to meet the needs of the entire cluster. Set the value of md_nsets to the expected number of disksets in the cluster plus one additional diskset. Solstice DiskSuite/Solaris Volume Manager software uses the additional diskset to manage the private disks on the local host. The private disks are those metadevices or volumes that are not in the local diskset.
The maximum number of disksets that are allowed per cluster is 32. This number allows for 31 disksets for general use plus one diskset for private disk management. The default value of md_nsets is 4.
nmd – The nmd field defines the number of metadevices or volumes that are created for each diskset. Set the value of nmd to the predicted highest value of metadevice or volume name that is used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices or volumes in its first 15 disksets, but 1000 metadevices or volumes in the 16th diskset, set the value of nmd to at least 1000. Also, the value of nmd must be large enough to ensure that enough numbers exist for each device–ID name. The number must also be large enough to ensure that each local metadevice name or local volume name can be unique throughout the cluster.
The highest allowed value of a metadevice or volume name per diskset is 8192. The default value of nmd is 128.
Set these fields at installation time to allow for all predicted future expansion of the cluster. To increase the value of these fields after the cluster is in production is time consuming. The value change requires a reconfiguration reboot for each node. To raise these values later also increases the possibility of inadequate space allocation in the root (/) file system to create all of the requested devices.
At the same time, keep the value of the nmdfield and the md_nsets field as low as possible. Memory structures exist for all possible devices as determined by nmdand md_nsets, even if you have not created those devices. For optimal performance, keep the value of nmd and md_nsets only slightly higher than the number of metadevices or volumes you plan to use.
See “System and Startup Files” in Solstice DiskSuite 4.2.1 Reference Guide or “System Files and Startup Files” in Solaris Volume Manager Administration Guide for more information about the md.conf file.
Consider the following points when you plan VERITAS Volume Manager (VxVM) configurations.
Enclosure-Based Naming – Enclosure-Based Naming is a feature that was introduced in VxVM version 3.2. If you use Enclosure-Based Naming of devices, ensure that you use consistent device names on all cluster nodes that share the same storage. VxVM does not coordinate these names, so the administrator must ensure that VxVM assigns the same names to the same devices from different nodes. Failure to assign consistent names does not interfere with correct cluster behavior. However, inconsistent names greatly complicate cluster administration and greatly increase the possibility of configuration errors, potentially leading to loss of data.
Root-disk group – You must create a default root-disk group (rootdg) on each node. The rootdg disk group can be created on the following disks:
The root disk, which must be encapsulated
One or more local nonroot disks, which you can encapsulate or initialize
A combination of root and local nonroot disks
The rootdg disk group must be local to the node.
Encapsulation – Disks to be encapsulated must have two disk-slice table entries free.
Number of volumes – Estimate the maximum number of volumes any given disk device group can use at the time the disk device group is created.
If the number of volumes is less than 1000, you can use default minor numbering.
If the number of volumes is 1000 or greater, you must carefully plan the way in which minor numbers are assigned to disk device group volumes. No two disk device groups can have overlapping minor number assignments.
Dirty Region Logging – Using Dirty Region Logging (DRL) decreases volume recovery time after a node failure. Using DRL might decrease I/O throughput.
Dynamic Multipathing (DMP) – DMP is not supported on Sun Cluster configurations. If you use VxVM in a configuration with multiple paths per node, then you must use another multipathing solution, such as Sun StorEdge Traffic Manager or EMC PowerPath. However, having DMP enabled on systems with only a single path per node poses no problems.
Logging is required for cluster file systems. Sun Cluster software supports the following choices of file-system logging:
Solaris UFS logging – See the mount_ufs(1M) man page for more information.
Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging – See “Creating DiskSuite Objects” in Solstice DiskSuite 4.2.1 User's Guide or “Transactional Volumes (Overview)” in Solaris Volume Manager Administration Guide for more information.
VERITAS File System (VxFS) logging – See the mount_vxfs man page provided with VxFS software for more information.
The following table lists the file-system logging supported by each volume manager.
Table 1–5 Supported File System Logging Matrix
Volume Manager |
Supported File System Logging |
---|---|
Solstice DiskSuite/Solaris Volume Manager |
Solaris UFS logging, Solstice DiskSuite trans-metadevice logging or Solaris Volume Manager transactional-volume logging, VxFS logging |
VERITAS Volume Manager |
Solaris UFS logging, VxFS logging |
Consider the following points when you choose between Solaris UFS logging and Solstice DiskSuite trans-metadevice logging/Solaris Volume Manager transactional-volume logging:
Solaris Volume Managertransactional-volume logging (formerly Solstice DiskSuite trans-metadevice logging) is scheduled to be removed from the Solaris operating environment in an upcoming Solaris release. Solaris UFS logging provides the same capabilities but superior performance, as well as lower system administration requirements and overhead.
Solaris UFS log size – Solaris UFS logging always allocates the log by using free space on the UFS file system, and depending on the size of the file system.
On file systems less than 1 Gbyte, the log occupies 1 Mbyte.
On file systems 1 Gbyte or greater, the log occupies 1 Mbyte per Gbyte on the file system, to a maximum of 64 Mbytes.
Log metadevice/transactional volume – A Solstice DiskSuite trans metadevice or Solaris Volume Manager transactional volume manages UFS logging. The logging device component of a trans metadevice or transactional volume is a metadevice or volume that you can mirror and stripe. You can create a maximum 1-Gbyte log size, although 64 Mbytes is sufficient for most file systems. The minimum log size is 1 Mbyte.
This section provides the following guidelines for planning the mirroring of your cluster configuration:
To mirror all multihost disks in a Sun Cluster configuration enables the configuration to tolerate single-disk failures. Sun Cluster software requires that you mirror all multihost disks across disk expansion units. You do not need to use software mirroring if the storage device provides hardware RAID as well as redundant paths to disks.
Consider the following points when you mirror multihost disks.
Separate disk expansion units – Each submirror of a given mirror or plex should reside in a different multihost disk-expansion unit.
Disk space – Mirroring doubles the amount of necessary disk space.
Three-way mirroring – Solstice DiskSuite/Solaris Volume Manager software and VERITAS Volume Manager (VxVM) support three-way mirroring. However, Sun Cluster software requires only two-way mirroring.
Number of metadevices or volumes – Under Solstice DiskSuite/Solaris Volume Manager software, mirrors consist of other Solstice DiskSuite metadevices or Solaris Volume Manager volumes such as concatenations or stripes. Large configurations might contain a large number of metadevices or volumes.
Differing disk sizes – If you mirror to a disk of a different size, your mirror capacity is limited to the size of the smallest submirror or plex.
For more information about multihost disks, see the Sun Cluster 3.1 10/03 Concepts Guide.
Add this planning information to the Local File System Layout Worksheet.
For maximum availability, mirror root (/), /usr, /var, /opt, and swap on the local disks. Under VxVM, you encapsulate the root disk and mirror the generated subdisks. However, Sun Cluster software does not require that you mirror the root disk.
Before you decide whether to mirror the root disk, consider the risks, complexity, cost, and service time for the various alternatives that concern the root disk. No single mirroring strategy works for all configurations. You might want to consider your local Sun service representative's preferred solution when you decide whether to mirror root.
See your volume-manager documentation and Installing and Configuring Solstice DiskSuite/Solaris Volume Manager Software or Installing and Configuring VxVM Software for instructions on how to mirror the root disk.
Consider the following points when you decide whether to mirror the root disk.
Boot disk – You can set up the mirror to be a bootable root disk. You can then boot from the mirror if the primary boot disk fails.
Complexity – To mirror the root disk adds complexity to system administration. To mirror the root disk also complicates booting in single-user mode.
Backups – Regardless of whether you mirror the root disk, you also should perform regular backups of root. Mirroring alone does not protect against administrative errors. Only a backup plan enables you to restore files that have been accidentally altered or deleted.
Quorum devices – Do not use a disk that was configured as a quorum device to mirror a root disk.
Quorum – Under Solstice DiskSuite/Solaris Volume Manager software, in failure scenarios in which state database quorum is lost, you cannot reboot the system until maintenance is performed. See your Solstice DiskSuite/Solaris Volume Manager documentation for information about the state database and state database replicas.
Separate controllers – Highest availability includes mirroring the root disk on a separate controller.
Secondary root disk – With a mirrored root disk, the primary root disk can fail but work can continue on the secondary (mirror) root disk. Later, the primary root disk might return to service, for example, after a power cycle or transient I/O errors. Subsequent boots are then performed by using the primary root disk that is specified in the OpenBootTM PROM boot-device field. In this situation, no manual repair task occurs, but the drive starts working well enough to boot. With Solstice DiskSuite/Solaris Volume Manager, a resync does occur. A resync requires a manual step when the drive is returned to service.
If changes were made to any files on the secondary (mirror) root disk, they would not be reflected on the primary root disk during boot time. This condition would cause a stale submirror. For example, changes to the /etc/system file would be lost. With Solstice DiskSuite/Solaris Volume Manager, some administrative commands might have changed the /etc/system file while the primary root disk was out of service.
The boot program does not check whether the system is booting from a mirror or from an underlying physical device. The mirroring becomes active partway through the boot process, after the metadevices or volumes are loaded. Before this point, the system is therefore vulnerable to stale submirror problems.