Sun Cluster does not require any specific disk layout or file system size. The requirements for the file system hierarchy are dependent on the volume management software you are using.
Regardless of your volume management software, Sun Cluster requires at least one file system per disk group to serve as the HA administrative file system. This administrative file system should always be mounted on /logicalhost, and must be a minimum of 10 Mbytes. It is used to store private directories containing data service configuration information.
For clusters using Solstice DiskSuite, you need to create a metadevice to contain the HA administrative file system. The HA administrative file system should be configured the same as your other multihost file systems, that is, it should be mirrored and set up as a trans device.
For clusters using VxVM, Sun Cluster creates the HA administrative file system on a volume named dg-stat where dg is the name of the disk group in which the volume is created. dg is usually the first disk group in the list of disk groups specified when defining a logical host.
Consider these points when planning file system size and disk layout:
When mirroring, lay out disks so that they are mirrored across disk controllers.
Partitioning or subdividing all similar disks identically simplifies administration.
Solstice DiskSuite software requires some additional space on the multihost disks and imposes some restrictions on its use. For example, if you are using UNIX file system (UFS) logging under Solstice DiskSuite, one to two percent of each multihost disk must be reserved for metadevice state database replicas and UFS logging. Refer to Appendix B, Configuring Solstice DiskSuite, and to the Solstice DiskSuite documentation for specific guidelines and restrictions.
All metadevices used by each shared diskset are created in advance, at reconfiguration boot time, based on settings found in the md.conf file. The fields in md.conf file are described in the Solstice DiskSuite documentation. The two fields that are used in the Sun Cluster configuration are md_nsets and nmd. The md_nsets field defines the number of disksets and the mnd field defines the number of metadevices to create for each diskset. You should set these fields at install time to allow for all predicted future expansion of the cluster.
Extending the Solstice DiskSuite configuration after the cluster is in production is time consuming because it requires a reconfiguration reboot for each node and always carries the risk that there will not be enough space allocated in the root (/) file system to create all of the requested devices.
The value of md_nsets must be set to the expected number of logical hosts in the cluster, plus one to allow Solstice DiskSuite to manage the private disks on the local host (that is, those metadevices that are not in the local diskset).
The value of nmd must be set to the predicted largest number of metadevices used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices in its first 15 disksets, but 1000 metadevices in the 16th diskset, nmd must be set to at least 1000.
All cluster nodes (or cluster pairs in the cluster pair topology) must have identical md.conf files, regardless of the number of logical hosts served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite errors and possible loss of data.
Consider these points when planning your Solstice DiskSuite file system layout:
The HA administrative file system cannot be grown using growfs(1M).
You must create mount points for other file systems at the /logicalhost level.
Your application might dictate a file system hierarchy and naming convention. Sun Cluster imposes no restrictions on file system naming, as long as names do not conflict with data service required directories.
Use the partitioning scheme described in Table 2-2 for the majority of drives.
In general, if UFS logs are created, the default size for Slice 6 should be 1 percent of the size of the largest multihost disk found on the system.
The overlap of Slices 6 and 0 by Slice 2 is used for raw devices where there are no UFS logs.
In addition, the first drive on each of the first two controllers in each of the disksets should be partitioned as described in Table 2-3.
Each disk group has an HA administrative file system associated with it. This file system is not NFS-shared. It is used for data service specific state or configuration information.
Partition 7 is always reserved for use by Solstice DiskSuite as the first or last 2 Mbytes on each multihost disk.
Slice |
Description |
---|---|
7 |
2 Mbytes, reserved for Solstice DiskSuite |
6 |
UFS logs |
0 |
Remainder of the disk |
2 |
Overlaps Slice 6 and 0 |
Table 2-3 Multihost Disk Partitioning for the First Drive on the First Two Controllers
Slice |
Description |
---|---|
7 |
2 Mbytes, reserved for Solstice DiskSuite |
5 |
2 Mbytes, UFS log for HA administrative file systems |
4 |
9 Mbytes, UFS master for HA administrative file systems |
6 |
UFS logs |
0 |
Remainder of the disk |
2 |
Overlaps Slice 6 and 0 |
You can create UNIX File System (UFS) or VERITAS File System (VxFS) file systems in the disk groups of logical hosts. When a logical host is mastered on a cluster node, the file systems associated with the disk groups of the logical host are mounted on the specified mount points of the mastering node.
When you reconfigure logical hosts, Sun Cluster must check the file systems before mounting them, by running the fsck command. Even though the fsck command checks the UFS file systems in non-interactive parallel mode on UFS file systems, this still consumes some time, and this affects the reconfiguration process. VxFS drastically cuts down on the file system check time, especially if the configuration contains large file systems (greater than 500 Mbytes) used for data services.
When setting up mirrored volumes, always add a Dirty Region Log (DRL) to decrease volume recovery time in the event of a system crash. When mirrored volumes are used in clusters, DRL must be assigned for volumes greater than 500 Mbytes.
With VxVM, it is important to estimate the maximum number of volumes that will be used by any given disk group at the time the disk group is created. If the number is less than 1000, default minor numbering can be used. Otherwise, you must carefully plan the way in which minor numbers are assigned to disk group volumes. It is important that no two disk groups shared by the same nodes have overlapping minor number assignments.
As long as default numbering can be used and all disk groups are currently imported, it is not necessary to use the minor option to the vxdg init command at disk group creation time. Otherwise, the minor option must be used to prevent overlapping the volume minor number assignments. It is possible to modify the minor numbering later, but doing so might require you to reboot and import the disk group again. Refer to the vxdg(1M) man page for details.
The /etc/vfstab file contains the mount points of file systems residing on local devices. For a multihost file system used for a logical host, all the nodes that can potentially master the logical host should possess the mount information.
The mount information for a logical host's file system is kept in a separate file on each node, named /etc/opt/SUNWcluster/conf/hanfs/vfstab.logicalhost. The format of this file is identical to the /etc/vfstab file for ease of maintenance, though not all fields are used.
You must keep the vfstab.logicalhost file consistent among all nodes of the cluster. Use the rcp command or file transfer protocol (FTP) to copy the file to the other nodes of the cluster. Alternatively, edit all of the files simultaneously by using crlogin or ctelnet.
The same file system cannot be mounted by more than one node at the same time, because a file system can be mounted only if the corresponding disk group has been imported by the node. The consistency and uniqueness of the disk group imports and logical host mastery is enforced by the cluster framework logical host reconfiguration sequence.
Sun Cluster supports booting from a private or shared disk inside a SPARCstorage Array.
Consider these points when using a boot disk in an SSA:
Make sure that each cluster node's boot disk is on a different SSA. If nodes share a single SSA for their boot disks, the loss of a single controller would bring down all nodes.
For VxVM configurations, do not configure a boot disk and a quorum device on the same tray. This is especially true for a two-node cluster. If you place both on the same tray, the cluster loses one of its nodes as well as its quorum device when you remove the tray. If for any reason a reconfiguration happens on the surviving node during this time, the cluster is lost. If a controller containing the boot disk and the quorum disk becomes faulty, the node that has its boot disk on the bad controller inevitably hangs or crashes, causing the other node to reconfigure and abort, since the quorum device is inaccessible. (This is a likely scenario in a minimal configuration consisting of two SSAs with boot disks and no root disk mirroring.)
Mirroring the boot disks in a bootable SSA configuration is recommended. However, there is an impact on software upgrades. Neither Solaris nor the volume manager software can be upgraded while the root disk is mirrored. In such configurations, perform upgrades carefully to avoid corruption of the root file system. Refer to "Mirroring Root (/)", for additional information on mirroring the root file system.