Sun Cluster 2.2 Software Installation Guide

2.2.8 Planning Your File System Layout on the Multihost Disks

Sun Cluster does not require any specific disk layout or file system size. The requirements for the file system hierarchy are dependent on the volume management software you are using.

Regardless of your volume management software, Sun Cluster requires at least one file system per disk group to serve as the HA administrative file system. This administrative file system is generally mounted on /logicalhost, and must be a minimum of 10 Mbytes. It is used to store private directories containing data service configuration information.

For clusters using Solstice DiskSuite, you need to create a metadevice to contain the HA administrative file system. The HA administrative file system should be configured the same as your other multihost file systems, that is, it should be mirrored and set up as a trans device.

For clusters using SSVM or CVM, Sun Cluster creates the HA administrative file system on a volume named dg-stat where dg is the name of the disk group in which the volume is created. dg is usually the first disk group in the list of disk groups specified when defining a logical host.

Consider these points when planning file system size and disk layout:

2.2.8.1 File Systems With Solstice DiskSuite

Solstice DiskSuite software requires some additional space on the multihost disks and imposes some restrictions on its use. For example, if you are using UNIX file system (UFS) logging under Solstice DiskSuite, one to two percent of each multihost disk must be reserved for metadevice state database replicas and UFS logging. Refer to Appendix B, Configuring Solstice DiskSuite, and to the Solstice DiskSuite documentation for specific guidelines and restrictions.

All metadevices used by each shared diskset are created in advance, at reconfiguration boot time, based on settings found in the md.conf file. The fields in md.conf file are described in the Solstice DiskSuite documentation. The two fields that are used in the Sun Cluster configuration are md_nsets and nmd. The md_nsets field defines the number of disksets and the mnd field defines the number of metadevices to create for each diskset. You should set these fields at install time to allow for all predicted future expansion of the cluster.

Extending the Solstice DiskSuite configuration after the cluster is in production is time consuming because it requires a reconfiguration reboot for each node and always carries the risk that there will not be enough space allocated in the root (/) file system to create all of the requested devices.

The value of md_nsets must be set to the expected number of logical hosts in the cluster, plus one to allow Solstice DiskSuite to manage the private disks on the local host (that is, those metadevices that are not in the local diskset).

The value of nmd must be set to the predicted largest number of metadevices used by any one of the disksets in the cluster. For example, if a cluster uses 10 metadevices in its first 15 disksets, but 1000 metadevices in the 16th diskset, nmd must be set to at least 1000.


Caution - Caution -

All cluster nodes (or cluster pairs in the cluster pair topology) must have identical md.conf files, regardless of the number of logical hosts served by each node. Failure to follow this guideline can result in serious Solstice DiskSuite errors and possible loss of data.


Consider these points when planning your Solstice DiskSuite file system layout:

Table 2-3 Solstice DiskSuite disk partitioning

Slice 

Description 

2 Mbytes, reserved for Solstice DiskSuite 

UFS logs 

Remainder of the disk 

Overlaps Slice 6 and 0 


Note -

The overlap of Slices 6 and 0 by Slice 2 is used for raw devices where there are no UFS logs.


In addition, the first drive on each of the first two controllers in each of the disksets should be partitioned as described in Table 2-4.

Table 2-4 Multihost Disk Partitioning for the First Drive on the First Two Controllers

Slice 

Description 

2 Mbytes, reserved for Solstice DiskSuite 

2 Mbytes, UFS log for HA administrative file systems 

9 Mbytes, UFS master for HA administrative file systems 

UFS logs 

Remainder of the disk 

Overlaps Slice 6 and 0 

Partition 7 is always reserved for use by Solstice DiskSuite as the first or last 2 Mbytes on each multihost disk.

2.2.8.2 File Systems With VERITAS VxFS

You can create UNIX File System (UFS) or Veritas File System (VxFS) file systems in the disk groups of logical hosts. When a logical host is mastered on a cluster node, the file systems associated with the disk groups of the logical host are mounted on the specified mount points of the mastering node.

When you reconfigure logical hosts, Sun Cluster must check the file systems before mounting them, by running the fsck command. Even though the fsck command checks the UFS file systems in non-interactive parallel mode on UFS file systems, this still consumes some time, and this affects the reconfiguration process. VxFS drastically cuts down on the file system check time, especially if the configuration contains large file systems (greater than 500 Mbytes) used for data services.

When setting up mirrored volumes, always add a Dirty Region Log (DRL) to decrease volume recovery time in the event of a system crash. When mirrored volumes are used in clusters, DRL must be assigned for volumes greater than 500 Mbytes.

With SSVM and CVM, it is important to estimate the maximum number of volumes that will be used by any given disk group at the time the disk group is created. If the number is less than 1000, default minor numbering can be used. Otherwise, you must carefully plan the way in which minor numbers are assigned to disk group volumes. It is important that no two disk groups shared by the same nodes have overlapping minor number assignments.

As long as default numbering can be used and all disk groups are currently imported, it is not necessary to use the minor option to the vxdg init command at disk group creation time. Otherwise, the minor option must be used to prevent overlapping the volume minor number assignments. It is possible to modify the minor numbering later, but doing so might require you to reboot and import the disk group again. Refer to the vxdg(1M) man page for details.

2.2.8.3 Mount Information

The /etc/vfstab file contains the mount points of file systems residing on local devices. For a multihost file system used for a logical host, all the nodes that can potentially master the logical host should possess the mount information.

The mount information for a logical host's file system is kept in a separate file on each node, named /etc/opt/SUNWcluster/conf/hanfs/vfstab.logicalhost. The format of this file is identical to the /etc/vfstab file for ease of maintenance, though not all fields are used.


Note -

You must keep the vfstab.loghostname file consistent among all nodes of the cluster. Use the rcp command or file transfer protocol (FTP) to copy the file to the other nodes of the cluster. Alternately, simultaneously edit the file by using crlogin or ctelnet.


The same file system cannot be mounted by more than one node at the same time, because a file system can be mounted only if the corresponding disk group has been imported by the node. The consistency and uniqueness of the disk group imports and logical host mastery is enforced by the cluster framework logical host reconfiguration sequence.

2.2.8.4 Booting From a SPARCstorage Array

Sun Cluster supports booting from a private or shared disk inside a SPARCstorage Array.

Consider these points when using a boot disk in an SSA: