This chapter provides conceptual information about disk sets. For information about performing related tasks, see Chapter 20, Disk Sets (Tasks).
This chapter includes the following information:
A shared disk set, or simply disk set, is a set of disk drives that contain volumes and hot spares that can be shared exclusively but not at the same time by multiple hosts. Additionally, disk sets provide a separate namespace within which Solaris Volume Manager volumes can be managed.
A disk set supports data redundancy and data availability. If one host fails, another host can take over the failed host's disk set. (This type of configuration is known as a failover configuration.) Although each host can control the set of disks, only one host can control it at a time.
Disk sets are supported on both SPARC based and x86 based platforms.
Disk sets are intended, in part, for use with Sun Cluster, Solstice HA (High Availability), or another supported third-party HA framework. Solaris Volume Manager by itself does not provide all the functionality necessary to implement a failover configuration.
In addition to the shared disk set, each host has a local disk set. The local disk set consists of all of the disks on a host that are not in a shared disk set. A local disk set belongs exclusively to a specific host. The local disk set contains the state database for that specific host's configuration.
Volumes and hot spare pools in a shared disk set must be built on drives from within that disk set. Once you have created a volume within the disk set, you can use the volume just as you would a physical slice. However, disk sets do not support mounting file systems from the /etc/vfstab file.
A file system that resides on a volume in a disk set cannot be mounted automatically at boot with the /etc/vfstab file. The necessary disk set RPC daemons (rpc.metad and rpc.metamhd) do not start early enough in the boot process to permit this. Additionally, the ownership of a disk set is lost during a reboot.
Similarly, volumes and hot spare pools in the local disk set can consist only of drives from within the local disk set.
When you add disks to a disk set, Solaris Volume Manager automatically creates the state database replicas on the disk set. When a drive is accepted into a disk set, Solaris Volume Manager might repartition the drive so that the state database replica for the disk set can be placed on the drive (see Automatic Disk Partitioning).
Unlike local disk set administration, you do not need to manually create or delete disk set state databases. Solaris Volume Manager places one state database replica (on slice 7) on each drive across all drives in the disk set, up to a maximum of 50 total replicas in the disk set.
Although disk sets are supported in single-host configurations, they are often not appropriate for “local” (not dual-connected) use. Two common exceptions are the use of disk sets to provide a more managable namespace for logical volumes, and to more easily manage storage on a Storage Area Network (SAN) fabric (see Scenario—Disk Sets).
When you add a new disk to a disk set, Solaris Volume Manager checks the disk format and, if necessary, repartitions the disk to ensure that the disk has an appropriately configured slice 7 with adequate space for a state database replica. The precise size of slice 7 depends on the disk geometry, but it will be no less than 4 Mbytes, and probably closer to 6 Mbytes (depending on where the cylinder boundaries lie).
The minimal size for slice seven will likely change in the future, based on a variety of factors, including the size of the state database replica and information to be stored in the state database replica.
For use in disk sets, disks must have a slice seven that meets these criteria:
Starts at sector 0
Includes enough space for disk label and state database replicas
Cannot be mounted
Does not overlap with any other slices, including slice two
After you add a drive to a disk set, you may repartition it as necessary, with the exception that slice 7 is not altered in any way.
The minimum size for slice seven is variable, based on disk geometry, but is always equal to or greater than 4MB.
The following output from the prtvtoc command shows a disk before it is added to a disk set.
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0 * /dev/rdsk/c1t6d0s0 partition map * * Dimensions: * 512 bytes/sector * 133 sectors/track * 27 tracks/cylinder * 3591 sectors/cylinder * 4926 cylinders * 4924 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 0 4111695 4111694 1 3 01 4111695 1235304 5346998 2 5 01 0 17682084 17682083 3 0 00 5346999 4197879 9544877 4 0 00 9544878 4197879 13742756 5 0 00 13742757 3939327 17682083 |
If you have disk sets that you upgraded from Solstice DiskSuite software, the default state database replica size on those sets will be 1034 blocks, not the 8192 block size from Solaris Volume Manager. Also, slice 7 on the disks that were added under Solstice DiskSuite will be correspondingly smaller than slice 7 on disks that were added under Solaris Volume Manager.
After you add the disk to a disk set, the output of prtvtoc looks like the following:
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0 * /dev/rdsk/c1t6d0s0 partition map * * Dimensions: * 512 bytes/sector * 133 sectors/track * 27 tracks/cylinder * 3591 sectors/cylinder * 4926 cylinders * 4924 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 0 00 10773 17671311 17682083 7 0 01 0 10773 10772 [root@lexicon:apps]$ |
Disk set component names are similar to other Solaris Volume Manager component names, but the disk set name is part of the name.
Volume path names include the disk set name after /dev/md/ and before the actual volume name in the path.
The following table shows some example disk set volume names.
Table 19–1 Example Volume Names
/dev/md/blue/dsk/d0 |
Block volume d0 in disk set blue |
/dev/md/blue/dsk/d1 |
Block volume d1 in disk set blue |
/dev/md/blue/rdsk/d126 |
Raw volume d126 in disk set blue |
/dev/md/blue/rdsk/d127 |
Raw volume d127 in disk set blue |
Figure 19–1 shows an example configuration that uses two disk sets.
In this configuration, Host A and Host B share disk sets A and B. They each have their own local disk set, which is not shared. If Host A fails, Host B can take over control of Host A's shared disk set (Disk set A). Likewise, if Host B fails, Host A can take control of Host B's shared disk set (Disk set B).
When working with disk sets, consider the following Background Information for Disk Sets and Administering Disk Sets.
Solaris Volume Manager must be configured on each host that will be connected to the disk set.
Each host must have its local state database set up before you can create disk sets.
To create and work with a disk set in a clustering environment, root must be a member of Group 14, or the /.rhosts file must contain an entry for the other host name (on each host).
To perform maintenance on a disk set, a host must be the owner of the disk set or have reserved the disk set. (A host takes implicit ownership of the disk set by putting the first drives into the set.)
You cannot add a drive that is in use to a disk set. Before you add a drive, make sure it is not currently being used for a file system, database, or any other application.
Do not add a drive with existing data that you want to preserve to a disk set. The process of adding the disk to the disk set repartitions the disk and destroys existing data.
All disks that you plan to share between hosts in the disk set must be connected to each host and must have the exact same path, driver, and name on each host. Specifically, a shared disk drive must be seen on both hosts at the same device number (c#t#d#). If the numbers are not the same on both hosts, you will see the message “drive c#t#d# is not common with host xxx” when attempting to add drives to the disk set. The shared disks must use the same driver name (ssd). See How to Add Drives to a Disk Set for more information on setting up shared disk drives in a disk set.
The default total number of disk sets on a system is 4. You can increase this value up to 32 by editing the /kernel/drv/md.conf file, as described in How to Increase the Number of Default Disk Sets. The number of shared disk sets is always one less than the md_nsets value, because the local set is included in md_nsets.
Unlike local volume administration, it is not necessary to create or delete state database replicas manually on the disk set. Solaris Volume Manager tries to balance a reasonable number of replicas across all drives in a disk set.
When drives are added to a disk set, Solaris Volume Manager re-balances the state database replicas across the remaining drives. Later, if necessary, you can change the replica layout with the metadb command.
Disk sets can be created and configured by using the Solaris Volume Manager command-line interface (the metaset command) or the Enhanced Storage tool within the Solaris Management Console.
After drives are added to a disk set, the disk set can be reserved (or taken) and released by hosts in the disk set. When a disk set is reserved by a host, the other host in the disk set cannot access the data on the drives in the disk set. To perform maintenance on a disk set, a host must be the owner of the disk set or have reserved the disk set. A host takes implicit ownership of the disk set by putting the first drives into the set.
Before a host can use drives in a disk set, the host must reserve the disk set. There are two methods of reserving a disk set:
Safely - When you safely reserve a disk set, Solaris Volume Manager attempts to take the disk set, and the other host attempts to release the disk set. The release (and therefore the reservation) might fail.
Forcibly - When you forcibly reserve a disk set, Solaris Volume Manager reserves the disk set whether or not another host currently has the set reserved. This method is generally used when a host in the disk set is down or not communicating. All disks within the disk set are taken over. The state database is read in on the host performing the reservation and the shared volumes configured in the disk set become accessible. If the other host had the disk set reserved at this point, it would panic due to reservation loss.
Normally, two hosts in a disk set cooperate with each other to ensure that drives in a disk set are reserved by only one host at a time. A normal situation is defined as both hosts being up and communicating with each other.
If a drive has been determined unexpectedly not to be reserved (perhaps because another host using the disk set forcibly took the drive), the host will panic. This behavior helps to minimize data loss which would occur if two hosts were to simultaneously access the same drive.
For more information about taking or reserving a disk set, see How to Take a Disk Set.
Releasing a disk set can be useful when you perform maintenance on the physical drives in the disk set. When a disk set is released, it cannot be accessed by the host. If both hosts in a disk set release the set, neither host in the disk set can access the drives in the disk set.
For more information about releasing a disk set, see How to Release a Disk Set.
The following example, drawing on the sample system shown in Chapter 4, Configuring and Using Solaris Volume Manager (Scenario), describes how disk sets should be used to manage storage that resides on a SAN (Storage Area Network) fabric.
Assume that the sample system has an additional controller that connects to a fiber switch and SAN storage. Storage on the SAN fabric is not available to the system as early in the boot process as other devices, such as SCSI and IDE disks, and Solaris Volume Manager would report logical volumes on the fabric as unavailable at boot. However, by adding the storage to a disk set, and then using the disk set tools to manage the storage, this problem with boot time availability is avoided (and the fabric-attached storage can be easily managed within a separate, disk set controlled, namespace from the local storage).