This chapter provides conceptual information about disk sets. For information about performing related tasks, see Chapter 21, Disk Sets (Tasks).
This chapter includes the following information:
A disk set is a set of physical storage volumes that contain logical volumes and hot spares. Volumes and hot spare pools must be built on drives from within that disk set. Once you have created a volume within the disk set, you can use the volume just as you would use a physical slice. You can use the volume to create and to mount a file system and to store data.
Disk sets are supported on both SPARC and x86 based platforms.
This section discusses the different types of disk sets available in Solaris Volume Manager.
Each host has a local disk set. The local disk set consists of all the disks on a host that are not in a named disk set. A local disk set belongs exclusively to a specific host. The local disk set contains the state database for that specific host's configuration. Volumes and hot spare pools in the local disk set consist only of drives from within the local disk set.
In addition to local disk sets, hosts can participate in named disk sets. A named disk set is any disk set that is not in the local disk set. You can implement the following types of named disk sets to manage volumes, depending on the configuration of your system:
A shared disk set can be shared by multiple hosts. Although a shared disk set is visible from all the participating hosts, only the owner of the disk set can access it. Each host can control a shared disk set, but only one host can control it at a time. Additionally, shared disk sets provide a distinct namespace within which the volume is managed.
A shared disk set supports data redundancy and data availability. If one host fails, another host can take over the failed host's disk set. This type of configuration is known as a failover configuration.
Shared disk sets are intended, in part, for use with Sun Cluster or another supported High Availability (HA) framework. Solaris Volume Manager by itself does not provide all the functionality necessary to implement a failover configuration.
Before the autotake feature became available in the Solaris 9 4/04 release, Solaris Volume Manager did not support the automatic mounting of file systems on disk sets through the /etc/vfstab file. Solaris Volume Manager required the system administrator to manually issue a disk set take command by using the metaset -s setname -t command before the file systems on the disk set could be accessed.
With the autotake feature enabled, you can set a disk set to be automatically taken by a host at boot time. This feature allows you to define the mount options in the /etc/vfstab file for file systems on volumes in the enabled disk set.
Only single-host disk sets support the autotake feature. The autotake feature requires that the disk set is not shared with any other hosts. A disk set that is shared cannot be set to use the autotake feature. If the autotake feature is enabled on a shared disk set, the metaset -A command fails. However, after other hosts are removed from the disk set, the autotake feature can be enabled on the single-host disk set. Similarly, an autotake disk set cannot have other hosts added to it. However, if the autotake feature is disabled, additional hosts can then be added to the disk set.
In a Sun Cluster environment, the autotake feature is disabled. Sun Cluster handles the take and release of a disk set.
For more information on the autotake feature, see the -A option description in metaset(1M).
Starting with the Solaris 9 9/04 release, Solaris Volume Manager can manage storage in a Sun Cluster environment using multi-owner disk sets. Multi-owner disk sets allow multiple nodes in a cluster to share the ownership of disk sets and to simultaneously access the shared disks. All disks and volumes in a multi-owner disk set can be directly accessed by all the nodes in a cluster. Each multi-owner disk set contains a list of the nodes that have been added to the disk set. Consequently, each multi-owner disk set within a cluster configuration can have a different (and sometimes overlapping) set of nodes.
Each multi-owner disk set has a master node. The function of the master node is to manage and update the state database replica changes. Since there is a master node per disk set, multiple masters can exist simultaneously. There are two ways that the master is chosen. The first way is that a node becomes the master if it is the first node to add a disk to the disk set. The second way is when a master node panics and fails. The node with the lowest node id becomes the master node.
Multi-owner disk sets work with Sun Cluster and with applications such as Oracle9i Real Application Clusters. For information about compatible releases of Sun Cluster, see http://wwws.sun.com/software/cluster. For more information on Solaris Volume Manager for Sun Cluster, see Chapter 4, Solaris Volume Manager for Sun Cluster (Overview).
Unlike local disk set administration, you do not need to manually create or delete state database replicas for named disk sets. Solaris Volume Manager places one state database replica (on slice 7) on each disk across all disks in the disk set, up to a maximum of 50 replicas in the disk set.
When you add disks to a disk set, Solaris Volume Manager automatically creates the state database replicas on the disk set. When a disk is accepted into a disk set, Solaris Volume Manager might repartition the disk so that the state database replica for the disk set can be placed on the disk (see Automatic Disk Partitioning).
A file system that resides on a volume in a named disk set is not mounted automatically at boot time with the /etc/vfstab file unless the disk set is an autotake enabled disk set. The necessary Solaris Volume Manager RPC daemons (rpc.metad rpc.metamedd, and rpc.metamhd) do not start early enough in the boot process.
Do not disable the Solaris Volume Manager RPC daemons in the /etc/inetd.conf file. They are configured to start by default. These daemons must remain enabled to allow Solaris Volume Manager to use its full functionality.
Additionally, when a system is rebooted, the ownership of a named disk set is lost unless the disk set is an autotake enabled disk set.For more information on the autotake feature, see Autotake Disk Sets.
Although disk sets are supported in single-host configurations, they are often not appropriate for “local” (not dual-connected) use. Two common exceptions are the use of disk sets to provide a more manageable namespace for logical volumes, and to more easily manage storage on a Storage Area Network (SAN) fabric (see Scenario—Disk Sets).
Disk sets can be created and configured by using the Solaris Volume Manager command-line interface (the metaset command) or the Enhanced Storage tool within the Solaris Management Console.
After disks are added to a shared disk set, the disk set can be reserved (or taken) and released by hosts in the disk set. When a disk set is reserved by a host, the other hosts in the disk set can read but cannot write data on the disks in the disk set. To perform maintenance on a disk set, a host must be the owner of the disk set or have reserved the disk set. A host takes implicit ownership of the disk set by putting the first disk into the set.
Disk sets, including disk sets created on a different system, can be imported into existing Solaris Volume Manager configurations using the metaimport command.
Before a host can use the disks in a disk set, the host must reserve the disk set. There are two methods of reserving a disk set:
Safely - Before another host can reserve a disk set safely, the host that currently has the disk set reserved must release the disk set. If a host attempts to take the disk set before the other host attempts to release the disk set, the release (and therefore the reservation) fails.
Forcibly - When you forcibly reserve a disk set, Solaris Volume Manager reserves the disk set whether or not another host currently has the set reserved. This method is generally used when a host in the disk set is down or not communicating. All disks within the disk set are taken over. The state database is read in on the host performing the reservation and the shared volumes configured in the disk set become accessible. If the other host had the disk set reserved at this point, it would panic due to reservation loss.
Normally, two hosts in a disk set cooperate with each other to ensure that the disks in a disk set are reserved by only one host at a time. A normal situation is defined as both hosts being up and communicating with each other.
If a disk has been determined unexpectedly not to be reserved (perhaps because another host using the disk set forcibly took the disk), the host will panic. This behavior helps to minimize data loss which would occur if two hosts were to simultaneously access the same disk.
For more information about taking or reserving a disk set, see How to Take a Disk Set.
Releasing a disk set can be useful when you perform maintenance on the physical disks in the disk set. When a disk set is released, it cannot be accessed by the host. If both hosts in a disk set release the set, neither host in the disk set can access the disks in the disk set.
For more information about releasing a disk set, see How to Release a Disk Set.
The metaimport command enables you to import disk sets into existing Solaris Volume Manager configurations that have device ID support in the disk set. You can also use the metaimport command to report on disk sets that are available for import.
The metaimport command does not import a disk in a disk set if the disk does not contain a volume or a state database replica. When you import a disk set to another system, you might find that a disk is missing from the disk set. This scenario occurs if a volume or a state database replica have not been added to the disk or have been deleted from the disk. For example, maximum of 50 state database replicas are allowed per Solaris Volume Manager disk set. If you have 60 disks in a disk set, the 10 disks that do not contain a state database replica must contain a volume in order to be imported with the disk set.
For tasks associated with importing a disk set, see Importing Disk Sets.
When you add a new disk to a disk set, Solaris Volume Manager checks the disk format and, if necessary, repartitions the disk to ensure that the disk has an appropriately configured slice 7 with adequate space for a state database replica. The precise size of slice 7 depends on the disk geometry, but it will be no less than 4 Mbytes, and probably closer to 6 Mbytes (depending on where the cylinder boundaries lie).
The minimal size for slice seven will likely change in the future, based on a variety of factors, including the size of the state database replica and information to be stored in the state database replica.
For use in disk sets, disks must have a slice seven that meets these criteria:
Starts at sector 0
Includes enough space for disk label and state database replicas
Cannot be mounted
Does not overlap with any other slices, including slice two
If the existing partition table does not meet these criteria, Solaris Volume Manager will repartition the disk. A small portion of each disk is reserved in slice 7 for use by Solaris Volume Manager. The remainder of the space on each disk is placed into slice 0. Any existing data on the disks is lost by repartitioning.
After you add a disk to a disk set, you may repartition it as necessary, with the exception that slice 7 is not altered in any way.
The minimum size for slice seven is variable, based on disk geometry, but is always equal to or greater than 4MB.
The following output from the prtvtoc command shows a disk before it is added to a disk set.
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0 * /dev/rdsk/c1t6d0s0 partition map * * Dimensions: * 512 bytes/sector * 133 sectors/track * 27 tracks/cylinder * 3591 sectors/cylinder * 4926 cylinders * 4924 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 0 4111695 4111694 1 3 01 4111695 1235304 5346998 2 5 01 0 17682084 17682083 3 0 00 5346999 4197879 9544877 4 0 00 9544878 4197879 13742756 5 0 00 13742757 3939327 17682083 |
If you have disk sets that you upgraded from Solstice DiskSuite software, the default state database replica size on those sets will be 1034 blocks, not the 8192 block size from Solaris Volume Manager. Also, slice 7 on the disks that were added under Solstice DiskSuite will be correspondingly smaller than slice 7 on disks that were added under Solaris Volume Manager.
After you add the disk to a disk set, the output of prtvtoc looks like the following:
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0 * /dev/rdsk/c1t6d0s0 partition map * * Dimensions: * 512 bytes/sector * 133 sectors/track * 27 tracks/cylinder * 3591 sectors/cylinder * 4926 cylinders * 4924 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 0 00 10773 17671311 17682083 7 0 01 0 10773 10772 [root@lexicon:apps]$ |
If disks you add to a disk set have acceptable slice 7s (that start at cylinder 0 and that have sufficient space for the state database replica), they will not be reformatted.
Disk set component names are similar to other Solaris Volume Manager component names, but the disk set name is part of the name.
Volume path names include the disk set name after /dev/md/ and before the actual volume name in the path.
The following table shows some example disk set volume names.
/dev/md/blue/dsk/d0 |
Block volume d0 in disk set blue |
/dev/md/blue/dsk/d1 |
Block volume d1 in disk set blue |
/dev/md/blue/rdsk/d126 |
Raw volume d126 in disk set blue |
/dev/md/blue/rdsk/d127 |
Raw volume d127 in disk set blue |
Similarly, hot spare pools have the disk set name as part of the hot spare name.
Figure 20–1 shows an example configuration that uses two disk sets.
In this configuration, Host A and Host B share disk sets red and blue. They each have their own local disk set, which is not shared. If Host A fails, Host B can take over control of Host A's shared disk set (Disk set red). Likewise, if Host B fails, Host A can take control of Host B's shared disk set (Disk set blue).
When working with disk sets, consider the following guidelines:
Solaris Volume Manager must be configured on each host that will be connected to the disk set.
Each host must have its local state database set up before you can create disk sets.
To create and work with a disk set in a clustering environment, root must be a member of Group 14 on all hosts, or the /.rhosts file on each host must contain an entry for the other host names associated with the disk set.
To perform maintenance on a disk set, a host must be the owner of the disk set or have reserved the disk set. A host takes implicit ownership of the disk set by putting the first disk into the disk set.
You cannot add a disk to a disk set that is in use for a file system, database or any other application. Before you add a disk, make sure that it is not currently being used.
Do not add to a disk set a disk containing existing data that you want to preserve. The process of adding the disk to the disk set repartitions the disk and destroys existing data.
The default total number of disk sets permitted on a system is 4. You can increase this value up to 32 by editing the /kernel/drv/md.conf file, as described in How to Increase the Number of Default Disk Sets. The number of shared disk sets is always one less than the md_nsets value, because the local disk set is included in md_nsets.
Unlike local volume administration, it is not necessary to manually create or delete state database replicas on the disk set. Solaris Volume Manager tries to balance a reasonable number of state database replicas across all disks in a disk set.
When disks are added to a disk set, Solaris Volume Manager rebalances the state database replicas across the remaining disks. Later, if necessary, you can change the replica layout with the metadb command.
In previous versions of Solaris Volume Manager, all of the disks that you planned to share between hosts in the disk set had to be connected to each host. They also had to have the exact same path, driver, and name on each host. Specifically, a shared disk drive had to be seen by both hosts in the same location (/dev/rdsk/c#t#d#). In addition, the shared disks had to use the same driver name (ssd).
In the current Solaris OS release, systems that have different views of commonly accessible storage can nonconcurrently share access to a disk set. With the introduction of device ID support for disk sets, Solaris Volume Manager automatically tracks disk movement within named disk sets.
Device ID support for disk sets is not supported in a Sun Cluster environment.
When you upgrade to the latest Solaris OS, you need to take the disk set once in order to enable disk tracking. For more information on taking a disk set, see How to Take a Disk Set.
If the autotake feature is not enabled, you have to take each disk set manually. If this feature is enabled, this step is done automatically when the system is rebooted. For more information on the autotake feature, see Autotake Disk Sets.
This expanded device ID support also enables you to import disk sets, even disk sets that were created on different systems. For more information on importing disk sets, see Importing a Disk Set.
The following example, drawing on the sample system shown in Chapter 5, Configuring and Using Solaris Volume Manager (Scenario), describes how disk sets should be used to manage storage that resides on a SAN (Storage Area Network) fabric.
Assume that the sample system has an additional controller that connects to a fiber switch and SAN storage. Storage on the SAN fabric is not available to the system as early in the boot process as other devices, such as SCSI and IDE disks, and Solaris Volume Manager would report logical volumes on the fabric as unavailable at boot. However, by adding the storage to a disk set, and then using the disk set tools to manage the storage, this problem with boot time availability is avoided (and the fabric-attached storage can be easily managed within a separate, disk set controlled, namespace from the local storage).