This chapter explains the overall structure of DiskSuite. Use the following table to proceed directly to the section that provides the information you need.
DiskSuite is a software product that enables you to manage large numbers of disks and the data on those disks. Although there are many ways to use DiskSuite, most tasks include:
Increasing storage capacity
Increasing data availability
In some instances, DiskSuite can also improve I/O performance.
DiskSuite uses virtual disks to manage physical disks and their associated data. In DiskSuite, a virtual disk is called a metadevice.
A metadevice is functionally identical to a physical disk in the view of an application. DiskSuite converts I/O requests directed at a metadevice into I/O requests to the underlying member disks.
DiskSuite's metadevices are built from slices (disk partitions). An easy way to build metadevices is to use the graphical user interface, DiskSuite Tool, that comes with DiskSuite. DiskSuite Tool presents you with a view of all the slices available to you. By dragging slices onto metadevice objects, you can quickly assign slices to metadevices. You can also build and modify metadevices using DiskSuite's command line utilities.
If, for example, you want to create more storage capacity, you could use DiskSuite to make the system treat a collection of many small slices as one larger slice or device. After you have created a large metadevice from these slices, you can immediately begin using it just as any "real" slice or device.
For a more detailed discussion of metadevices, see "Metadevices".
DiskSuite can increase the reliability and availability of data by using mirrors (copied data) and RAID5 metadevices. DiskSuite`s hot spares can provide another level of data availability for mirrors and RAID5 metadevices.
Once you have set up your configuration, you can use DiskSuite Tool to report on its operation. You can also use DiskSuite`s SNMP trap generating daemon to work with a network monitoring console to automatically receive DiskSuite error messages.
DiskSuite Tool is a graphical user interface for setting up and administering a DiskSuite configuration. The command to start DiskSuite Tool is:
# metatool & |
DiskSuite Tool provides a graphical view of DiskSuite objects--metadevices, hot spare pools, and the MetaDB object for the metadevice state database. DiskSuite Tool uses drag and drop manipulation of DiskSuite objects, enabling you to quickly configure your disks or change an existing configuration.
DiskSuite Tool provides graphical views of both physical devices and metadevices, helping simplify storage administration. You can also perform tasks specific to administering SPARCstorageTM Arrays using DiskSuite Tool.
However, DiskSuite Tool cannot perform all DiskSuite administration tasks. You must use the command line interface for some operations (for example, creating and administering disksets).
To learn more about using DiskSuite Tool, refer to Chapter 4, DiskSuite Tool.
Listed here are all the commands you can use to administer DiskSuite. For more detailed information, see the man pages.
Table 1-1 Command Line Interface Commands
DiskSuite Command |
Description |
---|---|
Expands a UFS file system in a non-destructive fashion. |
|
The mdlogd daemon and mdlogd.cf configuration file enable DiskSuite to send generic SNMP trap messages. |
|
Deletes active metadevices and hot spare pools. |
|
Creates and deletes state database replicas. |
|
Detaches a metadevice from a mirror, or a logging device from a trans metadevice. |
|
Manages hot spares and hot spare pools. |
|
Configures metadevices. |
|
Places submirrors offline. |
|
Places submirrors online. |
|
Modifies metadevice parameters. |
|
Renames and switches metadevice names. |
|
Replaces slices of submirrors and RAID5 metadevices. |
|
Sets up system files for mirroring root (/). |
|
Administers disksets. |
|
Displays status for metadevices or hot spare pools. |
|
Resyncs metadevices during reboot. |
|
Runs the DiskSuite Tool graphical user interface. |
|
Attaches a metadevice to a mirror, or a logging device to a trans metadevice. |
The three basic types of objects that you create with DiskSuite are metadevices, state database replicas, and hot spare pools. Table 1-2 gives an overview of these DiskSuite objects.
Table 1-2 Summary of DiskSuite Objects
DiskSuite Object |
What Is It? |
Why Use It? |
For More Information, Go To ... |
---|---|---|---|
A group of physical slices that appear to the system as a single, logical device |
To increase storage capacity and increase data availability. | ||
A database that stores information on disk about the state of your DiskSuite configuration |
DiskSuite cannot operate until you have created the metadevice state database replicas. | ||
A collection of slices (hot spares) reserved to be automatically substituted in case of slice failure in either a submirror or RAID5 metadevice |
To increase data availability for mirrors and RAID5 metadevices. |
DiskSuite Tool, DiskSuite's graphical user interface, also refers to the graphical representation of metadevices, the metadevice state database, and hot spare pools as "objects."
A metadevice is a name for a group of physical slices that appear to the system as a single, logical device. Metadevices are actually pseudo, or virtual, devices in standard UNIX terms.
You create a metadevice by using concatenation, striping, mirroring, RAID level 5, or UFS logging. Thus, the types of metadevices you can create are concatenations, stripes, concatenated stripes, mirrors, RAID5 metadevices, and trans metadevices.
DiskSuite uses a special driver, called the metadisk driver, to coordinate I/O to and from physical devices and metadevices, enabling applications to treat a metadevice like a physical device. This type of driver is also called a logical, or pseudo, driver.
You can use either the DiskSuite Tool graphical user interface or the command line utilities to create and administer metadevices.
Table 1-3 summarizes the types of metadevices:
Table 1-3 Types of Metadevices
Metadevice |
Description |
---|---|
Can be used directly, or as the basic building blocks for mirrors and trans devices. There are three types of simple metadevices: stripes, concatenations, and concatenated stripes. Simple metadevices consist only of physical slices. By themselves, simple metadevices do not provide data redundancy. |
|
Replicates data by maintaining multiple copies. A mirror is composed of one or more simple metadevices called submirrors. |
|
Replicates data by using parity information. In the case of missing data, the missing data can be regenerated using available data and the parity information. A RAID5 metadevice is composed of slices. One slice's worth of space is allocated to parity information, but it is distributed across all slices in the RAID5 metadevice. |
|
Used to log a UFS file system. A trans metadevice is composed of a master device and a logging device. Both of these devices can be a slice, simple metadevice, mirror, or RAID5 metadevice. The master device contains the UFS file system. |
You use metadevices to increase storage capacity and data availability. In some instances, metadevices can also increase I/O performance. Functionally, metadevices behave the same way as slices. Because metadevices look like slices, they are transparent to end users, applications, and file systems. Like physical devices, metadevices are accessed through block or raw device names. The metadevice name changes, depending on whether the block or raw device is used. See "Metadevice Conventions" for details about metadevice names.
You can use most file systems commands (mount(1M), umount(1M), ufsdump(1M), ufsrestore(1M),and so forth) on metadevices. You cannot use the format(1M) command, however. You can read, write, and copy files to and from a metadevice, as long as you have a file system mounted on the metadevice.
SPARC and x86 systems can create metadevices on the following disk drives:
SPARC - IPI, SCSI devices, and SPARCStorage Array drives
x86 - SCSI and IDE devices
Metadevice names begin with the letter "d" followed by a number (for example, d0 as shown in Table 1-4).
What are the default metadevice names?
DiskSuite has 128 default metadevice names from 0-127. Table 1-4 shows some example metadevice names.
/dev/md/dsk/d0 |
Block metadevice d0 |
/dev/md/dsk/d1 |
Block metadevice d1 |
/dev/md/rdsk/d126 |
Raw metadevice d126 |
/dev/md/rdsk/d127 |
Raw metadevice d127 |
Can metadevice names be abbreviated?
Yes. Instead of specifying the full metadevice name, such as /dev/md/dsk/d1, you can use d1. You can use either the command line interface or DiskSuite Tool to name metadevices.
What is the maximum number of metadevices possible?
1024 (though the default number of metadevices is 128). You can increase the number of default metadevices by editing the /kernel/drv/md.conf file. See "System and Startup Files" for more information on this file.
Where are metadevice names stored?
Like physical slices, metadevices have logical names which appear in the file system. Logical metadevice names have entries in /dev/md/dsk (for block devices) and /dev/md/rdsk (for raw devices).
Can metadevices be renamed?
Yes. DiskSuite enables you to rename a metadevice at any time, as long as the name being used is not in use by another metadevice, and as long as the metadevice itself is not in use. For a file system, make sure it is not mounted or being used as swap. Other applications using the raw device, such as a database, should have their own way of stopping access to the data.
You can use either DiskSuite Tool (via a metadevice's Information window) or the command line (the metarename(1M) command) to rename metadevices.
The metarename(1M) command with the -x option can "switch" metadevices that have a parent-child relationship. Refer to Solstice DiskSuite 4.2.1 User's Guide for procedures to rename and switch metadevices.
Figure 1-1 shows a metadevice "containing" two slices, one each from Disk A and Disk B. An application or UFS will treat the metadevice as if it were one physical disk. Adding more slices to the metadevice will increase its capacity.
A metadevice state database (often simply called the state database) is a database that stores information on disk about the state of your DiskSuite configuration. The metadevice state database records and tracks changes made to your configuration. DiskSuite automatically updates the metadevice state database when a configuration or state change occurs. Creating a new metadevice is an example of a configuration change. A submirror failure is an example of a state change.
The metadevice state database is actually a collection of multiple, replicated database copies. Each copy, referred to as a state database replica, ensures that the data in the database is always valid. Having copies of the metadevice state database protects against data loss from single points-of-failure. The metadevice state database tracks the location and status of all known state database replicas.
DiskSuite cannot operate until you have created the metadevice state database and its state database replicas. It is necessary that a DiskSuite configuration have an operating metadevice state database.
When you set up your configuration, you have two choices for the location of state database replicas. You can place the state database replicas on dedicated slices. Or you can place the state database replicas on slices that will later become part of metadevices. DiskSuite recognizes when a slice contains a state database replica, and automatically skips over the portion of the slice reserved for the replica if the slice is used in a metadevice. The part of a slice reserved for the state database replica should not be used for any other purpose.
You can keep more than one copy of a metadevice state database on one slice, though you may make the system more vulnerable to a single point-of-failure by doing so.
The state database replicas ensure that the data in the metadevice state database is always valid. When the metadevice state database is updated, each state database replica is also updated. The updates take place one at a time (to protect against corrupting all updates if the system crashes).
If your system loses a state database replica, DiskSuite must figure out which state database replicas still contain non-corrupted data. DiskSuite determines this information by a majority consensus algorithm. This algorithm requires that a majority (half + 1) of the state database replicas be available before any of them are considered non-corrupt. It is because of this majority consensus algorithm that you must create at least three state database replicas when you set up your disk configuration. A consensus can be reached as long as at least two of the three state database replicas are available.
To protect data, DiskSuite will not function if a majority (half + 1) of all state database replicas is not available. The algorithm, therefore, ensures against corrupt data.
The majority consensus algorithm guarantees the following:
The system will stay running with exactly half or more state database replicas.
The system will panic if more than half the state database replicas are not available.
The system will not reboot without one more than half the total state database replicas.
When the number of state database replicas is odd, DiskSuite computes the majority by dividing the number in half, rounding down to the nearest integer, then adding 1 (one). For example, on a system with seven replicas, the majority would be four (seven divided by two is three and one-half, rounded down is three, plus one is four).
During booting, DiskSuite ignores corrupted state database replicas. In some cases DiskSuite tries to rewrite state database replicas that are bad. Otherwise they are ignored until you repair them. If a state database replica becomes bad because its underlying slice encountered an error, you will need to repair or replace the slice and then enable the replica.
If all state database replicas are lost, you could, in theory, lose all data that is stored on your disks. For this reason, it is good practice to create enough state database replicas on separate drives and across controllers to prevent catastrophic failure. It is also wise to save your initial DiskSuite configuration information, as well as your disk partition information.
Refer to Solstice DiskSuite 4.2.1 User's Guide for information on adding additional state database replicas to the system, and on recovering when state database replicas are lost.
What is the size of a state database replica?
By default, 517 Kbytes or 1034 disk blocks of a slice. Because your disk slices may not be that small, you may want to resize a slice to hold the state database replica. (See Solstice DiskSuite 4.2.1 User's Guide for more information on resizing a slice.)
What are the minimum number of state database replicas required?
Three (3), preferably spread out across at least three disks (to avoid a single point-of-failure). DiskSuite does not operate with less than a majority.
What are the maximum number of state database replicas possible?
50.
Where are state database replicas created?
You can create state database replicas on slices not in use.
You cannot create state database replicas on existing file systems, root (/), /usr, and swap. If necessary, you can create a new slice (provided a slice name is available) by allocating space from swap and put state database replicas on that new slice. See Solstice DiskSuite 4.2.1 User's Guide for more information.
Can I create a state database replica on a slice that will be part of a metadevice?
Yes, but you must create it before adding the slice to the metadevice. You can also create a state database replica on a logging device. DiskSuite reserves the starting part of the slice for the state database replica.
Can I place more than one state database replica on a single disk drive?
In general, it is best to distribute state database replicas across slices, drives, and controllers, to avoid single points-of-failure.
If you have two disks, create two state database replicas on each disk.
What happens if a slice that contains a state database replica becomes errored?
The rest of your configuration should remain in operation. DiskSuite finds a good state database (as long as there are at least half + 1 valid state database replicas).
What happens when state database replicas are repaired?
When you manually repair or enable state database replicas, DiskSuite updates them with valid data.
A hot spare pool is a collection of slices (hot spares) reserved by DiskSuite to be automatically substituted in case of a slice failure in either a submirror or RAID5 metadevice. Hot spares provide increased data availability for mirrors and RAID5 metadevices. You can create a hot spare pool with either DiskSuite Tool or the command line interface.
When errors occur, DiskSuite checks the hot spare pool for the first available hot spare whose size is equal to or greater than the size of the slice being replaced. If found, DiskSuite automatically resyncs the data. If a slice of adequate size is not found in the list of hot spares, the submirror or RAID5 metadevice that failed is considered errored. For more information, see Chapter 3, Hot Spare Pools.
DiskSuite enables you to expand a metadevice by adding additional slices.
Mounted or unmounted UFS file systems contained within a metadevice can be expanded without having to halt or back up your system. (Nevertheless, backing up your data is always a good idea.) After the metadevice is expanded, you grow the file system with the growfs(1M) command.
After a file system is expanded, it cannot be decreased. Decreasing the size of a file system is a UFS limitation.
Applications and databases using the raw metadevice must have their own method to "grow" the added space so that the application or database can recognize it. DiskSuite does not provide this capability.
You can expand the disk space in metadevices in the following ways:
Adding a slice to a stripe or concatenation.
Adding multiple slices to a stripe or concatenation.
Adding a slice or multiple slices to all submirrors of a mirror.
You can use either DiskSuite Tool or the command line interface to add a slice to an existing metadevice.
When using DiskSuite Tool to expand a metadevice that contains a UFS file system, the growfs(1M) command is run automatically. If you use the command line to expand the metadevice, you must manually run the growfs(1M) command.
The growfs(1M) command expands a UFS file system without loss of service or data. H`owever, write-access to the metadevice is suspended while the growfs(1M) command is running. You can expand the file system to the size of the slice or the metadevice that contains the file system.
The file system can be expanded to use only part of the additional disk space by using the -s size option to the growfs(1M) command.
When expanding a mirror, space is added to the mirror's underlying submirrors. Likewise, when expanding a trans metadevice, space is added to the master device. The growfs(1M) command is then run on the mirror or the trans metadevice, respectively. The general rule is that space is added to the underlying devices(s), and the growfs(1M) command is run on the top-level device.
This section explains the files necessary for DiskSuite to operate correctly. For the most part, you do not have to worry about these files because DiskSuite accesses (updates) them automatically (with the exception of md.tab).
A file that records the locations of state database replicas. When state database replica locations change, DiskSuite makes an entry in the mddb.cf file that records the locations of all state databases. Similar information is entered into the /etc/system file.
An input file that you can use along with the command line utilities metainit(1M), metadb(1M), and metahs(1M) to create metadevices, state database replicas, or hot spares. A metadevice, group of state database replicas, or hot spare may have an entry in this file.
The configuration information in the /etc/lvm/md.tab file may differ from the current metadevices, hot spares, and state database replicas in use. It is only used at metadevice creation time, not to recapture the DiskSuite configuration at boot.
A backup file of a "local" diskset's configuration. DiskSuite provides the md.cf file for recovery. When you change the DiskSuite configuration, DiskSuite automatically updates the md.cf file (except for hot sparing).
You should not directly edit either the mddb.cf or md.cf files.
DiskSuite uses this configuration file at startup. You can edit two fields in this file: nmd, which sets the number of metadevices that the configuration can support, and md_nsets, which is the number of disksets. The default value for nmd is 128, which can be increased to 1024. The default value for md_nsets is 4, which can be increased to 32. The total number of disksets is always one less than the md_nsets value, because the local set is included in md_nsets.
DiskSuite uses this file to control the behavior of the DiskSuite mdlogd SNMP trap generating daemon. It is an editable ASCII file that specifies where the SNMP trap data should be sent when the DiskSuite driver detects a specified condition.
For automatic reloading of metadevice configuration at boot.
For automatic resyncing of metadevices.
For more information on DiskSuite system files, refer to the man pages.
A shared diskset, or simply diskset, is a set of shared disk drives containing metadevices and hot spares that can be shared exclusively but not at the same time by two hosts. Currently, disksets are only supported on SPARCstorage Array disks.
A diskset provides for data redundancy and availability. If one host fails, the other host can take over the failed host's diskset. (This type of configuration is known as a failover configuration.)
For more information, see Chapter 5, Disksets.