Solaris Volume Manager Administration Guide

Chapter 10 RAID-1 (Mirror) Volumes (Overview)

This chapter explains essential Solaris Volume Manager concepts related to mirrors and submirrors. For information about performing related tasks, see Chapter 11, RAID-1 (Mirror) Volumes (Tasks).

This chapter contains the following information:

Overview of RAID-1 (Mirror) Volumes

A RAID-1 volume, or mirror, is a volume that maintains identical copies of the data in RAID-0 (stripe or concatenation) volumes. The RAID-0 volumes that are mirrored are called submirrors. Mirroring requires an investment in disks. You need at least twice as much disk space as the amount of data you have to mirror. Because Solaris Volume Manager must write to all submirrors, mirroring can also increase the amount of time it takes for write requests to be written to disk.

After you configure a mirror, the mirror can be used just like a physical slice.

You can mirror any file system, including existing file systems. These file systems root (/), swap, and /usr. You can also use a mirror for any application, such as a database.


Tip –

Use Solaris Volume Manager's hot spare feature with mirrors to keep data safe and available. For information on hot spares, see Chapter 16, Hot Spare Pools (Overview) and Chapter 17, Hot Spare Pools (Tasks).


Overview of Submirrors

A mirror is composed of one or more RAID-0 volumes (stripes or concatenations) called submirrors.

A mirror can consist of up to four submirrors. However, two-way mirrors usually provide sufficient data redundancy for most applications and are less expensive in terms of disk drive costs. A third submirror enables you to make online backups without losing data redundancy while one submirror is offline for the backup.

If you take a submirror “offline,” the mirror stops reading and writing to the submirror. At this point, you could access the submirror itself, for example, to perform a backup. However, the submirror is in a read-only state. While a submirror is offline, Solaris Volume Manager keeps track of all writes to the mirror. When the submirror is brought back online, only the portions of the mirror that were written while the submirror was offline (the resynchronization regions) are resynchronized. Submirrors can also be taken offline to troubleshoot or repair physical devices that have errors.

Submirrors can be attached or be detached from a mirror at any time, though at least one submirror must remain attached at all times.

Normally, you create a mirror with only a single submirror. Then, you attach a second submirror after you create the mirror.

Scenario—RAID-1 (Mirror) Volume

Figure 10–1 illustrates a mirror, d20. The mirror is made of two volumes (submirrors) d21 and d22.

Solaris Volume Manager makes duplicate copies of the data on multiple physical disks, and presents one virtual disk to the application, d20 in the example. All disk writes are duplicated. Disk reads come from one of the underlying submirrors. The total capacity of mirror d20 is the size of the smallest of the submirrors (if they are not of equal size).

Figure 10–1 RAID-1 (Mirror) Example

Diagram shows how two RAID-0 volumes are used together as a RAID-1
(mirror) volume to provide redundant storage.

Providing RAID-1+0 and RAID-0+1

Solaris Volume Manager supports both RAID-1+0 and RAID-0+1 redundancy. RAID-1+0 redundancy constitutes a configuration of mirrors that are then striped. RAID-0+1 redundancy constitutes a configuration of stripes that are then mirrored. The Solaris Volume Manager interface makes it appear that all RAID-1 devices are strictly RAID-0+1. However, Solaris Volume Manager recognizes the underlying components and mirrors each individually, when possible.


Note –

Solaris Volume Manager cannot always provide RAID-1+0 functionality. However, where both submirrors are identical to each other and are composed of disk slices (and not soft partitions), RAID-1+0 is possible.


Consider a RAID-0+1 implementation with a two-way mirror that consists of three striped slices. Without Solaris Volume Manager, a single slice failure could fail one side of the mirror. Assuming that no hot spares are in use, a second slice failure would fail the mirror. Using Solaris Volume Manager, up to three slices could potentially fail without failing the mirror. The mirror does not fail because each of the three striped slices are individually mirrored to their counterparts on the other half of the mirror.

Figure 10–2 illustrates how a RAID-1 volume can experience the loss of a slice, yet the RAID-1+0 implementation prevents data loss.

Figure 10–2 RAID-1+0 Example

Diagram shows how three of six total slices in a RAID-1 volume
can potentially fail without data loss because of the RAID-1+0 implementation.

The RAID-1 volume consists of two submirrors. Each of the submirrors consist of three identical physical disks that have the same interlace value. A failure of three disks, A, B, and F, is tolerated. The entire logical block range of the mirror is still contained on at least one good disk. All of the volume's data is available.

However, if disks A and D fail, a portion of the mirror's data is no longer available on any disk. Access to these logical blocks fail. However, access to portions of the mirror where data is available still succeed. Under this situation, the mirror acts like a single disk that has developed bad blocks. The damaged portions are unavailable, but the remaining portions are available.

RAID-1 Volume (Mirror) Resynchronization

RAID-1 volume (mirror) resynchronization is the process of copying data from one submirror to another submirror when one of the following occurs:

While the resynchronization takes place, the mirror remains readable and writable by users.

A mirror resynchronization ensures proper mirror operation by maintaining all submirrors with identical data, with the exception of writes in progress.


Note –

A mirror resynchronization should not be bypassed. You do not need to manually initiate a mirror resynchronization. This process occurs automatically.


Full Resynchronization

When a new submirror is attached (added) to a mirror, all the data from another submirror in the mirror is automatically written to the newly attached submirror. Once the mirror resynchronization is done, the new submirror is readable. A submirror remains attached to a mirror until it is detached.

If the system crashes while a resynchronization is in progress, the resynchronization is restarted when the system finishes rebooting.

Optimized Resynchronization

During a reboot following a system failure, or when a submirror that was offline is brought back online, Solaris Volume Manager performs an optimized mirror resynchronization. The metadisk driver tracks submirror regions. This functionality enables the metadisk driver to know which submirror regions might be out-of-sync after a failure. An optimized mirror resynchronization is performed only on the out-of-sync regions. You can specify the order in which mirrors are resynchronized during reboot. You can omit a mirror resynchronization by setting submirror pass numbers to zero. For tasks associated with changing a pass number, see Example 11–15.


Caution – Caution –

A pass number of zero should only be used on mirrors that are mounted as read-only.


Partial Resynchronization

Following the replacement of a slice within a submirror, Solaris Volume Manager performs a partial mirror resynchronization of data. Solaris Volume Manager copies the data from the remaining good slices of another submirror to the replaced slice.

Canceling and Resuming Resynchronization With the metasync Command

The resynchronization process affects both the performance of a system, as well as the user's ability to perform tasks. For example, resynchronizations impact the I/O performance and the response time of a system. Additionally, during a resynchronization process, a disk set cannot be released from a host. In another example, if a volume is attached by mistake, the volume cannot be released until the resynchronization has completed. Because of situations such as these, allowing a resynchronization process to complete is not always advantageous.

The metasync -c volume command cancels the resynchronization process on a given volume. The following functionality is associated with canceling resynchronization processes:

A canceled resynchronization process can be resumed manually from the point that it stopped by issuing the metasync volume command.

For the tasks associated with canceling and resuming resynchroniztion processes using the metasync command, see How to Cancel a Volume Resynchronization Process and How to Resume a Volume Resynchronization Process.

Creating and Maintaining RAID-1 Volumes

This section provides guidelines can assist you in creating mirrors. This section also provides performance guidelines for the mirrors that you create.

Configuration Guidelines for RAID-1 Volumes

Performance Guidelines for RAID-1 Volumes

About RAID-1 Volume Options

The following options are available to optimize mirror performance:

You can define mirror options when you initially create the mirror. You can also change mirror options after a mirror has been set up and is running. For tasks related to changing these options, see How to Change RAID-1 Volume Options.

RAID-1 Volume Read-and-Write Policies

Solaris Volume Manager enables different read-and-write policies to be configured for a RAID-1 volume. Properly set read-and-write policies can improve performance for a given configuration.

Table 10–1 RAID-1 Volume Read Policies

Read Policy 

Description 

Round-Robin (Default) 

Attempts to balance the load across the submirrors. All reads are made in a round-robin order (one after another) from all submirrors in a mirror. 

Geometric 

Enables reads to be divided among submirrors on the basis of a logical disk block address. For example, with a two-way submirror, the disk space on the mirror is divided into two equally-sized logical address ranges. Reads from one submirror are restricted to one half of the logical range. Reads from the other submirror are restricted to the other half. The geometric read policy effectively reduces the seek time that is necessary for reads. The performance gained by this read policy depends on the system I/O load and the access patterns of the applications. 

First 

Directs all reads to the first submirror. This policy should be used only when the device or devices that comprise the first submirror are substantially faster than the devices of the second submirror. 

Table 10–2 RAID-1 Volume Write Policies

Write Policy 

Description 

Parallel (Default) 

Performs writes to a mirror that are replicated and dispatched to all of the submirrors simultaneously. 

Serial 

Performs writes to submirrors serially (that is, the first submirror write completes before the second submirror write is started). This policy specifies that writes to one submirror must be completed before the next submirror write is initiated. This policy is provided in case a submirror becomes unreadable, for example, due to a power failure. 

Pass Number

The pass number, a number in the range 0–9, determines the order in which a particular mirror is resynchronized during a system reboot. The default pass number is 1. The lower pass numbers are resynchronized first. If zero is used, the mirror resynchronization is skipped. A pass number of zero should be used only for mirrors that are mounted as read-only. Mirrors with the same pass number are resynchronized at the same time.

Understanding Submirror Status to Determine Maintenance Actions

The metastat command of Solaris Volume Manager reports status information on RAID 1 volumes and submirrors. The status information helps you to determine if maintenance action is required on a RAID-1 volume. The following table explains submirror states shown when you run the metastat command on a RAID-1 volume.

Table 10–3 Submirror States

State 

Meaning 

Okay 

The submirror has no errors and is functioning correctly. 

Resyncing 

The submirror is actively being resynchronized. An error has occurred and has been corrected, the submirror has just been brought back online, or a new submirror has been added. 

Resync canceled 

The resynchronization process on the submirror has been canceled using the metasync command.

Needs Maintenance 

A slice (or slices) in the submirror has encountered an I/O error or an open error. All reads and writes to and from this slice in the submirror have been discontinued. 

Additionally, for each slice in a submirror, the metastat command shows the following:

Device

Indicates the device name of the slice in the stripe

Start Block

Indicates the block on which the slice begins

Dbase

Indicates if the slice contains a state database replica

State

Indicates the state of the slice

Hot Spare

Indicates that a slice is being used as a hot spare for a failed slice

The submirror state only provides general information on the status of the submirror. The slice state is perhaps the most important information to review when you are troubleshooting mirror errors. If the submirror reports a “Needs Maintenance” state, you must refer to the slice state for more information.

You take a different recovery action depending on if the slice is in the “Maintenance” state or in the “Last Erred” state. If you only have slices in the “Maintenance” state, they can be repaired in any order. If you have slices both in the “Maintenance” state and in the “Last Erred” state, you must fix the slices in the “Maintenance” state first. Once the slices in the “Maintenance” state have been fixed, then fix the slices in the “Last Erred” state. For more information, see Overview of Replacing and Enabling Components in RAID-1 and RAID-5 Volumes.

The following table explains the slice states for submirrors and possible actions to take.

Table 10–4 Submirror Slice States

State 

Meaning 

Action 

Okay 

The slice has no errors and is functioning correctly. 

None. 

Resyncing 

The slice is actively being resynchronized. An error has occurred and been corrected, the submirror has just been brought back online, or a new submirror has been added. 

If desired, monitor the submirror status until the resynchronization is done. 

Maintenance 

The slice has encountered an I/O error or an open error. All reads and writes to and from this component have been discontinued. 

Enable or replace the failed slice. See How to Enable a Slice in a Submirror, or How to Replace a Slice in a Submirror. The metastat command will show an invoke recovery message with the appropriate action to take with the metareplace command. You can also use the metareplace -e command.

Last Erred 

The slice has encountered an I/O error or an open error. However, the data is not replicated elsewhere due to another slice failure. I/O is still performed on the slice. If I/O errors result, the mirror I/O fails. 

First, enable or replace slices in the “Maintenance” state. See How to Enable a Slice in a Submirror, or How to Replace a Slice in a Submirror. Usually, this error results in some data loss, so validate the mirror after it is fixed. For a file system, use the fsck command, then check the data. An application or database must have its own method of validating the device.

The Affect of Booting Into Single-User Mode on RAID-1 Volumes

Sometimes, you may need to boot a system with mirrors for root (/), /usr, and swap, the so-called “boot” file systems, into single-user mode (by using the boot -s command). In this case, these mirrors and possibly all mirrors on the system will appear in the “Needing Maintenance” state when viewed with the metastat command. Furthermore, if writes occur to these slices, the metastat command shows an increase in dirty regions on the mirrors.

This situation appears to be potentially dangerous. However, the metasync -r command, which normally runs during boot to resynchronize mirrors, is interrupted when the system is booted into single-user mode. Once the system is rebooted, the metasync -r command will run and resynchronize all mirrors.

If this situation is a concern, you can run the metasync -r command manually.

Scenario—RAID-1 Volumes (Mirrors)

RAID-1 volumes provide a means of constructing redundant volumes. Thus, when a partial or complete failure of one of the underlying RAID-0 volumes occurs, there is no data loss or interruption of access to the file systems. The following example, drawing on the scenario explained in Chapter 5, Configuring and Using Solaris Volume Manager (Scenario) and continued in Scenario—RAID-0 Volumes, describes how RAID-1 volumes can provide redundant storage.

As described in Scenario—RAID-0 Volumes, the sample system has two RAID-0 volumes. Each volume is approximately 27 Gbytes in size and spans three disks. By creating a RAID-1 volume to mirror these two RAID-0 volumes, a fully redundant storage space can provide resilient data storage.

Within this RAID-1 volume, the failure of either disk controller does not interrupt access to the volume. Similarly, failure of up to three individual disks might be tolerated without access interruption.

To provide additional protection against problems that could interrupt access, use hot spares, as described in Chapter 16, Hot Spare Pools (Overview). Specifically, see How Hot Spares Work.