Solaris Volume Manager Administration Guide

Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes

Solaris Volume Manager has the capability to replace and enable components within RAID 1 (mirror) and RAID 5 volumes.

In Solaris Volume Manager terms, replacing a component is a way to substitute an available component on the system for a selected component in a submirror or RAID 5 volume. You can think of this process as logical replacement, as opposed to physically replacing the component. (See Replacing a Component With Another Available Component.)

Enabling a component means to “activate” or substitute a component with itself (that is, the component name is the same). See Enabling a Component.


Note –

When recovering from disk errors, scan /var/adm/messages to see what kind of errors occurred. If the errors are transitory and the disks themselves do not have problems, try enabling the failed components. You can also use the format command to test a disk.


Enabling a Component

You can enable a component when any of the following conditions exist:


Note –

Always check for state database replicas and hot spares on the drive being replaced. Any state database replica shown to be in error should be deleted before replacing the disk. Then after enabling the component, they should be re-created (at the same size). You should treat hot spares in the same manner.


Replacing a Component With Another Available Component

You use the metareplace command when you replace or swap an existing component with a different component that is available and not in use on the system.

You can use this command when any of the following conditions exist:

Maintenance and Last Erred States

When a component in a mirror or RAID 5 volume experiences errors, Solaris Volume Manager puts the component in the “Maintenance” state. No further reads or writes are performed to a component in the “Maintenance” state. Subsequent errors on other components in the same volume are handled differently, depending on the type of volume. A RAID 1 volume might be able to tolerate many components in the “Maintenance” state and still be read from and written to. A RAID 5 volume, by definition, can only tolerate a single component in the “Maintenance” state.

When a component in a RAID 0 or RAID 5 volume experiences errors and there are no redundant components to read from (for example, in a RAID 5 volume, after one component goes into Maintenance state, there is no redundancy available, so the next component to fail would go into “Last Erred” state) When either a mirror or RAID 5 volume has a component in the “Last Erred” state, I/O is still attempted to the component marked “Last Erred.” This happens because a “Last Erred” component contains the last good copy of data from Solaris Volume Manager's point of view. With a component in the “Last Erred” state, the volume behaves like a normal device (disk) and returns I/O errors to an application. Usually, at this point some data has been lost.

Always replace components in the “Maintenance” state first, followed by those in the “Last Erred” state. After a component is replaced and resynchronized, use the metastat command to verify its state, then validate the data to make sure it is good.

Mirrors –If components are in the “Maintenance” state, no data has been lost. You can safely replace or enable the components in any order. If a component is in the “Last Erred” state, you cannot replace it until you first replace all the other mirrored components in the “Maintenance” state. Replacing or enabling a component in the “Last Erred” state usually means that some data has been lost. Be sure to validate the data on the mirror after you repair it.

RAID 5 Volumes–A RAID 5 volume can tolerate a single component failure. You can safely replace a single component in the “Maintenance” state without losing data. If an error on another component occurs, it is put into the “Last Erred” state. At this point, the RAID 5 volume is a read-only device. You need to perform some type of error recovery so that the state of the RAID 5 volume is stable and the possibility of data loss is reduced. If a RAID 5 volume reaches a “Last Erred” state, there is a good chance it has lost data. Be sure to validate the data on the RAID 5 volume after you repair it.

Background Information For Replacing and Enabling Slices in RAID 1 and RAID 5 Volumes

When you replace components in a mirror or a RAID 5 volume, follow these guidelines:


Note –

A submirror or RAID 5 volume might be using a hot spare in place of a failed component. When that failed component is enabled or replaced by using the procedures in this section, the hot spare is marked “Available” in the hot spare pool, and is ready for use.