Solaris Volume Manager Administration Guide

Maintaining RAID 5 Volumes

How to Check the Status of RAID 5 Volumes

To check status on a RAID 5 volume, use one of the following methods:

From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node and view the status of the volumes. Choose a volume, then choose Action->Properties to see more detailed information. For more information, see the online help.
Use the metastat command.

For each slice in the RAID 5 volume, the metastat command shows the following:
- “Device” (device name of the slice in the stripe)
- “Start Block” on which the slice begins
- “Dbase” to show if the slice contains a state database replica
- “State” of the slice
- “Hot Spare” to show the slice being used to hot spare a failed slice

Example—Viewing RAID 5 Volume Status

Here is sample RAID 5 volume output from the metastat command.

# metastat
d10: RAID
    State: Okay        
    Interlace: 32 blocks
    Size: 10080 blocks
Original device:
    Size: 10496 blocks
        Device              Start Block  Dbase State        Hot Spare
        c0t0d0s1                 330     No    Okay        
        c1t2d0s1                 330     No    Okay        
        c2t3d0s1                 330     No    Okay

The metastat command output identifies the volume as a RAID 5 volume. For each slice in the RAID 5 volume, it shows the name of the slice in the stripe, the block on which the slice begins, an indicator that none of these slices contain a state database replica, that all the slices are okay, and that none of the slices are hot spare replacements for a failed slice.

RAID 5 Volume Status Information

The following table explains RAID 5 volume states.

Table 14–1 RAID 5 States


State	Meaning
Initializing	Slices are in the process of having all disk blocks zeroed. This process is necessary due to the nature of RAID 5 volumes with respect to data and parity interlace striping. Once the state changes to “Okay,” the initialization process is complete and you are able to open the device. Up to this point, applications receive error messages.
Okay	The device is ready for use and is currently free from errors.
Maintenance	A slice has been marked as failed due to I/O or open errors that were encountered during a read or write operation.

The slice state is perhaps the most important information when you are troubleshooting RAID 5 volume errors. The RAID 5 state only provides general status information, such as “Okay” or “Needs Maintenance.” If the RAID 5 reports a “Needs Maintenance” state, refer to the slice state. You take a different recovery action if the slice is in the “Maintenance” or “Last Erred” state. If you only have a slice in the “Maintenance” state, it can be repaired without loss of data. If you have a slice in the “Maintenance” state and a slice in the “Last Erred” state, data has probably been corrupted. You must fix the slice in the “Maintenance” state first then the “Last Erred” slice. See Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes.

The following table explains the slice states for a RAID 5 volume and possible actions to take.

Table 14–2 RAID 5 Slice States


State	Meaning	Action
Initializing	Slices are in the process of having all disk blocks zeroed. This process is necessary due to the nature of RAID 5 volumes with respect to data and parity interlace striping.	Normally none. If an I/O error occurs during this process, the device goes into the “Maintenance” state. If the initialization fails, the volume is in the “Initialization Failed” state, and the slice is in the “Maintenance” state. If this happens, clear the volume and re-create it.
Okay	The device is ready for use and is currently free from errors.	None. Slices can be added or replaced, if necessary.
Resyncing	The slice is actively being resynchronized. An error has occurred and been corrected, a slice has been enabled, or a slice has been added.	If desired, monitor the RAID 5 volume status until the resynchronization is done.
Maintenance	A single slice has been marked as failed due to I/O or open errors that were encountered during a read or write operation.	Enable or replace the failed slice. See How to Enable a Component in a RAID 5 Volume, or How to Replace a Component in a RAID 5 Volume. The `metastat` command will show an `invoke` recovery message with the appropriate action to take with the `metareplace` command.
Maintenance/ Last Erred	Multiple slices have encountered errors. The state of the failed slices is either “Maintenance” or “Last Erred.” In this state, no I/O is attempted on the slice that is in the “Maintenance” state, but I/O is attempted to the slice marked “Last Erred” with the outcome being the overall status of the I/O request.	Enable or replace the failed slices. See How to Enable a Component in a RAID 5 Volume, or How to Replace a Component in a RAID 5 Volume. The `metastat` command will show an `invoke` recovery message with the appropriate action to take with the `metareplace` command, which must be run with the `-f` flag. This state indicates that data might be fabricated due to multiple failed slices.

Note –

RAID 5 volume initialization or resynchronization cannot be interrupted.

How to Expand a RAID 5 Volume

Make sure that you have a current backup of all data and that you have root access.

Read Background Information for Creating RAID 5 Volumes.

To attach additional components to a RAID 5 volume, use one of the following methods:
- From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose Attach Component and follow the instructions. For more information, see the online help.
- Use the following form of the metattach command:
  metattach volume-name name-of-component-to-add
  - volume-name is the name for the volume to expand.
  - name-of-component-to-add specifies the name of the component to attach to the RAID 5 volume.
  See the metattach(1M) man page for more information.
Note –
In general, attaching components is a short-term solution to a RAID 5 volume that is running out of space. For performance reasons, it is best to have a “pure” RAID 5 volume.

Example—Adding a Component to a RAID 5 Volume

# metattach d2 c2t1d0s2
d2: column is attached

This example shows the addition of slice c2t1d0s2 to an existing RAID 5 volume named d2.

Where to Go From Here

For a UFS, run the growfs command on the RAID 5 volume. See Volume and Disk Space Expansion.

An application, such as a database, that uses the raw volume must have its own way of growing the added space.

How to Enable a Component in a RAID 5 Volume

Make sure that you have a current backup of all data and that you have root access.

To enable a failed component in a RAID 5 volume, use one of the following methods:
- From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose the failed component. Click Enable Component and follow the instructions. For more information, see the online help.
- Use the following form of the metareplace command:
  metareplace -e volume-name component-name
  - -e specifies to replace the failed component with a component at the same location (perhaps after physically replacing a disk).
  - volume-name is the name of the volume with a failed component.
  - component-name specifies the name of the component to replace.
  metareplace automatically starts resynchronizing the new component with the rest of the RAID 5 volume.

Example—Enabling a Component in a RAID 5 Volume

# metareplace -e d20 c2t0d0s2

In this example, the RAID 5 volume d20 has a slice, c2t0d0s2, which had a soft error. The metareplace command with the -e option enables the slice.

Note –

If a disk drive is defective, you can either replace it with another available disk (and its slices) on the system as documented in How to Replace a Component in a RAID 5 Volume. Alternatively, you can repair/replace the disk, label it, and run the metareplace command with the -e option.

How to Replace a Component in a RAID 5 Volume

This task replaces a failed slice of a RAID 5 volume in which only one slice has failed.

Caution –

Replacing a failed slice when multiple slices are in error might cause data to be fabricated. The integrity of the data in this instance would be questionable.

Make sure that you have a current backup of all data and that you have root access.

Use one of the following methods to determine which slice of the RAID 5 volume needs to be replaced:
- From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then view the status of the individual components. For more information, see the online help.
- Use the metastat command.
Look for the keyword “Maintenance” to identify the failed slice.

Use one of the following methods to replace the failed slice with another slice:
- From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose the failed component. Click Replace Component and follow the instructions. For more information, see the online help.
- Use the following form of the metareplace command:
  metareplace volume-name failed-component new-component
  - volume-name is the name of the volume with a failed component.
  - failed-component specifies the name of the component to replace.
  - new-component specifies the name of the component to add to the volume in place of the failed component.
  See the metareplace(1M) man page for more information.

To verify the status of the replacement slice, use one of the methods described in Step 2.

The state of the replaced slice should be “Resyncing” or “Okay”.

Example—Replacing a RAID 5 Component

# metastat d1
d1: RAID
State: Needs Maintenance
    Invoke: metareplace d1 c0t14d0s6 <new device>
    Interlace: 32 blocks
    Size: 8087040 blocks
Original device:
    Size: 8087520 blocks
	Device              Start Block  Dbase State        Hot Spare
	c0t9d0s6                 330     No    Okay        
	c0t13d0s6                330     No    Okay        
	c0t10d0s6                330     No    Okay        
	c0t11d0s6                330     No    Okay        
	c0t12d0s6                330     No    Okay        
	c0t14d0s6                330     No    Maintenance
 
# metareplace d1 c0t14d0s6 c0t4d0s6
d1: device c0t14d0s6 is replaced with c0t4d0s6
# metastat d1
d1: RAID
    State: Resyncing
    Resync in progress: 98% done
    Interlace: 32 blocks
    Size: 8087040 blocks
Original device:
    Size: 8087520 blocks
	Device              Start Block  Dbase State        Hot Spare
	c0t9d0s6                 330     No    Okay        
	c0t13d0s6                330     No    Okay        
	c0t10d0s6                330     No    Okay        
	c0t11d0s6                330     No    Okay        
	c0t12d0s6                330     No    Okay
	c0t4d0s6                 330     No    Resyncing

In this example, the metastat command displays the action to take to recover from the failed slice in the d1 RAID 5 volume. After locating an available slice, the metareplace command is run, specifying the failed slice first, then the replacement slice. (If no other slices are available, run the metareplace command with the -e option to attempt to recover from possible soft errors by resynchronizing the failed device.) If multiple errors exist, the slice in the “Maintenance” state must first be replaced or enabled. Then the slice in the “Last Erred” state can be repaired. After the metareplace command, the metastat command monitors the progress of the resynchronization. During the replacement, the state of the volume and the new slice will is “Resyncing.” You can continue to use the volume while it is in this state.

Note –

You can use the metareplace command on non-failed devices to change a disk slice or other component. This procedure can be useful for tuning the performance of RAID 5 volumes.