To check status on a RAID 5 volume, use one of the following methods:
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node and view the status of the volumes. Choose a volume, then choose Action->Properties to see more detailed information. For more information, see the online help.
Use the metastat command.
For each slice in the RAID 5 volume, the metastat command shows the following:
“Device” (device name of the slice in the stripe)
“Start Block” on which the slice begins
“Dbase” to show if the slice contains a state database replica
“State” of the slice
“Hot Spare” to show the slice being used to hot spare a failed slice
Here is sample RAID 5 volume output from the metastat command.
# metastat d10: RAID State: Okay Interlace: 32 blocks Size: 10080 blocks Original device: Size: 10496 blocks Device Start Block Dbase State Hot Spare c0t0d0s1 330 No Okay c1t2d0s1 330 No Okay c2t3d0s1 330 No Okay |
The metastat command output identifies the volume as a RAID 5 volume. For each slice in the RAID 5 volume, it shows the name of the slice in the stripe, the block on which the slice begins, an indicator that none of these slices contain a state database replica, that all the slices are okay, and that none of the slices are hot spare replacements for a failed slice.
The following table explains RAID 5 volume states.
Table 14–1 RAID 5 States
State |
Meaning |
---|---|
Initializing |
Slices are in the process of having all disk blocks zeroed. This process is necessary due to the nature of RAID 5 volumes with respect to data and parity interlace striping. Once the state changes to “Okay,” the initialization process is complete and you are able to open the device. Up to this point, applications receive error messages. |
Okay |
The device is ready for use and is currently free from errors. |
Maintenance |
A slice has been marked as failed due to I/O or open errors that were encountered during a read or write operation. |
The slice state is perhaps the most important information when you are troubleshooting RAID 5 volume errors. The RAID 5 state only provides general status information, such as “Okay” or “Needs Maintenance.” If the RAID 5 reports a “Needs Maintenance” state, refer to the slice state. You take a different recovery action if the slice is in the “Maintenance” or “Last Erred” state. If you only have a slice in the “Maintenance” state, it can be repaired without loss of data. If you have a slice in the “Maintenance” state and a slice in the “Last Erred” state, data has probably been corrupted. You must fix the slice in the “Maintenance” state first then the “Last Erred” slice. See Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes.
The following table explains the slice states for a RAID 5 volume and possible actions to take.
Table 14–2 RAID 5 Slice States
State |
Meaning |
Action |
---|---|---|
Initializing |
Slices are in the process of having all disk blocks zeroed. This process is necessary due to the nature of RAID 5 volumes with respect to data and parity interlace striping. |
Normally none. If an I/O error occurs during this process, the device goes into the “Maintenance” state. If the initialization fails, the volume is in the “Initialization Failed” state, and the slice is in the “Maintenance” state. If this happens, clear the volume and re-create it. |
Okay |
The device is ready for use and is currently free from errors. |
None. Slices can be added or replaced, if necessary. |
Resyncing |
The slice is actively being resynchronized. An error has occurred and been corrected, a slice has been enabled, or a slice has been added. |
If desired, monitor the RAID 5 volume status until the resynchronization is done. |
Maintenance |
A single slice has been marked as failed due to I/O or open errors that were encountered during a read or write operation. |
Enable or replace the failed slice. See How to Enable a Component in a RAID 5 Volume, or How to Replace a Component in a RAID 5 Volume. The metastat command will show an invoke recovery message with the appropriate action to take with the metareplace command. |
Maintenance/ Last Erred |
Multiple slices have encountered errors. The state of the failed slices is either “Maintenance” or “Last Erred.” In this state, no I/O is attempted on the slice that is in the “Maintenance” state, but I/O is attempted to the slice marked “Last Erred” with the outcome being the overall status of the I/O request. |
Enable or replace the failed slices. See How to Enable a Component in a RAID 5 Volume, or How to Replace a Component in a RAID 5 Volume. The metastat command will show an invoke recovery message with the appropriate action to take with the metareplace command, which must be run with the -f flag. This state indicates that data might be fabricated due to multiple failed slices. |
RAID 5 volume initialization or resynchronization cannot be interrupted.
Make sure that you have a current backup of all data and that you have root access.
To attach additional components to a RAID 5 volume, use one of the following methods:
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose Attach Component and follow the instructions. For more information, see the online help.
Use the following form of the metattach command:
metattach volume-name name-of-component-to-add |
volume-name is the name for the volume to expand.
name-of-component-to-add specifies the name of the component to attach to the RAID 5 volume.
See the metattach(1M) man page for more information.
In general, attaching components is a short-term solution to a RAID 5 volume that is running out of space. For performance reasons, it is best to have a “pure” RAID 5 volume.
# metattach d2 c2t1d0s2 d2: column is attached |
This example shows the addition of slice c2t1d0s2 to an existing RAID 5 volume named d2.
For a UFS, run the growfs command on the RAID 5 volume. See Volume and Disk Space Expansion.
An application, such as a database, that uses the raw volume must have its own way of growing the added space.
Make sure that you have a current backup of all data and that you have root access.
To enable a failed component in a RAID 5 volume, use one of the following methods:
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose the failed component. Click Enable Component and follow the instructions. For more information, see the online help.
Use the following form of the metareplace command:
metareplace -e volume-name component-name |
-e specifies to replace the failed component with a component at the same location (perhaps after physically replacing a disk).
volume-name is the name of the volume with a failed component.
component-name specifies the name of the component to replace.
metareplace automatically starts resynchronizing the new component with the rest of the RAID 5 volume.
# metareplace -e d20 c2t0d0s2 |
In this example, the RAID 5 volume d20 has a slice, c2t0d0s2, which had a soft error. The metareplace command with the -e option enables the slice.
If a disk drive is defective, you can either replace it with another available disk (and its slices) on the system as documented in How to Replace a Component in a RAID 5 Volume. Alternatively, you can repair/replace the disk, label it, and run the metareplace command with the -e option.
This task replaces a failed slice of a RAID 5 volume in which only one slice has failed.
Replacing a failed slice when multiple slices are in error might cause data to be fabricated. The integrity of the data in this instance would be questionable.
Make sure that you have a current backup of all data and that you have root access.
Use one of the following methods to determine which slice of the RAID 5 volume needs to be replaced:
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then view the status of the individual components. For more information, see the online help.
Use the metastat command.
Look for the keyword “Maintenance” to identify the failed slice.
Use one of the following methods to replace the failed slice with another slice:
From the Enhanced Storage tool within the Solaris Management Console, open the Volumes node, then open the RAID 5 volume. Choose the Components pane, then choose the failed component. Click Replace Component and follow the instructions. For more information, see the online help.
Use the following form of the metareplace command:
metareplace volume-name failed-component new-component |
volume-name is the name of the volume with a failed component.
failed-component specifies the name of the component to replace.
new-component specifies the name of the component to add to the volume in place of the failed component.
See the metareplace(1M) man page for more information.
To verify the status of the replacement slice, use one of the methods described in Step 2.
The state of the replaced slice should be “Resyncing” or “Okay”.
# metastat d1 d1: RAID State: Needs Maintenance Invoke: metareplace d1 c0t14d0s6 <new device> Interlace: 32 blocks Size: 8087040 blocks Original device: Size: 8087520 blocks Device Start Block Dbase State Hot Spare c0t9d0s6 330 No Okay c0t13d0s6 330 No Okay c0t10d0s6 330 No Okay c0t11d0s6 330 No Okay c0t12d0s6 330 No Okay c0t14d0s6 330 No Maintenance # metareplace d1 c0t14d0s6 c0t4d0s6 d1: device c0t14d0s6 is replaced with c0t4d0s6 # metastat d1 d1: RAID State: Resyncing Resync in progress: 98% done Interlace: 32 blocks Size: 8087040 blocks Original device: Size: 8087520 blocks Device Start Block Dbase State Hot Spare c0t9d0s6 330 No Okay c0t13d0s6 330 No Okay c0t10d0s6 330 No Okay c0t11d0s6 330 No Okay c0t12d0s6 330 No Okay c0t4d0s6 330 No Resyncing |
In this example, the metastat command displays the action to take to recover from the failed slice in the d1 RAID 5 volume. After locating an available slice, the metareplace command is run, specifying the failed slice first, then the replacement slice. (If no other slices are available, run the metareplace command with the -e option to attempt to recover from possible soft errors by resynchronizing the failed device.) If multiple errors exist, the slice in the “Maintenance” state must first be replaced or enabled. Then the slice in the “Last Erred” state can be repaired. After the metareplace command, the metastat command monitors the progress of the resynchronization. During the replacement, the state of the volume and the new slice will is “Resyncing.” You can continue to use the volume while it is in this state.
You can use the metareplace command on non-failed devices to change a disk slice or other component. This procedure can be useful for tuning the performance of RAID 5 volumes.