How to Check the Status of Metadevices and Hot Spare Pools (Command Line) (Solstice DiskSuite 4.2.1 User's Guide)

Solstice DiskSuite 4.2.1 User's Guide

How to Check the Status of Metadevices and Hot Spare Pools (Command Line)

Make sure you have met the prerequisites ("Prerequisites for Maintaining DiskSuite Objects"). Use the metastat(1M) command to view metadevice or hot spare pool status. Refer to the metastat(1M) man pages for more information.

Use the following to find an explanation of the command line output and possible actions to take.

Note -

Refer to Table 3-2 for an explanation of DiskSuite's general status keywords.

Stripe and Concatenation Status (Command Line)

DiskSuite does not report a state change for a concatenation or a stripe, unless the concatenation or stripe is used as a submirror. Refer to "Stripe and Concatenation Status (DiskSuite Tool)" for more information.

Mirror and Submirror Status (Command Line)

Running metastat(1M) on a mirror displays the state of each submirror, the pass number, the read option, the write option, and the size of the total number of blocks in the mirror. Refer to "How to Change a Mirror's Options (Command Line)" to change a mirror's pass number, read option, or write option.

Here is sample mirror output from metastat.

# metastat
d0: Mirror
    Submirror 0: d1
      State: Okay        
    Submirror 1: d2
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 5600 blocks
 
d1: Submirror of d0
    State: Okay        
    Size: 5600 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c0t2d0s7                   0     No    Okay        
 
...

For each submirror in the mirror, metastat shows the state, an "invoke" line if there is an error, the assigned hot spare pool (if any), size in blocks, and information about each slice in the submirror.

Table 3-8 explains submirror states.

Table 3-8 Submirror States (Command Line)


State	Meaning
Okay	The submirror has no errors and is functioning correctly.
Resyncing	The submirror is actively being resynced. An error has occurred and been corrected, the submirror has just been brought back online, or a new submirror has been added.
Needs Maintenance	A slice (or slices) in the submirror has encountered an I/O error or an open error. All reads and writes to and from this slice in the submirror have been discontinued.

Additionally, for each stripe in a submirror, metastat shows the "Device" (device name of the slice in the stripe); "Start Block" on which the slice begins; "Dbase" to show if the slice contains a state database replica; "State" of the slice; and "Hot Spare" to show the slice being used to hot spare a failed slice.

The slice state is perhaps the most important information when troubleshooting mirror errors. The submirror state only provides general status information, such as "Okay" or "Needs Maintenance." If the submirror reports a "Needs Maintenance" state, refer to the slice state. You take a different recovery action if the slice is in the "Maintenance" or "Last Erred" state. If you only have slices in the "Maintenance" state, they can be repaired in any order. If you have a slices in the "Maintenance" state and a slice in the "Last Erred" state, you must fix the slices in the "Maintenance" state first then the "Last Erred" slice. Refer to "Overview of Replacing and Enabling Slices in Mirrors and RAID5 Metadevices".

Table 3-9 explains the slice states for submirrors and possible actions to take.

Table 3-9 Submirror Slice States (Command Line)


State	Meaning	Action
Okay	The slice has no errors and is functioning correctly.	None.
Resyncing	The slice is actively being resynced. An error has occurred and been corrected, the submirror has just been brought back online, or a new submirror has been added.	If desired, monitor the submirror status until the resync is done.
Maintenance	The slice has encountered an I/O error or an open error. All reads and writes to and from this slice have been discontinued.	Enable or replace the errored slice. See "How to Enable a Slice in a Submirror (Command Line)", or "How to Replace a Slice in a Submirror (Command Line)". Note: The `metastat(1M)` command will show an `invoke` recovery message with the appropriate action to take with the `metareplace(1M)` command. You can also use the `metareplace` `-e` command.
Last Erred	The slice has encountered an I/O error or an open error. However, the data is not replicated elsewhere due to another slice failure. I/O is still performed on the slice. If I/O errors result, the mirror I/O will fail.	First, enable or replace slices in the "Maintenance" state. See "How to Enable a Slice in a Submirror (Command Line)", or "How to Replace a Slice in a Submirror (Command Line)". Usually, this error results in some data loss, so validate the mirror after it is fixed. For a file system, use the `fsck(1M)` command to validate the "metadata" then check the user-data. An application or database must have its own method of validating the metadata.

RAID5 Metadevice Status (Command Line)

Running the metastat(1M) command on a RAID5 metadevice shows the status of the metadevice. Additionally, for each slice in the RAID5 metadevice, metastat shows the "Device" (device name of the slice in the stripe); "Start Block" on which the slice begins; "Dbase" to show if the slice contains a state database replica; "State" of the slice; and "Hot Spare" to show the slice being used to hot spare a failed slice.

Here is sample RAID5 metadevice output from metastat.

# metastat
d10: RAID
    State: Okay        
    Interlace: 32 blocks
    Size: 10080 blocks
Original device:
    Size: 10496 blocks
        Device              Start Block  Dbase State        Hot Spare
        c0t0d0s1                 330     No    Okay        
        c1t2d0s1                 330     No    Okay        
        c2t3d0s1                 330     No    Okay

Table 3-10 explains RAID5 metadevice states.

Table 3-10 RAID5 States (Command Line)


State	Meaning
Initializing	Slices are in the process of having all disk blocks zeroed. This is necessary due to the nature of RAID5 metadevices with respect to data and parity interlace striping. Once the state changes to the "Okay," the initialization process is complete and you are able to open the device. Up to this point, applications receive error messages.
Okay	The device is ready for use and is currently free from errors.
Maintenance	A single slice has been marked as errored due to I/O or open errors encountered during a read or write operation.

The slice state is perhaps the most important information when troubleshooting RAID5 metadevice errors. The RAID5 state only provides general status information, such as "Okay" or "Needs Maintenance." If the RAID5 reports a "Needs Maintenance" state, refer to the slice state. You take a different recovery action if the slice is in the "Maintenance" or "Last Erred" state. If you only have a slice in the "Maintenance" state, it can be repaired without loss of data. If you have a slice in the "Maintenance" state and a slice in the "Last Erred" state, data has probably been corrupted. You must fix the slice in the "Maintenance" state first then the "Last Erred" slice. Refer to "Overview of Replacing and Enabling Slices in Mirrors and RAID5 Metadevices".

Table 3-11 explains the slice states for a RAID5 metadevice and possible actions to take.

Table 3-11 RAID5 Slice States (Command Line)


State	Meaning	Action
Initializing	Slices are in the process of having all disk blocks zeroed. This is necessary due to the nature of RAID5 metadevices with respect to data and parity interlace striping.	Normally none. If an I/O error occurs during this process, the device goes into the "Maintenance" state. If the initialization fails, the metadevice is in the "Initialization Failed" state and the slice is in the "Maintenance" state. If this happens, clear the metadevice and recreate it.
Okay	The device is ready for use and is currently free from errors.	None. Slices may be added or replaced, if necessary.
Resyncing	The slice is actively being resynced. An error has occurred and been corrected, a slice has been enabled, or a slice has been added.	If desired, monitor the RAID5 metadevice status until the resync is done.
Maintenance	A single slice has been marked as errored due to I/O or open errors encountered during a read or write operation.	Enable or replace the errored slice. See "How to Enable a Slice in a RAID5 Metadevice (Command Line)", or "How to Replace a RAID5 Slice (Command Line)". Note: The `metastat(1M)` command will show an `invoke` recovery message with the appropriate action to take with the `metareplace(1M)` command.
Maintenance/ Last Erred	Multiple slices have encountered errors. The state of the errored slices is either "Maintenance" or "Last Erred." In this state, no I/O is attempted on the slice that is in the "Maintenance" state, but I/O is attempted to the slice marked "Last Erred" with the outcome being the overall status of the I/O request.	Enable or replace the errored slices. See "How to Enable a Slice in a RAID5 Metadevice (Command Line)", or "How to Replace a RAID5 Slice (Command Line)". Note: The `metastat(1M)` command will show an `invoke` recovery message with the appropriate action to take with the `metareplace(1M)` command, which must be run with the `-f` flag. This indicates that data might be fabricated due to multiple errored slices.

Trans Metadevice Status (Command Line)

Running the metastat(1M) command on a trans metadevice shows the status of the metadevice.

Here is sample trans metadevice output from metastat:

# metastat
d20: Trans
    State: Okay        
    Size: 102816 blocks
    Master Device: c0t3d0s4
    Logging Device: c0t2d0s3
 
        Master Device       Start Block  Dbase
        c0t3d0s4                   0     No  
 
c0t2d0s3: Logging device for d0
    State: Okay        
    Size: 5350 blocks
 
        Logging Device      Start Block  Dbase
        c0t2d0s3                 250     No

The metastat command also shows master and logging devices. For each device, the following information is displayed: the "Device" (device name of the slice or metadevice); "Start Block" on which the device begins; "Dbase" to show if the device contains a state database replica; and for the logging device, the "State."

Table 3-12 explains trans metadevice states and possible actions to take.

Table 3-12 Trans Metadevice States (Command Line)


State	Meaning	Action
Okay	The device is functioning properly. If mounted, the file system is logging and will not be checked at boot.	None.
Attaching	The logging device will be attached to the trans metadevice when the trans is closed or unmounted. When this occurs, the device is transitioned to the Okay state.	Refer to the `metattach(1M)` man page.
Detached	The trans metadevice does not have a logging device. All benefits from UFS logging are disabled.	`fsck(1M)` automatically checks the device at boot time. Refer to the `metadetach(1M)` man page.
Detaching	The logging device will be detached from the trans metadevice when the trans is closed or unmounted. When this occurs, the device transitions to the Detached state.	Refer to the `metadetach(1M)` man page.
Hard Error	A device error or file system panic has occurred while the device was in use. An I/O error is returned for every read or write until the device is closed or unmounted. The first open causes the device to transition to the Error state.	Fix the trans metadevice. See "How to Recover a Trans Metadevice With a File System Panic (Command Line)", or "How to Recover a Trans Metadevice With Hard Errors (Command Line)".
Error	The device can be read and written. The file system can be mounted read-only. However, an I/O error is returned for every read or write that actually gets a device error. The device does not transition back to the Hard Error state, even when a later device error of file system panic occurs.	Fix the trans metadevice. See "How to Recover a Trans Metadevice With a File System Panic (Command Line)", or "How to Recover a Trans Metadevice With Hard Errors (Command Line)". Successfully completing `fsck(1M)` or `newfs(1M)` transitions the device into the Okay state. When the device is in the Hard Error or Error state, `fsck` automatically checks and repairs the file system at boot time. `newfs` destroys whatever data may be on the device.

Hot Spare Pool and Hot Spare Status (Command Line)

Running the metastat(1M) command on a hot spare pool shows the status of the hot spare pool and its hot spares.

Here is sample hot spare pool output from metastat.

# metastat hsp001
hsp001: 1 hot spare
        c1t3d0s2                Available       16800 blocks

Table 3-13 explains hot spare pool states and possible actions to take.

Table 3-13 Hot Spare Pool States (Command Line)


State	Meaning	Action
Available	The hot spares are running and ready to accept data, but are not currently being written to or read from.	None.
In-use	Hot spares are currently being written to and read from.	Diagnose how the hot spares are being used. Then repair the slice in the metadevice for which the hot spare is being used.
Attention	There is a problem with a hot spare or hot spare pool, but there is no immediate danger of losing data. This status is also displayed if there are no hot spares in the Hot Spare Pool or all the hot spares are in use or any are broken.	Diagnose how the hot spares are being used or why they are broken. You can add more hot spares to the hot spare pool if desired.