Solstice DiskSuite 4.2.1 User's Guide

How to Check the Status of Metadevices and Hot Spare Pools (DiskSuite Tool)

Use this procedure to view and interpret metadevice and hot spare pool status information.

  1. Make sure you have met the prerequisites ("Prerequisites for Maintaining DiskSuite Objects").

  2. Check the status of a metadevice or hot spare pool by displaying the object's Information window.

    For other ways of checking status, see "Using DiskSuite Tool to Check Status".

  3. Refer to Table 3-2 for explanations of the status keywords used by metadevices and hot spare pools.

    Table 3-2 General Status Keywords

    Keyword 

    Meaning 

    Used By ... 

    OK 

    The metadevice or hot spare pool has no errors and is functioning correctly. 

    All metadevice types and hot spare pools 

    Attention 

    The metadevice or hot spare pool has a problem, but there is no immediate danger of losing data. 

    All metadevice types and hot spare pools 

    Urgent 

    The metadevice is only one failure away from losing data. 

    Mirrors/submirrors, RAID5 metadevices, and trans metadevices 

    Critical 

    Data potentially has been corrupted. For example, all submirrors in a mirror have errors, or a RAID5 metadevice has errors on more than one slice. Template objects, except the hot spare pool template, also show a Critical status if the metadevice configuration is invalid. 

    Mirrors/submirrors, RAID5 metadevices, trans metadevices, and all template objects 


    Note -

    If the fan fails on a SPARCstorage Array, all metadevices and slices on that SPARCstorage Array are marked "Critical."


  4. Use the following to find the appropriate section about a specific DiskSuite object's status and possible actions to take.

Stripe and Concatenation Status (DiskSuite Tool)

DiskSuite does not report a state change for a concatenation or stripe that experiences errors, unless the concatenation or stripe is used as a submirror. If there is a slice error, or other device problem, DiskSuite returns an error to the requesting application, and outputs it to the console, such as:


WARNING: md d4: read error on /dev/dsk/c1t3d0s6

Note -

DiskSuite can send SNMP trap data (alerts), such as the message above, to any network management console capable of receiving SNMP messages. Refer to "How to Configure DiskSuite SNMP Support (Command Line)", for more information.


Because concatenations and stripes do not contain replicated data, to recover from slice errors on simple metadevices you must replace the physical disk, recreate the metadevice, and restore data from backup. Refer to "How to Recreate a Stripe or Concatenation After Slice Failure (DiskSuite Tool)", or "How to Recreate a Stripe or Concatenation After Slice Failure (Command Line)".

Mirror and Submirror Status (DiskSuite Tool)

A Mirror object has two Status fields: one for the mirror device itself, and individual Status fields for each submirror. The Status field for a mirror, as explained in Table 3-3, gives a high-level status.

Table 3-3 Mirror Status Keywords

Keyword 

Meaning 

OK 

The mirror has no errors and is functioning correctly. 

Attention 

A submirror has a problem, but there is no immediate danger of losing data. There are still two copies of the data (the mirror is three-way mirror and only one submirror failed), or a hot spare has kicked in. 

Urgent 

The mirror contains only a single good submirror, providing only one copy of the data. The mirror is only one failure away from losing data. 

Critical 

All submirrors have errors and data has potentially been corrupted. 

Table 3-4 shows the Status fields of submirrors, and possible actions to take.

Table 3-4 Submirror Status Keywords

Keyword 

Meaning 

Action 

OK 

The submirror has no errors and is functioning correctly. 

None. 

Resyncing 

The submirror is actively being resynced.  

None. An error has occurred and been corrected, the submirror has just been brought back online, or a new submirror has been added. 

Component Resyncing 

A slice in the submirror is actively being resynced. 

None. Either a hot spare slice or another slice has replaced an errored slice in the submirror. 

Attaching 

The submirror is being attached. 

None. 

Attached (resyncing) 

The entire submirror is being resynced after the attach occurred. 

None. 

Online (scheduled) 

The submirror will be brought online the next time you click Commit. 

Click the Commit button to enable the submirror. 

Offline (scheduled) 

The submirror will be brought offline the next time you click Commit. 

Click the Commit button to offline the submirror. 

Offlined 

The submirror is offline. 

When appropriate, bring the submirror back online, for example, after performing maintenance. See "How to Place a Submirror Offline and Online (DiskSuite Tool)".

Maintenance 

The submirror has an error. 

Repair the submirror. You can fix submirrors in the "Errored" state in any order. See "How to Enable a Slice in a Submirror (DiskSuite Tool)", or "How to Replace a Slice in a Submirror (DiskSuite Tool)".

Last Erred 

The submirror has errors, and data for the mirror has potentially been corrupted. 

Fix submirrors in the "Maintenance" state first, then fix the submirror in the "Last Erred" state. See "How to Enable a Slice in a Submirror (DiskSuite Tool)", or "How to Replace a Slice in a Submirror (DiskSuite Tool)". After fixing the error, validate the data.


Note -

DiskSuite does not retain state and hot spare information for simple metadevices that are not submirrors.


RAID5 Metadevice Status (DiskSuite Tool)

Table 3-5 explains the keywords in the Status fields of RAID5 objects, and possible actions to take.

Table 3-5 RAID5 Status Keywords

Keyword 

Meaning 

Action 

OK 

The RAID5 metadevice has no errors and is functioning correctly. 

None. 

Attached/initialize (resyncing) 

The RAID5 metadevice is being resynced after an attach occurred, or after being created. 

Normally none. During the initialization of a new RAID5 metadevice, if an I/O error occurs, the device goes into the "Maintenance" state. If the initialization fails, the metadevice is in the "Initialization Failed" state and the slice is in the "Maintenance" state. If this happens, clear the metadevice and recreate it. 

Attention 

There is a problem with the RAID5 metadevice, but there is no immediate danger of losing data. 

Continue to monitor the status of the device. 

Urgent 

The RAID5 metadevice has a slice error and you are only one failure away from losing data. 

Fix the errored slice. See "How to Enable a Slice in a RAID5 Metadevice (DiskSuite Tool)", or "How to Replace a RAID5 Slice (DiskSuite Tool)".

Critical 

The RAID5 metadevice has more than one slice with an error. Data has potentially been corrupted. 

To fix the errored slices, see "How to Enable a Slice in a RAID5 Metadevice (DiskSuite Tool)", or "How to Replace a RAID5 Slice (DiskSuite Tool)". You may need to restore data from backup.

Trans Metadevice Status (DiskSuite Tool)

Table 3-6 explains the keywords in the Status fields of Trans Metadevice objects, and possible actions to take.

Table 3-6 Trans Metadevice Status Keywords

Keyword 

Meaning 

Action 

OK 

The device is functioning properly. If mounted, the file system is logging and will not be checked at boot (that is, the file system will not be checked by fsck at boot).

None. 

Detach Log (in progress) 

The trans metadevice log will be detached when the Trans metadevice is unmounted or at the next reboot. 

None. 

Detach Log (scheduled) 

The trans metadevice log will be detached the next time you click the Commit button. 

Click Commit to detach the log. The detach takes place at the next reboot, or when the file system is unmounted and remounted. 

Attention 

There is a problem with the trans metadevice, but there is no immediate danger of losing data. 

Continue to monitor the status of the trans metadevice. 

Urgent 

There is a problem with the trans metadevice and it is only one failure away from losing data. This state can only exist if the trans metadevice contains a RAID5 metadevice or mirror. 

Fix the errored mirror or RAID5 master device. See "Overview of Replacing and Enabling Slices in Mirrors and RAID5 Metadevices".

Critical (log missing) 

The trans metadevice does not have a logging device attached. 

Attach a logging device. Logging for the file system cannot start until a logging device is attached. 

Critical (log hard error) 

A device error or file system panic has occurred while the device was in use. An I/O error is returned for every read or write until the device is closed or unmounted. The first open causes the device to transition to the Error state. 

Fix the trans metadevice. See "How to Recover a Trans Metadevice With a File System Panic (Command Line)", or "How to Recover a Trans Metadevice With Hard Errors (Command Line)".

Critical (error) 

The device can be read and written. The file system can be mounted read-only. However, an I/O error is returned for every read or write that actually gets a device error. The device does not transition back to the Hard Error state, even when a later device error of file system panic occurs. 

Fix the trans metadevice. See "How to Recover a Trans Metadevice With a File System Panic (Command Line)", or "How to Recover a Trans Metadevice With Hard Errors (Command Line)". Successfully completing fsck(1M) or newfs(1M) transitions the device into the Okay state. When the device is in the Hard Error or Error state, fsck automatically checks and repairs the file system at boot time. newfs destroys whatever data may be on the device.

Hot Spare Pool and Hot Spare Status (DiskSuite Tool)

Table 3-7 explains the keywords in the Status fields of Hot Spare Pool objects, and possible actions to take.

Table 3-7 Hot Spare Pool Status Keywords

Keyword 

Meaning 

Action 

OK 

The hot spares are running and ready to accept data, but are not currently being written to or read from. 

None. 

In-use 

Hot spares are currently being written to and read from. 

Diagnose how the hot spares are being used. Then repair the slice in the metadevice for which the hot spare is being used. 

Attention 

There is a problem with a hot spare or hot spare pool, but there is no immediate danger of losing data. This status is also displayed if there are no hot spares in the Hot Spare Pool, or if all the hot spares are in use or any are broken. 

Diagnose how the hot spares are being used or why they are broken. You can add more hot spares to the hot spare pool if necessary.