Solstice DiskSuite 4.2.1 User's Guide

Overview of Replacing and Enabling Slices in Mirrors and RAID5 Metadevices

DiskSuite has the capability to replace and enable slices within mirrors and RAID5 metadevices.

In DiskSuite terms, replacing a slice is a way to substitute an available slice on the system for a selected slice in a submirror or RAID5 metadevice. You can think of this as a "metareplace," as opposed to physically replacing the slice. Enabling a slice means to "activate" or substitute a slice with itself (that is, the slice name is the same).

The following describes the two methods you can use and when you would use them.


Note -

When recovering from disk errors, scan /var/adm/messages to see what kind of errors occurred. If the errors are of a transitory nature and the disks themselves do not have problems, try enabling the errored slices. You can also use the format(1M) command to test a disk.


Enabling a Slice

It is appropriate to enable a slice when:

You can enable a slice when:

  1. DiskSuite cannot access the physical drive. This may have occurred, for example, due to a power loss, or a loose drive cable. In this case, DiskSuite puts the slices in the "Maintenance" state. You need to make sure the drive is accessible (restore power, recable, and so on) then enable the slices in the metadevices.

  2. You suspect that a physical drive is having transitory problems that are not disk-related. You might be able to fix a slice in the "Maintenance" state by simply enabling it. If this does not fix the problem, then you need to either physically replace the disk drive and enable the slice, or "metareplace" the slice with another available slice on the system.

    When you physically replace a drive, be sure to partition it the same as old drive. Note that after the drive has been physically replaced and partitioned like the old one, the task to enable the errored slice(s) is the same as for the first condition described above.


Note -

Always check for state database replicas and hot spares on the drive being replaced. Any state database replica shown to be in error should be deleted before replacing the disk and added back (making sure the size is the same) before enabling the slice. You should treat hot spares in the same manner.


Replacing a Slice with Another Available Slice

You use the DiskSuite "metareplace" slice feature when replacing or swapping an existing slice with a different slice that is available and not in use on the system.

You can use this method when:

  1. A disk drive has problems, and you don't have a replacement drive but you do have available slices elsewhere on the system. (You might want to do this if a replacement is absolutely necessary but you don't want to shut down the system.)

  2. You are seeing soft errors. Physical disks may report soft errors even though DiskSuite shows the mirror/submirror or RAID5 metadevice in the "Okay" state. Replacing the slice in question with another available slice enables you to perform preventative maintenance and potentially prevent hard errors from occurring.

  3. You want to do performance tuning. For example, by using DiskSuite Tool's performance monitor, you see that a particular slice in a RAID5 metadevice is experiencing a high load average, even though it's in the "Okay" state. To balance the load on the metadevice, you can replace that slice with one from a disk that is less utilized. This type of replacement can be performed online without interrupting service to the metadevice.


Note -

DiskSuite Tool enables you to replace an entire submirror if necessary. To do so, you create a new submirror (Concat/Stripe object) and drag it on top of the submirror to be replaced. This task is documented in "How to Replace a Submirror (DiskSuite Tool)".


Maintenance vs. Last Erred States

When a slice in a mirror or RAID5 metadevice device experiences errors, DiskSuite puts the slice in the "Maintenance" state. No further reads or writes are performed to a slice in the "Maintenance" state. Subsequent errors on other slices in the same metadevice are handled differently, depending on the type of metadevice. A mirror may be able to tolerate many slices in the "Maintenance" state and still be read from and written to. A RAID5 metadevice, by definition, can only tolerate a single slice in the "Maintenance" state. When either a mirror or RAID5 metadevice has a slice in the "Last Erred" state, I/O is still attempted to the slice marked "Last Erred." This is because a "Last Erred" slice contains the last good copy of data from DiskSuite's point of view. With a slice in the "Last Erred" state, the metadevice behaves like a normal device (disk) and returns I/O errors to an application. Usually, at this point some data has been lost.

Always replace slices in the "Maintenance" state first, followed by those in the "Last Erred" state. After a slice is replaced and resynced, use the metastat(1M) command to verify its state, then validate the data to make sure it is good.

Mirrors: If slices are in the "Maintenance" state, no data has been lost. You can safely replace or enable the slices in any order. If a slice is in the "Last Erred" state, you cannot replace it until you first replace all the other mirrored slices in the "Maintenance" state. Replacing or enabling a slice in the "Last Erred" state usually means that some data has been lost. Be sure to validate the data on the mirror after repairing it.

RAID5 Metadevices: A RAID5 metadevice can tolerate a single slice failure. You can safely replace a single slice in the "Maintenance" state without losing data. If an error on another slice occurs, it is put into the "Last Erred" state. At this point, the RAID5 metadevice is a read-only device; you need to perform some type of error recovery so that the state of the RAID5 metadevice is non-errored and the possibility of data loss is reduced. If a RAID5 metadevice reaches a "Last Erred" state, there is a good chance it has lost data. Be sure to validate the data on the RAID5 metadevice after repairing it.