How to Replace a Failed SCSI Disk (Command Line) (Solstice DiskSuite 4.2.1 User's Guide)

Solstice DiskSuite 4.2.1 User's Guide

How to Replace a Failed SCSI Disk (Command Line)

The high-level steps to replace a SCSI disk that is not part of a SPARCstorage Array are:

Identifying the disk that needs replacing
Deleting any metadevice state database replicas that are on the problem disk
Deleting hot spares marked "Available" that are on the problem disk
Locating and detaching submirrors that use slices on the problem disk
Halting the system and booting to singler-user mode
Physically replacing the disk
Repartitioning the new disk
Adding metadevice state database replicas that were deleted
Performing one of the following, depending on how the slice that failed was used:

For a simple slice: use normal recovery procedures For a stripe or concatenation: newfs entire metadevice, restore from backup For a mirror: reattach detached submirrors For a RAID5 metadevice: resync (enable) affected slices For a trans metadevice: run fsck(1M)
Adding hot spares that were deleted to hot spare pools

Identify the disk to be replaced by examining /var/adm/messages and metastat output.

Locate any local metadevice state database replicas that may have been placed on the problem disk. Use the metadb command to find the replicas.

Errors may be reported for the replicas located on the failed disk. In this example, c0t1d0 is the problem device.

# metadb
   flags       first blk        block count
  a m     u        16               1034            /dev/dsk/c0t0d0s4
  a       u        1050             1034            /dev/dsk/c0t0d0s4
  a       u        2084             1034            /dev/dsk/c0t0d0s4
  W   pc luo       16               1034            /dev/dsk/c0t1d0s4
  W   pc luo       1050             1034            /dev/dsk/c0t1d0s4
  W   pc luo       2084             1034            /dev/dsk/c0t1d0s4

The output above shows three state database replicas on Slice 4 of each of the local disks, c0t0d0 and c0t1d0. The W in the flags field of the c0t1d0s4 slice indicates that the device has write errors. Three replicas on the c0t0d0s4 slice are still good.

Caution -

If, after deleting the bad state database replicas, you are left with three or less, add more state database replicas before continuing. This will ensure that your system reboots correctly.

Record the slice name where the replicas reside and the number of replicas, then delete the state database replicas.

The number of replicas is obtained by counting the number of appearances of a slice in metadb output in Step 2. In this example, the three state database replicas that exist on c0t1d0s4 are deleted.
# metadb -d c0t1d0s4

Locate any submirrors using slices on the problem disk and detach them.

The metastat command can show the affected mirrors. In this example, one submirror, d10, is also using c0t1d0s4. The mirror is d20.
# metadetach d20 d10 d20: submirror d10 is detached

Delete hot spares on the problem disk.

# metahs -d hsp000 c0t1d0s6
hsp000: Hotspare is deleted

Halt the system and boot to single-user mode.
# halt ... ok boot -s ...

Physically replace the problem disk.

Repartition the new disk.

Use the format(1M) command or the fmthard(1M) command to partition the disk with the same slice information as the failed disk.

If you deleted replicas in Step 3, add the same number back to the appropriate slice.

In this example, /dev/dsk/c0t1d0s4 is used.
# metadb -a c 3 c0t1d0s4

Depending on how the disk was used, you may have a variety of things to do. Use the following table to decide what to do next.

Table 7-2 SCSI Disk Replacement Decision Table


Type of Device	Do the Following ...
Slice	Use normal data recovery procedures.
Unmirrored Stripe or Concatenation	If the stripe/concat is used for a file system, run `newfs(1M)`, mount the file system then restore data from backup. If the stripe/concat is used as an application that uses the raw device, that application must have its own recovery procedures.
Mirror (Submirror)	Run `metattach(1M)` to reattach a detached submirror.
RAID5 metadevice	Run `metareplace(1M)` to re-enable the slice. This causes the resyncs to start.
Trans metadevice	Run `fsck(1M)` to repair the trans metadevice.

Replace hot spares that were deleted, and add them to the appropriate hot spare pool(s).
# metahs -a hsp000 c0t0d0s6 hsp000: Hotspare is added

Validate the data.

Check the user/application data on all metadevices. You may have to run an application-level consistency checker or use some other method to check the data.