When power is lost to one SPARCstorage Array, the following occurs:
I/O operations to the DiskSuite objects will generate errors.
Errors are reported at the slice level rather than the drive level.
Errors are not reported until I/O operations are made to the disk.
Hot spare activity may be initiated if affected devices have assigned hot spares.
You must monitor the configuration for these events using the metastat(1M) command as explained in "Checking Status of DiskSuite Objects".
You may need to perform the following after power is restored:
Identify errored devices with metastat
Enable errored submirrors or RAID5 metadevices
Delete/recreate affected state database replicas
After power is restored, use the metastat command to identify the errored devices.
# metastat ... d10: Trans State: Okay Size: 11423440 blocks Master Device: d20 Logging Device: d15 d20: Mirror Submirror 0: d30 State: Needs maintenance Submirror 1: d40 State: Okay ... d30: Submirror of d20 State: Needs maintenance ... |
Return errored devices to service using the metareplace command:
# metareplace -e metadevice slice |
The -e option transitions the state of the slice to the "Available" state and resyncs the failed slice.
Slices that have been replaced by a hot spare should be the last devices replaced using the metareplace command. If the hot spare is replaced first, it could replace another errored slice in a submirror as soon as it becomes available.
A resync can be performed on only one slice of a submirror (metadevice) at a time. If all slices of a submirror were affected by the power outage, each slice must be replaced separately. It takes approximately 10 minutes for a resync to be performed on a 1.05-Gbyte disk.
Depending on the number of submirrors and the number of slices in these submirrors, the resync actions can require a considerable amount of time. A single submirror that is made up of 30 1.05-Gbyte drives might take about five hours to complete. A more realistic configuration made up of five-slice submirrors might take only 50 minutes to complete.
After the loss of power, all state database replicas on the affected SPARCstorage Array chassis will enter an errored state. While these will be reclaimed at the next reboot, you may want to manually return them to service by first deleting and then adding them back.
# metadb -d slice # metadb -a slice |
Make sure you add back the same number of state database replicas that were deleted on each slice. Multiple state database replicas can be deleted with a single metadb command. It may require multiple invocations of metadb -a to add back the replicas deleted by a single metadb -d. This is because if you need multiple copies of replicas on one slice these must be added in one invocation of metadb using the -c flag. Refer to the metadb(1M) man page for more information.
Because state database replica recovery is not automatic, it is safest to manually perform the recovery immediately after the SPARCstorage Array returns to service. Otherwise, a new failure may cause a majority of state database replicas to be out of service and cause a kernel panic. This is the expected behavior of DiskSuite when too few state database replicas are available.