If for some reason the state database replica quorum is not met, for example, due to a drive failure, the system cannot be rebooted. In DiskSuite terms, the state database has gone "stale." This task explains how to recover.
The high-level steps in this task are:
Deleting the stale state database replicas and rebooting
Repairing the problem disk
Adding back the state database replica(s)
In the following example, a disk containing two replicas has gone bad. This leaves the system with only two good replicas, and the system cannot reboot.
Boot the machine to determine which state database replicas are down.
ok boot ... Hostname: demo metainit: demo: stale databases Insufficient metadevice database replicas located. Use metadb to delete databases which are broken. Ignore any "Read-only file system" error messages. Reboot the system when finished to reload the metadevice database. After reboot, repair any broken database replicas which were deleted. Type Ctrl-d to proceed with normal startup, (or give root password for system maintenance): <root-password> Entering System Maintenance Mode SunOS Release 5.5 Version Generic [UNIX(R) System V Release 4.0] |
Use the metadb(1M) command to look at the metadevice state database and see which state database replicas are not available.
# metadb -i flags first blk block count a m p lu 16 1034 /dev/dsk/c0t3d0s3 a p l 1050 1034 /dev/dsk/c0t3d0s3 M p unknown unknown /dev/dsk/c1t2d0s3 M p unknown unknown /dev/dsk/c1t2d0s3 ... |
The system can no longer detect state database replicas on slice /dev/dsk/c1t2d0s3, which is part of the failed disk. The metadb command flags the replicas on this slice as having a problem with the master blocks.
Delete the state database replicas on the bad disk using the -d option to the metadb(1M) command.
At this point, the root (/) file system is read-only. You can ignore the mddb.cf error messages:
# metadb -d -f c1t2d0s3 metadb: demo: /etc/lvm/mddb.cf.new: Read-only file system |
Verify that the replicas were deleted.
# metadb -i flags first blk block count a m p lu 16 1034 /dev/dsk/c0t3d0s3 a p l 1050 1034 /dev/dsk/c0t3d0s3 |
Reboot.
Once you have a replacement disk, halt the system, replace the failed disk, and once again, reboot the system. Use the format(1M) command or the fmthard(1M) command to partition the disk as it was before the failure.
# halt ... ok boot ... # format /dev/rdsk/c1t2d0s0 ... |
Use the metadb(1M) command to add back the state database replicas and to determine that the state database replicas are correct.
# metadb -a -c 2 c1t2d0s3 # metadb flags first blk block count a m p luo 16 1034 dev/dsk/c0t3d0s3 a p luo 1050 1034 dev/dsk/c0t3d0s3 a u 16 1034 dev/dsk/c1t2d0s3 a u 1050 1034 dev/dsk/c1t2d0s3 |
The metadb command with the -c 2 option adds two state database replicas to the same slice.