How to Recover From Insufficient State Database Replicas (Command Line) (Solstice DiskSuite 4.2.1 User's Guide)

Solstice DiskSuite 4.2.1 User's Guide

How to Recover From Insufficient State Database Replicas (Command Line)

If for some reason the state database replica quorum is not met, for example, due to a drive failure, the system cannot be rebooted. In DiskSuite terms, the state database has gone "stale." This task explains how to recover.

The high-level steps in this task are:

Deleting the stale state database replicas and rebooting
Repairing the problem disk
Adding back the state database replica(s)

Example -- Recovering From Stale State Database Replicas

In the following example, a disk containing two replicas has gone bad. This leaves the system with only two good replicas, and the system cannot reboot.

Boot the machine to determine which state database replicas are down.

ok boot
...
Hostname: demo
metainit: demo: stale databases
 
Insufficient metadevice database replicas located.
 
Use metadb to delete databases which are broken.
Ignore any "Read-only file system" error messages.
Reboot the system when finished to reload the metadevice
database.
After reboot, repair any broken database replicas which were
deleted.
 
Type Ctrl-d to proceed with normal startup,
(or give root password for system maintenance): <root-password>
Entering System Maintenance Mode
 
SunOS Release 5.5 Version Generic [UNIX(R) System V Release 4.0]

Use the metadb(1M) command to look at the metadevice state database and see which state database replicas are not available.

# metadb -i
   flags      first blk      block count
    a m  p  lu    16                1034                  /dev/dsk/c0t3d0s3
    a   p  l      1050              1034                  /dev/dsk/c0t3d0s3
    M  p        unknown      unknown                      /dev/dsk/c1t2d0s3
    M  p        unknown      unknown                      /dev/dsk/c1t2d0s3
...

The system can no longer detect state database replicas on slice /dev/dsk/c1t2d0s3, which is part of the failed disk. The metadb command flags the replicas on this slice as having a problem with the master blocks.

Delete the state database replicas on the bad disk using the -d option to the metadb(1M) command.

At this point, the root (/) file system is read-only. You can ignore the mddb.cf error messages:
# metadb -d -f c1t2d0s3 metadb: demo: /etc/lvm/mddb.cf.new: Read-only file system

Verify that the replicas were deleted.

# metadb -i
    flags        first blk       block count
     a m  p  lu         16              1034            /dev/dsk/c0t3d0s3
     a    p  l          1050            1034            /dev/dsk/c0t3d0s3

Reboot.

Once you have a replacement disk, halt the system, replace the failed disk, and once again, reboot the system. Use the format(1M) command or the fmthard(1M) command to partition the disk as it was before the failure.
# halt ... ok boot ... # format /dev/rdsk/c1t2d0s0 ...

Use the metadb(1M) command to add back the state database replicas and to determine that the state database replicas are correct.

# metadb -a -c 2 c1t2d0s3
# metadb
   flags        first blk  block count
  a m  p  luo      16           1034         dev/dsk/c0t3d0s3
  a    p  luo      1050         1034         dev/dsk/c0t3d0s3
  a       u        16           1034         dev/dsk/c1t2d0s3
  a       u        1050         1034         dev/dsk/c1t2d0s3

The metadb command with the -c 2 option adds two state database replicas to the same slice.