Solstice DiskSuite 4.2.1 User's Guide

How to Recover From a Boot Device Failure (Command Line)

If you have a root (/) mirror and your boot device fails, you'll need to set up an alternate boot device.

The high-level steps in this task are:

Example -- Recovering From a Boot Device Failure

In the following example, the boot device containing two of the six state database replicas and the root (/), swap, and /usr submirrors fails.

Initially, when the boot device fails, you'll see a message similar to the following. This message may differ among various architectures.


Rebooting with command:
Boot device: /iommu/sbus/dma@f,81000/esp@f,80000/sd@3,0   File and args: kadb
kadb: kernel/unix
The selected SCSI device is not responding
Can't open boot device
...

When you see this message, note the device. Then, follow these steps:

  1. Boot from another root (/) submirror.

    Since only two of the six state database replicas in this example are in error, you can still boot. If this were not the case, you would need to delete the stale state database replicas in single-user mode. This procedure is described in "How to Recover From Insufficient State Database Replicas (Command Line)".

    When you created the mirror for the root (/) file system, you should have recorded the alternate boot device as part of that procedure. In this example, disk2 is that alternate boot device.


    ok boot disk2
    ...
    SunOS Release 5.5 Version Generic [UNIX(R) System V Release 4.0]
    Copyright (c) 1983-1995, Sun Microsystems, Inc.
     
    Hostname: demo
    ...
    demo console login: root
    Password: <root-password>
    Last login: Wed Dec 16 13:15:42 on console
    SunOS Release 5.1 Version Generic [UNIX(R) System V Release 4.0]
    ...
  2. Use the metadb(1M) command to determine that two state database replicas have failed.


    # metadb
           flags         first blk    block count
        M     p          unknown      unknown      /dev/dsk/c0t3d0s3
        M     p          unknown      unknown      /dev/dsk/c0t3d0s3
        a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
        a    p  luo      1050         1034         /dev/dsk/c0t2d0s3
        a    p  luo      16           1034         /dev/dsk/c0t1d0s3
        a    p  luo      1050         1034         /dev/dsk/c0t1d0s3

    The system can no longer detect state database replicas on slice /dev/dsk/c0t3d0s3, which is part of the failed disk.

  3. Use the metastat(1M) command to determine that half of the root (/), swap, and /usr mirrors have failed.


    # metastat
    d0: Mirror
        Submirror 0: d10
          State: Needs maintenance
        Submirror 1: d20
          State: Okay
    ...
     
    d10: Submirror of d0
        State: Needs maintenance
        Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>"
        Size: 47628 blocks
        Stripe 0:
    	Device              Start Block  Dbase State        Hot Spare
    	/dev/dsk/c0t3d0s0          0     No    Maintenance 
     
    d20: Submirror of d0
        State: Okay
        Size: 47628 blocks
        Stripe 0:
    	Device              Start Block  Dbase State        Hot Spare
    	/dev/dsk/c0t2d0s0          0     No    Okay  
     
    d1: Mirror
        Submirror 0: d11
          State: Needs maintenance
        Submirror 1: d21
          State: Okay
    ...
     
    d11: Submirror of d1
        State: Needs maintenance
        Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 <new device>"
        Size: 69660 blocks
        Stripe 0:
    	Device              Start Block  Dbase State        Hot Spare
    	/dev/dsk/c0t3d0s1          0     No    Maintenance 
     
    d21: Submirror of d1
        State: Okay
        Size: 69660 blocks
        Stripe 0:
    	Device              Start Block  Dbase State        Hot Spare
    	/dev/dsk/c0t2d0s1          0     No    Okay        
     
    d2: Mirror
        Submirror 0: d12
          State: Needs maintenance
        Submirror 1: d22
          State: Okay
    ...
     
    d2: Mirror
        Submirror 0: d12
          State: Needs maintenance
        Submirror 1: d22
          State: Okay
    ...
     
    d12: Submirror of d2
        State: Needs maintenance
        Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 <new device>"
        Size: 286740 blocks
        Stripe 0:
    	Device              Start Block  Dbase State        Hot Spare
    	/dev/dsk/c0t3d0s6          0     No    Maintenance 
     
     
    d22: Submirror of d2
        State: Okay
        Size: 286740 blocks
        Stripe 0:
    	Device              Start Block  Dbase State        Hot Spare
    	/dev/dsk/c0t2d0s6          0     No    Okay  

    In this example, the metastat shows that following submirrors need maintenance:

    • Submirror d10, device c0t3d0s0

    • Submirror d11, device c0t3d0s1

    • Submirror d12, device c0t3d0s6

  4. Halt the system, repair the disk, and use the format(1M) command or the fmthard(1M) command, to partition the disk as it was before the failure.


    # halt
    ...
    Halted
    ...
    ok boot
    ...
    # format /dev/rdsk/c0t3d0s0
    
  5. Reboot.

    Note that you must reboot from the other half of the root (/) mirror. You should have recorded the alternate boot device when you created the mirror.


    # halt
    ...
    ok boot disk2
    
  6. To delete the failed state database replicas and then add them back, use the metadb(1M) command.


    # metadb
           flags         first blk    block count
        M     p          unknown      unknown      /dev/dsk/c0t3d0s3
        M     p          unknown      unknown      /dev/dsk/c0t3d0s3
        a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
        a    p  luo      1050         1034         /dev/dsk/c0t2d0s3
        a    p  luo      16           1034         /dev/dsk/c0t1d0s3
        a    p  luo      1050         1034         /dev/dsk/c0t1d0s3
    # metadb -d c0t3d0s3
    # metadb -c 2 -a c0t3d0s3
    # metadb
           flags         first blk    block count
         a m  p  luo     16           1034         /dev/dsk/c0t2d0s3
         a    p  luo     1050         1034         /dev/dsk/c0t2d0s3
         a    p  luo     16           1034         /dev/dsk/c0t1d0s3
         a    p  luo     1050         1034         /dev/dsk/c0t1d0s3
         a        u      16           1034         /dev/dsk/c0t3d0s3
         a        u      1050         1034         /dev/dsk/c0t3d0s3
  7. Use the metareplace(1M) command to re-enable the submirrors.


    # metareplace -e d0 c0t3d0s0
    Device /dev/dsk/c0t3d0s0 is enabled
     
    # metareplace -e d1 c0t3d0s1
    Device /dev/dsk/c0t3d0s1 is enabled
     
    # metareplace -e d2 c0t3d0s6
    Device /dev/dsk/c0t3d0s6 is enabled

    After some time, the resyncs will complete. You can now return to booting from the original device.