Solaris Volume Manager Administration Guide

Recovering From Boot Problems

Because Solaris Volume Manager enables you to mirror the root (/), swap, and /usr directories, special problems can arise when you boot the system, either through hardware failures or operator error. The tasks in this section provide solutions to such potential problems.

The following table describes these problems and points you to the appropriate solution.

Table 26–1 Common Solaris Volume Manager Boot Problems


Reason for the Boot Problem	Instructions
The `/etc/vfstab` file contains incorrect information.	How to Recover From Improper `/etc/vfstab` Entries
There are not enough state database replicas.	How to Recover From Insufficient State Database Replicas
A boot device (disk) has failed.	How to Recover From a Boot Device Failure
The boot mirror has failed.

Background Information for Boot Problems

If Solaris Volume Manager takes a volume offline due to errors, unmount all file systems on the disk where the failure occurred. Because each disk slice is independent, multiple file systems can be mounted on a single disk. If the software has encountered a failure, other slices on the same disk will likely experience failures soon. File systems mounted directly on disk slices do not have the protection of Solaris Volume Manager error handling, and leaving such file systems mounted can leave you vulnerable to crashing the system and losing data.
Minimize the amount of time you run with submirrors disabled or offline. During resynchronization and online backup intervals, the full protection of mirroring is gone.

How to Recover From Improper `/etc/vfstab` Entries

If you have made an incorrect entry in the /etc/vfstab file, for example, when mirroring root (/), the system will appear at first to be booting properly then fail. To remedy this situation, you need to edit the /etc/vfstab file while in single-user mode.

The high-level steps to recover from improper /etc/vfstab file entries are as follows:

Booting the system to single-user mode
Running the fsck command on the mirror volume
Remounting file system read-write
Optional: running the metaroot command for a root (/) mirror
Verifying that the /etc/vfstab file correctly references the volume for the file system entry
Rebooting

Recovering the root (`/`) RAID 1 (Mirror) Volume

In the following example, root (/) is mirrored with a two-way mirror, d0. The root (/) entry in the /etc/vfstab file has somehow reverted back to the original slice of the file system, but the information in the /etc/system file still shows booting to be from the mirror d0. The most likely reason is that the metaroot command was not used to maintain the /etc/system and /etc/vfstab files, or an old copy of the/etc/vfstab file was copied back.

The incorrect /etc/vfstab file would look something like the following:

#device        device          mount          FS      fsck   mount    mount
#to mount      to fsck         point          type    pass   at boot  options
#
/dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0  /       ufs      1     no       -
/dev/dsk/c0t3d0s1 -                   -       swap     -     no       -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6  /usr    ufs      2     no       -
#
/proc             -                  /proc    proc     -     no       -
swap              -                  /tmp     tmpfs    -     yes      -

Because of the errors, you automatically go into single-user mode when the system is booted:

ok boot
...
configuring network interfaces: hme0.
Hostname: lexicon
mount: /dev/dsk/c0t3d0s0 is not this fstype.
setmnt: Cannot open /etc/mnttab for writing

INIT: Cannot create /var/adm/utmp or /var/adm/utmpx

INIT: failed write of utmpx entry:"  "

INIT: failed write of utmpx entry:"  "

INIT: SINGLE USER MODE

Type Ctrl-d to proceed with normal startup,
(or give root password for system maintenance): <root-password>

At this point, root (/) and /usr are mounted read-only. Follow these steps:

Steps

Run the fsck command on the root (/) mirror.

Note –

Be careful to use the correct volume for root.

# fsck /dev/md/rdsk/d0
** /dev/md/rdsk/d0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2274 files, 11815 used, 10302 free (158 frags, 1268 blocks,
0.7% fragmentation)

Remount root (/) read/write so you can edit the /etc/vfstab file.

# mount -o rw,remount /dev/md/dsk/d0 /
mount: warning: cannot lock temp file </etc/.mnt.lock>

Run the metaroot command.
# metaroot d0
This command edits the /etc/system and /etc/vfstab files to specify that the root (/) file system is now on volume d0.

Verify that the /etc/vfstab file contains the correct volume entries.

The root (/) entry in the /etc/vfstab file should appear as follows so that the entry for the file system correctly references the RAID 1 volume:

#device           device              mount    FS      fsck   mount   mount
#to mount         to fsck             point    type    pass   at boot options
#
/dev/md/dsk/d0    /dev/md/rdsk/d0     /        ufs     1      no      -
/dev/dsk/c0t3d0s1 -                   -        swap    -      no      -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6  /usr     ufs     2      no      -
#
/proc             -                  /proc     proc    -      no      -
swap              -                  /tmp      tmpfs   -      yes     -

Reboot the system.

The system returns to normal operation.

How to Recover From a Boot Device Failure

If you have a root (/) mirror and your boot device fails, you'll need to set up an alternate boot device.

The high-level steps in this task are as follows:

Booting from the alternate root (/) submirror
Determining the errored state database replicas and volumes
Repairing the failed disk
Restoring state database and volumes to their original state

In the following example, the boot device contains two of the six state database replicas and the root (/), swap, and /usr submirrors fails.

Initially, when the boot device fails, you'll see a message similar to the following. This message might differ among various architectures.

Rebooting with command:
Boot device: /iommu/sbus/dma@f,81000/esp@f,80000/sd@3,0   
The selected SCSI device is not responding
Can't open boot device
...

When you see this message, note the device. Then, follow these steps:

Steps

Boot from another root (/) submirror.

Since only two of the six state database replicas in this example are in error, you can still boot. If this were not the case, you would need to delete the inaccessible state database replicas in single-user mode. This procedure is described in How to Recover From Insufficient State Database Replicas.

When you created the mirror for the root (/) file system, you should have recorded the alternate boot device as part of that procedure. In this example, disk2 is that alternate boot device.

ok boot disk2
SunOS Release 5.9 Version s81_51 64-bit
Copyright 1983-2001 Sun Microsystems, Inc.  All rights reserved.
Hostname: demo
...
demo console login: root
Password: <root-password>
Dec 16 12:22:09 lexicon login: ROOT LOGIN /dev/console
Last login: Wed Dec 12 10:55:16 on console
Sun Microsystems Inc.   SunOS 5.9       s81_51  May 2002
...

Determine that two state database replicas have failed by using the metadb command.

# metadb
       flags         first blk    block count
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t2d0s3
    a    p  luo      16           1034         /dev/dsk/c0t1d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t1d0s3

The system can no longer detect state database replicas on slice /dev/dsk/c0t3d0s3, which is part of the failed disk.

Determine that half of the root (/), swap, and /usr mirrors have failed by using the metastat command.

# metastat
d0: Mirror
    Submirror 0: d10
      State: Needs maintenance
    Submirror 1: d20
      State: Okay
...
 
d10: Submirror of d0
    State: Needs maintenance
    Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>"
    Size: 47628 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t3d0s0          0     No    Maintenance 
 
d20: Submirror of d0
    State: Okay
    Size: 47628 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t2d0s0          0     No    Okay  
 
d1: Mirror
    Submirror 0: d11
      State: Needs maintenance
    Submirror 1: d21
      State: Okay
...
 
d11: Submirror of d1
    State: Needs maintenance
    Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 <new device>"
    Size: 69660 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t3d0s1          0     No    Maintenance 
 
d21: Submirror of d1
    State: Okay
    Size: 69660 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t2d0s1          0     No    Okay        
 
d2: Mirror
    Submirror 0: d12
      State: Needs maintenance
    Submirror 1: d22
      State: Okay
...
 
d2: Mirror
    Submirror 0: d12
      State: Needs maintenance
    Submirror 1: d22
      State: Okay
...
 
d12: Submirror of d2
    State: Needs maintenance
    Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 <new device>"
    Size: 286740 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t3d0s6          0     No    Maintenance 
 
 
d22: Submirror of d2
    State: Okay
    Size: 286740 blocks
    Stripe 0:
	Device              Start Block  Dbase State        Hot Spare
	/dev/dsk/c0t2d0s6          0     No    Okay

In this example, the metastat command shows that following submirrors need maintenance:

Submirror d10, device c0t3d0s0
Submirror d11, device c0t3d0s1
Submirror d12, device c0t3d0s6

Halt the system, replace the disk, and use the format command or the fmthard command, to partition the disk as it was before the failure.

Tip –
If the new disk is identical to the existing disk (the intact side of the mirror in this example), use prtvtoc /dev/rdsk/c0t2d0s2 | fmthard -s - /dev/rdsk/c0t3d0s2 to quickly format the new disk (c0t3d0 in this example)
# halt ... Halted ... ok boot ... # format /dev/rdsk/c0t3d0s0

Reboot.

Note that you must reboot from the other half of the root (/) mirror. You should have recorded the alternate boot device when you created the mirror.
# halt ... ok boot disk2

To delete the failed state database replicas and then add them back, use the metadb command.

# metadb
       flags         first blk    block count
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    M     p          unknown      unknown      /dev/dsk/c0t3d0s3
    a m  p  luo      16           1034         /dev/dsk/c0t2d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t2d0s3
    a    p  luo      16           1034         /dev/dsk/c0t1d0s3
    a    p  luo      1050         1034         /dev/dsk/c0t1d0s3
# metadb -d c0t3d0s3
# metadb -c 2 -a c0t3d0s3
# metadb
       flags         first blk    block count
     a m  p  luo     16           1034         /dev/dsk/c0t2d0s3
     a    p  luo     1050         1034         /dev/dsk/c0t2d0s3
     a    p  luo     16           1034         /dev/dsk/c0t1d0s3
     a    p  luo     1050         1034         /dev/dsk/c0t1d0s3
     a        u      16           1034         /dev/dsk/c0t3d0s3
     a        u      1050         1034         /dev/dsk/c0t3d0s3

Re-enable the submirrors by using the metareplace command.

# metareplace -e d0 c0t3d0s0
Device /dev/dsk/c0t3d0s0 is enabled
 
# metareplace -e d1 c0t3d0s1
Device /dev/dsk/c0t3d0s1 is enabled
 
# metareplace -e d2 c0t3d0s6
Device /dev/dsk/c0t3d0s6 is enabled

After some time, the resynchronization will complete. You can now return to booting from the original device.

Recovering From Boot Problems

Background Information for Boot Problems

How to Recover From Improper /etc/vfstab Entries

Recovering the root (/) RAID 1 (Mirror) Volume

Steps

How to Recover From a Boot Device Failure

Steps

How to Recover From Improper `/etc/vfstab` Entries

Recovering the root (`/`) RAID 1 (Mirror) Volume