Solaris Volume Manager Administration Guide

Recovering Transactional Volumes When Errors Occur

How to Recover a Transactional Volume With a Panic

    For file systems that the fsck command cannot repair, run the fsck command on each transactional volume whose file systems share the affected log device.

Example—Recovering a Transactional Volume


# fsck /dev/md/rdsk/trans

Only after all of the affected transactional volumes have been checked and successfully repaired will the fsck command reset the state of the failed transactional volume to “Okay.”

How to Recover a Transactional Volume With Hard Errors

Use this procedure to transition a transactional volume to the “Okay” state.

See How to Check the State of Transactional Volumes to check the status of a transactional volume.

If either the master device or log device encounters errors while processing logged data, the device transitions from the “Okay” state to the “Hard Error” state. If the device is in the “Hard Error” or “Error” state, either a device error or panic occurred. Recovery from both scenarios is the same.


Note –

If a log (log device) is shared, a failure in any of the slices in a transactional volume will result in all slices or volumes that are associated with the transactional volume switching to a failed state.


  1. Check Prerequisites for Creating Solaris Volume Manager Components and Background Information for Transactional Volumes.

  2. Read Background Information for Transactional Volumes.

  3. Run the lockfs command to determine which file systems are locked.


    # lockfs
    

    Affected file systems are listed with a lock type of hard. Every file system that shares the same log device would be hard locked.

  4. Unmount the affected file system(s).

    You can unmount locked file systems even if they were in use when the error occurred. If the affected processes try to access an opened file or directory on the hard locked or unmounted file system, an error is returned.

  5. (Optional) Back up any accessible data.

    Before you attempt to fix the device error, you might want to recover as much data as possible. If your backup procedure requires a mounted file system (such as the tar command or the cpio command), you can mount the file system read-only. If your backup procedure does not require a mounted file system (such as the dump command or the volcopy command), you can access the transactional volume directly.

  6. Fix the device error.

    At this point, any attempt to open or mount the transactional volume for read-and-write access starts rolling all accessible data on the log device to the appropriate master devices. Any data that cannot be read or written is discarded. However, if you open or mount the transactional volume for read-only access, the log is simply rescanned and not rolled forward to the master devices, and the error is not fixed. In other words, all data on the master device and log device remains unchanged until the first read or write open or mount.

  7. Run the fsck command to repair the file system, or the newfs command if you need to restore data.

    Run the fsck command on all of the transactional volumes that share the same log device. When all transactional volumes have been repaired by the fsck command, they then revert to the “Okay” state.

    The newfs command will also transition the file system back to the “Okay” state, but the command will destroy all of the data on the file system. The newfs command is generally used when you plan to restore file systems from backup.

    The fsck or newfs commands must be run on all of the transactional volumes that share the same log device before these devices revert back to the “Okay” state.

  8. Run the metastat command to verify that the state of the affected devices has reverted to “Okay.”

Example—Logging Device Error


# metastat d5
d5: Trans
    State: Hard Error  
    Size: 10080 blocks
    Master Device: d4
    Logging Device: c0t0d0s6
 
d4: Mirror
    State: Okay
...
c0t0d0s6: Logging device for d5
    State: Hard Error
    Size: 5350 blocks
...
# fsck /dev/md/rdsk/d5
** /dev/md/rdsk/d5
** Last Mounted on /fs1
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
WARNING: md: log device: /dev/dsk/c0t0d0s6 changed state to
Okay
4 files, 11 used, 4452 free (20 frags, 554 blocks, 0.4%
fragmentation)
# metastat d5
d5: Trans
    State: Okay
    Size: 10080 blocks
    Master Device: d4
    Logging Device: c0t0d0s6
 
d4: Mirror
    State: Okay
...
 
c0t0d0s6: Logging device for d5
    State: Okay
...

This example shows a transactional volume, d5, which has a log device in the “Hard Error” state, being fixed. You must run the fsck command on the transactional volume itself, which transitions the state of the transactional volume to “Okay.” The metastat command confirms that the state is “Okay.”