Solstice DiskSuite 4.2.1 User's Guide

Repairing Trans Metadevice Problems

Because a trans metadevice is a "layered" metadevice, consisting of a master device and logging device, and because the logging device can be shared among file systems, repairing an errored trans metadevice requires special recovery tasks.

Any device errors or file system panics must be dealt with using the command line utilities.

File System Panics

If a file system detects any internal inconsistencies while it is in use, it will panic the system. If the file system is setup for UFS logging, it notifies the trans metadevice that it needs to be checked at reboot. The trans metadevice transitions itself to the "Hard Error" state. All other trans metadevices sharing the same logging device also go into the "Hard Error" state.

At reboot, fsck checks and repairs the file system and transitions the file system back to the "Okay" state. fsck does this for all trans metadevices listed in the /etc/vfstab file for the affected logging device.

Trans Metadevice Errors

Device errors can cause data loss. Read errors occurring on a logging device can cause significant data loss. For this reason, it is strongly recommended that you mirror the logging device.

If a device error occurs on either the master device or the logging device while the trans metadevice is processing logged data, the device transitions from the "Okay" state to the "Hard Error" state. If the device is either in the "Hard Error" or "Error" state, either a device error has occurred, or a file system panic has occurred.

Note -

Any devices sharing the errored logging device also go the "Error" state.

How to Recover a Trans Metadevice With a File System Panic (Command Line)

For file systems that fsck cannot repair, run fsck on each trans metadevice whose file systems share the affected logging device.

Example -- Recovering a Trans Metadevice

# fsck /dev/md/rdsk/trans

Only after all of the affected trans metadevices have been checked and successfully repaired will fsck reset the state of the errored trans metadevice to "Okay."

How to Recover a Trans Metadevice With Hard Errors (Command Line)

Use this procedure to transition a trans metadevice to the "Okay" state.

Refer to "How to Check the Status of Metadevices and Hot Spare Pools (Command Line)" to check the status of a trans metadevice.

If either the master or log devices encounter errors while processing logged data, the device transitions from the "Okay" state to the "Hard Error" state. If the device is in the "Hard Error" or "Error" state, either a device error or file system panic occurred. Recovery from both scenarios is the same.

Note -

If a log (logging device) is shared, a failure in any of the slices in a trans metadevice will result in all slices or metadevices associated with the trans metadevice switching to an errored state.

The high-level steps in this procedure are:

Unmounting the affected file system(s)
Backing up any accessible data
Fixing the device error
Repairing the file system (fsck(1M) or newfs(1M))

After checking the prerequisites ("Prerequisites for Maintaining DiskSuite Objects") and the preliminary information ("Repairing Trans Metadevice Problems"), run the lockfs(1M) command to determine which file systems are locked.
# lockfs
Affected file systems will be listed with a lock type of hard. Every file system sharing the same logging device will be hard locked.

Unmount the affected file system(s).

You can unmount locked file systems even if they were in use when the error occurred. If the affected processes try to access an opened file or directory on the hard locked or unmounted file system, an EIO error is returned.

[Optional] Back up any accessible data.

Before attempting to fix the device error, you may want to recover as much data as possible. If your backup procedure requires a mounted file system (such as tar or cpio), you can mount the file system read-only. If your backup procedure does not require a mounted file system (such as dump or volcopy), you can access the trans metadevice directly.

Fix the device error.

At this point, any attempt to open or mount the trans metadevice for read/write access starts rolling all accessible data on the logging device to the appropriate master device(s). Any data that cannot be read or written is discarded. However, if you open or mount the trans metadevice for read-only access, the log is simply rescanned and not rolled forward to the master device(s), and the error is not fixed. In other words, all of the data on the master and logging devices remains unchanged until the first read/write open or mount.

Run fsck(1M) to repair the file system, or newfs(1M) if you need to restore data.

Run fsck on all of the trans metadevices sharing the same logging device. When all of these trans metadevices have been repaired by fsck, they then revert to the "Okay" state.

The newfs(1M) command will also transition the file system back to the "Okay" state, but will destroy all of the data on the file system. newfs(1M) is generally used when you plan to restore file systems from backup.

The fsck(1M) or newfs(1M) commands must be run on all of the trans metadevices sharing the same logging device before these devices revert back to the "Okay" state.

Run the metastat(1M) command to verify that the state of the affected devices has reverted to "Okay."

Example -- Logging Device Error

# metastat d5
d5: Trans
    State: Hard Error  
    Size: 10080 blocks
    Master Device: d4
    Logging Device: c0t0d0s6
 
d4: Mirror
    State: Okay
...
c0t0d0s6: Logging device for d5
    State: Hard Error
    Size: 5350 blocks
...
# fsck /dev/md/rdsk/d5
** /dev/md/rdsk/d5
** Last Mounted on /fs1
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
WARNING: md: logging device: /dev/dsk/c0t0d0s6 changed state to
Okay
4 files, 11 used, 4452 free (20 frags, 554 blocks, 0.4%
fragmentation)
# metastat d5
d5: Trans
    State: Okay
    Size: 10080 blocks
    Master Device: d4
    Logging Device: c0t0d0s6
 
d4: Mirror
    State: Okay
...
 
c0t0d0s6: Logging device for d5
    State: Okay
...

This example fixes a trans metadevice, d5, which has a logging device in the "Hard Error" state. You must run fsck on the trans device itself. This transitions the state of the trans metadevice to "Okay." The metastat confirms that the state is "Okay."