Solaris ZFS Administration Guide

Repairing Damaged Data

The following sections describe how to identify the type of data corruption and how to repair the data, if possible.

ZFS uses checksumming, redundancy, and self-healing data to minimize the chances of data corruption. Nonetheless, data corruption can occur if the pool isn't redundant, if corruption occurred while the pool was degraded, or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Regardless of the source, the result is the same: The data is corrupted and therefore no longer accessible. The action taken depends on the type of data being corrupted, and its relative value. Two basic types of data can be corrupted:

Data is verified during normal operation as well as through scrubbing. For more information about how to verify the integrity of pool data, see Checking ZFS File System Integrity.

Identifying the Type of Data Corruption

By default, the zpool status command shows only that corruption has occurred, but not where this corruption occurred. For example:


# zpool status
   pool: monkey
state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         monkey      ONLINE       0     0     0
           c1t1d0s6  ONLINE       0     0     0
           c1t1d0s7  ONLINE       0     0     0

errors: 8 data errors, use '-v' for a list 

Each error would indicate only that an error occurred at a given point in time. Each error is not necessarily still present on the system. Under normal circumstances, this situation is true. Certain temporary outages might result in data corruption that is automatically repaired once the outage ends. A complete scrub of the pool is guaranteed to examine every active block in the pool, so the error log is reset whenever a scrub finishes. If you determine that the errors are no longer present, and you don't want to wait for a scrub to complete, reset all errors in the pool by using the zpool online command.

If the data corruption is in pool-wide metadata, the output is slightly different. For example:


# zpool status -v morpheus
  pool: morpheus
    id: 1422736890544688191
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-72
config:

        morpheus    FAULTED   corrupted data
          c1t10d0   ONLINE

In the case of pool-wide corruption, the pool is placed into the FAULTED state, because the pool cannot possibly provide the needed redundancy level.

Repairing a Corrupted File or Directory

If a file or directory is corrupted, the system might still be able to function depending on the type of corruption. Any damage is effectively unrecoverable if no good copies of the data exist anywhere on the system. If the data is valuable, you have no choice but to restore the affected data from backup. Even so, you might be able to recover from this corruption without restoring the entire pool.

If the damage is within a file data block, then the file can safely be removed, thereby clearing the error from the system. Use the zpool status -v command to display a list of filenames with persistent errors. For example:


# zpool status -v
   pool: monkey
state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         monkey      ONLINE       0     0     0
           c1t1d0s6  ONLINE       0     0     0
           c1t1d0s7  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files: 

/monkey/a.txt
/monkey/bananas/b.txt
/monkey/sub/dir/d.txt
monkey/ghost/e.txt
/monkey/ghost/boo/f.txt

A list of filenames with persistent errors might be described as follows:

If the corruption is within a directory or a file's metadata, the only choice is to move the file elsewhere. You can safely move any file or directory to a less convenient location, allowing the original object to be restored in place.

Repairing ZFS Storage Pool-Wide Damage

If the damage is in pool metadata that damage prevents the pool from being opened or imported, then the following options are available: