Skip Navigation Links | |
Exit Print View | |
Oracle Solaris ZFS Administration Guide Oracle Solaris 10 1/13 Information Library |
1. Oracle Solaris ZFS File System (Introduction)
2. Getting Started With Oracle Solaris ZFS
3. Managing Oracle Solaris ZFS Storage Pools
4. Installing and Booting an Oracle Solaris ZFS Root File System
5. Managing Oracle Solaris ZFS File Systems
6. Working With Oracle Solaris ZFS Snapshots and Clones
7. Using ACLs and Attributes to Protect Oracle Solaris ZFS Files
8. Oracle Solaris ZFS Delegated Administration
9. Oracle Solaris ZFS Advanced Topics
10. Oracle Solaris ZFS Troubleshooting and Pool Recovery
Resolving General Hardware Problems
Identifying Hardware and Device Faults
System Reporting of ZFS Error Messages
Identifying Problems With ZFS Storage Pools
Determining If Problems Exist in a ZFS Storage Pool
Overall Pool Status Information
ZFS Storage Pool Configuration Information
ZFS Storage Pool Scrubbing Status
Resolving ZFS Storage Device Problems
Resolving a Missing or Removed Device
Physically Reattaching a Device
Notifying ZFS of Device Availability
Replacing or Repairing a Damaged Device
Determining the Type of Device Failure
Clearing Transient Device Errors
Replacing a Device in a ZFS Storage Pool
Determining If a Device Can Be Replaced
Devices That Cannot be Replaced
Replacing a Device in a ZFS Storage Pool
Resolving ZFS File System Problems
Resolving Data Problems in a ZFS Storage Pool
Checking ZFS File System Integrity
Controlling ZFS Data Scrubbing
ZFS Data Scrubbing and Resilvering
ZFS File System Space Reporting
ZFS Storage Pool Space Reporting
Identifying the Type of Data Corruption
Repairing a Corrupted File or Directory
Repairing a Damaged ZFS Configuration
Repairing an Unbootable System
11. Recommended Oracle Solaris ZFS Practices
Examples of data problems include the following:
Transient I/O errors due to a bad disk or controller
On-disk data corruption due to cosmic rays
Driver bugs resulting in data being transferred to or from the wrong location
A user overwriting portions of the physical device by accident
In some cases, these errors are transient, such as a random I/O error while the controller is having problems. In other cases, the damage is permanent, such as on-disk corruption. Even still, whether the damage is permanent does not necessarily indicate that the error is likely to occur again. For example, if you accidentally overwrite part of a disk, no type of hardware failure has occurred, and the device does not need to be replaced. Identifying the exact problem with a device is not an easy task and is covered in more detail in a later section.
No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, those of file system repair and file system validation.
With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing file system inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent file system structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. The only way for inconsistent data to exist on disk in a ZFS configuration is through hardware failure (in which case the pool should have been redundant) or when a bug exists in the ZFS software.
The fsck utility repairs known problems specific to UFS file systems. Most ZFS storage pool problems are generally related to failing hardware or power failures. Many problems can be avoided by using redundant pools. If your pool is damaged due to failing hardware or a power outage, see Repairing ZFS Storage Pool-Wide Damage.
If your pool is not redundant, the risk that file system corruption can render some or all of your data inaccessible is always present.
In addition to performing file system repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task requires unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all inconsistencies. This feature, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in a hardware or software failure.
Whenever ZFS encounters an error, either through scrubbing or when accessing a file on demand, the error is logged internally so that you can obtain a quick overview of all known errors within the pool.
The simplest way to check data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the pool's data should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command. For example:
# zpool scrub tank
The status of the current scrubbing operation can be displayed by using the zpool status command. For example:
# zpool status -v tank pool: tank state: ONLINE scrub: scrub completed after 0h7m with 0 errors on Tue Tue Feb 2 12:54:00 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
Only one active scrubbing operation per pool can occur at one time.
You can stop a scrubbing operation that is in progress by using the -s option. For example:
# zpool scrub -s tank
In most cases, a scrubbing operation to ensure data integrity should continue to completion. Stop a scrubbing operation at your own discretion if system performance is impacted by the operation.
Performing routine scrubbing guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.
For more information about interpreting zpool status output, see Querying ZFS Storage Pool Status.
When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can occur at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing and restarts it after the resilvering is completed.
For more information about resilvering, see Viewing Resilvering Status.
Data corruption occurs when one or more device errors (indicating one or more missing or damaged devices) affects a top-level virtual device. For example, one half of a mirror can experience thousands of device errors without ever causing data corruption. If an error is encountered on the other side of the mirror in the exact same location, corrupted data is the result.
Data corruption is always permanent and requires special consideration during repair. Even if the underlying devices are repaired or replaced, the original data is lost forever. Most often, this scenario requires restoring data from backups. Data errors are recorded as they are encountered, and they can be controlled through routine pool scrubbing as explained in the following section. When a corrupted block is removed, the next scrubbing pass recognizes that the corruption is no longer present and removes any trace of the error from the system.
Review the following sections if you are unsure how ZFS reports file system and pool space accounting. Also review ZFS Disk Space Accounting.
The zpool list and zfs list commands are better than the previous df and du commands for determining your available pool and file system space. With the legacy commands, you cannot easily discern between pool and file system space, nor do the legacy commands account for space that is consumed by descendent file systems or snapshots.
For example, the following root pool (rpool) has 5.46 GB allocated and 68.5 GB free.
# zpool list rpool NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT rpool 74G 5.46G 68.5G 7% 1.00x ONLINE -
If you compare the pool space accounting with the file system space accounting by reviewing the USED column of your individual file systems, you can see that the pool space that is reported in ALLOC is accounted for in the file systems' USED total. For example:
# zfs list -r rpool NAME USED AVAIL REFER MOUNTPOINT rpool 5.41G 67.4G 74.5K /rpool rpool/ROOT 3.37G 67.4G 31K legacy rpool/ROOT/solaris 3.37G 67.4G 3.07G / rpool/ROOT/solaris/var 302M 67.4G 214M /var rpool/dump 1.01G 67.5G 1000M - rpool/export 97.5K 67.4G 32K /rpool/export rpool/export/home 65.5K 67.4G 32K /rpool/export/home rpool/export/home/admin 33.5K 67.4G 33.5K /rpool/export/home/admin rpool/swap 1.03G 67.5G 1.00G -
The SIZE value that is reported by the zpool list command is generally the amount of physical disk space in the pool, but varies depending on the pool's redundancy level. See the examples below. The zfs list command lists the usable space that is available to file systems, which is disk space minus ZFS pool redundancy metadata overhead, if any.
Non-redundant storage pool – When a pool is created with one 136-GB disk, the zpool list command reports SIZE and initial FREE values as 136 GB. The initial AVAIL space reported by the zfs list command is 134 GB, due to a small amount of pool metadata overhead. For example:
# zpool create tank c0t6d0 # zpool list tank NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 136G 95.5K 136G 0% 1.00x ONLINE - # zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 72K 134G 21K /tank
Mirrored storage pool – When a pool is created with two 136-GB disks, zpool list command reports SIZE as 136 GB and initial FREE value as 136 GB. This reporting is referred to as the deflated space value. The initial AVAIL space reported by the zfs list command is 134 GB, due to a small amount of pool metadata overhead. For example:
# zpool create tank mirror c0t6d0 c0t7d0 # zpool list tank NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 136G 95.5K 136G 0% 1.00x ONLINE - # zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 72K 134G 21K /tank
RAID-Z storage pool – When a raidz2 pool is created with three 136-GB disks, the zpool list commands reports SIZE as 408 GB and initial FREE value as 408 GB. This reporting is referred to as the inflated disk space value, which includes redundancy overhead, such as parity information. The initial AVAIL space reported by the zfs list command is 133 GB, due to the pool redundancy overhead. The space discrepancy between the zpool list and the zfs list output for a RAID-Z pool is because zpool list reports the inflated pool space.
# zpool create tank raidz2 c0t6d0 c0t7d0 c0t8d0 # zpool list tank NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 408G 286K 408G 0% 1.00x ONLINE - # zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 73.2K 133G 20.9K /tank
The following sections describe how to identify the type of data corruption and how to repair the data, if possible.
ZFS uses checksums, redundancy, and self-healing data to minimize the risk of data corruption. Nonetheless, data corruption can occur if a pool isn't redundant, if corruption occurred while a pool was degraded, or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Regardless of the source, the result is the same: The data is corrupted and therefore no longer accessible. The action taken depends on the type of data being corrupted and its relative value. Two basic types of data can be corrupted:
Pool metadata – ZFS requires a certain amount of data to be parsed to open a pool and access datasets. If this data is corrupted, the entire pool or portions of the dataset hierarchy will become unavailable.
Object data – In this case, the corruption is within a specific file or directory. This problem might result in a portion of the file or directory being inaccessible, or this problem might cause the object to be broken altogether.
Data is verified during normal operations as well as through a scrubbing. For information about how to verify the integrity of pool data, see Checking ZFS File System Integrity.
By default, the zpool status command shows only that corruption has occurred, but not where this corruption occurred. For example:
# zpool status monkey pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 8 errors on Tue Jul 13 13:17:32 2010 config: NAME STATE READ WRITE CKSUM monkey ONLINE 8 0 0 c1t1d0 ONLINE 2 0 0 c2t5d0 ONLINE 6 0 0 errors: 8 data errors, use '-v' for a list
Each error indicates only that an error occurred at a given point in time. Each error is not necessarily still present on the system. Under normal circumstances, this is the case. Certain temporary outages might result in data corruption that is automatically repaired after the outage ends. A complete scrub of the pool is guaranteed to examine every active block in the pool, so the error log is reset whenever a scrub finishes. If you determine that the errors are no longer present, and you don't want to wait for a scrub to complete, reset all errors in the pool by using the zpool online command.
If the data corruption is in pool-wide metadata, the output is slightly different. For example:
# zpool status -v morpheus pool: morpheus id: 1422736890544688191 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-72 config: morpheus FAULTED corrupted data c1t10d0 ONLINE
In the case of pool-wide corruption, the pool is placed into the FAULTED state because the pool cannot provide the required redundancy level.
If a file or directory is corrupted, the system might still function, depending on the type of corruption. Any damage is effectively unrecoverable if no good copies of the data exist on the system. If the data is valuable, you must restore the affected data from backup. Even so, you might be able to recover from this corruption without restoring the entire pool.
If the damage is within a file data block, then the file can be safely removed, thereby clearing the error from the system. Use the zpool status -v command to display a list of file names with persistent errors. For example:
# zpool status -v pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 8 errors on Tue Jul 13 13:17:32 2010 config: NAME STATE READ WRITE CKSUM monkey ONLINE 8 0 0 c1t1d0 ONLINE 2 0 0 c2t5d0 ONLINE 6 0 0 errors: Permanent errors have been detected in the following files: /monkey/a.txt /monkey/bananas/b.txt /monkey/sub/dir/d.txt monkey/ghost/e.txt /monkey/ghost/boo/f.txt
The list of file names with persistent errors might be described as follows:
If the full path to the file is found and the dataset is mounted, the full path to the file is displayed. For example:
/monkey/a.txt
If the full path to the file is found, but the dataset is not mounted, then the dataset name with no preceding slash (/), followed by the path within the dataset to the file, is displayed. For example:
monkey/ghost/e.txt
If the object number to a file path cannot be successfully translated, either due to an error or because the object doesn't have a real file path associated with it, as is the case for a dnode_t, then the dataset name followed by the object's number is displayed. For example:
monkey/dnode:<0x0>
If an object in the metaobject set (MOS) is corrupted, then a special tag of <metadata>, followed by the object number, is displayed.
If the corruption is within a directory or a file's metadata, the only choice is to move the file elsewhere. You can safely move any file or directory to a less convenient location, allowing the original object to be restored in its place.
If a damaged file system has corrupted data with multiple block references, such as snapshots, the zpool status -v command cannot display all corrupted data paths. The current zpool status reporting of corrupted data is limited by the amount of metadata corruption and if any blocks have been reused after the zpool status command is executed. Deduplicated blocks makes reporting all corrupted data even more complicated.
If you have corrupted data and the zpool status -v command identifies that snapshot data is impacted, then considering running the following command to identify additional corrupted paths:
If the damage is in pool metadata and that damage prevents the pool from being opened or imported, then the following options are available to you:
You can attempt to recover the pool by using the zpool clear -F command or the zpool import -F command. These commands attempt to roll back the last few pool transactions to an operational state. You can use the zpool status command to review a damaged pool and the recommended recovery steps. For example:
# zpool status pool: tpool state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Recovery is possible, but will result in some data loss. Returning the pool to its state as of Wed Jul 14 11:44:10 2010 should correct the problem. Approximately 5 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool clear -F tpool'. A scrub of the pool is strongly recommended after recovery. see: http://www.sun.com/msg/ZFS-8000-72 scrub: none requested config: NAME STATE READ WRITE CKSUM tpool FAULTED 0 0 1 corrupted data c1t1d0 ONLINE 0 0 2 c1t3d0 ONLINE 0 0 4
The recovery process as described in the preceding output is to use the following command:
# zpool clear -F tpool
If you attempt to import a damaged storage pool, you will see messages similar to the following:
# zpool import tpool cannot import 'tpool': I/O error Recovery is possible, but will result in some data loss. Returning the pool to its state as of Wed Jul 14 11:44:10 2010 should correct the problem. Approximately 5 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool import -F tpool'. A scrub of the pool is strongly recommended after recovery.
The recovery process as described in the preceding output is to use the following command:
# zpool import -F tpool Pool tpool returned to its state as of Wed Jul 14 11:44:10 2010. Discarded approximately 5 seconds of transactions
If the damaged pool is in the zpool.cache file, the problem is discovered when the system is booted, and the damaged pool is reported in the zpool status command. If the pool isn't in the zpool.cache file, it won't successfully import or open and you will see the damaged pool messages when you attempt to import the pool.
You can import a damaged pool in read-only mode. This method enables you to import the pool so that you can access the data. For example:
# zpool import -o readonly=on tpool
For more information about importing a pool read-only, see Importing a Pool in Read-Only Mode.
You can import a pool with a missing log device by using the zpool import -m command. For more information, see Importing a Pool With a Missing Log Device.
If the pool cannot be recovered by either pool recovery method, you must restore the pool and all its data from a backup copy. The mechanism you use varies widely depending on the pool configuration and backup strategy. First, save the configuration as displayed by the zpool status command so that you can re-create it after the pool is destroyed. Then, use the zpool destroy -f command to destroy the pool.
Also, keep a file describing the layout of the datasets and the various locally set properties somewhere safe, as this information will become inaccessible if the pool is ever rendered inaccessible. With the pool configuration and dataset layout, you can reconstruct your complete configuration after destroying the pool. The data can then be populated by using whatever backup or restoration strategy you use.