This chapter describes how to identify and recover from ZFS failures. Information for preventing failures is provided as well.
The following sections are provided in this chapter:
As a combined file system and volume manager, ZFS can exhibit many different failures. This chapter begins by outlining the various failures, then discusses how to identify them on a running system. This chapter concludes by discussing how to repair the problems. ZFS can encounter three basic types of errors:
Note that a single pool can experience all three errors, so a complete repair procedure involves finding and correcting one error, proceeding to the next error, and so on.
If a device is completely removed from the system, ZFS detects that the device cannot be opened and places it in the REMOVED state. Depending on the data replication level of the pool, this removal might or might not result in the entire pool becoming unavailable. If one disk in a mirrored or RAID-Z device is removed, the pool continues to be accessible. A pool might become FAULTED, which means no data is accessible until the device is reattached, under the following conditions:
If all components of a mirror are removed
If more than one device in a RAID-Z (raidz1) device is removed
If top-level device is removed in a single-disk configuration
Transient I/O errors due to a bad disk or controller
On-disk data corruption due to cosmic rays
Driver bugs resulting in data being transferred to or from the wrong location
A user overwriting portions of the physical device by accident
In some cases, these errors are transient, such as a random I/O error while the controller is having problems. In other cases, the damage is permanent, such as on-disk corruption. Even still, whether the damage is permanent does not necessarily indicate that the error is likely to occur again. For example, if an administrator accidentally overwrites part of a disk, no type of hardware failure has occurred, and the device does not need to be replaced. Identifying the exact problem with a device is not an easy task and is covered in more detail in a later section.
Data corruption occurs when one or more device errors (indicating one or more missing or damaged devices) affects a top-level virtual device. For example, one half of a mirror can experience thousands of device errors without ever causing data corruption. If an error is encountered on the other side of the mirror in the exact same location, corrupted data is the result.
Data corruption is always permanent and requires special consideration during repair. Even if the underlying devices are repaired or replaced, the original data is lost forever. Most often, this scenario requires restoring data from backups. Data errors are recorded as they are encountered, and they can be controlled through routine pool scrubbing as explained in the following section. When a corrupted block is removed, the next scrubbing pass recognizes that the corruption is no longer present and removes any trace of the error from the system.
No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, those of file system repair and file system validation.
With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing file system inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent file system structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. The only way for inconsistent data to exist on disk in a ZFS configuration is through hardware failure (in which case the pool should have been redundant) or when a bug exists in the ZFS software.
The fsck utility repairs known problems specific to UFS file systems. Most ZFS storage pool problems are generally related to failing hardware or power failures. Many problems can be avoided by using redundant pools. If your pool is damaged due to failing hardware or a power outage, see Repairing ZFS Storage Pool-Wide Damage.
If your pool is not redundant, the risk that file system corruption can render some or all of your data inaccessible is always present.
In addition to performing file system repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task requires unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all inconsistencies. This feature, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in a hardware or software failure.
Whenever ZFS encounters an error, either through scrubbing or when accessing a file on demand, the error is logged internally so that you can obtain quick overview of all known errors within the pool.
The simplest way to check data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the pool's data should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command. For example:
# zpool scrub tank
The status of the current scrubbing operation can be displayed by using the zpool status command. For example:
# zpool status -v tank pool: tank state: ONLINE scrub: scrub completed after 0h7m with 0 errors on Tue Tue Feb 2 12:54:00 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
Only one active scrubbing operation per pool can occur at one time.
You can stop a scrubbing operation that is in progress by using the -s option. For example:
# zpool scrub -s tank
In most cases, a scrubing operation to ensure data integrity should continue to completion. Stop a scrubbing operation at your own discretion if system performance is impacted by the operation.
Performing routine scrubbing guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.
For more information about interpreting zpool status output, see Querying ZFS Storage Pool Status.
When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can occur at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing and restarts it after the resilvering is completed.
For more information about resilvering, see Viewing Resilvering Status.
The following sections describe how to identify and resolve problems with your ZFS file systems or storage pools:
You can use the following features to identify problems with your ZFS configuration:
Detailed ZFS storage pool information can be displayed by using the zpool status command.
Pool and device failures are reported through ZFS/FMA diagnostic messages.
Previous ZFS commands that modified pool state information can be displayed by using the zpool history command.
Most ZFS troubleshooting involves the zpool status command. This command analyzes the various failures in a system and identifies the most severe problem, presenting you with a suggested action and a link to a knowledge article for more information. Note that the command only identifies a single problem with a pool, though multiple problems can exist. For example, data corruption errors generally imply that one of the devices has failed, but replacing the failed device might not resolve all of the data corruption problems.
In addition, a ZFS diagnostic engine diagnoses and reports pool failures and device failures. Checksum, I/O, device, and pool errors associated with these failures are also reported. ZFS failures as reported by fmd are displayed on the console as well as the system messages file. In most cases, the fmd message directs you to the zpool status command for further recovery instructions.
The basic recovery process is as follows:
If appropriate, use the zpool history command to identify the ZFS commands that preceded the error scenario. For example:
# zpool history tank History for 'tank': 2010-07-15.12:06:50 zpool create tank mirror c0t1d0 c0t2d0 c0t3d0 2010-07-15.12:06:58 zfs create tank/erick 2010-07-15.12:07:01 zfs set checksum=off tank/erick
In this output, note that checksums are disabled for the tank/erick file system. This configuration is not recommended.
Identify the errors through the fmd messages that are displayed on the system console or in the /var/adm/messages file.
Find further repair instructions by using the zpool status -x command.
Repair the failures, which involves the following steps:
Replacing the faulted or missing device and bring it online.
Restoring the faulted configuration or corrupted data from a backup.
Verifying the recovery by using the zpool status -x command.
Backing up your restored configuration, if applicable.
This section describes how to interpret zpool status output in order to diagnose the type of failures that can occur. Although most of the work is performed automatically by the command, it is important to understand exactly what problems are being identified in order to diagnose the failure. Subsequent sections describe how to repair the various problems that you might encounter.
The easiest way to determine if any known problems exist on a system is to use the zpool status -x command. This command describes only pools that are exhibiting problems. If no unhealthy pools exist on the system, then the command displays the following:
# zpool status -x all pools are healthy
For more information about command-line options to the zpool status command, see Querying ZFS Storage Pool Status.
The complete zpool status output looks similar to the following:
# zpool status tank # zpool status tank pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 UNAVAIL 0 0 0 cannot open errors: No known data errors
This output is described next:
Identifies the name of the pool.
Indicates the current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level.
Describes what is wrong with the pool. This field is omitted if no errors are found.
A recommended action for repairing the errors. This field is omitted if no errors are found.
Refers to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated. So, always reference them for the most up-to-date repair procedures. This field is omitted if no errors are found.
Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub is in progress, or if no scrub was requested.
Identifies known data errors or the absence of known data errors.
The config field in the zpool status output describes the configuration of the devices in the pool, as well as their state and any errors generated from the devices. The state can be one of the following: ONLINE, FAULTED, DEGRADED, UNAVAIL, or OFFLINE. If the state is anything but ONLINE, the fault tolerance of the pool has been compromised.
The second section of the configuration output displays error statistics. These errors are divided into three categories:
READ – I/O errors that occurred while issuing a read request
WRITE – I/O errors that occurred while issuing a write request
CKSUM – Checksum errors, meaning that the device returned corrupted data as the result of a read request
These errors can be used to determine if the damage is permanent. A small number of I/O errors might indicate a temporary outage, while a large number might indicate a permanent problem with the device. These errors do not necessarily correspond to data corruption as interpreted by applications. If the device is in a redundant configuration, the devices might show uncorrectable errors, while no errors appear at the mirror or RAID-Z device level. In such cases, ZFS successfully retrieved the good data and attempted to heal the damaged data from existing replicas.
For more information about interpreting these errors, see Determining the Type of Device Failure.
Finally, additional auxiliary information is displayed in the last column of the zpool status output. This information expands on the state field, aiding in the diagnosis of failures. If a device is FAULTED, this field indicates whether the device is inaccessible or whether the data on the device is corrupted. If the device is undergoing resilvering, this field displays the current progress.
For information about monitoring resilvering progress, see Viewing Resilvering Status.
The scrub section of the zpool status output describes the current status of any explicit scrubbing operations. This information is distinct from whether any errors are detected on the system, though this information can be used to determine the accuracy of the data corruption error reporting. If the last scrub ended recently, most likely, any known data corruption has been discovered.
Scrub completion messages persist across system reboots.
For more information about the data scrubbing and how to interpret this information, see Checking ZFS File System Integrity.
The zpool status command also shows whether any known errors are associated with the pool. These errors might have been found during data scrubbing or during normal operation. ZFS maintains a persistent log of all data errors associated with a pool. This log is rotated whenever a complete scrub of the system finishes.
Data corruption errors are always fatal. Their presence indicates that at least one application experienced an I/O error due to corrupt data within the pool. Device errors within a redundant pool do not result in data corruption and are not recorded as part of this log. By default, only the number of errors found is displayed. A complete list of errors and their specifics can be found by using the zpool status -v option. For example:
# zpool status -v pool: tank state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: scrub completed after 0h0m with 0 errors on Tue Feb 2 13:08:42 2010 config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas c1t0d0 ONLINE 0 0 0 c1t1d0 UNAVAIL 4 1 0 cannot open errors: Permanent errors have been detected in the following files: /tank/data/aaa /tank/data/bbb /tank/data/ccc
A similar message is also displayed by fmd on the system console and the /var/adm/messages file. These messages can also be tracked by using the fmdump command.
For more information about interpreting data corruption errors, see Identifying the Type of Data Corruption.
Device state transition – If a device becomes FAULTED, ZFS logs a message indicating that the fault tolerance of the pool might be compromised. A similar message is sent if the device is later brought online, restoring the pool to health.
Data corruption – If any data corruption is detected, ZFS logs a message describing when and where the corruption was detected. This message is only logged the first time it is detected. Subsequent accesses do not generate a message.
Pool failures and device failures – If a pool failure or a device failure occurs, the fault manager daemon reports these errors through syslog messages as well as the fmdump command.
If ZFS detects a device error and automatically recovers from it, no notification occurs. Such errors do not constitute a failure in the pool redundancy or in data integrity. Moreover, such errors are typically the result of a driver problem accompanied by its own set of error messages.
ZFS maintains a cache of active pools and their configuration in the root file system. If this cache file is corrupted or somehow becomes out of sync with configuration information that is stored on disk, the pool can no longer be opened. ZFS tries to avoid this situation, though arbitrary corruption is always possible given the qualities of the underlying storage. This situation typically results in a pool disappearing from the system when it should otherwise be available. This situation can also manifest as a partial configuration that is missing an unknown number of top-level virtual devices. In either case, the configuration can be recovered by exporting the pool (if it is visible at all) and re-importing it.
For information about importing and exporting pools, see Migrating ZFS Storage Pools.
If a device cannot be opened, it displays the UNAVAIL state in the zpool status output. This state means that ZFS was unable to open the device when the pool was first accessed, or the device has since become unavailable. If the device causes a top-level virtual device to be unavailable, then nothing in the pool can be accessed. Otherwise, the fault tolerance of the pool might be compromised. In either case, the device just needs to be reattached to the system to restore normal operations.
For example, you might see a message similar to the following from fmd after a device failure:
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Jun 24 10:42:36 PDT 2010 PLATFORM: SUNW,Sun-Fire-T200, CSN: -, HOSTNAME: neo2 SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: a1fb66d0-cc51-cd14-a835-961c15696fcb DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device.
To view more detailed information about the device problem and the resolution, use the zpool status -x command. For example:
# zpool status -x pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: scrub completed after 0h0m with 0 errors on Tue Feb 2 13:15:20 2010 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 UNAVAIL 0 0 0 cannot open errors: No known data errors
You can see from this output that the missing c1t1d0 device is not functioning. If you determine that the device is faulty, replace it.
Then, use the zpool online command to bring online the replaced device. For example:
# zpool online tank c1t1d0
As a last step, confirm that the pool with the replaced device is healthy. For example:
# zpool status -x tank pool 'tank' is healthy
Exactly how a missing device is reattached depends on the device in question. If the device is a network-attached drive, connectivity to the network should be restored. If the device is a USB device or other removable media, it should be reattached to the system. If the device is a local disk, a controller might have failed such that the device is no longer visible to the system. In this case, the controller should be replaced, at which point the disks will again be available. Other problems can exist and depend on the type of hardware and its configuration. If a drive fails and it is no longer visible to the system, the device should be treated as a damaged device. Follow the procedures in Replacing or Repairing a Damaged Device.
After a device is reattached to the system, ZFS might or might not automatically detect its availability. If the pool was previously faulted, or the system was rebooted as part of the attach procedure, then ZFS automatically rescans all devices when it tries to open the pool. If the pool was degraded and the device was replaced while the system was running, you must notify ZFS that the device is now available and ready to be reopened by using the zpool online command. For example:
# zpool online tank c0t1d0
For more information about bringing devices online, see Bringing a Device Online.
This section describes how to determine device failure types, clear transient errors, and replacing a device.
Bit rot – Over time, random events such as magnetic influences and cosmic rays can cause bits stored on disk to flip. These events are relatively rare but common enough to cause potential data corruption in large or long-running systems.
Misdirected reads or writes – Firmware bugs or hardware faults can cause reads or writes of entire blocks to reference the incorrect location on disk. These errors are typically transient, though a large number of them might indicate a faulty drive.
Administrator error – Administrators can unknowingly overwrite portions of a disk with bad data (such as copying /dev/zero over portions of the disk) that cause permanent corruption on disk. These errors are always transient.
Temporary outage– A disk might become unavailable for a period of time, causing I/Os to fail. This situation is typically associated with network-attached devices, though local disks can experience temporary outages as well. These errors might or might not be transient.
Bad or flaky hardware – This situation is a catch-all for the various problems that faulty hardware exhibits, including consistent I/O errors, faulty transports causing random corruption, or any number of failures. These errors are typically permanent.
Offline device – If a device is offline, it is assumed that the administrator placed the device in this state because it is faulty. The administrator who placed the device in this state can determine if this assumption is accurate.
Determining exactly what is wrong with a device can be a difficult process. The first step is to examine the error counts in the zpool status output. For example:
# zpool status -v tpool pool: tpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 2 errors on Tue Jul 13 11:08:37 2010 config: NAME STATE READ WRITE CKSUM tpool ONLINE 2 0 0 c1t1d0 ONLINE 2 0 0 c1t3d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /tpool/words
The errors are divided into I/O errors and checksum errors, both of which might indicate the possible failure type. Typical operation predicts a very small number of errors (just a few over long periods of time). If you are seeing a large number of errors, then this situation probably indicates impending or complete device failure. However, an administrator error can also result in large error counts. The other source of information is the syslog system log. If the log shows a large number of SCSI or Fibre Channel driver messages, then this situation probably indicates serious hardware problems. If no syslog messages are generated, then the damage is likely transient.
The goal is to answer the following question:
Is another error likely to occur on this device?
Errors that happen only once are considered transient and do not indicate potential failure. Errors that are persistent or severe enough to indicate potential hardware failure are considered fatal. The act of determining the type of error is beyond the scope of any automated software currently available with ZFS, and so much must be done manually by you, the administrator. After determination is made, the appropriate action can be taken. Either clear the transient errors or replace the device due to fatal errors. These repair procedures are described in the next sections.
Even if the device errors are considered transient, they still might have caused uncorrectable data errors within the pool. These errors require special repair procedures, even if the underlying device is deemed healthy or otherwise repaired. For more information about repairing data errors, see Repairing Damaged Data.
If the device errors are deemed transient, in that they are unlikely to affect the future health of the device, they can be safely cleared to indicate that no fatal error occurred. To clear error counters for RAID-Z or mirrored devices, use the zpool clear command. For example:
# zpool clear tank c1t1d0
This syntax clears any device errors and clears any data error counts associated with the device.
To clear all errors associated with the virtual devices in a pool, and to clear any data error counts associated with the pool, use the following syntax:
# zpool clear tank
For more information about clearing pool errors, see Clearing Storage Pool Device Errors.
If device damage is permanent or future permanent damage is likely, the device must be replaced. Whether the device can be replaced depends on the configuration.
For a device to be replaced, the pool must be in the ONLINE state. The device must be part of a redundant configuration, or it must be healthy (in the ONLINE state). If the device is part of a redundant configuration, sufficient replicas from which to retrieve good data must exist. If two disks in a four-way mirror are faulted, then either disk can be replaced because healthy replicas are available. However, if two disks in a four-way RAID-Z (raidz1) virtual device are faulted, then neither disk can be replaced because insufficient replicas from which to retrieve data exist. If the device is damaged but otherwise online, it can be replaced as long as the pool is not in the FAULTED state. However, any corrupted data on the device is copied to the new device, unless sufficient replicas with good data exist.
In the following configuration, the c1t1d0 disk can be replaced, and any data in the pool is copied from the healthy replica, c1t0d0:
mirror DEGRADED c1t0d0 ONLINE c1t1d0 FAULTED
The c1t0d0 disk can also be replaced, though no self-healing of data can take place because no good replica is available.
In the following configuration, neither faulted disk can be replaced. The ONLINE disks cannot be replaced either because the pool itself is faulted.
raidz FAULTED c1t0d0 ONLINE c2t0d0 FAULTED c3t0d0 FAULTED c4t0d0 ONLINE
In the following configuration, either top-level disk can be replaced, though any bad data present on the disk is copied to the new disk.
c1t0d0 ONLINE c1t1d0 ONLINE
If either disk is faulted, then no replacement can be performed because the pool itself is faulted.
If the loss of a device causes the pool to become faulted or the device contains too many data errors in a non-redundant configuration, then the device cannot be safely replaced. Without sufficient redundancy, no good data with which to heal the damaged device exists. In this case, the only option is to destroy the pool and re-create the configuration, and then to restore your data from a backup copy.
For more information about restoring an entire pool, see Repairing ZFS Storage Pool-Wide Damage.
After you have determined that a device can be replaced, use the zpool replace command to replace the device. If you are replacing the damaged device with different device, use syntax similar to the following:
# zpool replace tank c1t1d0 c2t0d0
This command migrates data to the new device from the damaged device or from other devices in the pool if it is in a redundant configuration. When the command is finished, it detaches the damaged device from the configuration, at which point the device can be removed from the system. If you have already removed the device and replaced it with a new device in the same location, use the single device form of the command. For example:
# zpool replace tank c1t1d0
This command takes an unformatted disk, formats it appropriately, and then resilvers data from the rest of the configuration.
For more information about the zpool replace command, see Replacing Devices in a Storage Pool.
The following example shows how to replace a device (c1t3d0) in a mirrored storage pool tank on Oracle's Sun Fire x4500 system. To replace the disk c1t3d0 with a new disk at the same location (c1t3d0), then you must unconfigure the disk before you attempt to replace it. The basic steps follow:
Take offline the disk (c1t3d0)to be replaced. You cannot unconfigure a disk that is currently being used.
Use the cfgadm command to identify the disk (c1t3d0) to be unconfigured and unconfigure it. The pool will be degraded with the offline disk in this mirrored configuration, but the pool will continue to be available.
Physically replace the disk (c1t3d0). Ensure that the blue Ready to Remove LED is illuminated before you physically remove the faulted drive.
Reconfigure the disk (c1t3d0).
Bring the new disk (c1t3d0) online.
Run the zpool replace command to replace the disk (c1t3d0).
If you had previously set the pool property autoreplace to on, then any new device, found in the same physical location as a device that previously belonged to the pool is automatically formatted and replaced without using the zpool replace command. This feature might not be supported on all hardware.
If a failed disk is automatically replaced with a hot spare, you might need to detach the hot spare after the failed disk is replaced. For example, if c2t4d0 is still an active hot spare after the failed disk is replaced, then detach it.
# zpool detach tank c2t4d0
The following example walks through the steps to replace a disk in a ZFS storage pool.
# zpool offline tank c1t3d0 # cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0 disk connected configured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3 disk connected unconfigured ok <Physically replace the failed disk c1t3d0> # cfgadm -c configure sata1/3 # cfgadm | grep sata1/3 sata1/3::dsk/c1t3d0 disk connected configured ok # zpool online tank c1t3d0 # zpool replace tank c1t3d0 # zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Feb 2 13:17:32 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors
Note that the preceding zpool output might show both the new and old disks under a replacing heading. For example:
replacing DEGRADED 0 0 0 c1t3d0s0/o FAULTED 0 0 0 c1t3d0 ONLINE 0 0 0
This text means that the replacement process is in progress and the new disk is being resilvered.
If you are going to replace a disk (c1t3d0) with another disk (c4t3d0), then you only need to run the zpool replace command. For example:
# zpool replace tank c1t3d0 c4t3d0 # zpool status pool: tank state: DEGRADED scrub: resilver completed after 0h0m with 0 errors on Tue Feb 2 13:35:41 2010 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror-2 DEGRADED 0 0 0 c0t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 c1t3d0 OFFLINE 0 0 0 c4t3d0 ONLINE 0 0 0 errors: No known data errors
You might need to run the zpool status command several times until the disk replacement is completed.
# zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Feb 2 13:35:41 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0
The following example shows how to recover from a failed log device (c0t5d0) in the storage pool (pool). The basic steps follow:
Review the zpool status -x output and FMA diagnostic message, described here:
Physically replace the failed log device.
Bring the new log device online.
Clear the pool's error condition.
# zpool status -x pool: pool state: FAULTED status: One or more of the intent logs could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. scrub: none requested config: NAME STATE READ WRITE CKSUM pool FAULTED 0 0 0 bad intent log mirror ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 logs FAULTED 0 0 0 bad intent log c0t5d0 UNAVAIL 0 0 0 cannot open <Physically replace the failed log device> # zpool online pool c0t5d0 # zpool clear pool
# zpool status -x pool: pool state: FAULTED status: One or more of the intent logs could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. scrub: none requested config: NAME STATE READ WRITE CKSUM pool FAULTED 0 0 0 bad intent log mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 logs FAULTED 0 0 0 bad intent log c0t5d0 UNAVAIL 0 0 0 cannot open <Physically replace the failed log device> # zpool online pool c0t5d0 # zpool clear pool
The process of replacing a device can take an extended period of time, depending on the size of the device and the amount of data in the pool. The process of moving data from one device to another device is known as resilvering and can be monitored by using the zpool status command.
Traditional file systems resilver data at the block level. Because ZFS eliminates the artificial layering of the volume manager, it can perform resilvering in a much more powerful and controlled manner. The two main advantages of this feature are as follows:
ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-GB disk can take seconds if a pool has only a few gigabytes of used disk space.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.
To view the resilvering process, use the zpool status command. For example:
# zpool status tank pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 22.60% done, 0h1m to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 c1t0d0 UNAVAIL 0 0 0 cannot open c2t0d0 ONLINE 0 0 0 85.0M resilvered c1t1d0 ONLINE 0 0 0 errors: No known data errors
In this example, the disk c1t0d0 is being replaced by c2t0d0. This event is observed in the status output by the presence of the replacing virtual device in the configuration. This device is not real, nor is it possible for you to create a pool by using it. The purpose of this device is solely to display the resilvering progress and to identify which device is being replaced.
Note that any pool currently undergoing resilvering is placed in the ONLINE or DEGRADED state because the pool cannot provide the desired level of redundancy until the resilvering process is completed. Resilvering proceeds as fast as possible, though the I/O is always scheduled with a lower priority than user-requested I/O, to minimize impact on the system. After the resilvering is completed, the configuration reverts to the new, complete, configuration. For example:
# zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h1m with 0 errors on Tue Feb 2 13:54:30 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 377M resilvered c1t1d0 ONLINE 0 0 0 errors: No known data errors
The pool is once again ONLINE, and the original failed disk (c1t0d0) has been removed from the configuration.
The following sections describe how to identify the type of data corruption and how to repair the data, if possible.
ZFS uses checksums, redundancy, and self-healing data to minimize the risk of data corruption. Nonetheless, data corruption can occur if a pool isn't redundant, if corruption occurred while a pool was degraded, or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Regardless of the source, the result is the same: The data is corrupted and therefore no longer accessible. The action taken depends on the type of data being corrupted and its relative value. Two basic types of data can be corrupted:
Pool metadata – ZFS requires a certain amount of data to be parsed to open a pool and access datasets. If this data is corrupted, the entire pool or portions of the dataset hierarchy will become unavailable.
Object data – In this case, the corruption is within a specific file or directory. This problem might result in a portion of the file or directory being inaccessible, or this problem might cause the object to be broken altogether.
Data is verified during normal operations as well as through a scrubbing. For information about how to verify the integrity of pool data, see Checking ZFS File System Integrity.
# zpool status monkey pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 8 errors on Tue Jul 13 13:17:32 2010 config: NAME STATE READ WRITE CKSUM monkey ONLINE 8 0 0 c1t1d0 ONLINE 2 0 0 c2t5d0 ONLINE 6 0 0 errors: 8 data errors, use '-v' for a list
Each error indicates only that an error occurred at a given point in time. Each error is not necessarily still present on the system. Under normal circumstances, this is the case. Certain temporary outages might result in data corruption that is automatically repaired after the outage ends. A complete scrub of the pool is guaranteed to examine every active block in the pool, so the error log is reset whenever a scrub finishes. If you determine that the errors are no longer present, and you don't want to wait for a scrub to complete, reset all errors in the pool by using the zpool online command.
If the data corruption is in pool-wide metadata, the output is slightly different. For example:
# zpool status -v morpheus pool: morpheus id: 1422736890544688191 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-72 config: morpheus FAULTED corrupted data c1t10d0 ONLINE
In the case of pool-wide corruption, the pool is placed into the FAULTED state because the pool cannot provide the required redundancy level.
If a file or directory is corrupted, the system might still function, depending on the type of corruption. Any damage is effectively unrecoverable if no good copies of the data exist on the system. If the data is valuable, you must restore the affected data from backup. Even so, you might be able to recover from this corruption without restoring the entire pool.
If the damage is within a file data block, then the file can be safely removed, thereby clearing the error from the system. Use the zpool status -v command to display a list of file names with persistent errors. For example:
# zpool status -v pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 8 errors on Tue Jul 13 13:17:32 2010 config: NAME STATE READ WRITE CKSUM monkey ONLINE 8 0 0 c1t1d0 ONLINE 2 0 0 c2t5d0 ONLINE 6 0 0 errors: Permanent errors have been detected in the following files: /monkey/a.txt /monkey/bananas/b.txt /monkey/sub/dir/d.txt monkey/ghost/e.txt /monkey/ghost/boo/f.txt
The list of file names with persistent errors might be described as follows:
If the full path to the file is found and the dataset is mounted, the full path to the file is displayed. For example:
If the full path to the file is found, but the dataset is not mounted, then the dataset name with no preceding slash (/), followed by the path within the dataset to the file, is displayed. For example:
If the object number to a file path cannot be successfully translated, either due to an error or because the object doesn't have a real file path associated with it, as is the case for a dnode_t, then the dataset name followed by the object's number is displayed. For example:
If an object in the metaobject set (MOS) is corrupted, then a special tag of <metadata>, followed by the object number, is displayed.
If the corruption is within a directory or a file's metadata, the only choice is to move the file elsewhere. You can safely move any file or directory to a less convenient location, allowing the original object to be restored in its place.
If the damage is in pool metadata and that damage prevents the pool from being opened or imported, then the following options are available:
Attempt to recover the pool by using the zpool clear -F command or the zpool import -F command. These commands attempt to roll back the last few pool transactions to an operational state. You can use the zpool status command to review a damaged pool and the recommended recovery steps. For example:
# zpool status pool: tpool state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Recovery is possible, but will result in some data loss. Returning the pool to its state as of Wed Jul 14 11:44:10 2010 should correct the problem. Approximately 5 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool clear -F tpool'. A scrub of the pool is strongly recommended after recovery. see: http://www.sun.com/msg/ZFS-8000-72 scrub: none requested config: NAME STATE READ WRITE CKSUM tpool FAULTED 0 0 1 corrupted data c1t1d0 ONLINE 0 0 2 c1t3d0 ONLINE 0 0 4
The recovery process as described above is to use the following command:
# zpool clear -F tpool
If you attempt to import a damaged storage pool, you will see messages similar to the following:
# zpool import tpool cannot import 'tpool': I/O error Recovery is possible, but will result in some data loss. Returning the pool to its state as of Wed Jul 14 11:44:10 2010 should correct the problem. Approximately 5 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool import -F tpool'. A scrub of the pool is strongly recommended after recovery.
The recovery process as described above is to use the following command:
# zpool import -F tpool Pool tpool returned to its state as of Wed Jul 14 11:44:10 2010. Discarded approximately 5 seconds of transactions
If the damaged pool is in the zpool.cache file, the problem is discovered when the system is booted, and the damaged pool is reported in the zpool status command. If the pool isn't in the zpool.cache file, it won't successfully import or open and you'll see the damaged pool messages when you attempt to import the pool.
If the pool cannot be recovered by the pool recovery method described above, you must restore the pool and all its data from a backup copy. The mechanism you use varies widely depending on the pool configuration and backup strategy. First, save the configuration as displayed by the zpool status command so that you can recreate it after the pool is destroyed. Then, use the zpool destroy -f command to destroy the pool. Also, keep a file describing the layout of the datasets and the various locally set properties somewhere safe, as this information will become inaccessible if the pool is ever rendered inaccessible. With the pool configuration and dataset layout, you can reconstruct your complete configuration after destroying the pool. The data can then be populated by using whatever backup or restoration strategy you use.
ZFS is designed to be robust and stable despite errors. Even so, software bugs or certain unexpected problems might cause the system to panic when a pool is accessed. As part of the boot process, each pool must be opened, which means that such failures will cause a system to enter into a panic-reboot loop. To recover from this situation, ZFS must be informed not to look for any pools on startup.
ZFS maintains an internal cache of available pools and their configurations in /etc/zfs/zpool.cache. The location and contents of this file are private and are subject to change. If the system becomes unbootable, boot to the milestone none by using the -m milestone=none boot option. After the system is up, remount your root file system as writable and then rename or move the /etc/zfs/zpool.cache file to another location. These actions cause ZFS to forget that any pools exist on the system, preventing it from trying to access the unhealthy pool causing the problem. You can then proceed to a normal system state by issuing the svcadm milestone all command. You can use a similar process when booting from an alternate root to perform repairs.
After the system is up, you can attempt to import the pool by using the zpool import command. However, doing so will likely cause the same error that occurred during boot, because the command uses the same mechanism to access pools. If multiple pools exist on the system, do the following:
Rename or move the zpool.cache file to another location as discussed in the preceding text.
Determine which pool might have problems by using the fmdump -eV command to display the pools with reported fatal errors.
Import the pools one by one, skipping the pools that are having problems, as described in the fmdump output.