This chapter describes how to identify and recover from ZFS failure modes. Information for preventing failures is provided as well.
The following sections are provided in this chapter:
As a combined file system and volume manager, ZFS can exhibit many different failure modes. This chapter begins by outlining the various failure modes, then discusses how to identify them on a running system. This chapter concludes by discussing how to repair the problems. ZFS can encounter three basic types of errors:
Note that a single pool can experience all three errors, so a complete repair procedure involves finding and correcting one error, proceeding to the next error, and so on.
If a device is completely removed from the system, ZFS detects that the device cannot be opened and places it in the UNAVAIL state. Depending on the data replication level of the pool, this might or might not result in the entire pool becoming unavailable. If one disk in a mirrored or RAID-Z device is removed, the pool continues to be accessible. If all components of a mirror are removed, if more than one device in a RAID-Z (raidz1) device is removed, or if a single-disk, top-level device is removed, the pool becomes FAULTED. No data is accessible until the device is reattached.
Transient I/O errors due to a bad disk or controller
On-disk data corruption due to cosmic rays
Driver bugs resulting in data being transferred to or from the wrong location
Another user overwriting portions of the physical device by accident
In some cases, these errors are transient, such as a random I/O error while the controller is having problems. In other cases, the damage is permanent, such as on-disk corruption. Even still, whether the damage is permanent does not necessarily indicate that the error is likely to occur again. For example, if an administrator accidentally overwrites part of a disk, no type of hardware failure has occurred, and the device need not be replaced. Identifying exactly what went wrong with a device is not an easy task and is covered in more detail in a later section.
Data corruption occurs when one or more device errors (indicating missing or damaged devices) affects a top-level virtual device. For example, one half of a mirror can experience thousands of device errors without ever causing data corruption. If an error is encountered on the other side of the mirror in the exact same location, corrupted data will be the result.
Data corruption is always permanent and requires special consideration during repair. Even if the underlying devices are repaired or replaced, the original data is lost forever. Most often this scenario requires restoring data from backups. Data errors are recorded as they are encountered, and can be controlled through routine disk scrubbing as explained in the following section. When a corrupted block is removed, the next scrubbing pass recognizes that the corruption is no longer present and removes any trace of the error from the system.
No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, file system repair and file system validation.
With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing file system inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent file system structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. With ZFS, none of these problems exist. The only way for inconsistent data to exist on disk is through hardware failure (in which case the pool should have been redundant) or a bug exists in the ZFS software.
Given that the fsck utility is designed to repair known pathologies specific to individual file systems, writing such a utility for a file system with no known pathologies is impossible. Future experience might prove that certain data corruption problems are common enough and simple enough such that a repair utility can be developed, but these problems can always be avoided by using redundant pools.
If your pool is not redundant, the chance that file system corruption can render some or all of your data inaccessible is always present.
In addition to file system repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task is done by unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all inconsistencies. This functionality, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in hardware or software failure.
The simplest way to check your data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the file system should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command. For example:
# zpool scrub tank
The status of the current scrub can be displayed in the zpool status output. For example:
# zpool status -v tank pool: tank state: ONLINE scrub: scrub completed after 0h7m with 0 errors on Tue Sep 1 09:20:52 2009 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
Only one active scrubbing operation per pool can occur at one time.
You can stop a scrub that is in progress by using the -s option. For example:
# zpool scrub -s tank
In most cases, a scrub operation, to ensure data integrity, should continue to completion. Stop a scrub at your own discretion if system performance is impacted by a scrub operation.
Performing routine scrubbing also guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.
For more information about interpreting zpool status output, see Querying ZFS Storage Pool Status.
When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can happen at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing, and restarts it after the resilvering is complete.
For more information about resilvering, see Viewing Resilvering Status.
The following sections describe how to identify problems in your ZFS file systems or storage pools.
You can use the following features to identify problems with your ZFS configuration:
Detailed ZFS storage pool information with the zpool status command
Pool and device failures are reported with ZFS/FMA diagnostic messages
Previous ZFS commands that modified pool state information can be displayed with the zpool history command
Most ZFS troubleshooting is centered around the zpool status command. This command analyzes the various failures in the system and identifies the most severe problem, presenting you with a suggested action and a link to a knowledge article for more information. Note that the command only identifies a single problem with the pool, though multiple problems can exist. For example, data corruption errors generally imply that one of the devices has failed, but replacing the failed device might not resolve all of the data corruption problems.
In addition, a ZFS diagnostic engine is provided to diagnose and report pool failures and device failures. Checksum, I/O, device, and pool errors associated with pool or device failures are also reported. ZFS failures as reported by fmd are displayed on the console as well as the system messages file. In most cases, the fmd message directs you to the zpool status command for further recovery instructions.
The basic recovery process is as follows:
If appropriate, use the zpool history command to identify the previous ZFS commands that led up to the error scenario. For example:
# zpool history tank History for 'tank': 2009-09-01.09:26:15 zpool create tank mirror c0t1d0 c0t2d0 c0t3d0 2009-09-01.09:26:34 zfs create tank/erick 2009-09-01.09:26:41 zfs set checksum=off tank/erick
Notice in the above output that checksums are disabled for the tank/erick file system. This configuration is not recommended.
Identify the errors through the fmd messages that are displayed on the system console or in the /var/adm/messages files.
Find further repair instructions in the zpool status -x command.
Repair the failures, such as:
Replace the faulted or missing device and bring it online.
Restore the faulted configuration or corrupted data from a backup.
Verify the recovery by using the zpool status -x command.
Back up your restored configuration, if applicable.
This section describes how to interpret zpool status output in order to diagnose the type of failure and directs you to one of the following sections on how to repair the problem. While most of the work is performed automatically by the command, it is important to understand exactly what problems are being identified in order to diagnose the type of failure.
The easiest way to determine if any known problems exist on the system is to use the zpool status -x command. This command describes only pools exhibiting problems. If no bad pools exist on the system, then the command displays a simple message, as follows:
# zpool status -x all pools are healthy
For more information about command-line options to the zpool status command, see Querying ZFS Storage Pool Status.
The complete zpool status output looks similar to the following:
# zpool status tank pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 OFFLINE 0 0 0 errors: No known data errors
This output is divided into several sections:
The name of the pool.
The current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level.
A description of what is wrong with the pool. This field is omitted if no problems are found.
A recommended action for repairing the errors. This field is an abbreviated form directing the user to one of the following sections. This field is omitted if no problems are found.
A reference to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated, and should always be referenced for the most up-to-date repair procedures. This field is omitted if no problems are found.
Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub in progress, or if no scrubbing was requested.
Identifies known data errors or the absence of known data errors.
The config field in the zpool status output describes the configuration layout of the devices comprising the pool, as well as their state and any errors generated from the devices. The state can be one of the following: ONLINE, FAULTED, DEGRADED, UNAVAILABLE, or OFFLINE. If the state is anything but ONLINE, the fault tolerance of the pool has been compromised.
The second section of the configuration output displays error statistics. These errors are divided into three categories:
READ – I/O errors occurred while issuing a read request.
WRITE – I/O errors occurred while issuing a write request.
CKSUM – Checksum errors. The device returned corrupted data as the result of a read request.
These errors can be used to determine if the damage is permanent. A small number of I/O errors might indicate a temporary outage, while a large number might indicate a permanent problem with the device. These errors do not necessarily correspond to data corruption as interpreted by applications. If the device is in a redundant configuration, the disk devices might show uncorrectable errors, while no errors appear at the mirror or RAID-Z device level. If this scenario is the case, then ZFS successfully retrieved the good data and attempted to heal the damaged data from existing replicas.
For more information about interpreting these errors to determine device failure, see Determining the Type of Device Failure.
Finally, additional auxiliary information is displayed in the last column of the zpool status output. This information expands on the state field, aiding in diagnosis of failure modes. If a device is FAULTED, this field indicates whether the device is inaccessible or whether the data on the device is corrupted. If the device is undergoing resilvering, this field displays the current progress.
For more information about monitoring resilvering progress, see Viewing Resilvering Status.
The third section of the zpool status output describes the current status of any explicit scrubs. This information is distinct from whether any errors are detected on the system, though this information can be used to determine the accuracy of the data corruption error reporting. If the last scrub ended recently, most likely, any known data corruption has been discovered.
For more information about data scrubbing and how to interpret this information, see Checking ZFS File System Integrity.
The zpool status command also shows whether any known errors are associated with the pool. These errors might have been found during disk scrubbing or during normal operation. ZFS maintains a persistent log of all data errors associated with the pool. This log is rotated whenever a complete scrub of the system finishes.
Data corruption errors are always fatal. Their presence indicates that at least one application experienced an I/O error due to corrupt data within the pool. Device errors within a redundant pool do not result in data corruption and are not recorded as part of this log. By default, only the number of errors found is displayed. A complete list of errors and their specifics can be found by using the zpool status -v option. For example:
# zpool status -v pool: tank state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: scrub completed after 0h0m with 0 errors on Tue Sep 1 09:51:01 2009 config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas c1t0d0 ONLINE 0 0 0 c1t1d0 UNAVAIL 4 1 0 cannot open errors: Permanent errors have been detected in the following files: /tank/data/aaa /tank/data/bbb /tank/data/ccc
A similar message is also displayed by fmd on the system console and the /var/adm/messages file. These messages can also be tracked by using the fmdump command.
For more information about interpreting data corruption errors, see Identifying the Type of Data Corruption.
In addition to persistently keeping track of errors within the pool, ZFS also displays syslog messages when events of interest occur. The following scenarios generate events to notify the administrator:
Device state transition – If a device becomes FAULTED, ZFS logs a message indicating that the fault tolerance of the pool might be compromised. A similar message is sent if the device is later brought online, restoring the pool to health.
Data corruption – If any data corruption is detected, ZFS logs a message describing when and where the corruption was detected. This message is only logged the first time it is detected. Subsequent accesses do not generate a message.
Pool failures and device failures – If a pool failure or device failure occurs, the fault manager daemon reports these errors through syslog messages as well as the fmdump command.
If ZFS detects a device error and automatically recovers from it, no notification occurs. Such errors do not constitute a failure in the pool redundancy or data integrity. Moreover, such errors are typically the result of a driver problem accompanied by its own set of error messages.
ZFS maintains a cache of active pools and their configuration on the root file system. If this file is corrupted or somehow becomes out of sync with what is stored on disk, the pool can no longer be opened. ZFS tries to avoid this situation, though arbitrary corruption is always possible given the qualities of the underlying storage. This situation typically results in a pool disappearing from the system when it should otherwise be available. This situation can also manifest itself as a partial configuration that is missing an unknown number of top-level virtual devices. In either case, the configuration can be recovered by exporting the pool (if it is visible at all), and re-importing it.
For more information about importing and exporting pools, see Migrating ZFS Storage Pools.
If a device cannot be opened, it displays as UNAVAIL in the zpool status output. This status means that ZFS was unable to open the device when the pool was first accessed, or the device has since become unavailable. If the device causes a top-level virtual device to be unavailable, then nothing in the pool can be accessed. Otherwise, the fault tolerance of the pool might be compromised. In either case, the device simply needs to be reattached to the system to restore normal operation.
For example, you might see a message similar to the following from fmd after a device failure:
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Sep 1 09:36:46 MDT 2009 PLATFORM: SUNW,Sun-Fire-T200, CSN: -, HOSTNAME: neo SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: a1fb66d0-cc51-cd14-a835-961c15696fed DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device.
The next step is to use the zpool status -x command to view more detailed information about the device problem and the resolution. For example:
# zpool status -x pool: tank state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. See: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 UNAVAIL 0 0 0 cannot open errors: No known data errors
You can see from this output that the missing device c1t1d0 is not functioning. If you determine that the drive is faulty, replace the device.
Then, use the zpool online command to online the replaced device. For example:
# zpool online tank c1t1d0
Confirm that the pool with the replaced device is healthy.
# zpool status -x tank pool 'tank' is healthy
Exactly how a missing device is reattached depends on the device in question. If the device is a network-attached drive, connectivity to the network should be restored. If the device is a USB or other removable media, it should be reattached to the system. If the device is a local disk, a controller might have failed such that the device is no longer visible to the system. In this case, the controller should be replaced at which point the disks will again be available. Other pathologies can exist and depend on the type of hardware and its configuration. If a drive fails and it is no longer visible to the system, the device should be treated as a damaged device. Follow the procedures outlined in Replacing or Repairing a Damaged Device.
Once a device is reattached to the system, ZFS might or might not automatically detect its availability. If the pool was previously faulted, or the system was rebooted as part of the attach procedure, then ZFS automatically rescans all devices when it tries to open the pool. If the pool was degraded and the device was replaced while the system was up, you must notify ZFS that the device is now available and ready to be reopened by using the zpool online command. For example:
# zpool online tank c0t1d0
For more information about bringing devices online, see Bringing a Device Online.
This section describes how to determine device failure types, clear transient errors, and replacing a device.
Bit rot – Over time, random events, such as magnetic influences and cosmic rays, can cause bits stored on disk to flip in unpredictable events. These events are relatively rare but common enough to cause potential data corruption in large or long-running systems.
Misdirected reads or writes – Firmware bugs or hardware faults can cause reads or writes of entire blocks to reference the incorrect location on disk. These errors are typically transient, though a large number might indicate a faulty drive.
Administrator error – Administrators can unknowingly overwrite portions of the disk with bad data (such as copying /dev/zero over portions of the disk) that cause permanent corruption on disk. These errors are always transient.
Temporary outage– A disk might become unavailable for a period of time, causing I/Os to fail. This situation is typically associated with network-attached devices, though local disks can experience temporary outages as well. These errors might or might not be transient.
Bad or flaky hardware – This situation is a catch-all for the various problems that bad hardware exhibits. This could be consistent I/O errors, faulty transports causing random corruption, or any number of failures. These errors are typically permanent.
Offlined device – If a device is offline, it is assumed that the administrator placed the device in this state because it is presumed faulty. The administrator who placed the device in this state can determine if this assumption is accurate.
Determining exactly what is wrong can be a difficult process. The first step is to examine the error counts in the zpool status output as follows:
# zpool status -v pool
The errors are divided into I/O errors and checksum errors, both of which might indicate the possible failure type. Typical operation predicts a very small number of errors (just a few over long periods of time). If you are seeing large numbers of errors, then this situation probably indicates impending or complete device failure. However, the pathology for administrator error can result in large error counts. The other source of information is the system log. If the log shows a large number of SCSI or fibre channel driver messages, then this situation probably indicates serious hardware problems. If no syslog messages are generated, then the damage is likely transient.
The goal is to answer the following question:
Is another error likely to occur on this device?
Errors that happen only once are considered transient, and do not indicate potential failure. Errors that are persistent or severe enough to indicate potential hardware failure are considered “fatal.” The act of determining the type of error is beyond the scope of any automated software currently available with ZFS, and so much must be done manually by you, the administrator. Once the determination is made, the appropriate action can be taken. Either clear the transient errors or replace the device due to fatal errors. These repair procedures are described in the next sections.
Even if the device errors are considered transient, it still may have caused uncorrectable data errors within the pool. These errors require special repair procedures, even if the underlying device is deemed healthy or otherwise repaired. For more information on repairing data errors, see Repairing Damaged Data.
If the device errors are deemed transient, in that they are unlikely to effect the future health of the device, then the device errors can be safely cleared to indicate that no fatal error occurred. To clear error counters for RAID-Z or mirrored devices, use the zpool clear command. For example:
# zpool clear tank c1t1d0
This syntax clears any errors associated with the device and clears any data error counts associated with the device.
To clear all errors associated with the virtual devices in the pool, and clear any data error counts associated with the pool, use the following syntax:
# zpool clear tank
For more information about clearing pool errors, see Clearing Storage Pool Device Errors.
If device damage is permanent or future permanent damage is likely, the device must be replaced. Whether the device can be replaced depends on the configuration.
For a device to be replaced, the pool must be in the ONLINE state. The device must be part of a redundant configuration, or it must be healthy (in the ONLINE state). If the disk is part of a redundant configuration, sufficient replicas from which to retrieve good data must exist. If two disks in a four-way mirror are faulted, then either disk can be replaced because healthy replicas are available. However, if two disks in a four-way RAID-Z (raidz1) device are faulted, then neither disk can be replaced because not enough replicas from which to retrieve data exist. If the device is damaged but otherwise online, it can be replaced as long as the pool is not in the FAULTED state. However, any bad data on the device is copied to the new device unless there are sufficient replicas with good data.
In the following configuration, the disk c1t1d0 can be replaced, and any data in the pool is copied from the good replica, c1t0d0.
mirror DEGRADED c1t0d0 ONLINE c1t1d0 FAULTED
The disk c1t0d0 can also be replaced, though no self-healing of data can take place because no good replica is available.
In the following configuration, neither of the faulted disks can be replaced. The ONLINE disks cannot be replaced either, because the pool itself is faulted.
raidz FAULTED c1t0d0 ONLINE c2t0d0 FAULTED c3t0d0 FAULTED c4t0d0 ONLINE
In the following configuration, either top-level disk can be replaced, though any bad data present on the disk is copied to the new disk.
c1t0d0 ONLINE c1t1d0 ONLINE
If either disk were faulted, then no replacement could be performed because the pool itself would be faulted.
If the loss of a device causes the pool to become faulted, or the device contains too many data errors in a non-redundant configuration, then the device cannot safely be replaced. Without sufficient redundancy, no good data with which to heal the damaged device exists. In this case, the only option is to destroy the pool and re-create the configuration, restoring your data in the process.
For more information about restoring an entire pool, see Repairing ZFS Storage Pool-Wide Damage.
Once you have determined that a device can be replaced, use the zpool replace command to replace the device. If you are replacing the damaged device with another different device, use the following command:
# zpool replace tank c1t1d0 c2t0d0
This command begins migrating data to the new device from the damaged device, or other devices in the pool if it is in a redundant configuration. When the command is finished, it detaches the damaged device from the configuration, at which point the device can be removed from the system. If you have already removed the device and replaced it with a new device in the same location, use the single device form of the command. For example:
# zpool replace tank c1t1d0
This command takes an unformatted disk, formats it appropriately, and then begins resilvering data from the rest of the configuration.
For more information about the zpool replace command, see Replacing Devices in a Storage Pool.
The following example shows how to replace a device (c1t3d0) in the mirrored storage pool tank on a Sun Fire x4500 system. If you are going to replace the disk c1t3d0 with a new disk at the same location (c1t3d0), then unconfigure the disk before you attempt to replace it. The basic steps are as follows:
Offline the disk to be replaced first. You cannot unconfigure a disk that is currently being used.
Identify the disk (c1t3d0) to be unconfigured and unconfigure it. The pool will be degraded with the disk offlined in this mirrored configuration but the pool will continue to be available.
Physically replace the disk (c1t3d0). Make sure that the blue "Ready to Remove" LED is illuminated before you physically remove the faulted drive.
Reconfigure the disk (c1t3d0).
Bring the disk (c1t3d0) back online.
Run the zpool replace command to replace the disk (c1t3d0).
If you had previously set the pool property autoreplace=on, then any new device, found in the same physical location as a device that previously belonged to the pool, is automatically formatted and replaced without using the zpool replace command. This feature might not be supported on all hardware.
If a failed disk is automatically replaced with a hot spare, you might need to detach the hot spare after the failed disk is replaced. For example, if c2t4d0 is still an active spare after the failed disk is replaced, then detach it.
# zpool detach tank c2t4d0
# zpool offline tank c1t3d0 # cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0 disk connected configured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3 disk connected unconfigured ok <Replace the physical disk c1t3d0> # cfgadm -c configure sata1/3 # cfgadm | grep sata1/3 sata1/3::dsk/c1t3d0 disk connected configured ok # zpool online tank c1t3d0 # zpool replace tank c1t3d0 # zpool status pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Apr 22 14:44:46 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors
Note that the preceding zpool output might show both the new and old disks under a replacing heading. For example:
replacing DEGRADED 0 0 0 c1t3d0s0/o FAULTED 0 0 0 c1t3d0 ONLINE 0 0 0
This text means that the replacement process is in progress and the new disk is being resilvered.
If you are going to replace a disk (c1t3d0) with another disk (c4t3d0), then you only need to run the zpool replace command. For example:
# zpool replace tank c1t3d0 c4t3d0 # zpool status pool: tank state: DEGRADED scrub: resilver completed after 0h0m with 0 errors on Tue Apr 22 14:54:50 2008 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c0t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 c1t3d0 OFFLINE 0 0 0 c4t3d0 ONLINE 0 0 0 errors: No known data errors
You might have to run the zpool status command several times until the disk replacement is complete.
# zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Apr 22 14:54:50 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0
The following example shows to recover from a failed log device c0t5d0 in the storage pool, pool. The basic steps are as follows:
Review the zpool status -x output and FMA diagnostic message, described here:
Physically replace the failed log device.
Bring the log device online.
Clear the pool's error condition.
# zpool status -x pool: pool state: FAULTED status: One or more of the intent logs could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. scrub: none requested config: NAME STATE READ WRITE CKSUM pool FAULTED 0 0 0 bad intent log mirror ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 logs FAULTED 0 0 0 bad intent log c0t5d0 UNAVAIL 0 0 0 cannot open <Physically replace the failed log device> # zpool online pool c0t5d0 # zpool clear pool
The process of replacing a drive can take an extended period of time, depending on the size of the drive and the amount of data in the pool. The process of moving data from one device to another device is known as resilvering, and can be monitored by using the zpool status command.
Traditional file systems resilver data at the block level. Because ZFS eliminates the artificial layering of the volume manager, it can perform resilvering in a much more powerful and controlled manner. The two main advantages of this feature are as follows:
ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds, rather than resilvering the entire disk, or complicating matters with “dirty region” logging that some volume managers support. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-Gbyte disk can take seconds if only a few gigabytes of used space is in the pool.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.
To view the resilvering process, use the zpool status command. For example:
# zpool status tank pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h2m, 16.43% done, 0h13m to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 replacing DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0
In this example, the disk c1t0d0 is being replaced by c2t0d0. This event is observed in the status output by the presence of the replacing virtual device in the configuration. This device is not real, nor is it possible for you to create a pool by using this virtual device type. The purpose of this device is solely to display the resilvering progress, and to identify exactly which device is being replaced.
Note that any pool currently undergoing resilvering is placed in the ONLINE or DEGRADED state, because the pool cannot provide the desired level of redundancy until the resilvering process is complete. Resilvering proceeds as fast as possible, though the I/O is always scheduled with a lower priority than user-requested I/O, to minimize impact on the system. Once the resilvering is complete, the configuration reverts to the new, complete, configuration. For example:
# zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Sep 1 10:55:54 2009 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
The pool is once again ONLINE, and the original bad disk (c1t0d0) has been removed from the configuration.
The following sections describe how to identify the type of data corruption and how to repair the data, if possible.
ZFS uses checksumming, redundancy, and self-healing data to minimize the chances of data corruption. Nonetheless, data corruption can occur if the pool isn't redundant, if corruption occurred while the pool was degraded, or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Regardless of the source, the result is the same: The data is corrupted and therefore no longer accessible. The action taken depends on the type of data being corrupted, and its relative value. Two basic types of data can be corrupted:
Pool metadata – ZFS requires a certain amount of data to be parsed to open a pool and access datasets. If this data is corrupted, the entire pool or complete portions of the dataset hierarchy will become unavailable.
Object data – In this case, the corruption is within a specific file or directory. This problem might result in a portion of the file or directory being inaccessible, or this problem might cause the object to be broken altogether.
Data is verified during normal operation as well as through scrubbing. For more information about how to verify the integrity of pool data, see Checking ZFS File System Integrity.
# zpool status pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM monkey ONLINE 0 0 0 c1t1d0s6 ONLINE 0 0 0 c1t1d0s7 ONLINE 0 0 0 errors: 8 data errors, use '-v' for a list
Each error would indicate only that an error occurred at a given point in time. Each error is not necessarily still present on the system. Under normal circumstances, this situation is true. Certain temporary outages might result in data corruption that is automatically repaired once the outage ends. A complete scrub of the pool is guaranteed to examine every active block in the pool, so the error log is reset whenever a scrub finishes. If you determine that the errors are no longer present, and you don't want to wait for a scrub to complete, reset all errors in the pool by using the zpool online command.
If the data corruption is in pool-wide metadata, the output is slightly different. For example:
# zpool status -v morpheus pool: morpheus id: 1422736890544688191 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-72 config: morpheus FAULTED corrupted data c1t10d0 ONLINE
In the case of pool-wide corruption, the pool is placed into the FAULTED state, because the pool cannot possibly provide the needed redundancy level.
If a file or directory is corrupted, the system might still be able to function depending on the type of corruption. Any damage is effectively unrecoverable if no good copies of the data exist anywhere on the system. If the data is valuable, you have no choice but to restore the affected data from backup. Even so, you might be able to recover from this corruption without restoring the entire pool.
If the damage is within a file data block, then the file can safely be removed, thereby clearing the error from the system. Use the zpool status -v command to display a list of filenames with persistent errors. For example:
# zpool status -v pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM monkey ONLINE 0 0 0 c1t1d0s6 ONLINE 0 0 0 c1t1d0s7 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /monkey/a.txt /monkey/bananas/b.txt /monkey/sub/dir/d.txt monkey/ghost/e.txt /monkey/ghost/boo/f.txt
A list of filenames with persistent errors might be described as follows:
If the full path to the file is found and the dataset is mounted, the full path to the file is displayed. For example:
If the full path to the file is found, but the dataset is not mounted, then the dataset name with no preceding slash (/), followed by the path within the dataset to the file, is displayed. For example:
If the object number to a file path cannot be successfully translated, either due to an error or because the object doesn't have a real file path associated with it, as is the case for a dnode_t, then the dataset name followed by the object's number is displayed. For example:
If an object in the meta-object set (MOS) is corrupted, then a special tag of <metadata>, followed by the object number, is displayed.
If the corruption is within a directory or a file's metadata, the only choice is to move the file elsewhere. You can safely move any file or directory to a less convenient location, allowing the original object to be restored in place.
If the damage is in pool metadata that damage prevents the pool from being opened or imported, then the following options are available:
Attempt to recover the pool by using the zpool clear -F command or zpool import -F command. These commands attempt to roll back the last few pool transactions to an operational state. You can use the zpool status command to review a damaged pool and the recommended. recovery steps. For example:
# zpool status pool: dbpool state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Recovery is possible, but will result in some data loss. Returning the pool to its state as of Sat Nov 07 09:53:20 2009 should correct the problem. Approximately 49 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool clear -F dbpool'. A scrub of the pool is strongly recommended after recovery. see: http://www.sun.com/msg/ZFS-8000-72 scrub: none requested config: NAME STATE READ WRITE CKSUM dbpool FAULTED 0 0 1 corrupted data c1t1d0 ONLINE 0 0 2 c1t3d0 ONLINE 0 0 4
The recovery process as described above is to use the following command:
# zpool clear -F dbpool
If you attempt to import a damaged storage pool, you will see messages similar to the following:
# zpool import dpool cannot import 'dbpool': I/O error Recovery is possible, but will result in some data loss. Returning the pool to its state as of Sat Nov 07 09:53:20 2009 should correct the problem. Approximately 49 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool import -F dbpool'. A scrub of the pool is strongly recommended after recovery.
The recovery process as described above is to use the following command:
# zpool import -F dbpool Pool dbpool returned to its state as of Sat Nov 07 09:53:20 2009. Discarded approximately 49 seconds of transactions
If the damaged pool is in the zpool.cache file, the problem is discovered when the system is booted, and the damaged pool is reported in the zpool status command. If the pool isn't in the zpool.cache file, it won't successfully import or open and you'll see the damaged pool messages when you attempt to import the pool.
If the pool cannot be recovered by the pool recovery method described above, you must restore the pool and all its data from backup. The mechanism you use varies widely by the pool configuration and backup strategy. First, save the configuration as displayed by zpool status so that you can recreate it once the pool is destroyed. Then, use zpool destroy -f to destroy the pool. Also, keep a file describing the layout of the datasets and the various locally set properties somewhere safe, as this information will become inaccessible if the pool is ever rendered inaccessible. With the pool configuration and dataset layout, you can reconstruct your complete configuration after destroying the pool. The data can then be populated by using whatever backup or restoration strategy you use.
ZFS is designed to be robust and stable despite errors. Even so, software bugs or certain unexpected pathologies might cause the system to panic when a pool is accessed. As part of the boot process, each pool must be opened, which means that such failures will cause a system to enter into a panic-reboot loop. In order to recover from this situation, ZFS must be informed not to look for any pools on startup.
ZFS maintains an internal cache of available pools and their configurations in /etc/zfs/zpool.cache. The location and contents of this file are private and are subject to change. If the system becomes unbootable, boot to the milestone none by using the -m milestone=none boot option. Once the system is up, remount your root file system as writable and then rename or move the /etc/zfs/zpool.cache file to another location. These actions cause ZFS to forget that any pools exist on the system, preventing it from trying to access the bad pool causing the problem. You can then proceed to a normal system state by issuing the svcadm milestone all command. You can use a similar process when booting from an alternate root to perform repairs.
Once the system is up, you can attempt to import the pool by using the zpool import command. However, doing so will likely cause the same error that occurred during boot, because the command uses the same mechanism to access pools. If multiple pools exist on the system, do the following:
Rename or move the zpool.cache file to another location as discussed above.
Determine which pool might have issues by using the fmdump -eV command to display the pools with reported fatal errors.
Import the pools one-by-one, skipping the pools that are having issues, as described in the fmdump output.