This section describes how to determine device failure types, clear transient errors, and replacing a device.
Bit rot – Over time, random events such as magnetic influences and cosmic rays can cause bits stored on disk to flip. These events are relatively rare but common enough to cause potential data corruption in large or long-running systems.
Misdirected reads or writes – Firmware bugs or hardware faults can cause reads or writes of entire blocks to reference the incorrect location on disk. These errors are typically transient, though a large number of them might indicate a faulty drive.
Administrator error – Administrators can unknowingly overwrite portions of a disk with bad data (such as copying /dev/zero over portions of the disk) that cause permanent corruption on disk. These errors are always transient.
Temporary outage– A disk might become unavailable for a period of time, causing I/Os to fail. This situation is typically associated with network-attached devices, though local disks can experience temporary outages as well. These errors might or might not be transient.
Bad or flaky hardware – This situation is a catch-all for the various problems that faulty hardware exhibits, including consistent I/O errors, faulty transports causing random corruption, or any number of failures. These errors are typically permanent.
Offline device – If a device is offline, it is assumed that the administrator placed the device in this state because it is faulty. The administrator who placed the device in this state can determine if this assumption is accurate.
Determining exactly what is wrong with a device can be a difficult process. The first step is to examine the error counts in the zpool status output. For example:
# zpool status -v tpool pool: tpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h0m with 2 errors on Tue Jul 13 11:08:37 2010 config: NAME STATE READ WRITE CKSUM tpool ONLINE 2 0 0 c1t1d0 ONLINE 2 0 0 c1t3d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /tpool/words
The errors are divided into I/O errors and checksum errors, both of which might indicate the possible failure type. Typical operation predicts a very small number of errors (just a few over long periods of time). If you are seeing a large number of errors, then this situation probably indicates impending or complete device failure. However, an administrator error can also result in large error counts. The other source of information is the syslog system log. If the log shows a large number of SCSI or Fibre Channel driver messages, then this situation probably indicates serious hardware problems. If no syslog messages are generated, then the damage is likely transient.
The goal is to answer the following question:
Is another error likely to occur on this device?
Errors that happen only once are considered transient and do not indicate potential failure. Errors that are persistent or severe enough to indicate potential hardware failure are considered fatal. The act of determining the type of error is beyond the scope of any automated software currently available with ZFS, and so much must be done manually by you, the administrator. After determination is made, the appropriate action can be taken. Either clear the transient errors or replace the device due to fatal errors. These repair procedures are described in the next sections.
Even if the device errors are considered transient, they still might have caused uncorrectable data errors within the pool. These errors require special repair procedures, even if the underlying device is deemed healthy or otherwise repaired. For more information about repairing data errors, see Repairing Damaged Data.
If the device errors are deemed transient, in that they are unlikely to affect the future health of the device, they can be safely cleared to indicate that no fatal error occurred. To clear error counters for RAID-Z or mirrored devices, use the zpool clear command. For example:
# zpool clear tank c1t1d0
This syntax clears any device errors and clears any data error counts associated with the device.
To clear all errors associated with the virtual devices in a pool, and to clear any data error counts associated with the pool, use the following syntax:
# zpool clear tank
For more information about clearing pool errors, see Clearing Storage Pool Device Errors.
If device damage is permanent or future permanent damage is likely, the device must be replaced. Whether the device can be replaced depends on the configuration.
For a device to be replaced, the pool must be in the ONLINE state. The device must be part of a redundant configuration, or it must be healthy (in the ONLINE state). If the device is part of a redundant configuration, sufficient replicas from which to retrieve good data must exist. If two disks in a four-way mirror are faulted, then either disk can be replaced because healthy replicas are available. However, if two disks in a four-way RAID-Z (raidz1) virtual device are faulted, then neither disk can be replaced because insufficient replicas from which to retrieve data exist. If the device is damaged but otherwise online, it can be replaced as long as the pool is not in the FAULTED state. However, any corrupted data on the device is copied to the new device, unless sufficient replicas with good data exist.
In the following configuration, the c1t1d0 disk can be replaced, and any data in the pool is copied from the healthy replica, c1t0d0:
mirror DEGRADED c1t0d0 ONLINE c1t1d0 FAULTED
The c1t0d0 disk can also be replaced, though no self-healing of data can take place because no good replica is available.
In the following configuration, neither faulted disk can be replaced. The ONLINE disks cannot be replaced either because the pool itself is faulted.
raidz FAULTED c1t0d0 ONLINE c2t0d0 FAULTED c3t0d0 FAULTED c4t0d0 ONLINE
In the following configuration, either top-level disk can be replaced, though any bad data present on the disk is copied to the new disk.
c1t0d0 ONLINE c1t1d0 ONLINE
If either disk is faulted, then no replacement can be performed because the pool itself is faulted.
If the loss of a device causes the pool to become faulted or the device contains too many data errors in a non-redundant configuration, then the device cannot be safely replaced. Without sufficient redundancy, no good data with which to heal the damaged device exists. In this case, the only option is to destroy the pool and re-create the configuration, and then to restore your data from a backup copy.
For more information about restoring an entire pool, see Repairing ZFS Storage Pool-Wide Damage.
After you have determined that a device can be replaced, use the zpool replace command to replace the device. If you are replacing the damaged device with different device, use syntax similar to the following:
# zpool replace tank c1t1d0 c2t0d0
This command migrates data to the new device from the damaged device or from other devices in the pool if it is in a redundant configuration. When the command is finished, it detaches the damaged device from the configuration, at which point the device can be removed from the system. If you have already removed the device and replaced it with a new device in the same location, use the single device form of the command. For example:
# zpool replace tank c1t1d0
This command takes an unformatted disk, formats it appropriately, and then resilvers data from the rest of the configuration.
For more information about the zpool replace command, see Replacing Devices in a Storage Pool.
Example 11-1 Replacing a Device in a ZFS Storage Pool
The following example shows how to replace a device (c1t3d0) in a mirrored storage pool tank on Oracle's Sun Fire x4500 system. To replace the disk c1t3d0 with a new disk at the same location (c1t3d0), then you must unconfigure the disk before you attempt to replace it. The basic steps follow:
Take offline the disk (c1t3d0)to be replaced. You cannot unconfigure a disk that is currently being used.
Use the cfgadm command to identify the disk (c1t3d0) to be unconfigured and unconfigure it. The pool will be degraded with the offline disk in this mirrored configuration, but the pool will continue to be available.
Physically replace the disk (c1t3d0). Ensure that the blue Ready to Remove LED is illuminated before you physically remove the faulted drive.
Reconfigure the disk (c1t3d0).
Bring the new disk (c1t3d0) online.
Run the zpool replace command to replace the disk (c1t3d0).
Note - If you had previously set the pool property autoreplace to on, then any new device, found in the same physical location as a device that previously belonged to the pool is automatically formatted and replaced without using the zpool replace command. This feature might not be supported on all hardware.
If a failed disk is automatically replaced with a hot spare, you might need to detach the hot spare after the failed disk is replaced. For example, if c2t4d0 is still an active hot spare after the failed disk is replaced, then detach it.
# zpool detach tank c2t4d0
The following example walks through the steps to replace a disk in a ZFS storage pool.
# zpool offline tank c1t3d0 # cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0 disk connected configured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3 disk connected unconfigured ok <Physically replace the failed disk c1t3d0> # cfgadm -c configure sata1/3 # cfgadm | grep sata1/3 sata1/3::dsk/c1t3d0 disk connected configured ok # zpool online tank c1t3d0 # zpool replace tank c1t3d0 # zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Feb 2 13:17:32 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors
Note that the preceding zpool output might show both the new and old disks under a replacing heading. For example:
replacing DEGRADED 0 0 0 c1t3d0s0/o FAULTED 0 0 0 c1t3d0 ONLINE 0 0 0
This text means that the replacement process is in progress and the new disk is being resilvered.
If you are going to replace a disk (c1t3d0) with another disk (c4t3d0), then you only need to run the zpool replace command. For example:
# zpool replace tank c1t3d0 c4t3d0 # zpool status pool: tank state: DEGRADED scrub: resilver completed after 0h0m with 0 errors on Tue Feb 2 13:35:41 2010 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror-2 DEGRADED 0 0 0 c0t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 c1t3d0 OFFLINE 0 0 0 c4t3d0 ONLINE 0 0 0 errors: No known data errors
You might need to run the zpool status command several times until the disk replacement is completed.
# zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Tue Feb 2 13:35:41 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0
Example 11-2 Replacing a Failed Log Device
The following example shows how to recover from a failed log device (c0t5d0) in the storage pool (pool). The basic steps follow:
Review the zpool status -x output and FMA diagnostic message, described here:
Physically replace the failed log device.
Bring the new log device online.
Clear the pool's error condition.
# zpool status -x pool: pool state: FAULTED status: One or more of the intent logs could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. scrub: none requested config: NAME STATE READ WRITE CKSUM pool FAULTED 0 0 0 bad intent log mirror ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 logs FAULTED 0 0 0 bad intent log c0t5d0 UNAVAIL 0 0 0 cannot open <Physically replace the failed log device> # zpool online pool c0t5d0 # zpool clear pool
# zpool status -x pool: pool state: FAULTED status: One or more of the intent logs could not be read. Waiting for adminstrator intervention to fix the faulted pool. action: Either restore the affected device(s) and run 'zpool online', or ignore the intent log records by running 'zpool clear'. scrub: none requested config: NAME STATE READ WRITE CKSUM pool FAULTED 0 0 0 bad intent log mirror-0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 logs FAULTED 0 0 0 bad intent log c0t5d0 UNAVAIL 0 0 0 cannot open <Physically replace the failed log device> # zpool online pool c0t5d0 # zpool clear pool
The process of replacing a device can take an extended period of time, depending on the size of the device and the amount of data in the pool. The process of moving data from one device to another device is known as resilvering and can be monitored by using the zpool status command.
Traditional file systems resilver data at the block level. Because ZFS eliminates the artificial layering of the volume manager, it can perform resilvering in a much more powerful and controlled manner. The two main advantages of this feature are as follows:
ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-GB disk can take seconds if a pool has only a few gigabytes of used disk space.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.
To view the resilvering process, use the zpool status command. For example:
# zpool status tank pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 22.60% done, 0h1m to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 c1t0d0 UNAVAIL 0 0 0 cannot open c2t0d0 ONLINE 0 0 0 85.0M resilvered c1t1d0 ONLINE 0 0 0 errors: No known data errors
In this example, the disk c1t0d0 is being replaced by c2t0d0. This event is observed in the status output by the presence of the replacing virtual device in the configuration. This device is not real, nor is it possible for you to create a pool by using it. The purpose of this device is solely to display the resilvering progress and to identify which device is being replaced.
Note that any pool currently undergoing resilvering is placed in the ONLINE or DEGRADED state because the pool cannot provide the desired level of redundancy until the resilvering process is completed. Resilvering proceeds as fast as possible, though the I/O is always scheduled with a lower priority than user-requested I/O, to minimize impact on the system. After the resilvering is completed, the configuration reverts to the new, complete, configuration. For example:
# zpool status tank pool: tank state: ONLINE scrub: resilver completed after 0h1m with 0 errors on Tue Feb 2 13:54:30 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 377M resilvered c1t1d0 ONLINE 0 0 0 errors: No known data errors
The pool is once again ONLINE, and the original failed disk (c1t0d0) has been removed from the configuration.