ZFS provides an integrated method of examining pool and device health. The health of a pool is determined from the state of all its devices. This state information is displayed by using the zpool status command. In addition, potential pool and device failures are reported by fmd and are displayed on the system console and logged in the /var/adm/messages file. This section describes how to determine pool and device health. This chapter does not document how to repair or recover from unhealthy pools. For more information on troubleshooting and data recovery, see Chapter 11, ZFS Troubleshooting and Pool Recovery.
Each device can fall into one of the following states:
The device or virtual device is in normal working order. While some transient errors might still occur, the device is otherwise in working order.
The virtual device has experienced failure but is still able to function. This state is most common when a mirror or RAID-Z device has lost one or more constituent devices. The fault tolerance of the pool might be compromised, as a subsequent fault in another device might be unrecoverable.
The device or virtual device is completely inaccessible. This status typically indicates total failure of the device, such that ZFS is incapable of sending or receiving data from it. If a top-level virtual device is in this state, then the pool is completely inaccessible.
The device has been explicitly taken offline by the administrator.
The device or virtual device cannot be opened. In some cases, pools with UNAVAIL devices appear in DEGRADED mode. If a top-level virtual device is unavailable, then nothing in the pool can be accessed.
The device was physically removed while the system was running. Device removal detection is hardware-dependent and might not be supported on all platforms.
The health of a pool is determined from the health of all its top-level virtual devices. If all virtual devices are ONLINE, then the pool is also ONLINE. If any one of the virtual devices is DEGRADED or UNAVAIL, then the pool is also DEGRADED. If a top-level virtual device is FAULTED or OFFLINE, then the pool is also FAULTED. A pool in the faulted state is completely inaccessible. No data can be recovered until the necessary devices are attached or repaired. A pool in the degraded state continues to run, but you might not achieve the same level of data redundancy or data throughput than if the pool were online.
# zpool status -x all pools are healthy
Specific pools can be examined by specifying a pool name to the command. Any pool that is not in the ONLINE state should be investigated for potential problems, as described in the next section.
You can request a more detailed health summary by using the -v option. For example:
# zpool status -v tank pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c1t0d0 FAULTED 0 0 0 cannot open c1t1d0 ONLINE 0 0 0 errors: No known data errors
This output displays a complete description of why the pool is in its current state, including a readable description of the problem and a link to a knowledge article for more information. Each knowledge article provides up-to-date information on the best way to recover from your current problem. Using the detailed configuration information, you should be able to determine which device is damaged and how to repair the pool.
In the above example, the faulted device should be replaced. After the device is replaced, use the zpool online command to bring the device back online. For example:
# zpool online tank c1t0d0 Bringing device c1t0d0 online # zpool status -x all pools are healthy
If the autoreplace property is on, you might not have to online the replaced device.
If a pool has an offlined device, the command output identifies the problem pool. For example:
# zpool status -x pool: tank state: DEGRADED status: One or more devices has been taken offline by the adminstrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 OFFLINE 0 0 0 errors: No known data errors
The READ and WRITE columns provides a count of I/O errors seen on the device, while the CKSUM column provides a count of uncorrectable checksum errors that occurred on the device. Both of these error counts likely indicate potential device failure, and some corrective action is needed. If non-zero errors are reported for a top-level virtual device, portions of your data might have become inaccessible.
The errors: field identifies any known data errors.
In the example output above, the offlined device is not causing data errors.
For more information about diagnosing and repairing faulted pools and data, see Chapter 11, ZFS Troubleshooting and Pool Recovery.