The hot spares feature enables you to identify disks that could be used to replace a failed or faulted device in a storage pool. Designating a device as a hot spare means that the device is not an active device in the pool, but if an active device in the pool fails, the hot spare automatically replaces the failed device.
Devices can be designated as hot spares in the following ways:
When the pool is created with the zpool create command.
After the pool is created with the zpool add command.
The following example shows how to designate devices as hot spares when the pool is created:
# zpool create zeepool mirror c0t5000C500335F95E3d0 c0t5000C500335F907Fd0 mirror c0t5000C500335BD117d0 c0t5000C500335DC60Fd0 spare c0t5000C500335E106Bd0 c0t5000C500335FC3E7d0 # zpool status zeepool pool: zeepool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335DC60Fd0 ONLINE 0 0 0 spares c0t5000C500335E106Bd0 AVAIL c0t5000C500335FC3E7d0 AVAIL errors: No known data errors
The following example shows how to designate hot spares by adding them to a pool after the pool is created:
# zpool add zeepool spare c0t5000C500335E106Bd0 c0t5000C500335FC3E7d0 # zpool status zeepool pool: zeepool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335DC60Fd0 ONLINE 0 0 0 spares c0t5000C500335E106Bd0 AVAIL c0t5000C500335FC3E7d0 AVAIL errors: No known data errors
Hot spares can be removed from a storage pool by using the zpool remove command. For example:
# zpool remove zeepool c0t5000C500335FC3E7d0 # zpool status zeepool pool: zeepool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335DC60Fd0 ONLINE 0 0 0 spares c0t5000C500335E106Bd0 AVAIL errors: No known data errors
A hot spare cannot be removed if it is currently used by a storage pool.
Consider the following when using ZFS hot spares:
Currently, the zpool remove command can only be used to remove hot spares, cache devices, and log devices.
To add a disk as a hot spare, the hot spare must be equal to or larger than the size of the largest disk in the pool. Adding a smaller disk as a spare to a pool is allowed. However, when the smaller spare disk is activated, either automatically or with the zpool replace command, the operation fails with an error similar to the following:
cannot replace disk3 with disk4: device is too small
You cannot share a spare across systems.
You cannot configure multiple systems to share a spare even if the disk is visible for access by these systems. If a disk is configured to be shared among several pools, only a single system must control all of these pools.
Consider that if you share a spare between two data pools on the same system, you must coordinate the use of the spare between the two pools. For example, pool A has the spare in use and pool A s exported. Pool B could unknowingly use the spare while pool A is exported. When pool A is imported, data corruption could occur because both pools are using the same disk. Therefore, be aware of such edge cases where even though a disk is a shared spare for several pools, conditions might exist that would trigger problems for the pools.
Do not share a spare between a root pool and a data pool.
Hot spares are activated in the following ways:
Manual replacement – You replace a failed device in a storage pool with a hot spare by using the zpool replace command.
Automatic replacement – When a fault is detected, an FMA agent examines the pool to determine if it has any available hot spares. If so, it replaces the faulted device with an available spare.
If a hot spare that is currently in use fails, the FMA agent detaches the spare and thereby cancels the replacement. The agent then attempts to replace the device with another hot spare, if one is available. This feature is currently limited by the fact that the ZFS diagnostic engine only generates faults when a device disappears from the system.
If you physically replace a failed device with an active spare, you can reactivate the original device by using the zpool detach command to detach the spare. If you set the autoreplace pool property to on, the spare is automatically detached and returned to the spare pool when the new device is inserted and the online operation completes.
An UNAVAIL device is automatically replaced if a hot spare is available. For example:
# zpool status -x pool: zeepool state: DEGRADED status: One or more devices are unavailable in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or 'fmadm repaired', or replace the device with 'zpool replace'. Run 'zpool status -v' to see device specific details. scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 16:46:19 2012 config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 spare-1 DEGRADED 449 0 0 c0t5000C500335DC60Fd0 UNAVAIL 0 0 0 c0t5000C500335E106Bd0 ONLINE 0 0 0 spares c0t5000C500335E106Bd0 INUSE errors: No known data errors
Currently, you can deactivate a hot spare in the following ways:
By removing the hot spare from the storage pool.
By detaching a hot spare after a failed disk is physically replaced. See Example 3–8.
By temporarily or permanently swapping in another hot spare. See Example 3–9.
In this example, the failed disk (c0t5000C500335DC60Fd0) is physically replaced and ZFS is notified by using the zpool replace command.
# zpool replace zeepool c0t5000C500335DC60Fd0 # zpool status zeepool pool: zeepool state: ONLINE scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 16:53:43 2012 config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335DC60Fd0 ONLINE 0 0 0 spares c0t5000C500335E106Bd0 AVAIL
If necessary, you can use the zpool detach command to return the hot spare back to the spare pool. For example:
# zpool detach zeepool c0t5000C500335E106Bd0Example 3-9 Detaching a Failed Disk and Using the Hot Spare
If you want to replace a failed disk by temporarily or permanently swapping in the hot spare that is currently replacing it, then detach the original (failed) disk. If the failed disk is eventually replaced, then you can add it back to the storage pool as a spare. For example:
# zpool status zeepool pool: zeepool state: DEGRADED status: One or more devices are unavailable in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or 'fmadm repaired', or replace the device with 'zpool replace'. Run 'zpool status -v' to see device specific details. scan: scrub in progress since Thu Jun 21 17:01:49 2012 1.07G scanned out of 6.29G at 220M/s, 0h0m to go 0 repaired, 17.05% done config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335DC60Fd0 UNAVAIL 0 0 0 spares c0t5000C500335E106Bd0 AVAIL errors: No known data errors # zpool detach zeepool c0t5000C500335DC60Fd0 # zpool status zeepool pool: zeepool state: ONLINE scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 17:02:35 2012 config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335E106Bd0 ONLINE 0 0 0 errors: No known data errors (Original failed disk c0t5000C500335DC60Fd0 is physically replaced) # zpool add zeepool spare c0t5000C500335DC60Fd0 # zpool status zeepool pool: zeepool state: ONLINE scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 17:02:35 2012 config: NAME STATE READ WRITE CKSUM zeepool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335E106Bd0 ONLINE 0 0 0 spares c0t5000C500335DC60Fd0 AVAIL errors: No known data errors
After a disk is replaced and the spare is detached, let FMA know that the disk is repaired.
# fmadm faulty # fmadm repaired zfs://pool=name/vdev=guid