Managing ZFS File Systems in Oracle® Solaris 11.2

Exit Print View

Updated: December 2014
 
 

Designating Hot Spares in Your Storage Pool

The hot spares feature enables you to identify disks that could be used to replace a failed or faulted device in a storage pool. Designating a device as a hot spare means that the device is not an active device in the pool, but if an active device in the pool fails, the hot spare automatically replaces the failed device.

Devices can be designated as hot spares in the following ways:

  • When the pool is created with the zpool create command.

  • After the pool is created with the zpool add command.

The following example shows how to designate devices as hot spares when the pool is created:

# zpool create zeepool mirror c0t5000C500335F95E3d0 c0t5000C500335F907Fd0
mirror c0t5000C500335BD117d0 c0t5000C500335DC60Fd0 spare c0t5000C500335E106Bd0 c0t5000C500335FC3E7d0
# zpool status zeepool
pool: zeepool
state: ONLINE
scan: none requested
config:

NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   ONLINE       0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335DC60Fd0   ONLINE       0      0      0
   spares
      c0t5000C500335E106Bd0    AVAIL
      c0t5000C500335FC3E7d0    AVAIL

errors: No known data errors

The following example shows how to designate hot spares by adding them to a pool after the pool is created:

# zpool add zeepool spare c0t5000C500335E106Bd0 c0t5000C500335FC3E7d0
# zpool status zeepool
pool: zeepool
state: ONLINE
scan: none requested
config:

NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   ONLINE       0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335DC60Fd0   ONLINE       0      0      0
   spares
      c0t5000C500335E106Bd0    AVAIL
      c0t5000C500335FC3E7d0    AVAIL

errors: No known data errors

Hot spares can be removed from a storage pool by using the zpool remove command. For example:

# zpool remove zeepool c0t5000C500335FC3E7d0
# zpool status zeepool
pool: zeepool
state: ONLINE
scan: none requested
config:

NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   ONLINE       0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335DC60Fd0   ONLINE       0      0      0
   spares
      c0t5000C500335E106Bd0    AVAIL

errors: No known data errors

A hot spare cannot be removed if it is currently used by a storage pool.

Consider the following when using ZFS hot spares:

  • Currently, the zpool remove command can only be used to remove hot spares, cache devices, and log devices.

  • To add a disk as a hot spare, the hot spare must be equal to or larger than the size of the largest disk in the pool. Adding a smaller disk as a spare to a pool is allowed. However, when the smaller spare disk is activated, either automatically or with the zpool replace command, the operation fails with an error similar to the following:

    cannot replace disk3 with disk4: device is too small
  • You cannot share a spare across systems.

    You cannot configure multiple systems to share a spare even if the disk is visible for access by these systems. If a disk is configured to be shared among several pools, only a single system must control all of these pools.

  • Consider that if you share a spare between two data pools on the same system, you must coordinate the use of the spare between the two pools. For example, pool A has the spare in use and pool A s exported. Pool B could unknowingly use the spare while pool A is exported. When pool A is imported, data corruption could occur because both pools are using the same disk. Therefore, be aware of such edge cases where even though a disk is a shared spare for several pools, conditions might exist that would trigger problems for the pools.

  • Do not share a spare between a root pool and a data pool.

Activating and Deactivating Hot Spares in Your Storage Pool

Hot spares are activated in the following ways:

  • Manual replacement – You replace a failed device in a storage pool with a hot spare by using the zpool replace command.

  • Automatic replacement – When a fault is detected, an FMA agent examines the pool to determine if it has any available hot spares. If so, it replaces the faulted device with an available spare.

    If a hot spare that is currently in use fails, the FMA agent detaches the spare and thereby cancels the replacement. The agent then attempts to replace the device with another hot spare, if one is available. This feature is currently limited by the fact that the ZFS diagnostic engine only generates faults when a device disappears from the system.

    If you physically replace a failed device with an active spare, you can reactivate the original device by using the zpool detach command to detach the spare. If you set the autoreplace pool property to on, the spare is automatically detached and returned to the spare pool when the new device is inserted and the online operation completes.

An UNAVAIL device is automatically replaced if a hot spare is available. For example:

# zpool status -x
pool: zeepool
state: DEGRADED
status: One or more devices are unavailable in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or 'fmadm repaired', or replace the device
with 'zpool replace'.
Run 'zpool status -v' to see device specific details.
scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 16:46:19 2012
config:


NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   DEGRADED     0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
   spare-1                    DEGRADED   449      0      0
      c0t5000C500335DC60Fd0   UNAVAIL      0      0      0
      c0t5000C500335E106Bd0   ONLINE       0      0      0
   spares
      c0t5000C500335E106Bd0      INUSE

errors: No known data errors

Currently, you can deactivate a hot spare in the following ways:

  • By removing the hot spare from the storage pool.

  • By detaching a hot spare after a failed disk is physically replaced. See Example 3–8.

  • By temporarily or permanently swapping in another hot spare. See Example 3–9.

Example 3-8  Detaching a Hot Spare After the Failed Disk Is Replaced

In this example, the failed disk (c0t5000C500335DC60Fd0) is physically replaced and ZFS is notified by using the zpool replace command.

# zpool replace zeepool c0t5000C500335DC60Fd0
# zpool status zeepool
pool: zeepool
state: ONLINE
scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 16:53:43 2012
config:

NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   ONLINE       0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335DC60Fd0   ONLINE       0      0      0
   spares
      c0t5000C500335E106Bd0    AVAIL   

If necessary, you can use the zpool detach command to return the hot spare back to the spare pool. For example:

# zpool detach zeepool c0t5000C500335E106Bd0
Example 3-9  Detaching a Failed Disk and Using the Hot Spare

If you want to replace a failed disk by temporarily or permanently swapping in the hot spare that is currently replacing it, then detach the original (failed) disk. If the failed disk is eventually replaced, then you can add it back to the storage pool as a spare. For example:

# zpool status zeepool
pool: zeepool
state: DEGRADED
status: One or more devices are unavailable in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or 'fmadm repaired', or replace the device
with 'zpool replace'.
Run 'zpool status -v' to see device specific details.
scan: scrub in progress since Thu Jun 21 17:01:49 2012
1.07G scanned out of 6.29G at 220M/s, 0h0m to go
0 repaired, 17.05% done
config:
NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   DEGRADED     0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335DC60Fd0   UNAVAIL      0      0      0
   spares
      c0t5000C500335E106Bd0    AVAIL

errors: No known data errors
# zpool detach zeepool c0t5000C500335DC60Fd0
# zpool status zeepool
pool: zeepool
state: ONLINE
scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 17:02:35 2012
config:

NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   DEGRADED     0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335E106Bd0   ONLINE       0      0      0

errors: No known data errors
(Original failed disk c0t5000C500335DC60Fd0 is physically replaced)
# zpool add zeepool spare c0t5000C500335DC60Fd0
# zpool status zeepool
pool: zeepool
state: ONLINE
scan: resilvered 3.15G in 0h0m with 0 errors on Thu Jun 21 17:02:35 2012
config:

NAME                          STATE     READ  WRITE  CKSUM
zeepool                       ONLINE       0      0      0
   mirror-0                   ONLINE       0      0      0
      c0t5000C500335F95E3d0   ONLINE       0      0      0
      c0t5000C500335F907Fd0   ONLINE       0      0      0
   mirror-1                   DEGRADED     0      0      0
      c0t5000C500335BD117d0   ONLINE       0      0      0
      c0t5000C500335E106Bd0   ONLINE       0      0      0
   spares
      c0t5000C500335DC60Fd0    AVAIL

errors: No known data errors

After a disk is replaced and the spare is detached, let FMA know that the disk is repaired.

# fmadm faulty
# fmadm repaired zfs://pool=name/vdev=guid