Replacing a Device in a ZFS Storage Pool

After you have determined that a device can be replaced, use the zpool replace command to replace the device. If you are replacing the damaged device with different device, use syntax similar to the following:

$ zpool replace system1 c1t1d0 c2t0d0

This command migrates data to the new device from the damaged device or from other devices in the pool if it is in a redundant configuration. When the command is finished, it detaches the damaged device from the configuration, at which point the device can be removed from the system. If you have already removed the device and replaced it with a new device in the same location, use the single device form of the command. For example:

$ zpool replace system1 c1t1d0

This command takes an unformatted disk, formats it appropriately, and then resilvers data from the rest of the configuration.

For more information about the zpool replace command, see Replacing Devices in a Storage Pool.

Example 11-1 Replacing a SATA Disk in a ZFS Storage Pool

The following example shows how to replace a device (c1t3d0) in a mirrored storage pool system1 on a system with SATA devices. To replace the disk c1t3d0 with a new disk at the same location (c1t3d0), then you must unconfigure the disk before you attempt to replace it. If the disk to be replaced is not a SATA disk, then see Replacing Devices in a Storage Pool.

The basic steps follow:

  • Take offline the disk (c1t3d0)to be replaced. You cannot unconfigure a SATA disk that is currently being used.

  • Use the cfgadm command to identify the SATA disk (c1t3d0) to be unconfigured and unconfigure it. The pool will be degraded with the offline disk in this mirrored configuration, but the pool will continue to be available.

  • Physically replace the disk (c1t3d0). Ensure that the blue Ready to Remove LED is illuminated before you physically remove the UNAVAIL drive, if available.

  • Reconfigure the SATA disk (c1t3d0).

  • Bring the new disk (c1t3d0) online.

  • Run the zpool replace command to replace the disk (c1t3d0).

    Note:

    If you had previously set the pool property autoreplace to on, then any new device, found in the same physical location as a device that previously belonged to the pool is automatically formatted and replaced without using the zpool replace command. This feature might not be supported on all hardware.
  • If a failed disk is automatically replaced with a hot spare, you might need to detach the hot spare after the failed disk is replaced. For example, if c2t4d0 is still an active hot spare after the failed disk is replaced, then detach it.

    $ zpool detach system1 c2t4d0
  • If FMA is reporting the failed device, then you should clear the device failure.

    $ fmadm faulty
    $ fmadm repaired zfs://pool=name/vdev=guid

The following example walks through the steps to replace a disk in a ZFS storage pool.

$ zpool offline system1 c1t3d0
$ cfgadm | grep c1t3d0
sata1/3::dsk/c1t3d0            disk         connected    configured   ok
$ cfgadm -c unconfigure sata1/3
Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:3
This operation will suspend activity on the SATA device
Continue (yes/no)? yes
$ cfgadm | grep sata1/3
sata1/3                        disk         connected    unconfigured ok
<Physically replace the failed disk c1t3d0>
$ cfgadm -c configure sata1/3
$ cfgadm | grep sata1/3
sata1/3::dsk/c1t3d0            disk         connected    configured   ok
$ zpool online system1 c1t3d0
$ zpool replace system1 c1t3d0
# zpool status system1
pool: system1
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Tue Feb  2 13:17:32 2010
config:

NAME           STATE     READ WRITE CKSUM
system1        ONLINE       0     0     0
   mirror-0    ONLINE       0     0     0
      c0t1d0   ONLINE       0     0     0
      c1t1d0   ONLINE       0     0     0
   mirror-1    ONLINE       0     0     0
      c0t2d0   ONLINE       0     0     0
      c1t2d0   ONLINE       0     0     0
   mirror-2    ONLINE       0     0     0
      c0t3d0   ONLINE       0     0     0
      c1t3d0   ONLINE       0     0     0

errors: No known data errors

Note that the preceding zpool output might show both the new and old disks under a replacing heading. For example:

replacing   DEGRADED     0     0    0
c1t3d0s0/o  FAULTED      0     0    0
c1t3d0      ONLINE       0     0    0

This text means that the replacement process is in progress and the new disk is being resilvered.

If you are going to replace a disk (c1t3d0) with another disk (c4t3d0), then you only need to run the zpool replace command. For example:

$ zpool replace system1 c1t3d0 c4t3d0
$ zpool status
pool: system1
state: DEGRADED
scrub: resilver completed after 0h0m with 0 errors on Tue Feb  2 13:35:41 2010
config:

NAME           STATE     READ WRITE CKSUM
system1        DEGRADED     0     0     0
   mirror-0    ONLINE       0     0     0
      c0t1d0   ONLINE       0     0     0
      c1t1d0   ONLINE       0     0     0
   mirror-1    ONLINE       0     0     0
      c0t2d0   ONLINE       0     0     0
      c1t2d0   ONLINE       0     0     0
   mirror-2    DEGRADED     0     0     0
      c0t3d0   ONLINE       0     0     0
   replacing   DEGRADED     0     0     0
      c1t3d0   OFFLINE      0     0     0
      c4t3d0   ONLINE       0     0     0

errors: No known data errors

You might need to run the zpool status command several times until the disk replacement is completed.

$ zpool status system1
pool: system1
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Tue Feb  2 13:35:41 2010
config:

NAME          STATE     READ WRITE CKSUM
system1       ONLINE       0     0     0
   mirror-0   ONLINE       0     0     0
      c0t1d0  ONLINE       0     0     0
      c1t1d0  ONLINE       0     0     0
   mirror-1   ONLINE       0     0     0
      c0t2d0  ONLINE       0     0     0
      c1t2d0  ONLINE       0     0     0
   mirror-2   ONLINE       0     0     0
      c0t3d0  ONLINE       0     0     0
      c4t3d0  ONLINE       0     0     0

Example 11-2 Replacing a Failed Log Device

ZFS identifies intent log failures in the zpool status command output. Fault Management Architecture (FMA) reports these errors as well. Both ZFS and FMA describe how to recover from an intent log failure.

The following example shows how to recover from a failed log device (c0t5d0) in the storage pool (storpool). The basic steps follow:

  • Review the zpool status -x output and FMA diagnostic message, described in ZFS intent log read failure (Doc ID 1021625.1) in https://support.oracle.com/.

  • Physically replace the failed log device.

  • Bring the new log device online.

  • Clear the pool's error condition.

  • Clear the FMA error.

For example, if the system shuts down abruptly before synchronous write operations are committed to a pool with a separate log device, you see messages similar to the following:

$ zpool status -x
pool: storpool
state: FAULTED
status: One or more of the intent logs could not be read.
Waiting for administrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
scrub: none requested
config:

NAME           STATE     READ WRITE CKSUM
storpool       FAULTED      0     0     0 bad intent log
   mirror-0    ONLINE       0     0     0
      c0t1d0   ONLINE       0     0     0
      c0t4d0   ONLINE       0     0     0
   logs        FAULTED      0     0     0 bad intent log
      c0t5d0   UNAVAIL      0     0     0 cannot open
<Physically replace the failed log device>
$ zpool online storpool c0t5d0
$ zpool clear storpool
$ fmadm faulty
$ fmadm repair zfs://pool=name/vdev=guid

You can resolve the log device failure in the following ways:

  • Replace or recover the log device. In this example, the log device is c0t5d0.

  • Bring the log device back online.

    $ zpool online storpool c0t5d0
  • Reset the failed log device error condition.

    $ zpool clear storpool

To recover from this error without replacing the failed log device, you can clear the error with the zpool clear command. In this scenario, the pool will operate in a degraded mode and the log records will be written to the main pool until the separate log device is replaced.

Consider using mirrored log devices to avoid the log device failure scenario.