Oracle® Solaris 11.2 Release Notes

Exit Print View

Updated: May 2015
 
 

ZFS Should Retry or Abort an Entire Transaction When a WCE LUN Gets a Power-On-Reset (15662604)

ZFS enables the write cache on pool devices and safely handles cache flushing in the event of a system power loss. However, a power-on-reset condition can potentially occur while data has not yet been committed to stable storage.

In an environment with no single point of failure, this situation is automatically detected and corrected by ZFS the next time the data is read. Routine pool scrubs of the pool can increase the detection and repair of any lost writes.

In an environment with a single point of failure, this problem could lead to data loss.

This problem might also occur more frequently when accessing LUNs that are exported from a clustered configuration. During cluster failover, data cached by the failing head may be lost due to a power-on-reset event that is explicitly sent by the SCSI target on the surviving head. In this situation, even pools with no single point of failure might be affected.

A symptom of this issue is clusters of persistent checksum errors. You can use the output from fmdump –eV to determine whether the checksum errors have been diagnosed as persistent. The zio_txg entry in the fmdump –eV output represents the time that a block of data is written. Note that a pattern of persistent checksum errors could also be a symptom of failing devices, software, or hardware.

Workaround: For systems that rely on LUNs exported from a cluster or systems with a single point of failure, consider disabling the write cache for devices on a system.

Perform the following steps to disable the write cache and suppress cache flushing for SCSI (sd) or FC (ssd) devices.

  1. Copy either the /kernel/drv/sd.conf file or the /kernel/drv/ssd.conf file into the /etc/driver/drv directory, depending on your storage devices.

  2. Edit either the /etc/driver/drv/sd.conf file or the /etc/driver/drv/ssd.conf file to disable the write cache and suppress cache flushing.

  3. Add lines to replace the VID, PID, or SUN COMSTAR values with the appropriate values described on the sd (7D) man page.

    SPARC system:

    sd-config-list="SUN COMSTAR","disable-cache-suppress-flush";
    disable-cache-suppress-flush=1,0x40010,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1;
    

    x64 system:

    sd-config-list="SUN COMSTAR","disable-cache-suppress-flush";
    disable-cache-suppress-flush=1,0x40008,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1;
    
  4. Reboot the system and override the fast reboot option.

    # reboot -p

Note -  Applying the workaround could cause a reduction in system performance.