ZFS is designed to work with storage devices that manage a disk-level cache. ZFS commonly asks the storage device to ensure that data is safely placed on stable storage by requesting a cache flush. For JBOD storage, this works as designed and without problems. For many NVRAM-based storage arrays, a performance problem might occur if the array takes the cache flush request and actually does something with it, rather than ignoring it. Some storage arrays flush their large caches despite the fact that the NVRAM protection makes those caches as good as stable storage.
ZFS issues infrequent flushes (every 5 second or so) after the uberblock updates. The flushing infrequency is fairly inconsequential so no tuning is warranted here. ZFS also issues a flush every time an application requests a synchronous write (O_DSYNC, fsync, NFS commit, and so on). The completion of this type of flush is waited upon by the application and impacts performance. Greatly so, in fact. From a performance standpoint, this neutralizes the benefits of having an NVRAM-based storage.
Cache flush tuning was recently shown to help flash device performance when used as log devices. When all LUNs exposed to ZFS come from NVRAM-protected storage array and procedures ensure that no unprotected LUNs will be added in the future, ZFS can be tuned to not issue the flush requests by setting zfs_nocacheflush. If some LUNs exposed to ZFS are not protected by NVRAM, then this tuning can lead to data loss, application level corruption, or even pool corruption. In some NVRAM-protected storage arrays, the cache flush command is a no-op, so tuning in this situation makes no performance difference.
A recent OS change is that the flush request semantic has been qualified to instruct storage devices to ignore the requests if they have the proper protection. This change requires a fix to our disk drivers and for the NVRAM device to support the updated semantics. If the NVRAM device does not recognize this improvement, use these instructions to tell the Solaris OS not to send any synchronize cache commands to the array. If you use these instructions, make sure all targeted LUNS are indeed protected by NVRAM.
Occasionally, flash and NVRAM devices do not properly advertise to the OS that they are non-volatile devices, and that caches do not need to be flushed. Cache flushing is an expensive operation. Unnecessary flushes can drastically impede performance in some cases.
Review the following zfs_nocacheflush syntax restrictions before applying the tuning entries below:
The tuning syntax below can be included in sd.conf but there must be only a single sd-config-list entry per vendor/product.
If multiple devices entries are desired, multiple pairs of vendor IDs and sd tuning strings can be specified on the same line by using the following syntax:
# "012345670123456789012345","tuning ", sd-config-list="|-VID1-||-----PID1-----|","param1:val1, param2:val2", "|-VIDN-||-----PIDN-----|","param1:val1, param3:val3";
Make sure the vendor ID (VID) string is padded to 8 characters and the Product ID (PID) string is padded to 16 characters as described in the preceding example.
Caution - All cache sync commands are ignored by the device. Use at your own risk.
Use the format utility to run the inquiry subcommand on a LUN from the storage array. For example:
# format . . . Specify disk (enter its number): x format> inquiry Vendor: ATA Product: Marvell Revision: XXXX format>
Select one of the following based on your architecture:
For all devices, copy the file /kernel/drv/sd.conf to the /etc/driver/drv/sd.conf file.
For F40 flash devices, add the following entry to /kernel/drv/sd.conf. In the entry below, ensure that ATA is padded to 8 characters, and 3E128-TS2-550B01 contains 16 characters. Total string length is 24.
sd-config-list="ATA 3E128-TS2-550B01","disksort:false, cache-nonvolatile:true, physical-block-size:4096";
For F80 flash devices, add the following entry to /kernel/drv/sd.conf. Ensure that ATA is padded to 8 characters, and 3E128-TS2-550B01 contains 16 characters. Total string length is 24.
sd-config-list="ATA 2E256-TU2-510B00","disksort:false, cache-nonvolatile:true, physical-block-size:4096";
For F20 and F5100 flash devices, choose one of the following based on your architecture. In the entries below, ATA is padded to 8 characters, and MARVELL SD88SA02 contains 16 characters. The total string length is 24.
Add the following entry to /etc/driver/drv/sd.conf
sd-config-list="ATA MARVELL SD88SA02","throttle-max:32, disksort:false, cache-nonvolatile:true";
Carefully add whitespace to make the vendor ID (VID) 8 characters long (here ATA) and Product ID (PID) 16 characters long (here MARVELL) in the sd-config-list entry as illustrated.
Reboot the system.
You can tune zfs_nocacheflush back to it's default value (0) with no adverse effect on performance.
Confirm that the flush behavior is correct.
Use the script provided in Appendix A, System Check Script for verification.