The following sections provide recommended practices for creating and monitoring ZFS storage pools. For information about troubleshooting storage pool problems, see Oracle Solaris ZFS Troubleshooting and Pool Recovery.
Keep system up-to-date with latest Oracle Solaris updates and releases
Confirm that your controller honors cache flush commands so that you know your data is safely written, which is important before changing the pool's devices or splitting a mirrored storage pool. This is generally not a problem on Oracle/Sun hardware, but it is good practice to confirm that your hardware's cache flushing setting is enabled.
Size memory requirements to actual system workload
With a known application memory footprint, such as for a database application, you might cap the ARC size so that the application will not need to reclaim its necessary memory from the ZFS cache.
Consider deduplication memory requirements
Identify ZFS memory usage with the following command:
$ mdb -k > ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 388117 1516 19% ZFS File Data 81321 317 4% Anon 29928 116 1% Exec and libs 1359 5 0% Page cache 4890 19 0% Free (cachelist) 6030 23 0% Free (freelist) 1581183 6176 76% Total 2092828 8175 Physical 2092827 8175 > $q
See Document 1663862.1, Memory Management Between ZFS and Applications in Oracle Solaris 11.x, in My Oracle Support (MOS) for tips on tuning the ZFS ARC cache. This document includes a script which you can use to modify the user_reserver_hint_pct memory management parameter.
Consider using ECC memory to protect against memory corruption. Silent memory corruption can potentially damage your data.
Perform regular backups – Although a pool that is created with ZFS redundancy can help reduce down time due to hardware failures, it is not immune to hardware failures, power failures, or disconnected cables. Make sure you backup your data on a regular basis. If your data is important, it should be backed up. Different ways to provide copies of your data are:
Regular or daily ZFS snapshots
Weekly backups of ZFS pool data. You can use the zpool split command to create an exact duplicate of ZFS mirrored storage pool.
Monthly backups by using an enterprise-level backup product
ZFS redundancy is recommended over hardware RAID for the following reasons:
When ZFS manages the storage redundancy, it not only detects underlying hardware issues but can also repair data inconsistencies.
Using ZFS redundancy has many benefits. For production environments, configure ZFS so that it can repair data inconsistencies. Use ZFS redundancy, such as RAID-Z, RAID-Z-2, RAID-Z-3, or mirroring. With such redundancy, faults in the underlying storage device or its connections to the host can be discovered and repaired by ZFS.
If you must use hardware RAID, present devices in JBOD mode so ZFS can manage the redundancy. In addition, follow these recommendations:
Monitor both the ZFS storage pool by using zpool status and the underlying LUNs by using your hardware RAID monitoring tools.
Promptly replace any failed devices.
Scrub your ZFS storage pools routinely, such as monthly, if you are using datacenter quality services.
Always have good, recent backups of your important data.
Crash dumps consume more disk space, generally in the 1/2-3/4 size of physical memory range.
The following sections provide general and more specific pool practices.
Use whole disks to enable disk write cache and provide easier maintenance. Creating pools on slices adds complexity to disk management and recovery.
Use ZFS redundancy so that ZFS can repair data inconsistencies.
The following message is displayed when a non-redundant pool is created:
$ zpool create system1 c4t1d0 c4t3d0 'system1' successfully created, but with no redundancy; failure of one device will cause loss of the pool
For mirrored pools, use mirrored disk pairs
For RAID-Z pools, group 3-9 disks per VDEV
Do not mix RAID-Z and mirrored components within the same pool. These pools are harder to manage and performance might suffer.
Use hot spares to reduce down time due to hardware failures
Use similar size disks so that I/O is balanced across devices
Smaller LUNs can be expanded to large LUNs
Do not expand LUNs from extremely varied sizes, such as 128 MB to 2 TB, to keep optimal metaslab sizes
Consider creating a small root pool and larger data pools to support faster system recovery
Recommended minimum pool size is 8 GB. Although the minimum pool size is 64 MB, anything less than 8 GB makes allocating and reclaiming free pool space more difficult.
Recommended maximum pool size should comfortably fit your workload or data size. Do not try to store more data than you can routinely back up on a regular basis. Otherwise, your data is at risk due to some unforeseen event.
SPARC (SMI (VTOC)): Create root pools with slices by using the s* identifier. Do not use the p* identifier. In general, a system's ZFS root pool is created when the system is installed. If you are creating a second root pool or re-creating a root pool, use syntax similar to the following on a SPARC system:
$ zpool create rpool c0t1d0s0
Or, create a mirrored root pool. For example:
$ zpool create rpool mirror c0t1d0s0 c0t2d0s0
Solaris 11.1 x86 (EFI (GPT)): Create root pools with whole disks by using the d* identifier. Do not use the p* identifier. In general, a system's ZFS root pool is created when the system is installed. If you are creating a second root pool or re-creating a root pool, use syntax similar to the following:
$ zpool create rpool c0t1d0
Or, create a mirrored root pool. For example:
$ zpool create rpool mirror c0t1d0 c0t2d0
The root pool must be created as a mirrored configuration or as a single-disk configuration. Neither a RAID-Z nor a striped configuration is supported. You cannot add additional disks to create multiple mirrored top-level virtual devices by using the zpool add command, but you can expand a mirrored virtual device by using the zpool attach command.
The root pool cannot have a separate log device.
Pool properties can be set during an AI installation, but the gzip compression algorithm is not supported on root pools.
Do not rename the root pool after it is created by an initial installation. Renaming the root pool might cause an unbootable system.
Do not create a root pool on a USB stick on a production system because root pool disks are critical for continuous operation, particularly in an enterprise environment. Consider using a system's internal disks for the root pool, or at least, use the same quality disks that you would use for your non-root data. In addition, a USB stick might not be large enough to support a dump volume size that is equivalent to at least 1/2 the size of physical memory.
Rather than adding a hot spare to a root pool, consider creating a two- or a three-way mirror root pool. In addition, do not share a hot spare between a root pool and a data pool.
Do not use a VMware thinly-provisioned device for a root pool device.
Create non-root pools with whole disks by using the d* identifier. Do not use the p* identifier.
ZFS works best without any additional volume management software.
For better performance, use individual disks or at least LUNs made up of just a few disks. By providing ZFS with more visibility into the LUNs setup, ZFS is able to make better I/O scheduling decisions.
Create redundant pool configurations across multiple controllers to reduce down time due to a controller failure.
Mirrored storage pools – Consume more disk space but generally perform better with small random reads.
$ zpool create system1 mirror c1d0 c2d0 mirror c3d0 c4d0
RAID-Z storage pools – Can be created with 3 parity strategies, where parity equals 1 raidz), 2 raidz2), or 3 raidz3). A RAID-Z configuration maximizes disk space and generally performs well when data is written and read in large chunks (128K or more).
Consider a single-parity RAID-Z raidz) configuration with 2 VDEVs of 3 disks (2+1) each.
$ zpool create rzpool raidz1 c1t0d0 c2t0d0 c3t0d0 raidz1 c1t1d0 c2t1d0 c3t1d0
A RAIDZ-2 configuration offers better data availability, and performs similarly to RAID-Z. RAIDZ-2 has significantly better mean time to data loss (MTTDL) than either RAID-Z or 2-way mirrors. Create a double-parity RAID-Z raidz2) configuration at 6 disks (4+2).
$ zpool create rzpool raidz2 c0t1d0 c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0 raidz2 c0t2d0 c1t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d
A RAIDZ-3 configuration maximizes disk space and offers excellent availability because it can withstand 3 disk failures. Create a triple-parity RAID-Z (raidz3) configuration at 9 disks (6+3).
$ zpool create rzpool raidz3 c0t0d0 c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0 c7t0d0 c8t0d0
Consider the following storage pool practices when creating an a ZFS storage pool on a storage array that is connected locally or remotely.
If you create an pool on SAN devices and the network connection is slow, the pool's devices might be UNAVAIL for a period of time. You need to assess whether the network connection is appropriate for providing your data in a continuous fashion. Also, consider that if you are using SAN devices for your root pool, they might not be available as soon as the system is booted and the root pool's devices might also be UNAVAIL.
Confirm with your array vendor that the disk array is not flushing its cache after a flush write cache request is issued by ZFS.
Use whole disks, not disk slices, as storage pool devices so that Oracle Solaris ZFS activates the local small disk caches, which get flushed at appropriate times.
For best performance, create one LUN for each physical disk in the array. Using only one large LUN can cause ZFS to queue up too few read I/O operations to actually drive the storage to optimal performance. Conversely, using many small LUNs could have the effect of swamping the storage with a large number of pending read I/O operations.
A storage array that uses dynamic (or thin) provisioning software to implement virtual space allocation is not recommended for Oracle Solaris ZFS. When Oracle Solaris ZFS writes the modified data to free space, it writes to the entire LUN. The Oracle Solaris ZFS write process allocates all the virtual space from the storage array's point of view, which negates the benefit of dynamic provisioning.
Consider that dynamic provisioning software might be unnecessary when using ZFS:
You can expand a LUN in an existing ZFS storage pool and it will use the new space.
Similar behavior works when a smaller LUN is replaced with a larger LUN.
If you assess the storage needs for your pool and create the pool with smaller LUNs that equal the required storage needs, then you can always expand the LUNs to a larger size if you need more space.
Present individual devices in JBOD-mode and configure ZFS storage redundancy (mirror or RAID-Z) on this type of array so that ZFS can report and correct data inconsistencies.
Consider the following storage pool practices when creating an Oracle database.
Use a mirrored pool or hardware RAID for pools
RAID-Z pools are generally not recommended for random read workloads
Create a small separate pool with a separate log device for database redo logs
Create a small separate pool for the archive log
For more information about tuning ZFS for an Oracle database, see Tuning ZFS for Database Products in Oracle Solaris 11.4 Tunable Parameters Reference Manual.
Virtual Box is configured to ignore cache flush commands from the underlying storage by default. This means that in the event of a system crash or a hardware failure, data could be lost.
Enable cache flushing on Virtual Box by issuing the following command:
VBoxManage setextradata vm-name "VBoxInternal/Devices/type/0/LUN#n/Config/IgnoreFlush" 0
vm-name – the name of the virtual machine
type – the controller type, either piix3ide (if you're using the usual IDE virtual controller) or ahci, if you're using a SATA controller
n – the disk number
In general, keep pool capacity below 90% for best performance. The percentage where performance might be impacted depends greatly on workload:
If data is mostly added (write once, remove never), then it's very easy for ZFS to find new blocks. In this case, the percentage can be higher than normal; maybe up to 95%.
If data is made of large files or large blocks (such as 128K files or 1MB blocks) and the data is removed in bulk operations, the percentage can be higher than normal; maybe up to 95%.
If a large percentage (more than 50%) of the pool is made up of 8k chunks (DBfiles, iSCSI Luns, or many small files) and have constant rewrites, then the 90% rule should be followed strictly.
If all of the data is small blocks that have constant rewrites, then you should monitor your pool closely once the capacity gets over 80%. The sign to watch for is increased disk IOPS to achieve the same level of client IOPS.
Mirrored pools are recommended over RAID-Z pools for random read/write workloads
Separate log devices
Recommended to improve synchronous write performance
With a high synchronous write load, prevents fragmentation of writing many log blocks in the main pool
Separate cache devices are recommended to improve read performance
Scrub/resilver - A very large RAID-Z pool with lots of devices will have longer scrub and resilver times
Pool performance is slow – Use the zpool status command to rule out any hardware problems that are causing pool performance problems. If no problems show up in the zpool status command, use the fmdump command to display hardware faults or use the fmdump –eV command to review any hardware errors that have not yet resulted in a reported fault.
Make sure that pool capacity is below 90% for best performance.
Pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. If the primary workload is immutable files, then keep pool in the 95-96% utilization range. Even with mostly static content in the 95-96% range, write, read, and resilvering performance might suffer.
Monitor pool and file system space to make sure that they are not full.
Consider using ZFS quotas and reservations to make sure file system space does not exceed 90% pool capacity.
Monitor pool health
Monitor a redundant pool with zpool status and fmdump at least once per week
Monitor a non-redundant pool with zpool status and fmdump at least twice per week
Run zpool scrub on a regular basis to identify data integrity problems.
If you have consumer-quality drives, consider a weekly scrubbing schedule.
If you have datacenter-quality drives, consider a monthly scrubbing schedule.
You should also run a scrub prior to replacing devices or temporarily reducing a pool's redundancy to ensure that all devices are currently operational.
Monitoring pool or device failures - Use zpool status as described below. Also use fmdump or fmdump -eV to see if any device faults or errors have occurred.
Redundant pools, monitor pool health with zpool status and fmdump on a weekly basis
Non-redundant pools, monitor pool health with zpool status and fmdump on a semiweekly basis
Pool device is UNAVAIL or OFFLINE – If a pool device is not available, then check to see if the device is listed in the format command output. If the device is not listed in the format output, then it will not be visible to ZFS.
If a pool device has UNAVAIL or OFFLINE, then this generally means that the device has failed or cable has disconnected, or some other hardware problem, such as a bad cable or bad controller has caused the device to be inaccessible.
Consider configuring the smtp-notify service to notify you when a hardware component is diagnosed as faulty. For more information, see the Notification Parameters section of smf(7) and smtp-notify(8).
By default, some notifications are set up automatically to be sent to the root user. If you add an alias for your user account as root in the /etc/aliases file, you will receive electronic mail notifications with information similar to the following:
SUNW-MSG-ID: ZFS-8000-8A, TYPE: Fault, VER: 1, SEVERITY: Critical EVENT-TIME: Fri Jun 29 16:58:58 MDT 2012 ... SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 76c2d1d1-4631-4220-dbbc-a3574b1ee807 DESC: A file or directory in pool 'pond' could not be read due to corrupt data. AUTO-RESPONSE: No automated response will occur. IMPACT: The file or directory is unavailable. REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -xv' and examine the list of damaged files to determine what has been affected. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-8A for the latest service procedures and policies regarding this diagnosis.
Monitor your storage pool space – Use the zpool list command and the zfs list command to identify how much disk is consumed by file system data. ZFS snapshots can consume disk space and if they are not listed by the zfs list command, they can silently consume disk space. Use the zfs list –t snapshot command to identify disk space that is consumed by snapshots.