This section provides a list of guidelines for working with concatenations, stripes, mirrors, RAID5 metadevices, state database replicas, and file systems constructed on metadevices.
A concatenated metadevice uses less CPU time than striping.
Concatenation works well for small random I/O.
Avoid using physical disks with different disk geometries.
Disk geometry refers to how sectors and tracks are organized for each cylinder in a disk drive. The UFS organizes itself to use disk geometry efficiently. If slices in a concatenated metadevice have different disk geometries, DiskSuite uses the geometry of the first slice. This fact may decrease the UFS file system efficiency.
Disk geometry differences do not matter with disks that use Zone Bit Recording (ZBR), because the amount of data on any given cylinder varies with the distance from the spindle. Most disks now use ZBR.
When constructing a concatenation, distribute slices across different controllers and busses. Cross-controller and cross-bus slice distribution can help balance the overall I/O load.
Set the stripe's interlace value correctly.
The more physical disks in a striped metadevice, the greater the I/O performance. (The MTBF, however, will be reduced, so consider mirroring striped metadevices.)
Don't mix differently sized slices in the striped metadevice. A striped metadevice's size is limited by its smallest slice.
Avoid using physical disks with different disk geometries.
Distribute the striped metadevice across different controllers and busses.
Striping cannot be used to encapsulate existing file systems.
Striping performs well for large sequential I/O and for random I/O distributions.
Striping uses more CPU cycles than concatenation. However, it is usually worth it.
Striping does not provide any redundancy of data.
Mirroring may improve read performance; write performance is always degraded.
Mirroring improves read performance only in threaded or asynchronous I/O situations; if there is just a single thread reading from the metadevice, performance will not improve.
Mirroring degrades write performance by about 15-50 percent, because two copies of the data must be written to disk to complete a single logical write. If an application is write intensive, mirroring will degrade overall performance. However, the write degradation with mirroring is substantially less than the typical RAID5 write penalty (which can be as much as 70 percent). Refer to Figure 7-1.
Note that the UNIX operating system implements a file system cache. Since read requests frequently can be satisfied from this cache, the read/write ratio for physical I/O through the file system can be significantly biased toward writing.
For example, an application I/O mix might be 80 percent reads and 20 percent writes. But, if many read requests can be satisfied from the file system cache, the physical I/O mix might be quite different--perhaps only 60 percent reads and 40 percent writes. In fact, if there is a large amount of memory to be used as a buffer cache, the physical I/O mix can even go the other direction: 80 percent reads and 20 percent writes might turn out to be 40 percent reads and 60 percent writes.
RAID5 can withstand only a single device failure.
A mirrored metadevice can withstand multiple device failures in some cases (for example, if the multiple failed devices are all on the same submirror). A RAID5 metadevice can only withstand a single device failure. Striped and concatenated metadevices cannot withstand any device failures.
RAID5 provides good read performance if no error conditions, and poor read performance under error conditions.
When a device fails in a RAID5 metadevice, read performance suffers because multiple I/O operations are required to regenerate the data from the data and parity on the existing drives. Mirrored metadevices do not suffer the same degradation in performance when a device fails.
RAID5 can cause poor write performance.
In a RAID5 metadevice, parity must be calculated and both data and parity must be stored for each write operation. Because of the multiple I/O operations required to do this, RAID5 write performance is generally reduced. In mirrored metadevices, the data must be written to multiple mirrors, but mirrored performance in write-intensive applications is still much better than in RAID5 metadevices.
RAID5 involves a lower hardware cost than mirroring.
RAID5 metadevices have a lower hardware cost than mirroring. Mirroring requires twice the disk storage (for a two-way mirror). In a RAID5 metadevice, the amount required to store the parity is: 1/#-disks.
RAID5 can't be used for existing file systems.
You can't encapsulate an existing file system in a RAID5 metadevice (you must backup and restore).
All replicas are written when the configuration changes.
Only two replicas (per mirror) are updated for mirror dirty region bitmaps.
A good average is two replicas per three mirrors.
Use two replicas per one mirror for write intensive applications.
Use two replicas per 10 mirrors for read intensive applications.
The default inode density value (-i option) for the newfs(1M) command is not optimal for large file systems. When creating a new file system with the newfs command, you should set the inode density to 1 inode per 8 Kbyte of file space (-i 8192), rather than the default 1 inode per 2 Kbyte. Typical files today are approaching 64 Kbyte or larger in size, rather than the 1 Kbyte which typified files in 1980.
For large metadevices (greater than 8 Gbyte), it may be necessary to increase the size of a cylinder group to as many as 256 cylinders as in:
# newfs -c 256 /dev/md/rdsk/d114 |
The man page in Solaris 2.3 and 2.4 incorrectly states that the maximum size is 32 cylinders.)
If possible, set your file system cluster size equal to some integral of the stripe width.
For example, try the following parameters for sequential I/O:
maxcontig = 16 (16 * 8 Kbyte blocks = 128 Kbyte clusters)
Using a four-way stripe with a 32 Kbyte interlace value results in a 128 Kbyte stripe width, which is a good performance match:
interlace size = 32 Kbyte (32 Kbyte stripe unit size * 4 disks = 128 Kbyte stripe width)
You can set the maxcontig parameter for a file system to control the file system I/O cluster size. This parameter specifies the maximum number of blocks, belonging to one file, that will be allocated contiguously before inserting a rotational delay.
Performance may be improved if the file system I/O cluster size is some integral of the stripe width. For example, setting the maxcontig parameter to 16 results in 128 Kbyte clusters (16 blocks * 8 Kbyte file system block size).
The options to the mkfs(1M) command can be used to modify the default minfree, inode density, cylinders/cylinder group, and maxcontig settings. You can also use the tunefs(1M) command to modify the maxcontig and minfree settings.
See the man pages for mkfs(1M), tunefs(1M), and newfs(1M) for more information.
Assign data to physical drives to evenly balance the I/O load among the available disk drives.
Identify the most frequently accessed data, and increase access bandwidth to that data with mirroring or striping.
Both striped metadevices and RAID5 metadevices distribute data across multiple disk drives and help balance the I/O load. In addition, mirroring can also be used to help balance the I/O load.
Use DiskSuite Tool performance monitoring capabilities, and generic OS tools such as iostat(1M), to identify the most frequently accessed data. Once identified, the "access bandwidth" to this data can be increased using mirroring, striping, or RAID5.