Solstice DiskSuite 4.2.1 Reference Guide

Chapter 7 Configuration Guidelines


This appendix describes some ways to set up your configuration. Use the following table to locate specific information in this chapter.

Configuration Planning Overview

When planning a configuration, the main point to keep in mind is that for any given application there are trade-offs in performance, availability, and hardware costs. Experimenting with the different variables is necessary to figure out what works best for your configuration.

Configuration Planning Guidelines

This section provides a list of guidelines for working with concatenations, stripes, mirrors, RAID5 metadevices, state database replicas, and file systems constructed on metadevices.

Concatenation Guidelines

Note -

Disk geometry differences do not matter with disks that use Zone Bit Recording (ZBR), because the amount of data on any given cylinder varies with the distance from the spindle. Most disks now use ZBR.

Striping Guidelines

Mirroring Guidelines

Figure 7-1 Mirror Performance Matrix


RAID5 Guidelines

State Database Replica Guidelines for Performance

File System Guidelines

General Performance Guidelines

RAID5 Metadevices and Striped Metadevices

This section compares performance issues for RAID5 metadevices and striped metadevices.

Optimizing for Random I/O and Sequential I/O

This section explains the differences between random I/O and sequential I/O, and DiskSuite strategies for optimizing your particular configuration.

Random I/O

Sequential Access I/O

Note -

Seek and rotation time are practically non-existent in the sequential case. When optimizing sequential I/O, the internal transfer rate of a disk is most important.

The most useful recommendation is: max-io-size / #-disks. Note that for UFS file systems, the maxcontig parameter controls the file system cluster size, which defaults to 56 Kbyte. It may be useful to configure this to larger sizes for some sequential applications. For example, using a maxcontig value of 12 results in 96 Kbyte file system clusters (12 * 8 Kbyte blocks = 96 Kbyte clusters). Using a 4-wide stripe with a 24 Kbyte interlace size results in a 96 Kbyte stripe width (4 * 24 Kbyte = 96 Kbyte) which is a good performance match.

Example: In sequential applications, typical I/O size is usually large (greater than 128 Kbyte, often greater than 1 Mbyte). Assume an application with a typical I/O request size of 256 Kbyte and assume striping across 4 disk spindles. Do the arithmetic: 256 Kbyte / 4 = 64 Kbyte. So, a good choice for the interlace size would be 32 to 64 Kbyte.

Number of stripes: Another way of looking at striping is to first determine the performance requirements. For example, you may need 10.4 Mbyte/sec performance for a selected application, and each disk may deliver approximately 4 Mbyte/sec. Based on this, then determine how many disk spindles you need to stripe across:

10.4 Mbyte/sec / 4 Mbyte/sec = 2.6

Therefore, 3 disks would be needed.

Striping Trade-offs

To summarize the trade-offs: Striping delivers good performance, particularly for large sequential I/O and for uneven I/O distributions, but it does not provide any redundancy of data.

Write intensive applications: Because of the read-modify-write nature of RAID5, metadevices with greater than about 20 percent writes should probably not be RAID5. If data protection is required, consider mirroring.

RAID5 writes will never be as fast as mirrored writes, which in turn will never be as fast as unprotected writes. The NVRAM cache on the SPARCstorage Array closes the gap between RAID5 and mirrored configurations.

Full Stripe Writes: RAID5 read performance is always good (unless the metadevice has suffered a disk failure and is operating in degraded mode), but write performance suffers because of the read-modify-write nature of RAID5.

In particular, when writes are less than a full stripe width or don't align with a stripe, multiple I/Os (a read-modify-write sequence) are required. First, the old data and parity are read into buffers. Next, the parity is modified (XOR's are performed between data and parity to calculate the new parity--first the old data is logically subtracted from the parity and then the new data is logically added to the parity), and the new parity and data are stored to a log. Finally, the new parity and new data are written to the data stripe units.

Full stripe width writes have the advantage of not requiring the read-modify-write sequence, and thus performance is not degraded as much. With full stripe writes, all new data stripes are XORed together to generate parity, and the new data and parity are stored to a log. Then, the new parity and new data are written to the data stripe units in a single write.

Full stripe writes are used when the I/O request aligns with the stripe and the I/O size exactly matches:

interlace_size * (num_of_columns - 1)

For example, if a RAID5 configuration is striped over 4 columns, in any one stripe, 3 chunks are used to store data, and 1 chunk is used to store the corresponding parity. In this example, full stripe writes are used when the I/O request starts at the beginning of the stripe and the I/O size is equal to: stripe_unit_size * 3. For example, if the stripe unit size is 16 Kbyte, full stripe writes would be used for aligned I/O requests of size 48 Kbyte.

Performance in degraded mode: When a slice of a RAID5 metadevice fails, the parity is used to reconstruct the data; this requires reading from every column of the RAID5 metadevice. The more slices assigned to the RAID5 metadevice, the longer read and write operations (including resyncing the RAID5 metadevice) will take when I/O maps to the failed device.

Logging Device Trade-offs

State Database Replicas

Note -

Replicas cannot be stored on the root (/), swap, or /usr slices, or on slices containing existing file systems or data.

Summary of State Database Replicas

Note -

If you created two replicas on each disk in a two-disk configuration, DiskSuite will still function if one disk fails. But because you must have one more than half of the total replicas available in order for the system to reboot, you will be unable to reboot in this state.