Skip Headers
StorageTek Storage Archive Manager and StorageTek QFS Software Installation and Configuration Guide
Release 5.4
E42062-02
  Go To Documentation Library
Library
Go To Table Of Contents
Contents

Previous
Previous
 
Next
Next
 

12 Tuning I/O Characteristics for Special Needs

The basic file-system configuration steps described in the preceding chapters provide optimal, balanced performance in most situations. So if you are at all uncertain of how your application behaves, you are usually better off leaving the settings in this section at their default values. However, if your application makes unusually consistent or unusually large I/O requests, overall performance may benefit from tuning or changing the way in which the file system handles physical I/O.

Physical I/O is most efficient when all or most reads and writes begin and end exactly on the 512-byte boundary of a disk sector. Disk I/O can only occur in sector-sized chunks. So, when an I/O request straddles a sector boundary, the system must perform additional operations to separate the application data from unrelated data in the same sector while insuring that the latter is not corrupted in the process. In the worst case, when writing across sectors, the file system has to read the sector, modify the sector data in memory to reflect the I/O request, and then write the sector back to disk. The additional mechanical activity alone makes such read-modify-write operations extremely costly in performance terms.

Unfortunately, most applications need to read and write data in varied sizes that are not well aligned on sector boundaries. For this reason, like many file systems, SAM-QFS uses paged I/O by default. The file system handles immediate I/O requests from the application by reading from or writing to a data cache in Solaris paged memory. The file system asynchronously updates the cache with more efficiently sized and better aligned physical reads and writes. Whenever it reads data from disk, it can make the most of the physical I/O by anticipating upcoming reads and loading the corresponding data into cache in the same operation. Most I/O requests are thus met using data cached in virtual memory pages, with no additional physical disk activity. Paged I/O uses memory and imposes some additional load on the system CPU, but, in most cases, these costs are more than offset by greater physical I/O efficiency.

In a few cases, however, the extra overhead associated with paged I/O is not offset by its advantages. Applications that always perform well-aligned I/O and applications that can be tuned to do so gain nothing from page caching. Applications that perform extremely large I/Os may also gain little from page caching, because only the first and last sectors are misaligned, and because the large I/Os may, in any case, be too large to be retained in cache. Finally, applications that stream telemetry data, surveillance video, or other types of real-time information may risk loss of irrecoverable data if writes are not immediately committed to non-volatile storage. In these cases, it may be better to use direct I/O. When direct I/O is specified, the file system transfers data between application memory and the disk device directly, bypassing the page cache.

SAM-QFS gives you considerable latitude when it comes to selecting and tuning I/O caching behavior. Once you understand the I/O characteristics of your application and have carried out the tasks described in "Tune Solaris System and Driver Parameters for Anticipated File System I/O", select your approach as follows:

Optimize Paged I/O for Larger Data Transfers

Paged I/O can be tuned to better match application and hardware characteristics. Reads to cache and writes from cache should be large enough to transfer the average amount of data that the application transfers or the maximum amount of data that the physical storage can transfer, whichever is larger. If we fail to tune page caching behavior for either, the cache will be under utilized, application I/O requests will require more physical I/O, and overall system performance will suffer.

For example, consider the difference between an md data device that is implemented on a single disk volume and an md device implemented on a 3+1 RAID 5 volume group. If we were to handle each write request from the application by writing a single 64 kilobyte disk allocation unit (DAU) from cache to the latter, ignoring the additional bandwidth possible with the multiple-disk device, the RAID device would have to split the I/O into three smaller, still less efficient 21- and 22-kilobyte fragments before writing data out to the three data disks in the RAID group. Fulfilling 64-kilobyte I/O requests from the application would thus require significantly more work using this configuration than it would have required had we used the page cache to assemble the requests into a single, 3-DAU, 192-kilobyte I/O. If the application could—or could be tuned to—make I/O requests in even multiples of the device bandwidth—192-, 384-, or 576-kilobytes—then we could cache even more data and transfer more with each physical I/O, further reducing overhead and boosting performance accordingly.

So, identify the I/O requirements of your application and understand the I/O properties of your hardware. Then proceed as follows.

  1. Log in to the file system host as root.

    root@solaris:~# 
    
  2. Back up the operating system's /etc/vfstab file.

    root@solaris:~# cp /etc/vfstab /etc/vfstab.backup
    
  3. Open the /etc/vfstab file in a text editor, and locate the row for the file system that needs tuning.

    In this example, the file system is named qfsma:

    root@solaris:~# vi /etc/vfstab
    #File
    #Device    Device   Mount     System  fsck  Mount    Mount
    #to Mount  to fsck  Point     Type    Pass  at Boot  Options
    #--------  -------  --------  ------  ----  -------  -------------------------
    /devices   -        /devices  devfs   -     no       -
    ...
    qfsma      -        /qfsma    samfs   -     yes      ...
    
  4. In the Mount Options field for the file system, add the writebehind=n mount option, where n is a multiple of 8 kilobytes. Use a comma (no spaces) to separate mount options. Save the file and close the editor.

    The writebehind option determines how much of a given file can queue up in the page cache before the cache is flushed to disk. Setting the parameter to a higher value improves performance, because a large queue consolidates multiple small application writes into fewer, larger, more efficient physical I/Os. Setting the parameter lower better protects data, because changes are written to non-volatile storage sooner.

    The default value is 512 kilobytes (eight 64-kilobyte DAUs), which generally favors large-block, sequential I/O. But in this example, the family set contains two md disk devices with striped file allocation. The stripe width is one 64-kilobyte DAU, for a write of 128 kilobytes to the two md devices. The md devices are 3+1 RAID 5 groups. So we want to write at least 128 kilobytes to each of the three data spindles, for a total write of at least 768 kilobytes (96 groups of 8 kilobytes each):

    #File
    #Device    Device   Mount     System  fsck  Mount    Mount
    #to Mount  to fsck  Point     Type    Pass  at Boot  Options
    #--------  -------  --------  ------  ----  -------  -------------------------
    /devices   -        /devices  devfs   -     no       -
    ...
    qfsma      -        /qfsma    samfs   -     yes      ...,writebehind=768
    :wq
    root@solaris:~# 
    
  5. Test the I/O performance of the file system and adjust the writebehind setting as needed.

  6. Re-open the /etc/vfstab file in a text editor. In the Mount Options field for the file system, add the readahead=n mount option, where n is a multiple of 8 kilobytes. Use a comma (no spaces) to separate mount options. Save the file and close the editor.

    The readahead option determines the amount of data that is read into cache during a single physical read. When an application appears to be reading sequentially, the file system caches upcoming blocks of file data during each physical read. A series of application read requests can then be handled from cache memory, consolidating several application read requests into a single physical I/O request.

    The default value is 1024 kilobytes (sixteen 64-kilobyte DAUs), which generally favors large-block, sequential I/O. If a database or similar application performs its own readahead, set SAM-QFS readahead to 0 to avoid conflicts. Otherwise, readahead should generally be set to cache the maximum data that a single physical I/O can transfer. If the readahead setting is smaller than the amount of data that applications typically request and that devices can supply, fulfilling an application I/O request requires more physical I/Os than necessary. However, if readahead is set excessively high, it may consume enough memory to degrade overall system performance. In the example, we set readahead to 736 kilobytes (thirty-six 64-kilobyte DAUs).

    #File
    #Device    Device   Mount     System  fsck  Mount    Mount
    #to Mount  to fsck  Point     Type    Pass  at Boot  Options
    #--------  -------  --------  ------  ----  -------  -------------------------
    /devices   -        /devices  devfs   -     no       -
    /proc      -        /proc     proc    -     no       -
    ...
    qfsma      -        /qfsma    samfs   -     yes      ...,readahead=736
    :wq
    root@solaris:~# 
    
  7. Test the I/O performance of the file system and adjust the readahead setting as needed.

    Increasing the size of the readahead parameter increases the performance of large file transfers, but only to a point. So test the performance of the system after resetting the readahead size and then adjust it upwards until you see no more improvement in transfer rates.

Enable Switching Between Paged and Direct I/O

You can configure SAM-QFS file systems to switch between paged and direct I/O when doing so better suits the I/O behavior of your application. You specify the sector-alignment and minimum-size characteristics of reads and writes that might benefit from direct I/O and then set the number of qualifying reads and writes that should trigger the switch. Proceed as follows:

  1. Log in to the file system host as root.

    root@solaris:~# 
    
  2. Back up the operating system's /etc/vfstab file.

    root@solaris:~# cp /etc/vfstab /etc/vfstab.backup
    
  3. Open the /etc/vfstab file in a text editor, and locate the row for the file system that you want to configure.

    In this example, the file system is named qfsma:

    root@solaris:~# vi /etc/vfstab
    #File     Device                      Mount
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   stripe=1
    
  4. To set a threshold size for starting direct I/O for read requests that align well with 512-byte sector boundaries, add the dio_rd_form_min=n mount option to the Mount Options field for the file system, where n is a number of kilobytes. Use a comma (no spaces) to separate mount options.

    By default, dio_rd_form_min=256 kilobytes. In the example, we know that our application does not produce consistently well-aligned reads until it requests a read of at least 512 kilobytes. So we change the threshold size for well-aligned direct reads to 512 kilobytes:

    #File     Device                      Mount
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   stripe=1,dio_rd_form_min=512
    
  5. To set a threshold size for starting direct I/O for write requests that align well with 512-byte sector boundaries, add the dio_wr_form_min=n mount option to the Mount Options field for the file system, where n is a number of kilobytes. Use a comma (no spaces) to separate mount options.

    By default, dio_wr_form_min=256 kilobytes. In the example, we know that our application does not produce consistently well-aligned writes until it requests a write of at least a megabyte. So we change the threshold size for well-aligned direct writes to 1024 kilobytes:

    #File     Device                      Moun
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   ...,dio_wr_form_min=1024
    
  6. To set a threshold size for starting direct I/O for read requests that do not align well with 512-byte sector boundaries, add the dio_rd_ill_min=n mount option to the Mount Options field for the file system, where n is a number of kilobytes. Use a comma (no spaces) to separate mount options.

    By default, dio_rd_ill_min=0 kilobytes, so direct I/O is not used for misaligned reads. In the example, we know that our application generally makes misaligned read requests for small chunks of data. Much of this data is subsequently reread. So page caching is likely to be beneficial for these reads. Switching to direct I/O would cause needless additional physical I/O and reduced performance. So we accept the default and make no changes to the vfstab file:

    #File     Device                      Mount
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   ...,dio_wr_form_min=1024
    
  7. To set a threshold size for starting direct I/O for write requests that do not align well with 512-byte sector boundaries, add the dio_wr_ill_min=n mount option to the Mount Options field for the file system, where n is a number of kilobytes. Use a comma (no spaces) to separate mount options.

    By default, dio_wr_ill_min=0 kilobytes, so direct I/O is not used for misaligned writes. Misaligned writes can be particularly costly in performance terms, because the system has to read, modify, and write sectors. In the example, however, we know that our application occasionally makes large, single write requests that do not fall on sector boundaries. Since read-write-modify operations are limited to the beginning and end of a large block of sequential sectors, the benefits of direct I/O outweigh those of paged I/O. So we set dio_wr_ill_min=2048 kilobytes:

    In this example, we change the default threshold value for using direct I/O during writes with misaligned data to 2048 kilobytes:

    #File     Device                      Mount
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   ...,dio_wr_ill_min=2048
    
  8. To enable direct I/O for reads, add the dio_rd_consec=n mount option to the Mount Options field, where n is the number of consecutive I/O transfers that must meet the size and alignment requirements specified above in order to trigger the switch to direct I/O. Select a value that selects for application operations that benefit from direct I/O. Use a comma (no spaces) to separate mount options.

    By default, dio_rd_consec=0, so I/O switching is disabled.In the example, we know that, once our application requests three, successive, well-aligned reads of at least the minimum size specified by dio_rd_form_min, 512 kilobytes, it will continue to do so for long enough to make direct I/O worthwhile. The minimum size specified by dio_rd_form_min is the default, 0, so enabling direct I/O will not affect misaligned read requests. So we set dio_rd_consec=3:

    #File     Device                      Mount 
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   ...,dio_rd_consec=3
    
  9. To enable direct I/O for writes, add the dio_wr_consec=n mount option to the Mount Options field, where n is the number of consecutive I/O transfers that must meet the size and alignment requirements specified above in order to trigger the switch to direct I/O. Select a value that selects for application operations that benefit from direct I/O. Use a comma (no spaces) to separate mount options.

    By default, dio_wr_consec=0, so I/O switching is disabled. In the example, we know that, once our application requests two, successive, well-aligned writes of at least the minimum size specified by dio_wr_form_min, 1024 kilobytes, it will continue to do so for long enough to make direct I/O worthwhile. We also know that two successive, misaligned writes larger than dio_wr_form_min, 2048 kilobytes, will be large enough that the misalignment will matter relatively little. So we set dio_wr_consec=2:

    #File     Device                      Mount 
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- ------ ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   ...,dio_wr_consec=2
    
  10. Save the vfstab file, and close the editor.

    #File     Device                      Mount 
    #Device   to     Mount    System fsck at    Mount
    #to Mount fsck   Point    Type   Pass Boot  Options
    #-------- ------ -------- -----  ---- ----- ----------------------------------
    /devices  -      /devices devfs  -    no    -
    /proc     -      /proc    proc   -    no    -
    ...
    qfsma     -      /qfsma   samfs  -    yes   ...,dio_wr_consec=2
    :wq
    root@solaris:~# 
    
  11. Mount the modified file system:

    root@solaris:~# mount /qfsms
    

Configure the File System to Use Direct I/O Exclusively

When the I/O characteristics of applications make exclusive use of direct I/O desirable, you can mount entire file systems using the forcedirectio mount option (for information on how to specify direct I/O for individual files or directories, see the SAM-QFS setfa man page).

To mount a file system to use direct I/O exclusively, proceed as follows:

  1. Log in to the file system host as root.

    root@solaris:~# 
    
  2. Back up the operating system's /etc/vfstab file.

    root@solaris:~# cp /etc/vfstab /etc/vfstab.backup
    
  3. Open the /etc/vfstab file in a text editor, and locate the row for the file system where you want to use direct I/O.

    In this example, the file system is named qfsma:

    root@solaris:~# vi /etc/vfstab
    #File
    #Device    Device   Mount     System  fsck  Mount    Mount
    #to Mount  to fsck  Point     Type    Pass  at Boot  Options
    #--------  -------  --------  ------  ----  -------  -------------------------
    /devices   -        /devices  devfs   -     no       -
    /proc      -        /proc     proc    -     no       -
    ...
    qfsma      -        /qfsma    samfs   -     yes      stripe=1
    
  4. In the Mount Options field for the file system, add the forcedirectio mount option. Use a comma (no spaces) to separate mount options. Save the file, and close the editor.

    #File
    #Device    Device   Mount     System  fsck  Mount    Mount
    #to Mount  to fsck  Point     Type    Pass  at Boot  Options
    #--------  -------  --------  ------  ----  -------  -------------------------
    /devices   -        /devices  devfs   -     no       -
    /proc      -        /proc     proc    -     no       -
    ...
    qfsma      -        /qfsma    samfs   -     yes      stripe=1,forcedirectio
    :wq
    root@solaris:~# 
    
  5. Mount the modified file system:

    root@solaris:~# mount /qfsms 
    

Increase the Directory Name Lookup Cache Size

The default size of the Oracle Solaris directory name lookup cache (DNLC) on the metadata server may prove inadequate when the clients of a shared file system open many files at the same time. The metadata server looks up file names on behalf of all clients, so file system performance may suffer under these conditions.

If you anticipate this kind of work load, change the value of directory name lookup cache-size parameter, ncsize, to double or triple the default size. For instructions, see the Oracle Solaris Tunable Parameters Reference Manual, available in the Oracle Solaris Information Library (see the "Available Documentation" section of the Preface.