7.4. Storage Performance and Tuning

7.4.1. Sizing Guidelines for Sun ZFS Storage Servers
7.4.2. About ZFS Storage Caches
7.4.3. Managing the ZIL on Oracle Solaris Platforms
7.4.4. Oracle VDI Global Settings for Storage
7.4.5. About Block Alignment

7.4.1. Sizing Guidelines for Sun ZFS Storage Servers

The recommended disk layout is RAID 10, mirrored sets in a striped set, with ZFS striping the data automatically between multiple sets. This layout is called "mirrored" by the 7000 series. While this disk layout uses 50% of the available disk capacity for redundancy, it is faster than RAID 5 for intense small random read/writes, which is the typical access characteristic for iSCSI.

The storage servers provide the virtual disks that are accessed by Oracle VM VirtualBox through iSCSI. Because iSCSI is a CPU-intensive protocol the number of cores of the storage server are a decisive factor for its performance. Other important factors are the memory size (cache), the number of disks, and the available network bandwidth.

The network bandwidth is very volatile and determined by the relation of desktops starting up (peak network bandwidth) and desktops that have cached the applications in use (average network bandwidth). Starting a virtual machine (XP desktop) creates a network load of 150 MB which needs to be satisfied in around 30 seconds. If many desktops are started at the same time, the requested network bandwidth may exceed 1 Gb/s if the CPUs of the storage can handle the load created by the iSCSI traffic. This scenario is typical for shift-work companies. In such a case, set the Pool, Cloning, or Machine State option to Running, which always keeps the desktops running and therefore decouples the OS boot from the login of a user. Another option is to trunk several interfaces to provide more than 1 Gb/s bandwidth through one IP. You can also use Jumbo Frames to speed up iSCSI connections. Jumbo Frames need to be configured for all participants of the network: storage servers, VirtualBox servers, and switches. Note that Jumbo Frames are not standardized so there is a risk of incompatibilities.

Oracle VDI, in combination with VirtualBox, uses the Sparse Volume feature of ZFS, which enables it to allocate more disk space for volumes than is physically available as long as the actual data written does not exceed the capacity of the storage. This feature, in combination with the fact that cloned desktops reuse unchanged data of their templates, results in a very effective usage of the available disk space. Therefore, the calculation for disk space below is a worst-case scenario assuming that all volumes are completely used by data which differs from the template.

  • Number of disks = number of users * user IOPS * (read ratio * read penalty + write ratio * write penalty) / disk IOPS

    See Calculating the Number of Disks for an explanation of this formula and an example.

  • Number of cores = number of virtual disks in use / 200

    Example: A x7210 storage with 2 CPUs and 4 cores per CPU can serve up to 2 * 4 * 200 = 1600 virtual disks

  • Memory size - The more the better. The free memory can be used as a disk cache, which reduces the access time.

  • Average Network bandwidth [Mb/s] = number of virtual disks in use * 0.032 Mb/s

    Example: An x7210 storage with one Gigabit Ethernet interface can serve up to 1000 / 0.032 = 31250 virtual disks

  • Peak Network bandwidth [Mb/s] = number of virtual disks in use * 40 Mb/s

    Example: An x7210 storage with one Gigabit Ethernet interface can serve up to 1000 / 40 = 25 virtual disks

  • Disk space [GB] = number of desktops * size of the virtual disk [GB]

    Example: An x7210 storage with a capacity of 46 TB can support 46 * 1024 GB / 2 / 8 GB = 2944 8 GB disks in a RAID 10 configuration

Note

For details about how to improve desktop performance, see the sections on optimizing desktop images Section 5.3, “Creating Desktop Images”.

Calculating the Number of Disks

The calculation for the number of disks depends on several factors, as follows:

  • Disk IOPS: The capability of the disks in terms of physical input/output operations per second (IOPS).

    The following table shows the typical disk IOPS for various disk speeds (in revolutions per minute or RPM) or disk types.

    Disk RPM or Type

    Disk IOPS

    SSD

    10,000

    15,000

    175

    10,000

    125

    7,200

    75

    5,400

    50

  • User IOPS: The input/output operations per second generated by users when they use applications in their desktops.

    The user IOPS value depends largely on the applications used and how they are used. The following table shows some sample IOPS based on long-time averages for different user types and Windows platforms.

    Windows User Type

    User IOPS

    Windows 7 task worker

    7

    Windows 7 knowledge user

    15

    Windows 7 power user

    25

    Windows XP task worker

    5

    Windows XP knowledge user

    10

    Windows XP power user

    20

  • Disk read:write IOPS ratio: This depends on the cache available to both the operating system of the desktop and, most importantly, the storage.

    For Sun ZFS storage, the more Adaptive Replacement Cache (ARC) and Second Level Adaptive Replacement Cache (L2ARC) that is available, the fewer read IOPS are performed, and this enables more write IOPS. You can optimize the storage head movements if you decrease the read IOPS. Write IOPS are cached and written in bursts to optimize the head movements, but read requests can disrupt this optimization. Typical read:write ratios range from 40:60 to 20:80, or even less.

  • Read and Write Penalty: The selected RAID configuration of the storage has a read and write penalty, as shown in the following table.

    RAID Type

    Read Penalty

    Write Penalty

    RAID0

    1

    1

    RAID1

    1

    2

    RAID10

    1

    2

    RAID5

    1

    4

    RAID6

    1

    6

Example 7.1. Example Calculation

The number of disks are needed for 1000 Windows XP task workers using 10,000 RPM disks with a 20:80 read:write ratio in a RAID10 array is:

1000 * 5 * (0.2 * 1 + 0.8 * 2) / 125 = 72


7.4.2. About ZFS Storage Caches

This section provides a brief overview of the cache structure and performance of ZFS, and how it maps to the hardware of a Sun ZFS Storage Appliance.

Background

The Zettabyte File System (ZFS) is the underlying file system on the supported Sun ZFS storage platforms.

The Adaptive Replacement Cache (ARC) is the ZFS read cache in the main memory (DRAM).

The Second Level Adaptive Replacement Cache (L2ARC) is used to store read cache data outside of the main memory. Sun ZFS Storage Appliances use read-optimized SSDs (known as Readzillas) for the L2ARC. SSDs are slower than DRAM but still much faster then hard disks. The L2ARC allows for a very large cache which improves the read performance.

The ZFS Intent Log (ZIL) satisfies the POSIX requirements for synchronous writes and crash recovery. It is not used for asynchronous writes. The ZFS system calls are logged by the ZIL and contain sufficient information to play them back in the event of a system crash. Sun ZFS Storage Appliances use write-optimized SSDs (known as Writezillas or Logzillas) for the ZIL. If Logzillas are not available the hard disks are used.

The write cache is used to store data in volatile (not battery-backed) DRAM for faster writes. There are no system calls logged in the ZIL if the write cache is enabled on a Sun ZFS Storage Appliance.

Performance Considerations

Size the read cache to store as much data in it to improve performance. Maximize the ARC first (DRAM), then add L2ARC (Readzillas).

Oracle VDI enables the write cache by default for every iSCSI volume used by Oracle VDI. This configuration is very fast and does not make use of Logzillas, as the ZIL is not used. Without ZIL, data might be at risk if a Sun ZFS Storage Appliance reboots or experiences a power loss while desktops are active. However, it does not cause corruption in ZFS itself.

Disable the write cache in Oracle VDI to minimize the risk of data loss. Without Logzillas the ZIL is backed by the available hard disks and performance suffers noticeably. Use Logzillas to speed up the ZIL. In case you have two or four Logzillas use the 'striped' profile to further improve performance.

To switch off the in-memory write cache, select a storage in Oracle VDI Manager, click Edit to open the Edit Storage wizard and unselect the Cache check box. The change will be applied to newly created desktops for Oracle VM VirtualBoxs and to newly started desktops for Microsoft Hyper-V virtualization platforms.

7.4.3. Managing the ZIL on Oracle Solaris Platforms

On Oracle Solaris 10 10/09 (and later) storage platforms, you can increase performance on the storage by disabling the ZFS Intent Log (ZIL). However, it is important to understand that the performance gains are at the expense of synchronous disk I/O and data integrity in the event of a storage failure.

Managing the ZIL - Oracle Solaris 10 9/10 (Update 9) and Earlier

On Oracle Solaris 10 9/10 (Update 9) platforms and earlier, you can disable the ZIL temporarily or permanently.

If you disable the ZIL temporarily, the ZIL is re-enabled when the system is rebooted. If you disable the ZIL permanently, it remains disabled after the system is rebooted. When you change the ZIL setting, the setting is only applied to a ZFS pool when it is mounted. If the ZIL is disabled, the ZFS pool must be created, or mounted, or imported, after the setting is changed.

If you disable the ZIL setting permanently, the ZIL is disabled for all ZFS pools following a reboot. This can cause undesirable behavior if the system's root volume is a ZFS volume because there is no synchronous disk I/O. In this situation, it is best practice to use a storage host with at least two disks. Format the first disk using the UFS file system, and use that disk for the operating system. Format the other disks using ZFS, and use those disks as the ZFS storage. In this way, ZIL can be disabled without affecting the performance of the operating system.

To disable the ZIL temporarily, run the following command as superuser (root):

# echo zil_disable/W0t1 | mdb -kw

To re-enable a temporarily-disabled ZIL, run the following command as superuser (root):

# echo zil_disable/W0t0 | mdb -kw

To disable the ZIL permanently, edit the /etc/system as superuser (root) and add the following line.

set zfs:zil_disable=1

Managing the ZIL - Oracle Solaris 10 8/11 (Update 10) and Later

Starting with Oracle Solaris 10 8/11 (Update 10), the steps for disabling the ZIL changed. The ZIL is configured as a ZFS property on a dataset. This means different ZFS datasets can have different ZIL settings and so you can disable the ZIL for a storage pool without affecting the ZFS volume of the operating system.

To disable the ZIL, run the following command as superuser (root):

# zfs set sync=disabled dataset

The change takes effect immediately and the ZIL remains disabled on reboot.

To re-enable the ZIL, run the following command as superuser (root):

# zfs set sync=standard dataset

7.4.4. Oracle VDI Global Settings for Storage

This section provides information about the Oracle VDI global settings that apply to storage. Use the vda settings-getprops and vda settings-setprops commands to list and edit these settings.

Table 7.1. Storage Tuning Properties

Global Setting

Description

storage.max.commands

The number of commands executed on a storage in parallel.

The default is 10.

Changing this setting requires a restart of the Oracle VDI service.

The setting is global for an Oracle VDI installation and applies to a physical storage determined by its IP or DNS name.

The number of Oracle VDI hosts does not influence the maximum number of parallel storage actions executed by Oracle VDI on a physical storage. Reduce the number in case of intermittent "unresponsive storage" messages to reduce the storage load. Doing so impacts cloning and recycling performance.

This setting works even if the Oracle VDI Center Agent is no longer running on the host.

This setting applies only to Sun ZFS storage used with Oracle VM VirtualBox (on Oracle Solaris) and Microsoft Hyper-V desktop providers.

storage.query.size.interval

The time in seconds the Oracle VDI service queries the storage for its total and available disk space.

The default is 180 seconds.

As there is only one Oracle VDI host which does this, there is typically no need to change this setting.

This setting applies to all storage types.

storage.watchdog.interval

The time in seconds the Oracle VDI service queries the storage for its availability.

The default is 30 seconds.

As there is only one Oracle VDI host which does this, there is typically no need to change this setting.

This setting applies to all storage types.

storage.fast.command.duration

The time in seconds after which the Oracle VDI service considers a fast storage command to have failed.

The default is 75 seconds.

Changing this setting requires a restart of the Oracle VDI service.

The only Oracle VDI functionality which uses this command duration is the storage watchdog which periodically pings the storage for its availability.

This setting applies only to Sun ZFS storage used with Oracle VM VirtualBox (on Oracle Solaris) and Microsoft Hyper-V desktop providers.

storage.medium.command.duration

The time in seconds after which the Oracle VDI service considers a medium storage command to have failed.

The default is 1800 seconds (30 minutes).

Changing this setting requires a restart of the Oracle VDI service.

The majority of the storage commands used by Oracle VDI use this command duration.

This setting applies only to Sun ZFS storage used with Oracle VM VirtualBox (on Oracle Solaris) and Microsoft Hyper-V desktop providers.

storage.slow.command.duration

The time in seconds after which the Oracle VDI service considers a slow storage command to have failed.

The default is 10800 seconds (3 hours).

Changing this setting requires a restart of the Oracle VDI service.

Only a few complex storage scripts used by Oracle VDI use this command duration. Such scripts are not run very often, typically once per day.

This setting applies only to Sun ZFS storage used with Oracle VM VirtualBox (on Oracle Solaris) and Microsoft Hyper-V desktop providers.


The storage.max.commands setting is the setting that is most often changed. By default, Sun ZFS Storage Appliances can only execute four commands in parallel, and the remaining commands are queued. To achieve better performance, Oracle VDI VDI intentionally overcommits the storage queue. If your storage becomes slow, for example because of a heavy load, it can take too long for queued commands to be executed, and if the commands take longer than the duration specified in the duration settings, the storage might be marked incorrectly as unresponsive. If this happens regularly, you can decrease the value of the storage.max.commands setting, but this might result in a decrease in performance when the storage is not so busy.

The interval settings rarely need to be changed because the commands are performed only by the primary host in an Oracle VDI Center. Decreasing the value of these settings results in more up-to-date information about the storage disk space and a quicker detection of unresponsive storage hosts, but also increases the load on the storage hosts. It is best to keep these settings at their defaults.

The duration settings include a good safety margin. Only change the duration settings if the storage is not able to execute the commands in the allotted time.

7.4.5. About Block Alignment

Classic hard disks have a block size of 512 bytes. Depending on the guest operating system of the virtual machine, one logical block of the guest file system can use two blocks on the storage. This is known as block misalignment. Figure 7.2 shows an example. It is best to avoid block misalignment because it doubles the IO on the storage to access a block of the guest OS file system (assuming a complete random access pattern and no caching).

Figure 7.2. Examples of Misaligned and Aligned Blocks with Sun ZFS Storage

The diagram shows two examples of a virtual disk, one is aligned correctly and one is misaligned.

By default, Windows XP does not correctly align partitions and the blocks are misaligned. Usually Windows 7 and later versions do align partitions correctly and the blocks are aligned.

Checking the Block Alignment

Typically a single partition on a disk starts at disk sector 63. To check the alignment of a Windows partition, use the following command:

wmic partition get StartingOffset, Name, Index

The following is an example of the output from this command:

Index Name                   StartingOffset 
0     Disk #0, Partition #0  32256

To find the starting sector, divide the StartingOffset value by 512 (the block size of the hard disk):

32256 ÷ 512 = 63

An NTFS cluster is typically 4 kilobytes in size. So the first NTFS cluster starts at disk sector 63 and ends at disk sector 70.

Storage types that use the Zettabyte File System (ZFS) file system have a default block size of 8 kilobytes. So on the storage, the fourth ZFS block maps to disk sectors 48 to 63, and the fifth ZFS block sector maps to disk sectors 64 to 79.

Storage types that use that use the Oracle Cluster File System version 2 (OCFS2) have a default block size of 4 kilobytes. So on the storage, the eighth OCFS2 block maps to disk sectors 56 to 63, and the ninth OCFS2 block sector maps to disk sectors 64 to 73.

A misalignment occurs on both storage types because more than one block on the storage must be accessed to access the first NTFS cluster, as shown in Figure 7.2.

For a correct block alignment, the StartingOffset value must be exactly divisible by either 8192 or 4096, depending on the block size used by the file system on the storage.

In the following example, the blocks are misaligned:

wmic partition get StartingOffset, Name, Index
Index Name                   StartingOffset 
0     Disk #0, Partition #0  32256

32556 ÷ 8192 = 3.97412109

32556 ÷ 4096 = 7.94824219

In the following example, the blocks are aligned:

wmic partition get StartingOffset, Name, Index
Index Name                   StartingOffset 
0     Disk #0, Partition #0  32768

32768 ÷ 8192 = 4

32768 ÷ 4096 = 8

Correcting Block Alignment

On Windows 2003 SP1 and later, the diskpart.exe utility has an Align option to specify the block alignment of partitions. For Windows XP, use a third-party disk partitioning tool such as parted to create partitions with a defined start sector, see the example that follows. For other operating systems, refer to your system documentation for details of how to align partitions.

Example of How to Prepare a Disk for Windows XP with Correct Block Alignment

In this example, the disk utilities on a bootable live Linux system, such as Knoppix, are used to create a disk partition with the blocks aligned correctly.

  1. Create a new virtual machine.

  2. Assign the ISO image of the live Linux system to the CD/DVD-ROM drive of the virtual machine.

  3. Boot the virtual machine.

  4. Open a command shell and become root.

  5. Obtain the total number of sectors of the disk.

    Use the fdisk -ul command to obtain information about the disk.

    In the following example, the disk has 20971520 sectors:

    # fdisk -ul
    Disk /dev/sda doesn't contain a valid partition table
    
    Disk /dev/sda: 10.7 GB, 10737418240 bytes
    255 heads, 63 sectors/track, 1305 cylinders, total 20971520 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x00000000
  6. Create an MS-DOS partition table on the disk.

    Use the parted disk mklabel msdos command to create the partition table.

    In the following example, a partition table is created on the /dev/sda disk:

    # parted /dev/sda mklabel msdos
  7. Create a new partition, specifying the start and end sectors of the partition.

    Use the parted disk mkpartfs primary fat32 64s end-sectors command to create the partition. The end-sector is the total number of sectors of the disk minus one. For example, if the disk has 20971520 sectors, the end-sector is 20971519.

    Depending on the version of parted used, you might see a warning that the partition is not properly aligned for best performance. You can safely ignore this warning.

    In the following example, a partition is created on the /dev/sda disk:

    # parted /dev/sda mkpartfs primary fat32 64s 20971519s                          
  8. Check that the partition is created.

    Use the parted disk print command to check the partition.

    In the following example, the /dev/sda disk is checked for partitions:

    # parted /dev/sda print
    Model: ATA VBOX HARDDISK (scsi)
    Disk /dev/sda: 10.7GB
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos
    
    Number  Start   End     Size    Type     File system  Flags
     1      32.8kB  10.7GB  10.7GB  primary  fat32        lba
  9. Shut down the virtual machine and unassign the ISO image.

  10. Assign the Windows XP installation ISO image to the CD/DVD-ROM drive of the virtual machine.

  11. Boot the virtual machine and install Windows XP.

  12. When prompted, select the newly created partition.

  13. (Optional) When prompted, change the file system from FAT32 to NTFS.

  14. Complete the installation.

  15. Log in to the Windows XP guest as an administrator.

  16. Check that the StartingOffset is 32768.

    wmic partition get StartingOffset, Name, Index
    Index Name                   StartingOffset 
    0     Disk #0, Partition #0  32768