Message Store Performance Considerations (Sun Java Communications Suite 5 Deployment Planning Guide)

Sun Java Communications Suite 5 Deployment Planning Guide

Message Store Performance Considerations

Message store performance is affected by a variety of factors, including:

Disk I/O
Inbound message rate (also known as message insertion rate)
Message sizes
Use of S/MIME
Login rate (POP/IMAP/HTTP)
Transaction rate for IMAP and HTTP
Concurrent number of connections for the various protocols
Network I/O
Use of SSL

The preceding factors list the approximate order of impact to the Message Store. Most performance issues with the Message Storage arise from insufficient disk I/O capacity. Additionally, the way in which you lay out the store on the physical disks can also have a performance impact. For smaller standalone systems, it is possible to use a simple stripe of disks to provide sufficient I/O. For most larger systems, segregate the file system and provide I/O to the various parts of store.

Messaging Server Directories

Messaging Server uses six directories that receive a significant amount of input and output activity. If you require a deployment that is scalable, responsive, and resilient to variations in load, provide each of those directories with sufficient I/O bandwidth. When you provide separate file systems for these directories, each composed of multiple drives, you can more readily diagnose I/O bottlenecks and problems. Also, you can isolate the effect of storage failures and simplify the resulting recovery operations. In addition, place a seventh directory for DB snapshots on a file system separate from the active DB to preserve it in the event of a storage failure of the active DB file system.

The following table describes these directories.

Table 11–1 High Access Messaging Server Directories


High I/O Directory	Description and Defining Parameter
MTA queue directory	In this directory, many files are created, one for each message that passes through the MTA channels. After the file is sent to the next destination, the file is then deleted. The directory location is controlled by the `IMTA_QUEUE` option in the `imta_tailor` file. Before modifying the MTA queue directory, read about this option in the Sun Java System Messaging Server 6.3 Administration Reference. Default location: `/var/opt/SUNWmsgsr/queue`
Messaging Server log directory	This directory contains log files which are constantly being appended with new logging information. The number of changes will depend on the logging level set. The directory location is controlled by the `configutil` parameter `logfile..logdir`, where can be a log-generating component such as admin, default, http, imap, or pop. The MTA log files can be changed with the IMTA_LOG option in the `imta_tailor` file. Default location: `/var/opt/SUNWmsgsr/log`
Mailbox database files	These files require constant updates as well as cache synchronization. Put this directory on your fastest disk volume. These files are always located in the `/var/opt/SUNWmsgsr/store/mboxlist` directory.
Message store index files	These files contain meta information about mailboxes, messages, and users. By default, these files are stored with the message files. The `configutil` parameter `store.partition..path`, where is the name of the partition, controls the directory location. If you have the resources, put these files on your second fastest disk volume. Default location: `/var/opt/SUNWmsgsr/store/partition/primary`
Message files	These files contain the messages, one file per message. Files are frequently created, never modified, and eventually deleted. By default, they are stored in the same directory as the message store index files. The location can be controlled with the `configutil` parameter `store.partition.partition_name.messagepath`, where `partition_name` is the name of the partition. Some sites might have a single message store partition called `primary` specified by `store.partition.primary.path`. Large sites might have additional partitions that can be specified with `store.partition.partition_name.messagepath`, where `partition_name` is the name of the partition. Default location: `/var/opt/SUNWmsgsr/store/partition/primary`
Mailbox list database temporary directory	The directory used by the Message Store for all temporary files. To maximize performance, this directory should be located under the fastest file system. For Solaris, use the `configutil` command to configure the `store.dbtmpdir` variable to a directory under `tmpfs`, for example, `/tmp/mboxlist`. Default location: `/var/opt/SUNWmsgsr/store/mboxlist`

The following sections provide more detail on Messaging Server high access directories.

MTA Queue Directories

In non-LMTP environments, the MTA queue directories in the Message Store system are also heavily used. LMTP works such that inbound messages are not put in MTA queues but directly inserted into the store. This message insertion lessens the overall I/O requirements of the Message Store machines and greatly reduces use of the MTA queue directory on Message Store machines. If the system is standalone or uses the local MTA for Webmail sends, significant I/O can still result on this directory for outbound mail traffic. In a two-tiered environment using LMTP, this directory will be lightly used, if at all. In prior releases of Messaging Server, on large systems this directory set needs to be on its own stripe or volume.

MTA queue directories should usually be on their own file systems, separate from the message files in the Message Store. The Message Store has a mechanism to stop delivery and appending of messages if the disk space drops below a defined threshold. However, if both the log and queue directories are on the same file system and keep growing, you will run out of disk space and the Message Store will stop working.

Log Files Directory

The log files directory requires varying amounts of I/O depending on the level of logging that is enabled. The I/O on the logging directory, unlike all of the other high I/O requirements of the Message Store, is asynchronous. For typical deployment scenarios, do not dedicate an entire Logical Unit Number (LUN) for logging. For very large store deployments, or environments where significant logging is required, a dedicated LUN is in order.

In almost all environments, you need to protect the Message Store from loss of data. The level of loss and continuous availability that is necessary varies from simple disk protection such as RAID5, to mirroring, to routine backup, to real time replication of data, to a remote data center. Data protection also varies from the need for Automatic System Recovery (ASR) capable machines, to local HA capabilities, to automated remote site failover. These decisions impact the amount of hardware and support staff required to provide service.

mboxlist Directory

The mboxlist directory is highly I/O intensive but not very large. The mboxlist directory contains the databases that are used by the stores and their transaction logs. Because of its high I/O activity, and due to the fact that the multiple files that constitute the database cannot be split between different file systems, you should place the mboxlist directory on its own stripe or volume in large deployments. This is also the most likely cause of a loss of vertical scalability, as many procedures of the Message Store access the databases. For highly active systems, this can be a bottleneck. Bottlenecks in the I/O performance of the mboxlist directory decrease not only the raw performance and response time of the store but also impact the vertical scalability. For systems with a requirement for fast recovery from backup, place this directory on Solid State Disks (SSD) or a high performance caching array to accept the high write rate that an ongoing restore with a live service will place on the file system.

Multiple Store Partitions

The Message Store supports multiple store partitions. Place each partition on its own stripe or volume. The number of partitions that should be put on a store is determined by a number of factors. The obvious factor is the I/O requirements of the peak load on the server. By adding additional file systems as additional store partitions, you increase the available IOPS (total IOs per second) to the server for mail delivery and retrieval. In most environments, you will get more IOPS out of a larger number of smaller stripes or LUNs than a small number of larger stripes or LUNs.

With some disk arrays, it is possible to configure a set of arrays in two different ways. You can configure each array as a LUN and mount it as a file system. Or, you can configure each array as a LUN and stripe them on the server. Both are valid configurations. However, multiple store partitions (one per small array or a number of partitions on a large array striping sets of LUNs into server volumes) are easier to optimize and administer.

Raw performance, however, is usually not the overriding factor in deciding how many store partitions you want or need. In corporate environments, it is likely that you will need more space than IOPS. Again, it is possible to software stripe across LUNs and provide a single large store partition. However, multiple smaller partitions are generally easier to manage. The overriding factor of determining the appropriate number of store partitions is usually recovery time.

Recovery times for store partitions fall into a number of categories:

First of all, the fsck command can operate on multiple file systems in parallel on a crash recovery caused by power, hardware, or operating system failure. If you are using a journaling file system (highly recommended and required for any HA platform), this factor is small.
Secondly, backup and recovery procedures can be run in parallel across multiple store partitions. This parallelization is limited by the vertical scalability of the mboxlist directory as the Message Store uses a single set of databases for all of the store partitions. Store cleanup procedures (expire and purge) run in parallel with one thread of execution per store partition.
Lastly, mirror or RAID re-sync procedures are faster with smaller LUNs. There are no hard and fast rules here, but the general recommendation in most cases is that a store partition should not encompass more than 10 spindles.

The size of drive to use in a storage array is a question of the IOPS requirements versus the space requirements. For most residential ISP POP environments, use “smaller drives.” Corporate deployments with large quotas should use “larger” drives. Again, every deployment is different and needs to examine its own set of requirements.

Message Store Processor Scalability

The Message Store scales well, due to its multiprocess, multithreaded nature. The Message Store actually scales more than linearly from one to four processors. This means that a four processor system will handle more load than a set of four single processor systems. The Message Store also scales fairly linearly from four to 12 processors. From 12 to 16 processors, there is increased capacity but not a linear increase. The vertical scalability of a Message Store is more limited with the use of LMTP although the number of users that can be supported on the same size store system increases dramatically.

Setting the Mailbox Database Cache Size

Messaging Server makes frequent calls to the mailbox database. For this reason, it helps if this data is returned as quickly as possible. A portion of the mailbox database is cached to improve Message Store performance. Setting the optimal cache size can make a big difference in overall Message Store performance. You set the size of the cache with the configutil parameter store.dbcachesize.

You should use the configutil parameter store.dbtmpdir to redefine the location of the mailbox database to /tmp, that is, /tmp/mboxlist.

The mailbox database is stored in data pages. When the various daemons make calls to the database (stored, imapd, popd), the system checks to see if the desired page is stored in the cache. If it is, the data is passed to the daemon. If not, the system must write one page from the cache back to disk, and read the desired page and write it in the cache. Lowering the number of disk read/writes helps performance, so setting the cache to its optimal size is important.

If the cache is too small, the desired data will have to be retrieved from disk more frequently than necessary. If the cache is too large, dynamic memory (RAM) is wasted, and it takes longer to synchronize the disk to the cache. Of these two situations, a cache that is too small will degrade performance more than a cache that is too large.

Cache efficiency is measured by hit rate. Hit rate is the percentage of times that a database call can be handled by cache. An optimally sized cache will have a 99 percent hit rate (that is, 99 percent of the desired database pages will be returned to the daemon without having to grab pages from the disk). The goal is to set the cache so that it holds a number of pages such that the cache will be able to return at least 95 percent of the requested data. If the direct cache return is less than 95 percent, then you need to increase the cache size.

To Adjust the Mailbox Database Cache Size

Become the mailsrv user (or whatever user you set mailsrv to).

Using root or any other user for this task can cause problems with the database.

Set the LD_LIBRARY_PATH to /opt/SUNWmsgsr/lib.

Set the size of the cache with the configutil parameter store.dbcachesize.

To Monitor the Mailbox Database Cache Size

Beginning with the Messaging Server 6.3 release, use the imcheck command to measure the cache hit rate. Prior to the Messaging Server 6.3 release, use the database command db_stat.

Display the cache hit rate.

Messaging Server 7.0: Run the imcheck -s mpool command.

Messaging Server 6.3: Run the imcheck -s command.

Messaging Server 6.2: Run the db_stat command as follows. In this example, the configutil parameter store.dbtmpdir has redefined the location of the mailbox database to /tmp, that is, /tmp/mboxlist.

# /opt/SUNWmsgsr/lib/db_stat -m -h /tmp/mboxlist

2MB 513KB 604B  Total cache size.
1                   Number of caches.
2MB 520KB           Pool individual cache size.
0                   Requested pages mapped into the process’ address space.
55339               Requested pages found in the cache (99%).

Examine the cache hit rate.

In this case, the hit rate is 99 percent. This could be optimal or, more likely, it could be that the cache is too large. To test, lower the cache size until the hit rate moves to below 99 percent. When you hit 98 percent, you have optimized the DB cache size. Conversely, if see a hit rate of less than 95 percent, then you should increase the cache size with the store.dbcachesize parameter. The maximum size is the total of all the *.db files in the store/mboxlist directory. The cache size should not exceed the total size of all of the .db files under the store/mboxlist directory.

As your user base changes, the hit rate can also change. Periodically check and adjust this parameter as necessary.

This parameter has an upper limit of 2 GB imposed by the database.

Setting Disk Stripe Width

When setting disk striping, the stripe width should be about the same size as the average message passing through your system. A stripe width of 128 blocks is usually too large and has a negative performance impact. Instead, use values of 8, 16, or 32 blocks (4, 8, or 16 kilobyte message respectively).