When you design your deployment, you must decide how to configure your Messaging Server to provide optimum performance, scalability, and reliability.
Sizing is an important part of this effort. The sizing process enables you to identify what hardware and software resources are needed so that you can deliver your desired level of service or response time according to the estimated workload that your Messaging Server users generate. Sizing is an iterative effort.
This chapter introduces the basics of sizing your Messaging Server deployment to enable you to obtain the right sizing data by which you can make deployment decisions. It also provides the context and rationale for the Messaging Server sizing process.
The chapter contains the following sections:
Because each deployment has its own set of unique features, this chapter does not provide detailed sizing information for your specific site. Rather, this chapter explains what you need to consider when you architect your sizing plan. Work with your Sun technical representative for your deployment hardware and software needs.
Your peak volume is the largest concentrated numbers of transactions to your messaging system within a given period in a day. The volume can vary from site to site as well as across different classes of users. For example, peak volume among a certain class of managers in a medium-sized enterprise might occur from 9 to 10 in the morning, 12 to 1 in the afternoon, and 5 to 6 in the evening.
Analyzing peak volume involves three basic operations:
Determining when and for how long the peaks occur
Sizing your deployment against peak volume load assumptions
Once patterns are analyzed, choices can be made to help the system handle the load and provide the services that users demand.
Making sure that your Messaging Server deployment can support the peak volume that you have determined.
This section helps you create your usage profile to measure the amount of load that is placed on your deployment.
To create a usage profile, answer the following questions:
What is the number of users on your system?
When counting the number of users on your system, account for not only the users who have mail accounts and can log in to the mail system, but also the users with mail accounts who are currently not logged onto the system. In particular, note the difference between active and inactive users:
A user who is logged into mail systems through mail access protocols like POP, IMAP, or HTTP. Depending on the type of access protocol, active users might or might not have connections to the mail server at any given time.
For example, POP users can have a mail client open, but the POP connection established by the mail client to the server is short in duration and periodic.
Active users in this discussion are not the same as mail attributes with active status, such as mailuserstatus or inetuserstatus. For more information on mail attributes, see Sun Java Communications Suite 5 Schema Reference.
A user with a mail account who currently is not using the mail system.
If you have a very small deployment (for example, under 300 users), you might not need to go through this process of planning a sizing strategy. Work with your Sun Client Services representative to determine your individual needs.
Specifically, note the number of concurrent, idle, and busy connections for each client access service that you support:
Number of unique TCP connections or sessions (HTTP, POP, or IMAP) that are established on your mail system at any given time.
An active user can have multiple concurrent IMAP sessions, whereas a user with a POP or Messenger Express client can only have one connection per client. Furthermore, because POP and Messenger Express connections connect to the server, retrieve data, disconnect from the server, display data, get user input, and reconnect to the mail server, it is possible for active users on POP and Messenger Express client access services not to have active connections at a given moment in time.
An established IMAP connection where no information is being sent between the mail client and Messaging Server, except the occasional check or noop command.
A connection that is in progress. An example of a busy connection is a mail server that is processing the command a mail client has just sent; the mail server is sending back a response to the mail client.
Count the number of established TCP connections by using the netstat command on UNIX platforms.
Obtain the last login and logout times for Messenger Express or for IMAP users. See the Sun Java System Messaging Server 6.3 Administration Guide for more information.
If you have a large deployment, how will you organize your users?
Some options include but are not limited to:
Placing active users and inactive users together on separate machines from one another
If an inactive user becomes an active user, that user can be moved to the active user machines. This approach could decrease the amount of needed hardware, rather than placing inactive and active users together on a machine.
Separating users by Class of Service
You might separate individual contributors, managers, and executives on machines that offer different mail storage space allocation for each class of service, different privileges, and specialized services.
What is the amount of storage used on each mailbox?
When you measure the amount of storage per mailbox, you should estimate real usage per mailbox, not the specified quota. Messages in trash or wastebasket folders still take up disk space and quota.
How many messages enter your messaging system from the Internet?
The number of messages should be measured in messages per second during your peak volume.
How many messages are sent by your users to:
End users on your mail system?
This number of messages is also measured in messages per second during the peak volume.
What is the distribution of messages in different size ranges?
Less than 5 Kbytes?
Between 5 Kbytes - 10 Kbytes?
Between 10 Kbytes -100 Kbytes?
Between 100 Kbytes - 500 Kbytes?
Between 500 Kbytes -10 MB?
Greater than 10 MB?
If the distribution of message sizes is not available, use the average message size on your mail system, however it is not as effective as size ranges.
The size of messages is particularly important, because it affects the rate of delivery of the MTA, the rate of delivery into the Message Store, the rate of message retrieval, and processing by anti-virus or anti-spam filters.
Will you be using SSL/TLS? If yes, what percentage of users and what type of users?
For example, in a particular organization, 20 percent of IMAP connections during peak hours will enable SSL.
Do you plan on using any SSL crypto accelerator hardware?
Will you be using virus scanning or other specialized message processing and will this processing be enabled for all users?
Depending on your Messaging Server configuration, the MTA will need to scan all messages to match criteria specified in specialized processing, thus increasing load on the system.
For IMAP users, will you enforce a standard client or allow users to choose their own?
Different IMAP clients make different numbers of concurrent connections to the server. Thus, a power user with many open folders might have many concurrent connections.
Will you allow users to share folders? If so, will you allow all users or only some?
Answering these questions provides a preliminary usage profile for your deployment. You can refine your usage profile as your Messaging Server needs change.
While the following questions are not applicable to creating your usage profile, they are important to developing your sizing strategy. How you answer these questions might require you to consider additional hardware.
How much redundancy do you want in your deployment?
For example, you might consider high availability. Consider how much down time is allowed, and if you need clustering technology.
Do you need a DMZ to separate your internal and external networks? Are all users using the internal network? Or do some of them connect by using the Internet?
You might need MMP proxy servers and separate MTA layers.
What are your response time requirements? What are your throughput requirements?
What is your specific criteria for resource utilization? Can your CPUs be 80 percent busy on average? Or only at peak?
Will you have messaging servers at different geographic locations? Do you expect users’ mail to be located geographically?
Do you have archiving requirements for keeping mail messages for a certain length of time?
Do you have legal requirements to log all messages? Do you need to keep a copy of every message sent and received?
Once you establish a usage profile, compare it to sample pre-defined user bases that are described in this section. A user base is made up of the types of messaging operations that your users will perform along with a range of message sizes that your users will send and receive. Messaging users fall into one of five user bases:
The sample user bases described in this section broadly generalize user behavior. Your particular usage profile might not exactly match the user bases. You will be able to adjust these differences when you run your load simulator (as described in Using a Messaging Server Load Simulator).
A lightweight POP user base typically consists of residential dial-up users with simple messaging requirements. Each concurrent client connection sends approximately four messages per hour. These users read and delete all of their messages within a single login session. In addition, these users compose and send few messages of their own with just single recipients. Approximately 80 percent of messages are 5 Kbytes or smaller in size, and about 20 percent of messages are 10 Kbytes or larger.
A heavyweight POP user base typically consists of premium broadband users or small business accounts with more sophisticated messaging requirements than the lightweight POP user base. This group uses cable modem or DSL to access its service provider. Each concurrent client connection sends approximately six messages per hour. Messages average about two recipients per message. Sixty-five percent of messages are 5 Kbytes or smaller in size. Thirty percent of messages in this user base are between 5-10 Kbytes. Five percent of messages are larger than 1 Mbyte. Of these users, 85 percent delete all of their messages after reading them. However, 15 percent of users leave messages on the server through several logins before they delete them. Mail builds up in a small portion of those mailboxes. In some cases, the same message can be fetched several times from the server.
A lightweight IMAP user base represents users that enable premium broadband Internet services, including most of the advanced features of their messaging systems like message searching and client filters. This user base is similar to heavyweight POP with regard to message sizes, number of recipients, and number of messages sent and received by each concurrent connection. Lightweight IMAP users typically log in for hours at a time and delete most or all mail before log out. Consequently, mail stacks up in a mailbox during a login session, but user generally do not store more than 20 to 30 messages in their mailboxes. Most inboxes contain less than 10 messages.
A mediumweight IMAP user base represents sophisticated enterprise users with login sessions lasting most of an eight hour business day. These users send, receive, and keep a large amount of mail. Furthermore, these users have unlimited or very large message quotas. Their inboxes contain a large amount of mail that grows during the day, and is fully or partially purged in large spurts. They regularly file messages into folders and search for messages multiple times per hour. Each concurrent client connection sends approximately eight messages per hour. These users send messages with an average of four recipients and have the same message size mix as the Heavyweight POP and Lightweight IMAP user bases.
A mediumweight Messenger Express/Communications Express user base is similar to Mediumweight IMAP. This user base has the same message size mix as Mediumweight IMAP, Lightweight IMAP, and Heavyweight POP. And, the message delivery rates are the same as Mediumweight IMAP users.
It is likely that you will have more than one type of user base in your organization, particularly if you offer more than one client access option. Once you identify your user bases from these categories, you will test them with your usage profile and with a load simulator, described in Using a Messaging Server Load Simulator.
To measure the performance of your Messaging Server, use your messaging user bases (described in Defining Your Messaging User Base) and your messaging usage profile (described in Creating Your Messaging Usage Profile) as inputs into a load simulator.
A load simulator creates a peak volume environment and calibrates the amount of load placed on your servers. You can determine if you need to alter your hardware, throughput, or deployment architecture to meet your expected response time, without overloading your system.
Define the user base that you want to test (for example, Lightweight IMAP).
If necessary, adjust individual parameters to best match your usage profile.
Define the hardware that will be tested.
Run the load simulator and measure the maximum number of concurrent connections on the tested hardware with the user base.
Publish your results and compare those results with production deployments.
Repeat this process using different user bases and hardware until you get the response time that is within an acceptable range for your organization under peak load conditions.
Contact Sun Client Services for recommended load simulators and support.
Once you evaluate your hardware and user base with a load simulator, you need to assess your system performance. The following topics address methods by which you can improve your overall system performance.
Make sure you have an adequate amount of physical memory on each machine in your deployment. Additional physical memory improves performance and enables the server to operate at peak volume. Without sufficient memory, Messaging Server cannot operate efficiently without excessive swapping.
Disk throughput is the amount of data that your system can transfer from memory to disk and from disk to memory. The rate at which this data can be transferred is critical to the performance of Messaging Server. To create efficiencies in your system’s disk throughput:
Consider your maintenance operations, and ensure you have enough bandwidth for backup. Backup can also affect network bandwidth particularly with remote backups. Private backup networks might be a more efficient alternative.
Carefully partition the store and separate store data items (such as tmp and db) to improve throughput efficiency.
Ensure the user base is distributed across RAID (Redundant Array of Independent Disks) environments in large deployments.
Stripe data across multiple disk spindles in order to speed up operations that retrieve data from disk.
Allocate enough CPU resources for RAID support, if RAID does not exist on your hardware.
You want to measure disk I/O in terms of IOPS (total I/O operations per second) not bandwidth. You need to measure the number of unique disk transactions the system can handle with a very low response time (less than 10 milliseconds).
When planning server system disk space, you need to be sure to include space for operating environment software, Messaging Server software, and message content and tracking. Be sure to use an external disk array if availability is a requirement. For most systems, external disks are required for performance because the internal system disks supply no more than four spindles.
In addition, user disk space needs to be allocated. Typically, this space is determined by your site’s policy.
Your deployment planning needs to include how you want to back up the Message Store for disaster recovery. Messaging Server supports Solstice Backup (Legato Networker), the imsbackup utility, and file system snapshot backup. You might want to store your backup media remotely. The more frequently you perform a backup, the better, as long as it doesn’t impact server operations.
The behavior of the Messaging Server MTA Queue is to provide a transient store for messages waiting to be delivered. Messages are written to disk in a persistent manner to maintain guaranteed service delivery. If the MTA is unable to deliver the message, it will retry until it finally gives up and returns the message to the sender.
Sizing the MTA Queue disks are an important step for improving MTA performance. The MTA's performance is directly tied to disk I/O first above any other system resource. This means that you should plan on disk volume that consists of multiple disk spindles, which are concatenated and stripped by using a disk RAID system.
End users are quickly affected by the MTA performance. As users press the SEND button on their email client, the MTA will not fully accept receipt of the message until the message has been committed to the Message Queue. Therefore, improved performance on the Message Queue results in better response times for the end-user experience.
SMTP services are considered a guaranteed message delivery service. This is an assurance to end users that the messaging server will not lose messages that the service is attempting to deliver. When you architect the design of the MTA Queue system, all effort should be made to ensure that messages will not be lost. This guarantee is usually made by implementing redundant disk systems through various RAID technologies.
The queue will grow excessively if one of the following conditions occurs:
The site has excessive network connectivity issues
The MTA configuration is holding on to messages too long
There are valid problems with those messages (not covered in this document)
The following sections address these issues.
Occasionally the MTA is unable to deliver messages due to network connectivity issues. In these cases, the messages will be stored on the queue until the next time the MTA is able to attempt to deliver (as defined by the retry interval).
Planning on disk space for these outages is based on a simple rule, the “General Rule for Message Queue Sizing:”
Determine average number of messages/minute expected to be delivered (N).
Determine average size (kb) of messages (S).
Determine maximum duration (minutes) of typical network connectivity outages (T).
Thus, the formula for estimating the Disk Queue Size is:
Disk Queue Size (kb) = N x S x T
Occasionally, the system will not be able to deliver any messages. In this state, messages will reside on the message queue while the MTA attempts to set aside the messages for a period of time (retry interval) until it reattempts the delivery. This will continue until the MTA gives up and returns the message to the sender. The reason a message is undeliverable is fairly unpredictable. A number of reasons such as network connectivity, busy destination server, network throttles, and so on, could explain why the message is undeliverable.
On a busy server, these temporarily stored messages can build up during periods of high volume activities. Such a build-up can potentially cause problems with disk space. To avoid these build-ups, tune the MTA to retry delivery at a faster rate.
The retry interval is set within the Channel Block configurations of the imta.cnf file. The structure of this file consists of two parts: rewrite rules and channel blocks. The channel blocks define the behavior of a particular disk queue and related processes. This discussion refers to the tcp_local channel. The tcp_local channel provides delivery to sites outside an enterprise's local network, in other words, to places over the Internet.
The retry interval setting of the tcp_local channel is initially set by the default channel block. The default channel block allows settings to be duplicated to avoid having repeated settings.
The following is the default channel block:
defaults notices 1 2 4 7 copywarnpost copysendpost postheadonly noswitchchannel immnonurgent maxjobs 7 defaulthost red.siroe.com red.siroe.com
First, the structure of the channel block consists of the channel name. In the example above, this is the default channel block, which will be applied to channels without these settings. The second part is a list of channel keywords.
The notices keyword specifies the amount of time that can elapse before message delivery notices (MDNs) are sent back to the sender. This keyword starts with the notices keyword followed by a set of numbers, which set the retry period. By default, the MTA will attempt delivery and send notices back to the sender. These notices come from “postmaster” to end-user inboxes.
In this example, the MTA will retry at a period of 1 day, 2 days, and 4 days. At 7 days, the MTA will return the message and regard the message as a failed delivery.
In many cases, the default setting of the MTA provides adequate performance. In some cases, you need to tune the MTA to avoid potential resource exhaustions, such as running out disk space for message queues. This is not a product limitation, but a limitation of the total Messaging Server system, which includes hardware and network resources.
In consideration of these possible disk size issues, deployments with a large number of users may not want to attempt message deliveries for much shorter intervals. If this is the case, study the documentation listed below.
Refer to the following documentation for more information.
Network throughput is the amount of data at a given time that can travel through your network between your client application and server. When a networked server is unable to respond to a client request, the client typically retransmits the request a number of times. Each retransmission introduces additional system overhead and generates more network traffic.
You can reduce the number of retransmissions by improving data integrity, system performance, and network congestion:
To avoid bottlenecks, ensure that the network infrastructure can handle the load.
Partition your network. For example, use 100 Mbps Ethernet for client access and 1 GB Ethernet for the backbone.
To ensure that sufficient capacity exists for future expansion, don’t use theoretical maximum values when configuring your network.
Separate traffic flows on different network partitions to reduce collisions and to optimize bandwidth use.
For detailed information on planning your architecture, see Chapter 11, Developing a Messaging Server Architecture.
A two-tiered architecture splits the Messaging Server deployment into two layers: an access layer and a data layer. In a simplified two-tiered deployment, you might add an MMP and an MTA to the access layer. The MMP acts as a proxy for POP and IMAP mail readers, and the MTA relays transmitted mail. The data layer holds the Message Store and Directory Server. Figure 10–1 shows a simplified two-tiered architecture.
Easier maintenance than one-tiered architectures
Easier growth management and system upgrade with limited overall downtime
The goals of sizing your Message Store are to identify the maximum number of concurrent connections your store can handle and to determine the number of messages that can be delivered to the store per second.
Determine the number of store machines and concurrent connections per machine based on the figures you gather by using a load simulator. For more information on sizing tools, see Using a Messaging Server Load Simulator.
Determine the amount of storage needed for each store machine.
Use multiple store partitions or store machines, if it is appropriate for your backup and restoration of file system recovery times.
Sun Client Services is often asked to specify a recommendation for the maximum number of users on a message store. Such a recommendation cannot be given without understanding:
Usage patterns (as described in Using a Messaging Server Load Simulator.
The maximum number of active users on any given piece of hardware within the deployment.
Backup, restore, and recovery times. These times increase as the size of a message store increases.
In general, separate your MTA services into inbound and outbound services. You can then size each in a similar fashion. The goal of sizing your MTAs is to determine the maximum number of messages that can be relayed per second.
From the raw performance of the inbound MTA, add SSL, virus scanning processes, and other extraordinary message processing.
With redundancy, one or more of each type of machine can still handle peak load without a substantial impact to throughput or response time.
In addition, sufficient disk capacity for network problems or non-functioning remote MTAs must be calculated for transient messages.
In addition, you must:
Add CPU or a hardware accelerator for SSL.
Add more disks for an SMTP proxy.
Add capacity for load balancing and redundancy, if appropriate.
As with inbound MTA routers, one or more of each type of machine should still handle peak load without a substantial impact to throughput or response time when you plan for redundancy in your deployment.
In a single-tiered architecture, there is no separation between access and data layers. The MTA, Message Store, and sometimes the Directory Server are installed in one layer. Figure 10–2 shows a single-tiered architecture.
Single-tiered architectures have lower up-front hardware costs than two-tiered architectures. However, if you choose a one-tiered architecture, you need to allow for significant maintenance windows.
Size your message stores like you size message stores in a Two-tiered Messaging Server Architecture.
Add CPU for SSL, if necessary.
Add more disks for the increased number of SMTP connections.
Add more disks for outbound MTA routing.
For specific instructions on sizing Messaging components in single-tiered or two-tiered architectures, contact your Sun Client Services representative.