Chapter 3 Developing a Messaging Architecture

This chapter describes how to design the architecture of your Messaging Server deployment.

Purpose of a Messaging System Architecture

A good email system architecture quickly delivers email with embedded sound, graphics, video files, HTML forms, Java applets, and desktop applications, while providing for future upgrade and scalability. At a simplistic level, the Messaging Server architecture should:

Central to an email system architecture is the Messaging Server, a collection of components used to send and deliver messages. In addition to components provided in Messaging Server, the email system also requires an LDAP server and a DNS server. Many enterprises have an existing LDAP server and database that can be used with Messaging Server. If not, Java Enterprise System provides an LDAP server (Sun Java System Directory Server). The DNS server must be in place before deploying your email system.

The remainder of this chapter describes the components of the Messaging Server used to design an efficient scalable messaging system, as well as the Messaging Server software architecture.

Several factors other than efficiency and scalability influence the Messaging Server architecture. Specifically these are:

Messaging Server Software Architecture

Figure 3-1 shows a simplified standalone view of Messaging Server. While this particular deployment is not recommended because it does not scale well, it does illustrate the individual components of the server.

Message Transfer Agent or MTA. Receives, routes, transports, and delivers mail messages using the SMTP protocol. An MTA is like an electronic mail deliverer dispersing messages to an electronic mailbox or to another MTA.

Message Store. Consists of a set of components that store, retrieve, and manipulate messages for mail clients. Mail can be retrieved by POP, IMAP, or HTTP clients. POP clients download messages to the client machine for reading and storage. IMAP and HTTP clients read and store messages on the server. The Message Store is like an electronic mailbox that stores and retrieves mail for users.

LDAP directory. Stores, retrieves, and distributes mail directory information for Messaging Server. This includes user routing information, distribution lists, configuration data, and other information necessary to support delivery and access of email. The LDAP directory is a directory of mail addresses, aliases, routing information, passwords, and any other information needed by the MTA or Message Store to deliver and retrieve messages.

DNS Server. Translates domain names into IP addresses. This component needs to be present before Messaging Server is installed.

Message Path Through the Simplified Messaging Server System

Incoming messages from the Internet or local clients are received by the MTA through the Simple Mail Transport Protocol (SMTP). If the address is internal, that is, within the Messaging Server domain, the MTA delivers the message to the Message Store. If the message is external, that is, addressed to a domain outside of Messaging Server control, the MTA relays the message to another MTA on the Internet.

Although it is possible to deliver mail to the /var/mail file system (UNIX systems only), which was the case in previous versions of the Messaging Server, local messages are usually delivered to the more optimized Messaging Server Message Store. Messages are then retrieved by IMAP4, POP3, or HTTP mail client programs.

The directory server stores and retrieves local user and group delivery information such as addresses, alternate mail addresses, and mailhost. When the MTA receives a message, it uses this address information to determine where and how to deliver the message.

In addition to storing messages, the Message Store uses the directory server to verify user login name and passwords for mail clients accessing their mail. The directory also stores information about quota limits, default message store type, and so on.

Outgoing messages from mail clients go directly to the MTA, which sends the message to the appropriate server on the Internet. If the address is local, the MTA sends the message to the Message Store.

New users and groups are created by adding user and group entries to the directory. Entries can be created or modified by using the User Management Utility, or by modifying the directory using LDAP.

Messaging Server components are administered by the Administration Server console. In addition, Messaging Server provides a set of command-line interfaces and configuration files. Any machine connected to a Messaging Server host can perform administrative tasks (assuming, of course, the administrator has proper access). Some of the more common administrative tasks are adding, modifying, and removing users and groups to the mail system, and configuring the operation of the MTA, directory server, and Message Store.

The Message Transfer Agent (MTA)

The MTA routes, transfers, and delivers Internet mail messages for Messaging Server. Mail flows through interfaces known as channels, which consist of a pair of channel programs and a set of configuration information. Figure 3-2 illustrates the process. You can configure channels individually and direct mail to specific channels based on the address.

Each channel consists of up to two channel programs called a slave program, which handles mail coming into the channel, and a master program, which handles mail as it leaves the channel. There is also an outgoing message queue for storing messages that are destined to be sent to one or more of the interfaces associated with the channel. Channel programs perform one of two functions:

Channels are configurable by using the imta.cnf configuration text file. Through channel configuration, you can set a variety of channel keywords to control how messages are handled. Channel keywords affect performance tuning as well as reporting aspects of the system. For example, you can define multiple channels to segment traffic by groups or departments, define message size limits to limit traffic, and define delivery status notification rules according to the needs of your business. Diagnostic attributes are also configurable on a per-channel basis. The number of configuration parameters that can be set on a channel basis is large. See the Sun Java System Messaging Server Administration Guide for detailed information.

SMTP Channel. Enables TCP/IP-based message delivery and receipt. Both master and slave channels are provided.

LMTP Channel. Enables routing of messages directly from MTAs to the Message Store. These channels communicate with the Message Store over LMTP instead of SMTP.

Pipe Channel. Used for alternative message delivery programs. Enables delivery of messages to programs such as a mail sorter rather than directly to a user’s inbox.

Local Channel. Delivers mail to /var/mail. Provides for compatibility with older UNIX mail clients.

Reprocessing Channel. Useful for messages that are resubmitted.

Defragmentation Channel. Reassembles partial messages into the original complete message to support the MIME message/partial content type.

Conversion Channel. Performs body part by body part conversion on messages. Useful for rewriting addresses or re-formatting messages.

See the Sun Java System Messaging Server Administration Guide for more information on MTA concepts.

Direct LDAP Lookup

Prior to version 5.2, Messaging Server ran in dirsync mode. In dirsync mode, directory information about users and groups used by the MTA was accessed through a number of files and databases collectively called the directory cache. The data itself was stored in the LDAP directory, but actual information was accessed from the cache. Data in the cache was updated by the dirsync program, which monitored changes to the LDAP directory and updated the files and databases accordingly.

Starting with Messaging Server 5.2, you can configure the MTA to look up the information directly from the LDAP server. This direct lookup makes better use of LDAP by using the kind of normal query expected by an LDAP server. The direct lookup provides a more scalable, slightly faster, and more configurable relationship between the MTA and the LDAP server. The results of the LDAP queries are cached in the process, with configurable size and time, so performance is tunable. See the Sun Java System Messaging Server Administration Guide for more information.

With the introduction of Messaging Server 6.0, dirsync mode is no longer supported nor included.

Rewrite Rules

Mail is routed to a channel based on the result of running the destination addresses through domain rewriting rules, or rewrite rules for short. Rewrite rules are used to convert addresses into true domain addresses and to determine their corresponding channels. These rules are used to rewrite addresses appearing in both the transport layer and the message header. The transport layer is the message’s envelope. It contains routing information and is invisible to the user, but is the actual information used to deliver the message to the appropriate recipient.

The rewrite rules and the table of channels cooperate to determine the disposition of each address. The result of the rewrite process is a rewritten address and a routing system, that is, the system (channel) to which the message is to be sent. Depending upon the topology of the network, the routing system might only be the first step along the path the message takes to reach its destination, or it might be the final destination system itself.

After the rewrite process has finished, a search is made for the routing system among the channel portion of the imta.cnf file. Each channel has one or more host names associated with it. The routing system name is compared against each of these names to determine to which channel to enqueue the message. A simple rewrite rule is shown here:

This rule matches addresses for the domain example.com only. Such matching addresses would be rewritten using the template $U%$D, where:

$U	Indicates the user portion or left-hand side of the address (before the @)
%	Indicates the @ sign
$D	Indicates the domain portion or right-hand side of the address (after the @)

Thus, a message of the form wallaby@thor.example.com would be rewritten to wallaby@example.com, and would be sent to the channel called tcp_siroe-daemon.

Rewrite rules can perform sophisticated substitutions based on mapping tables, LDAP directory lookups, and database references. While occasionally cryptic, they are useful in the fact that they operate at a low level and impose little direct overhead on the message processing cycle. For full information on these and other features available in the rewrite process, see the Sun Java System Messaging Server Administration Guide.

The Job Controller

The job controller controls master, job controller, sender, and dequeue channel programs. The job controller is a program that controls the message queues and executes the programs to do the actual message delivery. The job controller runs as a multithreaded process and is one of the few processes that is always present in the Messaging Server system. The channel processing jobs themselves are created by the job controller but are transient and might not be present when there is no work for them to do.

There are configurables for the job controller that determine if there is always at least one instance of a channel processing program. In many cases, these are set so that there is always at least one instance of the service program even when there is no immediate work to do. In other cases, there will be an instance for a set period of time after it last did some work but there is nothing left to do.

Slave channels, which respond to external stimuli, notify the job controller of a newly created message file. The job controller enters this information into its internal data structure and if necessary creates a master channel job to process the message. This job creation might not be necessary if the job controller determines that an extant channel job can process the newly created message file. When the master channel job starts, it gets its message assignment from the job controller. When it is finished with the message, the master channel updates the job controller as to the status of its processing. The status is either that the message is successfully dequeued or the message should be rescheduled for retrying. The job controller maintains information about message priority and previous delivery attempts that failed, allowing for advantageous scheduling of channel jobs. The job controller also keeps track of the state of each job. The state can be idle, how long the job has been idle, or whether the job is busy. Tracking state enables the job controller to keep an optimal pool of channel jobs.

Local Mail Transfer Protocol (LMTP)

As of the Sun ONE Messaging Server 6.0 release, you can configure LMTP for delivery to the Message Store in a multi-tier deployment. In these scenarios, where you are using inbound relays and back-end Message Stores, the relays become responsible for address expansion and delivery methods such as autoreply and forwarding and also for mailing list expansion. Delivery to the back-end stores historically has been over SMTP, which requires the back-end system to look up the recipient addresses in the LDAP directory again, thereby engaging the full machinery of the MTA. For speed and efficiency, the MTA can use LMTP rather than SMTP to deliver messages to the back-end store. See the Sun Java System Messaging Server Administration Guide for more information.

The Message Store


Note	Messaging Server’s implementation of LMTP is a general purpose implementation. You can only use Messaging Server’s LMTP between the Messaging Server MTA and Message Store components in a two-tier architecture. In a single-tier architecture, consisting of one machine, you cannot use LMTP.

The Message Store is a dedicated data store for the delivery, retrieval, and manipulation of Internet mail messages. The Message Store works with the IMAP4 and POP3 client access servers to provide flexible and easy access to messaging. The Message Store also works through the Webmail server to provide messaging capabilities to web browsers. In addition to this section, see the Sun Java System Messaging Server Administration Guide for more information.

The Message Store is organized as a set of folders or user mailboxes. The folder or mailbox is a container for messages. Each user has an INBOX where new mail arrives. Each user can also have one or more folders where mail can be stored. Folders can contain other folders arranged in a hierarchical tree. Mailboxes owned by an individual user are private folders. Private folders can be shared at the owner’s discretion with other users on the same Message Store. As of the 6.0 release, Messaging Server supports sharing folders across multiple stores.

There are two general areas in the Message Store, one for user files and another for system files. In the user area, the location of each user’s INBOX is determined by using a two-level hashing algorithm. Each user mailbox or folder is represented by another directory in its parent folder. Each message is stored as a plain text file using the MIME formatting standard. When there are many messages in a folder, the system creates hash directories for that folder. Using hash directories eases the burden on the underlying file system when there are many messages in a folder. In addition to the messages themselves, the Message Store maintains an index and cache of message header information and other frequently used data to enable clients to rapidly retrieve mailbox information and do common searches without the need to access the individual message files.

A Message Store can contain many message store partitions. A Message Store partition is contained by a file system volume. As the file system becomes full, you can create additional file system volumes and Message Store partitions on those file system volumes.

Message Store maintains only one copy of each message per partition. This is sometimes referred to as a single-copy message store. When the Message Store receives a message addressed to multiple users or a group or distribution list, it adds a reference to the message in each user’s INBOX. Rather than saving a copy of the message in each user’s INBOX, Message Store avoids the storage of duplicate data. The individual message status flag (seen, read, answered, deleted, and so on) is maintained per folder for each user.

The system area contains information on the entire Message Store in Berkeley database format for faster access. The information in the system area can be reconstructed from the user area. Starting with Sun ONE Messaging Server 5.2, the product contains a database snapshot function. When needed, you can quickly recover the database to a known state. Messaging Server now also adds fast recovery, so that in case of database corruption, you can shut down the Message Store and bring it back immediately without having to wait for a lengthy database reconstruction.

The Message Store supports the IMAP quota extension (RFC2087). Enforcement of quota can be turned on or off. You can configure a user quota by using number of bytes or number of messages. You can also set a threshold so that if the quota reaches the threshold, a warning message can be sent to the user. When the user is over quota, new messages can be held up for retry during a grace period. After the grace period, messages sent to the over-quota user are returned to the sender with a non-delivery notification.

For special applications where quota is used, but messages must be delivered regardless of the quota status of the users, there is a guaranteed message delivery channel. This channel can be used to deliver all messages regardless of quota status. Utilities are available for reporting quota usage and for sending over quota warnings.

Directory Services

Messaging Server is bundled with Sun Java System Directory Server. Directory Server is a Lightweight Directory Access Protocol (LDAP) directory service. Directory Server provides the central repository for information critical to the operation of Messaging Server. This information includes user profiles, distribution lists, and other system resources.

Directory Information Tree

The directory stores data in the form of a tree, known as the Directory Information Tree (DIT). The DIT is a hierarchical structure, with one major branch at the top of the tree and branches and subbranches below. The DIT is flexible enough to enable you to design a deployment that fits your organization’s needs. For example, you might choose to arrange the DIT according to your actual business organizational structure, or by the geographical layout of your business. You also might want to design a DIT that follows a one-to-one mapping to your DNS layers. Use care when designing your DIT, as changing it after the fact is not an easy task.

The DIT is also flexible enough to accommodate a wide range of administration scenarios. You can administer the DIT in either a centralized or distributed manner. In centralized administration, one authority manages the entire DIT. You would use centralized administration where the entire DIT resides on one mail server. In distributed administration, multiple authorities manage the DIT. Usually you implement distributed administration when the DIT is divided into portions, or subtrees, residing on different mail servers.

When the DIT is large, or when mail servers are geographically disbursed, consider delegating management of portions of the DIT. Typically, you assign an authority to manage each subtree of the DIT. Messaging Server enables you to manage multiple subtrees from one authority. However, for security reasons, an authority can only make changes to the subtree of the DIT that the authority owns.

The default schema used by Messaging Server when Identity Server is not used is different from the one used by Identity Server. Messaging Server supports both Sun Java System LDAP Schema 1 and 2, and allows for transition and migration of the schemas. See Chapter 7, "Understanding Messaging Server Schema and Provisioning Options" for more information.

Directory Replication

Directory Server supports replication, enabling a variety of configurations that provide redundancy and efficiency. Enabling replication of all or part of the DIT from one host to other hosts provides the following configuration capabilities:

For more information on directory replication, directory performance tuning, and DIT structure and design, see the Sun Java System Directory Server documentation at the following location:

Understanding the Two-tier Architecture

A two-tier messaging architecture provides the optimum design for scalability and reliability. Instead of having a single host run all the components of a messaging system, a two-tier architecture separates the components onto different machines. These separate components perform specific specialized functions. As the load for a particular functional component increases—for example, more message storage is required, or more outbound relaying is needed—you can add more servers to handle the larger loads.

The two-tier architecture consists of an access layer and a data layer. The access layer is the portion of the deployment that handles delivery, message access, user login, and authentication. The data layer is the portion of the deployment that holds all the data. This includes the LDAP master servers and Messaging Server machines that are configured to store user messages.

Public Access Network. The network connecting the Messaging Server to internal users and the Internet. Each deployment defines its own network requirements, however, the basic Messaging Server requirement is connectibility to end users and the Internet using standard protocols such as SMTP, POP, IMAP, and HTTP.

Private Data Network. This network provides secure connectivity between the public access network and Messaging Server data. It consists of a secure access layer and a data layer, which includes the service-wide directory, the message data center, and the personal address book (PAB) server.

LDAP directory server. Directory server used for storing and retrieving information about the user base. It stores user and group aliases, mailhost information, delivery preferences, and so on. Depending on your design requirements, there could be more than one identical directory for the system. Figure 3-3 shows a master directory and two replicas. An LDAP directory server is provided as part of the Messaging Server product. If desired, you can use data from an existing directory. In this instance, you retrieve user and group data from the existing directory and place it in a Sun Java System Directory Server directory. The data format of the existing directory must also be compliant with the Messaging Server schema.

Message Store. Holds and stores user mail. Also referred to as a “back end.” The Message Store also refers to the Message Access Components such as the IMAP server, the POP server, and the Messenger Express (Webmail) servers. Figure 3-3 shows a deployment that has two message stores. You can add more stores as needed.

Personal Address Book (PAB) Server. Stores and retrieves Messenger Express user addresses.

DNS server. Maps host names to IP addresses. The DNS server determines what host to contact when routing messages to external domains. Internally, DNS maps actual services to names of machines. The DNS server is not part of the Messaging Server product line. You must install an operating DNS server prior to installing Messaging Server.

Server Load Balancer. Balances network connections uniformly or by algorithm across multiple servers. Using load balancers, a single network address can represent a large number of servers, eliminating traffic bottlenecks, allowing management of traffic flows and guaranteeing high service levels. Figure 3-3 has two load balancers. One balances connections to the MMP, and one balances the MTA Outbound Relays. Load balancers are not part of the Java Enterprise System product line. You cannot use load balancers on the Message Store or directory masters. You use them for connections to MMPs, MEMs, Communications Express, inbound and outbound MTAs, directory consumers, and without Messaging Server’s MTA’s use of the Brightmail product, Brightmail servers.

MTA Inbound Relay. MTA dedicated to accepting messages from external (Internet) sites and routing those messages to the local Message Store server. Because this is the first point of contact from the outside, the MTA inbound relay has the added responsibility of guarding against unauthorized relaying, spam filtering, and denial of service attack.

MTA Outbound Relay. MTA that only receives mail from internal or authenticated users and routes those messages to other internal users or to external (Internet) domains. While a single machine can be an inbound relay as well as an outbound relay, in a large scale Internet-facing deployment, separate these functions to two separate machines. This way, internal clients sending mail do not have to compete with inbound mail from external sites.

There is another option for routing that outbound relays never deliver internally. They view internally bound mail from their user base as simply an instance of routing and forward all such messages to an inbound MTA.

Delegated Administrator Server. Provides a GUI management console for users and administrators. Delegated Administrator enables users to change passwords, set vacation mail, and so forth. Administrators are able to do more advanced administrative tasks, such as adding and deleting users. Delegated Administrator currently works only with Schema 1 implementations.

Messaging Multiplexor or Mail Message Proxy or MMP. Enables scaling of the Message Store across multiple physical machines by decoupling the specific machine that contains a user’s mailbox from its associated DNS name. Client software does not have to know the physical machine that contains its Message Store. Thus, users do not need to change the DNS name of their host message store every time their mailbox is moved to a new machine. When POP or IMAP clients request mailbox access, the proxy forwards the request to the Messaging Server system containing the requested mailbox by looking in the directory service for the location of the user’s mailbox.

Messenger Express Multiplexor. A specialized server that acts as a single point of connection to the HTTP access service for Webmail. All users connect to the single messaging proxy server, which directs them to their appropriate mailbox. As a result, an entire array of messaging servers will appear to mail users to be a single host name. While the Messaging Multiplexor (MMP) connects to POP and IMAP servers, the Messenger Express Multiplexor connects to an HTTP server. In other words, the Messenger Express Multiplexor is to Messenger Express as MMP is to POP and IMAP.

Two-tier Architecture—Messaging Data Flow

This section describes the message flow through the messaging system. How the message flow works depends upon the actual protocol and message path.

Sending Mail: Internal User to Another Internal User

Synopsis: Internal User -> Load Balancer -> MTA Outbound Relay 1 or 2 -> MTA Inbound Relay 1 or 2 -> Message Store 1 or 2

Messages addressed from one internal user to another internal user (that is, users on the same email system) first go to a load balancer. The load balancer shields the email user from the underlying site architecture and helps provide a highly available email service. The load balancer sends the connection to either MTA Outbound Relay 1 or 2. The outbound relay reads the address and determines if the message is addressed to an external user or an internal user. If it is an external user, it sends the message to the Internet. If it is an internal user, it sends it to MTA Inbound Relay 1 or 2 (or directly to the appropriate message store if so configured). The MTA Inbound Relay delivers the message to the appropriate Message Store. The Message Store receives the message and delivers it to the mailbox.

Retrieving Mail: Internal User


Note	An increasingly more common scenario is to use LMTP to deliver mail directly from the outbound relay to the store. In a two-tier deployment, you can make this choice.

Synopsis: Internal User -> Load Balancer -> MMP/MEM/Communications Express Proxy Server 1 or 2 -> Message Store 1 or 2

Mail is retrieved by using either POP, HTTP, or IMAP. The user connection is received by the load balancer and forwarded to one of the MMP, MEM, or Communications Express servers. The user then sends the login request to the access machine it is connected to. The access layer machine validates the login request and password, then sends the request over the same protocol designated by the user connection to the appropriate Message Store (1 or 2). The access layer machine then mediates for the rest of the connection between the client and servers. The exception is for Communications Express, which does a level of processing of ongoing user requests to handle some of the browser rendering.

Sending Mail: Internal User to an External (Internet) User

Synopsis: Internal User -> Load Balancer -> MTA Outbound Relay 1 or 2 -> Internet

Messages addressed from an internal user to an external user (that is, users not on the same email system) go to a load balancer. The load balancer shields the email user from the underlying site architecture and helps provide a highly available email service. The load balancer sends the message to either MTA Outbound Relay 1 or 2, (or directly to the appropriate message store if so configured). The outbound relay reads the address and determines if the message is addressed to an external user or an internal user. If it is an external user, it sends the message to an MTA on the Internet. If it is an internal user, it sends it to MTA Inbound Relay 1 or 2. The MTA Inbound Relay delivers the message to the appropriate Message Store. The Message Store receives the message and delivers it to the appropriate mailbox.

Sending Mail: External (Internet) User to an Internal User

Messages addressed from an external user (from the Internet) to an internal user go to either MTA Inbound Relay 1 or 2 (a load balancer is not required). The inbound relay reads the address and determines if the message is addressed to an external user (if Internet relaying is enabled) or an internal user. If it is an external user, the inbound relay sends the message to another MTA on the Internet. If it is an internal user, the inbound relay determines by using an LDAP lookup whether to send it to Message Store 1 or 2, and delivers accordingly. The appropriate Message Store receives the message and delivers it to the appropriate mailbox.

Understanding Horizontal and Vertical Scalability

Scalability is the capacity of your deployment to accommodate growth in the use of messaging services. Scalability determines how well your system can absorb rapid growths in user population. Scalability also determines how well your system can adapt to significant changes in user behavior, for example, when a large percentage of your users want to enable SSL within a month.

This section helps you identify the features you can add to your architecture to accommodate growth on individual servers and across servers. The following topics are covered:

Planning for Horizontal Scalability

Horizontal scalability refers to the ease with which you can add more servers to your architecture. As your user population expands or as user behavior changes, you eventually begin to maximize resources of your existing architecture. Careful planning helps you to determine how to appropriately scale your architecture.

If you horizontally scale your architecture, you distribute resources across several servers. There are two methods used for horizontal scalability:

Spreading Your User Base Across Several Servers

To distribute load across servers is to divide clients’ mail evenly across several back-end Message Stores. You can divide up users alphabetically, by their Class of Service, by their department, or by their physical location and assign them to a specific back-end Message Store host.

The Messaging Multiplexor (MMP) is a multi-threaded server that handles incoming client connections for multiple servers. The MMP accepts POP or IMAP connections, performs LDAP lookups for authentication, and then routes connections to the appropriate messaging server. For HTTP connections, you might enable the Messenger Express Multiplexor (MEM) to handle incoming client connections for multiple servers. Communications Express also acts in a similar fashion.

Often, both the MMP and the Messenger Express Multiplexor are placed on the same machine for ease of manageability. Figure 3-4 shows a sample architecture where users are spread across multiple back-end servers and a multiplexor is enabled to handle incoming client connections.

Spreading users across back-end servers provides simplified user management, as long as you use the MMP or the MEM. Because users connect to one back-end server, where their mail resides, you can standardize setup across all users. This configuration also makes administration of multiple servers easier to manage. And, as the demand for more Messaging Server hosts increases, you can add more hosts seamlessly.

Spreading Your Resources Across Redundant Components

If email is a critical part of your organization’s day-to-day operations, redundant components, like load balancers, MX records, and relays might be necessary to ensure that the messaging system remains operational.

The following figure is an example of spreading resources across redundant MTA relays. The same set of components, such as the Internet relay, inbound MTA, and outbound MTA, are used in Figure 3-4, except that in this case, there are two of each deployed.

By using redundant MTA relays, you can ensure that if one component is disabled, the other is still available. Also, spreading resources across redundant MTA relays enables load sharing. For example, two Internet relays share the load that a single relay previously managed. This redundancy also provides fault tolerance to the Messaging Server system. Each MTA relay should be able to perform the function of other MTA relays.

Installing redundant network connections to servers and mail relays also provides fault tolerance for network problems. The more critical your messaging deployment is to your organization, the more important it is for you to consider fault tolerance and redundancy.

MX Records

Equal priority MX records route messages to redundant Internet relays and inbound and outbound MTAs. For example, the sending MTA will find that the MX record for siroe.com corresponds to relayA.siroe.com and relayB.siroe.com. One of these relays is chosen at random, as they have equal priority, and an SMTP connection is opened. If the first relay chosen does not respond, the mail goes to the other relay. See the following MX record example:

Relays

When Messaging Server hosts are each supporting many users, and there is a heavy load of sending SMTP mail, offload the routing task from the Messaging Server hosts by using mail relays. You can further share the load by designating different relays to handle outgoing and incoming messages.

Often, both the inbound and outbound relays are combined as a single In/Out SMTP relay host. To determine if you need one or more relay hosts, identify the inbound and outbound message traffic characteristics of the overall architecture.

Load Balancers

Load balancing can be used to distribute the load across several servers so that no single server is overwhelmed. A load balancer takes requests from clients and redirects them to an available server by algorithms such as keeping track of each server’s CPU and memory usage. Load balancers are available as software that runs on a common server, as a pure external hardware solution, or as a combined hardware and software package.

Planning for Vertical Scalability

Vertical scalability pertains to adding resources to individual server machines, for example, adding additional CPUs. Each machine is scaled to handle a certain load. In general, you might decide upon vertical scalability in your deployment because you have resource limitations or you are unable to purchase additional hardware as your deployment grows.

Planning for High Availability

High availability is a design for your deployment that operates with a small amount of planned and unplanned downtime. Typically, a highly available configuration is a cluster that is made up of two or more loosely coupled systems. Each system maintains its own processors, memory, and operating system. Storage is shared between the systems. Special software binds the systems together and allows them to provide fully automated recovery from a single point of failure. Messaging Server provides high-availability options that support both the Sun™ Cluster services and Veritas® clustering solutions.

When you create your high availability plan, you need to weigh availability against cost. Generally, the more highly available your deployment is, the more its design and operation will cost.

High availability is an insurance against the loss of data access due to application services outages or downtime. If application services become unavailable, an organization might suffer from loss of income, customers, and other opportunities. The value of high availability to an organization is directly related to the costs of downtime. The higher the cost of downtime, the easier it is to justify the additional expense of having high availability. In addition, your organization might have service level agreements guaranteeing a certain level of availability. Not meeting availability goals can have a direct financial impact.

Performance Considerations for a Messaging Server Architecture

This section describes how to evaluate the performance characteristics of Messaging Server components to accurately develop your architecture.

Message Store Performance Considerations

The preceding factors list the approximate order of impact to the Message Store. Most performance issues with the Message Storage arise from insufficient disk I/O capacity. Additionally, the way in which you lay out the store on the physical disks can also have a performance impact. For smaller standalone systems, it is possible to use a simple stripe of disks to provide sufficient I/O. For most larger systems, segregate the file system and provide I/O to the various parts of store.

Messaging Server Directories

Messaging Server uses five directories that receive a significant amount of input and output activity. Because these directories are accessed very frequently, you can increase performance by providing each of those directories with its own disk, or even better, providing each directory with a Redundant Array of Independent Disks (RAID). The following table describes these directories.

Table 3-1 High Access Messaging Server Directories
High I/O Directory	Description and Defining Parameter
MTA queue directory	In this directory, many files are created, one for each message that passes through the MTA channels. After the file is sent to the next destination, the file is then deleted. The directory location is controlled by the IMTA_QUEUE option in the imta_tailor file. Before modifying the MTA queue directory, read about this option in the Sun Java System Messaging Server Administration Reference. Default location: msg_svr_base/data/imta/queue
Messaging Server log directory	This directory contains log files which are constantly being appended with new logging messages. The number of changes will depend on the logging level set. The directory location is controlled by the configutil parameter logfile..logdir, where can be a log-generating component such as admin, default, http, imap, or pop. The MTA log files can be changed with IMTA_LOG option in the imta_tailor file. Default location: msg_svr_base/data/log
Mailbox database files	These files require constant updates as well as cache synchronization. Put this directory on your fastest disk volume. These files are always located in the msg_svr_base/data/store/mboxlist directory.
Message store index files	These files contain meta information about mailboxes, messages, and users. By default, these files are stored with the message files. The configutil parameter store.partition..path, where is the name of the partition, controls the directory location. If you have the resources, put these files on your second fastest disk volume. Default location: msg_svr_base/data/store/partition/primary
Message files	These files contain the messages, one file per message. Files are frequently created, never modified, and eventually deleted. By default, they are stored in the same directory as the message store index files. The location can be controlled with the configutil parameter store.partition..messagepath, where is the name of the partition. Some sites might have a single message store partition called primary specified by store.partition.primary.path. Large sites might have additional partitions that can be specified with store.partition..path, where is the name of the partition. Default location: msg_svr_base/data/store/partition/primary

The following sections provide more detail on Messaging Server high access directories.

MTA Queue Directories

In non-LMTP environments, the MTA queue directories are also heavily used. LMTP works such that inbound messages are not put in MTA queues but directly inserted into the store. This message insertion lessens the overall I/O requirements of the Message Store machines and greatly reduces use of the MTA queue directory on Message Store machines. If the system is standalone or uses the local MTA for Webmail sends, significant I/O can still result on this directory for outbound mail traffic. In a proper two-tier environment using LMTP, this directory will be lightly used, if at all. In prior releases of Messaging Server, on large systems this directory set needs to be on its own stripe or volume.

Log Files Directory

The log files directory requires varying amounts of I/O depending on the level of logging that is enabled. The I/O on the logging directory, unlike all of the other high I/O requirements of the Message Store, is asynchronous. For typical deployment scenarios, do not dedicate an entire LUN for logging. For very large store deployments, or environments where significant logging is required, a dedicated LUN is in order.

In almost all environments, you need to protect the Message Store from loss of data. The level of loss and continuous availability that is necessary varies from simply disk protection such as RAID5, to mirroring, to routine backup, to real time replication of data, to a remote data center. Data protection also varies from the need for Automatic System Recovery (ASR) capable machines, to local HA capabilities, to automated remote site failover. These decisions impact the amount of hardware and support staff required to provide service.

mboxlist Directory

The mboxlist directory is highly I/O intensive but not very large. The mboxlist directory contains the Sleepycat (Berkeley) databases that are used by the stores and their transaction logs. Because of its high I/O activity, and due to the fact that it cannot be split, you should place the mboxlist directory on its own stripe or volume in large deployments. This is also the most likely cause of a loss of vertical scalability, as many procedures of the Message Store access the Sleepycat databases. For highly active systems, this can be a bottleneck. Bottlenecks in the I/O performance of the mboxlist directory decrease not only the raw performance and response time of the store but also impact the vertical scalability. For systems with a requirement for fast recovery from backup, place this directory on Solid State Disks (SSD) or a high performance caching array to accept the high write rate that an ongoing restore with a live service will place on the file system.

Multiple Store Partitions

The Message Store supports multiple store partitions. Place each partition on its own stripe or volume. The number of partitions that should be put on a store is determined by a number of factors. The obvious factor is the I/O requirements of the peak load on the server. By adding additional file systems as additional store partitions, you increase the available IOPS (total IOs per second) to the server for mail delivery and retrieval. In most environments, you will get more IOPS out of a larger number of smaller stripes or LUNS than a small number of larger stripes or LUNS.

With some disk arrays, it is possible to configure a set of arrays in two different ways. You can configure each array as a LUN and mount it as a file system. Or, you can configure each array as a LUN and stripe them on the server. Both are valid configurations. However, multiple store partitions (one per small array or a number of partitions on a large array striping sets of LUNs into server volumes) are easier to optimize and administer.

Raw performance, however, is usually not the overriding factor in deciding how many store partitions you want or need. In corporate environments, it is likely that you will need more space than IOPS. Again, it is possible to software stripe across LUNs and provide a single large store partition. However, multiple smaller partitions are generally easier to manage. The overriding factor of determining the appropriate number of store partitions is usually recovery time.

First of all, fsck can operate on multiple file systems in parallel on a crash recovery caused by power, hardware, or operating system failure. If you are using a journaling file system (highly recommended and required for any HA platform), this factor is small.

Secondly, backup and recovery procedures can be run in parallel across multiple store partitions. This parallelization is limited by the vertical scalability of the mboxlist directory as the Message Store uses a single set of databases for all of the store partitions. Store cleanup procedures (expire and purge) run in parallel with one thread of execution per store partition.

Lastly, re-mirror or RAID re-sync procedures are faster with smaller LUNs. There are no hard and fast rules here, but the general recommendation in most cases is that a store partition should not encompass more than 10 spindles.

The size of drive to use in a storage array is a question of the IOPS requirements versus the space requirements. For most residential ISP POP environments, use “smaller drives.” Corporate deployments with large quotas should use “larger” drives. (By way of comparison, a small drive in a Sun disk array would be 36 GB, a large drive would be 73 GB or greater.) Again, every deployment is different and needs to examine its own set of requirements.

Message Store Scalability

The Message Store scales well, due to its multiprocess, multithreaded nature. The Message Store actually scales more than linearly from one to four processors, meaning that a four processor system will handle more load than a set of four single processor systems. The Message Store also scales fairly linearly from four to 12 processors. From 12 to 16 processors, there is increased capacity but not a linear increase. The vertical scalability of a Message Store is more limited with the use of LMTP although the number of users that can be supported on the same size store system increases dramatically.

MTA Performance Considerations

MTA performance is affected by a number of factors including, but not limited to:

The MTA router is both CPU and I/O intensive. The MTA uses two different file systems for the queue directory and the logging directory. For a small host (four processors or less) functioning as an MTA router, you do not need to separate these directories on different file systems. The queue directory is written to synchronously with fairly large writes. The logging directory is a series of smaller asynchronous and sequential writes.

In most cases, you will want to plan for redundancy in the MTA in the disk subsystem to avoid permanent loss of mail in the event of a spindle failure. (A spindle failure is by far the single most likely hardware failure.) This implies that either an external disk array or a system with many internal spindles is optimal.

MTA RAID Trade-offs

There are trade-offs between using external hardware RAID controller devices and using JBOD arrays with software mirroring. The JBOD approach is sometimes less expensive in terms of hardware purchase but always requires more rack space and power. The JBOD approach also marginally decreases server performance, because of the cost of doing the mirroring in software, and usually implies a higher maintenance cost. Software RAID5 has such an impact on performance that it is not a viable alternative. For these reasons, use RAID5 caching controller arrays if RAID5 is preferred.

MTA Scalability

The MTA router does scale linearly beyond eight processors, and like the Message Store, more than linearly from one processor to four.

MTA and High Availability

It is rarely advisable to put the MTA router under HA control, but there are exceptional circumstances where this is warranted. If you have a requirement that mail delivery happens in a short, specified time frame, even in the event of hardware failure, then the MTA must be put under HA software control. In most environments, simply increase the number of MTAs that are available by one or more over the peak load requirement. This ensures that proper traffic flow can occur even with a single MTA failure, or in very large environments, when multiple MTA routers are offline for some reason.

In addition, with respect to placement of MTAs, you should always deploy the MTA at your firewall.

Mail Message Proxy (MMP) Performance Considerations

The MMP uses no disk I/O other than for logging. The MMP is completely CPU and network bound. Unlike all the other Messaging Server components, the MMP is not multiprocess and multithreaded. The primary execution code is single process and multithreaded. Thus, because the MMP is not sufficiently a multiprocess, it does not scale as well as the other components.

The MMP does not scale beyond four processors, and scales less than linearly from two to four processors. Two processor, rack mounted machines are good candidates for MMPs.

In deployments where you choose to put other component software on the same machine as the MMP (MEM, Calendar Server front end, Communications Express Web Client, LDAP proxy, and so on), look at deploying a larger, four processor SPARC machine. Such a configuration reduces the total number of machines that need to be managed, patched, monitored, and so forth.

MMP sizing is affected by connection rates and transaction rates. POP sizing is fairly straight forward, as POP connections are rarely idle. POP connections connect, do some work, and disconnect. IMAP sizing is more complex, as you need to understand the login rate, the concurrency rate, and the way in which the connections are busy. The MMP is also somewhat affected by connection latency and bandwidth. Thus, in a dial up environment, the MMP will handle a smaller number of concurrent users than in a broadband environment, as the MMP acts as a buffer for data coming from the Message Store to the client.

If you use SSL in a significant percentage of connections, install a hardware accelerator.

MMP and High Availability

Never deploy the MMP under HA control. An individual MMP has no static data. In a highly available environment, add one or more additional MMP machines so that if one or more are down there is still sufficient capacity for the peak load. If you are using Sun Fire Blade™ Server hardware, take into account the possibility that an entire Blade rack unit can go down and plan for the appropriate redundancy.

Messenger Express Multiplexor (MEM) Performance Considerations

The MEM provides a middle-tier proxy for the Webmail client. This client enables users to access mail and to compose messages through a browser. The benefit of the MEM is that end users only connect to the MEM to access email, regardless of which back-end server is storing their mail. MEM accomplishes this by managing the HTTP session information and user profiles via the user’s LDAP information. The second benefit is that all static files and LDAP authentication states are located on the Messaging Server front end. This benefit offsets some of the additional CPU requirements associated with web page rendering from the Message Store back end.

The MEM has many of the same characteristics as the MMP. The MEM will scale beyond four processors, but in most environments, there is no particular value in doing so. Also, in the future, the Webmail component will be offloaded from the Message Store and onto access layer machines that are running the XML rendering as Java servlets under the web server. Java servlets do not presently scale well beyond two processors. Thus, plan your hardware choice around either SPARC or Intel two-processor machines for the MEM, or assume that you will repurpose your current two-processor MEM hardware to be replaced by smaller machines when the next generation solution becomes available.

You can put the MMP and MEM on the same set of servers. The advantage to doing so is if a small number of either MMPs or MEMs are required, the amount of extra hardware for redundancy is minimized. The only possible downside to co-locating the MMP and MEM on the same set of servers is that a denial of service attack on one protocol can impact the others.

Setting Disk Stripe Width

When setting disk striping, the stripe width should be about the same size as the average message passing through your system. A stripe width of 128 blocks is usually too large and has a negative performance impact. Instead, use values of 8, 16, or 32 blocks (4, 8, or 16 kilobyte message respectively).

Setting the Mailbox Database Cache Size

Messaging Server makes frequent calls to the mailbox database. For this reason, it helps if this data is returned as quickly as possible. A portion of the mailbox database is cached to improve Message Store performance. Setting the optimal cache size can make a big difference in overall Message Store performance. You set the size of the cache with the configutil parameter store.dbcachesize.

The mailbox database is stored in data pages. When the various daemons make calls to the database (stored, imapd, popd), the system checks to see if the desired page is stored in the cache. If it is, the data is passed to the daemon. If not, the system must write one page from the cache back to disk, and read the desired page and write it in the cache. Lowering the number of disk read/writes helps performance, so setting the cache to its optimal size is important.

If the cache is too small, the desired data will have to be retrieved from disk more frequently than necessary. If the cache is too large, dynamic memory (RAM) is wasted, and it takes longer to synchronize the disk to the cache. Of these two situations, a cache that is too small will degrade performance more than a cache that is too large.

Cache efficiency is measured by hit rate. Hit rate is the percentage of times that a database call can handled by cache. An optimally sized cache will have a 99 percent hit rate (that is, 99 percent of the desired database pages will be returned to the daemon without having to grab pages from the disk). The goal is to set the cache such that it holds a number of pages such that the cache will be able to return at least 95 percent of the requested data. If the direct cache return is less than 95 percent, then you need to increase the cache size.

The Sleepycat database command db_stat can be used to measure the cache hit rate. For example:

2MB 513KB 604B  Total cache size.
1               Number of caches.
2MB 520KB       Pool individual cache size.
0               Requested pages mapped into the process’ address space.
55339             Requested pages found in the cache (99%).

In this case, the hit rate is 99 percent. This could be optimal or, more likely, it could be that the cache is too large. (A cache that is too large will always show 99 percent.) The way to test this is to lower the cache size until the hit rate moves to below 99 percent. When you hit 98 percent, you have optimized the DB cache size. Conversely, if db_stat shows a hit rate of less than 95 percent, then you should increase the cache size with store.dbcachesize.

Previous Contents Index Next
Sun Java System Messaging Server 6 2004Q2 Deployment Planning Guide


Note	As your user base changes, the hit rate can also change. Periodically check and adjust this parameter as necessary. This parameter has an upper limit of 2 GB imposed by the Sleepycat database.