Sun Java System Messaging Server 6.3 Administration Guide

3.2 High Availability Models

There are different high availability models that can be used with Messaging Server. Three of the more basic ones are:

Each of these models is described in greater detail in the following subsections.

Note that different HA products may or may not support different models. Refer to the HA documentation to determine which models are supported.

3.2.1 Asymmetric

The basic asymmetric or hot standby high availability model consists of two clustered host machines or nodes. A logical IP address and associated host name are designated to both nodes.

In this model, only one node is active at any given time; the backup or hot standby node remains idle most of the time. A single shared disk array between both nodes is configured and is mastered by the active or primary node. The message store partitions and Mail Transport Agent (MTA) queues reside on this shared volume.

Figure 3–1 Asymmetric High Availability Mode

This image shows an HA asymmetric model.

The preceding figure shows two physical nodes, Physical-A and Physical-B. Before failover, the active node is Physical-A. Upon failover, Physical-B becomes the active node and the shared volume is switched so that it is mastered by Physical-B. All services are stopped on Physical-A and started on Physical-B.

The advantage of this model is that the backup node is dedicated and completely reserved for the primary node. Additionally, there is no resource contention on the backup node when a failover occurs. However, this model also means that the backup node stays idle most of the time and this resource is therefore under utilized.

3.2.2 Symmetric

The basic symmetric or "dual services" high availability model consists of two hosting machines, each with its own logical IP address. Each logical node is associated with one physical node, and each physical node controls one disk array with two storage volumes. One volume is used for its local message store partitions and MTA queues, and the other is a mirror image of its partner's message store partitions and MTA queues.

The following figure shows the symmetric high availability mode. Both nodes are active concurrently, and each node serves as a backup node for the other. Under normal conditions, each node runs only one instance of Messaging Server.

Figure 3–2 Symmetric High Availability Mode

Upon failover, the services on the failing node are shut down and restarted on the backup node. At this point, the backup node is running Messaging Server for both nodes and is managing two separate volumes.

The advantage of this model is that both nodes are active simultaneously, thus fully utilizing machine resources. However, during a failure, the backup node will have more resource contention as it runs services for Messaging Server from both nodes. Therefore, you should repair the failed node as quickly as possible and switch the servers back to their dual services state.

This model also provides a backup storage array. In the event of a disk array failure, its redundant image can be picked up by the service on its backup node.

To configure a symmetric model, you need to install shared binaries on your shared disk. Note that doing so might prevent you from performing rolling upgrades, a feature that enables you to update your system during Messaging Server patch releases. (This feature is planned for future releases.)

3.2.3 N+1 (N Over 1)

The N + 1 or "N over 1" model operates in a multi-node asymmetrical configuration. N logical host names and N shared disk arrays are required. A single backup node is reserved as a hot standby for all the other nodes. The backup node is capable of concurrently running Messaging Server from the N nodes.

The figure below illustrates the basic N + 1 high availability model.

Figure 3–3 N + 1 High Availability Mode

Upon failover of one or more active nodes, the backup node picks up the failing node's responsibilities.

The advantages of the N + 1 model are that the server load can be distributed to multiple nodes and that only one backup node is necessary to sustain all the possible node failures. Thus, the machine idle ratio is 1/N as opposed to 1/1, as is the case in a single asymmetric model.

To configure an N+1 model, you need to install binaries only on the local disks (that is, not shared disks as with the symmetric model). The current Messaging Server installation and setup process forces you to put the binaries on the shared disk for any symmetric, 1+1, or N+1 asymmetrical or symmetrical HA solution.

3.2.4 Choosing a High Availability Model

The following table summarizes the advantages and disadvantages of each high availability model. Use this information to help you determine which model is right for your deployment.

Table 3–1 HA Model Comparison


Model	Advantages	Disadvantages	Recommended Users
Asymmetric	Simple Configuration Backup node is 100 percent reserved	Machine resources are not fully utilized.	A small service provider with plans to expand in the future
Symmetric	Better use of system resources Higher availability	Resource contention on the backup node. HA requires fully redundant disks.	A small corporate deployment that can accept performance penalties in the event of a single server failure
N + 1	Load distribution Easy expansion	Management and configuration complexity.	A large service provider who requires distribution with no resource constraints

3.2.5 System Down Time Calculations

The following table illustrates the probability that on any given day the messaging service will be unavailable due to system failure. These calculations assume that on average, each server goes down for one day every three months due to either a system crash or server hang, and that each storage device goes down one day every 12 months. These calculations also ignore the small probability of both nodes being down simultaneously.

Table 3–2 HA Down Probabilities


Model	Server Down Time Probability
Single server (no high availability)	Pr(down) = (4 days of system down + 1 day of storage down)/365 = 1.37%
Asymmetric	Pr(down) = (0 days of system down + 1 day of storage down)/365 = 0.27%
Symmetric	Pr(down) = (0 days of system down + 0 days of storage down)/365 = (near 0)
N + 1 Asymmetric	Pr(down) = (5 hours of system down + 1 day of storage down)/(365xN) = 0.27%/N