CHAPTER 12

SIMS Asymmetric High Availability System




A major advantage of SunTM Internet Mail ServerTM 4.0 over competitors is its superior scalability, which enables populating large number of users on just one server. Although this provides excellent price versus performance advantages, it results in a single point of failure, where one failing machine could interrupt email access for an entire user community.

To ensure reliability, SIMS 4.0 enables you to install on a Sun Enterprise Cluster by using the Sun Cluster 2.2 software. This capability provides automatic fail over when a failure occurs.

See http://www.sun.com/clusters for information on Sun Cluster.

Topics in this chapter include:

Other High Availability references in this guide
SIMS asymmetric High Availability configuration
Fail over procedure


Other HA References in this Guide

TABLE 12-1 lists other chapters in this guide that contain information on SIMS/HA. After you read this chapter and are familiar with the design of SIMS/HA, you may proceed with Chapter 7, "Installing SIMS 4.0," for instructions on installing the SIMS High Availability system.

TABLE  12-1   Chapters Containing High Availability Information
Chapter Number, Name
To Learn About...

4, Preparing to Install SIMS  

Prepare a clean system, and lists of required and recommended patches.  

5, Preparing to Configure  

Software requirements for the SIMS High Availability system.  

6, What SIMS 4.0 Installs  

Lists of SIMS 4.0 packages.  

7, Installing SIMS 4.0  

Instructions for installing the SIMS High Availability system.  

9, Post Installation Tasks  

Instructions to start Administration Console.  


SIMS Asymmetric HA Configuration

FIGURE 12-1 illustrates SIMS running on a cluster. Each node in the cluster is a complete Solaris system with its own private disk that contains the operating system and the Sun Cluster software.

FIGURE  12-1 SIMS Asymmetric HA Configuration before Fail Over

Since each node has at least one network interface connected to the public network, users can connect through this interface to read their mail messages. Each node has at least two additional private network interfaces, which connect to corresponding private network interfaces on the other members of the cluster. These are used by the Sun Cluster software on each node for system status monitoring and cluster configuration data sharing. Only one pair of private network interfaces is in use at any given time; the other is a redundant interface so that there is not a single point of failure.

Each node has a connection to the disk cluster that contains the message store, message queues, directory contents, configuration files, and SIMS binaries. While both nodes are connected to the disk cluster continuously, the volumes in the disk cluster are mounted on only one of the nodes at any given time. FIGURE 12-1 and subsequent figures in this chapter show this disk cluster as a single logical volume.

In SIMS 4.0, both the Veritas and the DiskSuite volume managers are supported for Sun Cluster. The volume manager allows a logical volume to be mirrored across multiple physical volumes, providing uninterrupted service even if a physical disk fails.

As shown in FIGURE 12-1, only one of the nodes in the cluster runs SIMS at any time. This represents an Asymmetric High Availability (HA) configuration. In this configuration, all the SIMS binaries, configuration files, message queues, and message store reside on a shared disk. As a result, when a fail-over occurs the disk is unmounted from the failing system and mounted on the surviving system. FIGURE 12-2 shows the cluster after a fail-over.

The logical IP address is now configured on the public network interface of the other system. Users and mail agents on the public network always connect using the logical IP address. Thus, reconnecting after a failure automatically connects to the other system. A fail-over, will then, appear to be a very quick crash and reboot of a single system!

FIGURE  12-2 SIMS Asymmetric HA Configuration after Fail Over

In the SIMS Asymmetric HA configuration, the other node in the cluster is idle as far as SIMS usage is concerned. This node, however, remains a fully-functional Solaris system and is available for other work, as long as procedures to terminate or limit the other work after a fail-over are in place.

As long as the CPU speed and memory size of the two nodes in the cluster are alike (recommended) in this Asymmetric HA configuration, the performance does not suffer during a fail-over. The stand-by node, however, is unused much of the time.


Fail Over Procedure

The fail over procedure in the SIMS High Availability environment involves the following:

Linking SIMS directories to shared disks between cluster nodes
Linking Sun DS directories to shared disks between cluster nodes
Replacing stop and start scripts with HA scripts

Linking SIMS Directories to Shared Disks

Although the basic SIMS architecture remains unchanged by the SIMS/HA configuration during normal execution, SIMS/HA functions and performs similar to SIMS. The SIMS binaries, configuration files, IMTA spools, message store, and logs are all placed on disks shared between the cluster nodes. The installation process plants symbolic links from the following locations to the corresponding places on the shared disk, as shown in TABLE 12-2.

TABLE  12-2   SIMS Directories on Shared Disk between Cluster Nodes
Install Creates Links from...
To place on the shared disk

/opt/SUNWmail  

/shared-disk-BASEDIR/opt/SUNWmail  

/etc/opt/SUNWmail  

/shared-disk-BASEDIR/etc/opt/SUNWmail  

/var/opt/SUNWmail  

/shared-disk-BASEDIR/var/opt/SUNWmail  


Linking Sun Directory Services to Shared Disks

The Sun Directory Service 3.1 (SunDS) binaries and data are also normally installed on the shared disk. Since the SunDS packages are not completely re locatable, this is accomplished in the SIMS install script by planting symbolic links from the following locations to the shared disk, as shown in TABLE 12-3.

TABLE  12-3   Sun DS Directories Mapped to Shared Disk between Cluster Nodes
Install Creates Links from...
To place on the shared disk

/opt/SUNWconn  

/shared-disk-BASEDIR/opt/SUNWconn  

/etc/opt/SUNWconn  

/shared-disk-BASEDIR/etc/opt/SUNWconn  

/usr/opt/SUNWconn  

/shared-disk-BASEDIR/usr/opt/SUNWconn  


Replacing HA stop and start Scripts

To ensure that SIMS and directory start and stop can be controlled through the HA framework, the normal /etc/rc?.d hard links to the start and stop scripts in /etc/init.d are removed. In return, these scripts are replaced with specialized HA start and stop scripts.

These HA scripts are packaged in SUNWimha and located by default in
/opt/SUNWimha/clust_progs (This path is similar to those used by other HA services.)

SIMS installation uses the hareg command to register these scripts with the HA framework and associates them with the HA service name Sun_Internet_Mail.

While a fail over can occur either at a system administrator's request or after a failure is detected, it proceeds by calling these HA scripts to shut down SIMS and the directory server on one machine and start up on the other machine.


Operating System and Platform Support

TABLE 12-4 shows the configurations that are supported in SIMS 4.0.

TABLE  12-4   Tested Platform Configurations
Operating System Release
Cluster Software
Hardware Architecture

Solaris 2.6  

Sun Cluster 2.2  

SPARC  

Solaris 2.7  

Sun Cluster 2.2  

SPARC  


Storage and File System Requirements

To ensure system availability by eliminating single points of failure, SIMS, SunDS, or Netscape Directory Services (NSDS) binaries, configuration files, and data must be mirrored on replicated disks. Sun Cluster 2.2 includes the Sun Enterprise Volume Manager, on which SIMS/HA depends to provide redundant storage.

To enable SIMS to back up quickly after a fail-over, it's important that the file systems holding the SIMS writable data recover quickly and be restored to a consistent state.

Because the base Solaris UNIX File System (UFS) does not synchronously write file meta-data at fsync() time, fsck times can be large and the file system state may not be completely recovered. For example, small files that were created right before a crash might not exist after fsck and remount. SIMS/HA depends on the Veritas File System (VxFS) 3.3.2 to provide a file system with these characteristics.


Note - Since Sun Cluster 2.2 does not include the Veritas File System (VxFS) 3.3.2 or the Solaris UFS, either file system of your choice must be purchased separately.


Directory Services Considerations

SIMS 4.0 requires that a directory service server with the proper schema run on the same host. This means that the SunDS server must also have the High Availability capability. Since SunDS 1.0 does not support the High Availability capability, SIMS/HA has incorporated a mechanism to have SunDS highly available. To accomplish this by default, SIMS installs SunDS on a shared disk. It then changes the /etc/rc?.d links so that start and stop operations are under the control of the SIMS/HA scripts.

As an alternative, you may consider another method for installing SIMS/HA. That is, rather than considering SunDS as part of the Sun_Internet_Mail service and switching it back and forth between the two hosts in the cluster, you could install SunDS on each host, and set up SunDS replication to keep the two hosts in synchronization. To implement this configuration, you could make the SunDS on one of the hosts as the primary and the other host as the secondary.

A more symmetrical approach, however, would be to make both hosts secondaries of the primary that is not part of the cluster. In this configuration, however, a SIMS fail-over may cause the cached copy of the directory that SIMS not to match the directory on the new node (since there is no synchronization between SunDS and its replications).

To avoid this, installations should see that a full dirsync be run soon after a fail-over occurs. To restore the services as quickly as possible, however, the SIMS/HA interface scripts will not run a full dirsync automatically.

To allow installations to create directory configurations like this, SIMS/HA follows the same practice established in earlier releases of SIMS. That is, if SunDS is already installed when SIMS/HA installation is started, SIMS installation leaves the SunDS installation alone.

You have learned about the SIMS High Availability configurations in this chapter. You can now follow the instructions covered in Chapter 7, "Installing SIMS 4.0," to install your SIMS High Availability system.

See Appendix B, "Installing Netscape Directory Services for SIMS High Availability," for instructions to install NSDS 4.1 and configure it for the SIMS High Availability system.




Copyright © 1999 Sun Microsystems, Inc. All Rights Reserved.