Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Messaging Server 6 2004Q2 Administration Guide 

Chapter 22
Monitoring the Messaging Server

In most cases, a well-planned, well-configured server will perform without extensive intervention from an administrator. As an administrator, however, it is your job to monitor the server for signs of problems. This chapter describes the monitoring of the Messaging Server. It consists of the following sections:

Troubleshooting procedures can be found in Chapter 21, "Troubleshooting the MTA".

Automatic Monitoring and Restart

Messaging Server provides a way to transparently monitor services and automatically restart them if they fail or become unresponsive (the services hangs or freeze up). It can monitor all message store, MTA, and MMP services including the IMAP, POP, HTTP, job controller, dispatcher, and MMP servers. It does not monitor other services such as ENS, SMS, LMTP or TCP/SNMP servers. (LMTP and TCP/SNMP are monitored by the job controller.) Refer to Automatic Restart of Failed or Unresponsive Services for details.

In addition, this feature generates the log file msg_svr_base/data/log/watcher, shown below, which records all server starts and stops. This is a very important file for monitoring system health.

watcher process 13425 started at Tue Oct 21 15:29:44 2003

Watched ’imapd’ process 13428 exited abnormally
Received request to restart: store imap pop http
Connecting to watcher ...
Stopping http server 13440 .... done
Stopping pop server 13431 ... done
Stopping pop server 13434 ... done
Stopping pop server 13435 ... done
Stopping pop server 13433 ... done
imap server is not running
Stopping store server 13426 .... done
Starting store server .... 13457
checking store server status ...... ready
Starting imap server ..... 13459
Starting pop server ....... 13462
Starting http server ...... 13471

Daily Monitoring Tasks

The most important tasks you should perform on a daily basis are checking postmaster mail, monitoring the log files, and setting up the stored utility. These tasks are described below.

Checking postmaster Mail

Messaging Server has a predefined administrative mailing list set up for postmaster email. Any users who are part of this mailing list will automatically receive mail addressed to postmaster.

The rules for postmaster mail are defined in RFC822, which requires every email site to accept mail addressed to a user or mailing list named postmaster and that mail sent to this address be delivered to a real person. All messages sent to postmaster@host.domain are sent to a postmaster account or mailing list.

Typically, the postmaster address is where users should send email about their mail service. As postmaster, you might receive mail from local users about server response time, from other server administrators who are encountering problems sending mail to your server, and so on. You should check postmaster mail daily.

You can also configure the server to send certain error messages to the postmaster address. For example, when the MTA cannot route or deliver a message, you can be notified via email sent to the postmaster address. You can also send exception condition warnings (low disk space, poor server response) to postmaster.

Monitoring and Maintaining the Log Files

Messaging Server creates a separate set of log files for each of the major protocols, or services, it supports: SMTP, IMAP, POP, and HTTP. These are located in msg_svr_base/data/log. You should monitor the log files on a routine basis--especially if you are having problems with the server.

Be aware that logging can impact server performance. The more verbose the logging you specify, the more disk space your log files will occupy for a given amount of time. You should define effective but realistic log rotation, expiration, and backup policies for your server. For information about defining logging policies for your server, see Chapter 20, "Logging and Log Analysis".”

Setting Up the stored Utility

The stored utility performs automatic monitoring and maintenance tasks for the server, such as:

The stored utility automatically performs cleanup and expiration operations once a day at midnight. For further information see stored.

Monitoring System Performance

This chapter focuses on Messaging Server monitoring, however, you will also need to monitor the system on which the server resides. A well-configured server cannot perform well on a poorly-tuned system, and symptoms of server failure may be an indication that the hardware is not powerful enough to serve the email load. This chapter does not provide all the details for monitoring system performance as many of these procedures are platform specific and may require that you refer to the platform specific system documentation. The following procedures are described here for performance monitoring:

Monitoring End-to-end Message Delivery Times

Email needs to be delivered on time. This may be a service agreement requirement, but also it is good policy to have mail delivered as quickly as possible. Slow end-to-end times could indicate a number of things. It may be that the server is not working properly, or that certain times of the day experience overwhelming message loads, or that the existing hardware resources are being pushed beyond their capacity.

Symptoms of Poor End-to-end Message Delivery Times

Mail takes a longer period of time to be delivered than normal.

To Monitor End-to-end Message Delivery Times

Monitoring Disk Space

Inadequate disk space is one of the most common causes of the mail server problems and failure. Without space to write to the MTA queues or to the message store, the mail server will fail. In addition, unless log files are monitored and cleaned up, they can grow uncontrollably filling up all disk space.

Disk space can be rapidly depleted when the clean up function of stored fails and deleted messages are not expunged from the message store. Other causes of running out of disk space are the MTA message queues growing too large, the message store outgrowing the available disk space, and unmonitored log files growing uncontrollably. (Note that there are a number of log files such as LDAP, MTA, and Message Access, and that each of these log files can be stored on different disks.)

Symptoms of Disk Space Problems

Different symptoms can occur depending on which disk or partition is running out of space. MTA queues can overflow and reject SMTP connections, messages might remain in the ims_master queue and not be not delivered to the message store, and log files can overflow.

To Monitor Disk Space

Depending upon the system configuration you may need to monitor various disks and partitions. For example, MTA queues may reside on one disk/partition, message stores may reside on another, and log files may reside on yet another. Each of these spaces will require monitoring and the methods to monitor these spaces may differ.

Monitoring the Message Store

It is recommended that message store disk usage not exceed 75% capacity. You can monitor message store disk usage by configuring the following alarm attributes using the configutil utility:

By setting these parameters, you can specify how often the system should monitor disk space and under what circumstances the system should send a warning. For example, if you want the system to monitor disk space every 600 seconds, specify the following command:

configutil -o alarm.diskavail.msgalarmstatinterval -v 600

If you want to receive a warning whenever available disk space falls below 20%, specify the following command:

configutil -o alarm.diskavail.msgalarmthreshold -v 20

Refer to Table 22-1 for more information on these parameters.

Monitoring the MTA Queues and Logging Space

You will need to monitor MTA queue disk and logging space disk usage.

Monitoring CPU Usage

High CPU usage is either a sign that there is not enough CPU capacity for the level of usage or some process is using up more CPU cycles than is appropriate.

Symptoms of CPU Usage Problems

Poor system response time. Slow logging in of users. Slow rate of delivery.

To Monitor CPU Usage

Monitoring CPU usage is a platform specific task. Refer to the relevant platform documentation.

Monitoring the MTA

This section consists of the following subsections:

Monitoring the Size of the Message Queues

Excessive message queue growth may indicate that messages are not being delivered, are being delayed in their delivery, or are coming in faster than the system can deliver them. This may be caused by a number of reasons such as a denial of service attack caused by huge numbers of messages flooding your system, or the Job Controller not running.

See Channel Message Queues, Messages are Not Dequeued, and MTA Messages are Not Delivered for more information on message queues.

Symptoms of Message Queue Problems

To Monitor the Size of the Message Queues

Probably the best way to monitor the message queues is to use imsimta qm. Refer to imsimta qm counters.

You can also monitor the number of files in the queue directories (msg_svr_base/data/queue/). The number of files will be site-specific, and you’ll need to build a baseline history to find out what is “too many.” This can be done by recording the size of the queue files over a two week period to get an approximate average.

Monitoring Rate of Delivery Failure

A delivery failure is a failed attempt to deliver a message to an external site. A large increase in rate of delivery failure can be a sign of a network problem such as a dead DNS server or a remote server timing out on responding to connections.

Symptoms of Rate of Delivery Failure

There are no outward symptoms. Lots of Q records will appear in to mail.log_current.

To Monitor the Rate of Delivery Failure

Delivery failures are recorded in the MTA logs with the logging entry code Q. Look at the record in the file msg_svr_base/data/log/mail.log_current. Example:

mail.log:06-Oct-2003 00:24:03.66 501d.0b.9 ims-ms    Q  5 durai.balusamy@Sun.COM rfc822;durai.balusamy@Sun.COM durai@ims-ms-daemon <00ce01c38bda$c7e2b240$6501a8c0@guindy> Mailbox is busy

Monitoring Inbound SMTP Connections

An unusual increase in the number of inbound SMTP connections from a given IP address may indicate:

Symptoms of Unauthorized SMTP Connections

To Monitor Inbound SMTP Connections

Monitoring the Dispatcher and Job Controller Processes

The Dispatcher and Job Controller Processes must be operating for MTA to work. You should have one process of each kind.

Symptoms of Dispatcher and Job Controller Processes Down

If the Dispatcher is down or does not have enough resources, SMTP connections are refused.

If the Job Controller is down, queue size will grow.

To Monitor Dispatcher and Job Controller Processes

Check to see that the processes called dispatcher and job_controller exist. See Check that the Job Controller and Dispatcher are Running.

Monitoring Message Access

This section consists of the following subsections:

Monitoring imapd, popd and httpd

These processes provide access to IMAP, POP and Webmail services. If any of these is not running or not responding, the service will not function appropriately. If the service is running, but is over loaded, monitoring will allow you to detect this and configure it more appropriately.

Symptoms of imapd, popd and httpd Problems

Connections are refused or system is too slow to connect. For example, if IMAP is not running and you try to connect to IMAP directly you will see something like this:

telnet 0 143
telnet: Unable to connect to remote host: Connection refused

If you try to connect with a client, you will get a message such as:

Client is unable to connect to the server at the location you have specified. The server may be down or busy.

To Monitor imapd, popd and httpd

Monitoring stored

stored performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk. If stored stops running, the messaging server will eventually run into problems. If stored doesn’t start when start-msg is run, no other processes will start. For more information about stored see the Sun Java System Messaging Server Administration Reference.

Symptoms of stored Problems

There are no outward symptoms.

To Monitor stored

Monitoring LDAP Directory Server

This section consists of the following subsection:

Monitoring slapd

The LDAP directory server (slapd) provides directory information for the messaging system. If slapd is down, the system will not work properly. If slapd response time is too slow, this will affect login speed and any other transaction that requires LDAP lookups.

Symptoms of slapd Problems

To Monitor slapd

Monitoring the Message Store

Messages are stored in a database. The distribution of users on disks, the size of their mailbox, and disk requirements affect the store performance. This section consists of the following subsections:

Monitoring the State of Message Store Database Locks

The state of DB-locks is held by different server processes. These database locks can affect the performance of the message store. In case of deadlocks, messages will not be getting inserted into the store at reasonable speeds and the ims-ms channel queue will grow larger as a result. There are legitimate reasons for a queue to back up, so it is useful to have a history of the queue length in order to diagnose problems.

Symptoms of Message Store Database Lock Problems

Number of transactions are accumulating and not resolving.

To Monitor Message Store Database Locks

Use the command counterutil -o db_lock

Monitoring the Number of Database Log Files in the mboxlist Directory

Database log files refer to sleepycat transaction checkpointing log files (msg_svr_base/store/mboxlist). Log file build up is a symptom of database checkpointing not happening. Log file build up can also be due to stored problems.

Symptoms of Database Log File Problems

There should be 2 or 3 log files. If there are more, it is a sign of a potentially serious problem. The message store uses a few databases for messages and quotas, and a problem with those can lead to problems for all of the mail server.

To Monitor Database Log Files

Look in the msg_svr_base/store/mboxlist directory and make sure there are only 2 or 3 files.

Utilities and Tools for Monitoring

The following tools are available in for monitoring:


immonitor-access monitors the status of the following Messaging Server components/processes: Mail Delivery (SMTP server), Message Access and Store (POP and IMAP servers), Directory Service (LDAP server) and HTTP server. This utility measures the response times of the various services and the total round trip time taken to send and retrieve a message. The Directory Service is monitored by looking up a specified user in the directory and measuring the response time. Mail Delivery is monitored by sending a message (SMTP) and the Message Access and Store is monitored by retrieving it. Monitoring the HTTP server is limited to finding out weather or not it is up and running.

For complete instructions, refer to Sun Java System Messaging Server Administration Reference.


The stored utility performs maintenance tasks on the server, but it also can do monitoring. It can periodically check the server state, disk space, service response times and, if specified, it can issue alarms in the form of email messages to the postmaster (see (more...) ).

An alarm comes in the form of an email message from stored to the postmaster warning of a specified condition. A sample email alarm sent by stored when a certain threshold is exceeded is shown below:

Subject:  ALARM: server response time in seconds of “ldap_siroe.com_389” is 10
Date:   Tue, 17 Jul 2001 16:37:08 -0700 (PDT)

Server instance: /opt/SUNWmsgsr
Alarmid: serverresponse
Instance: ldap_siroe_europa.com_389
Description: server response time in seconds
Current measured value (17/Jul/2001:16:37:08 -0700): 10
Lowest recorded value: 0
Highest recorded value: 10
Monitoring interval: 600 seconds
Alarm condition is when over threshold of 10
Number of times over threshold: 1

You can specify how often stored monitors disk and server performance, and under what circumstances it sends alarms. This is done by using the configutil command to set the alarm parameters. Table 22-1 shows useful stored parameters along with their default setting.

Table 22-1  Recommended stored Parameters 


Description (Default in parenthesis)


(localhost) Machine to which you send warning messages.


(25) The SMTP port to which to connect when sending alarm message.


(Postmaster@localhost) Whom to send alarm notice.


(Postmaster@localhost) Address of sender the alarm.


Description for disk availability alarm.


(3600) Interval in seconds between disk availability checks. Set to 0 to disable checking of disk usage.


(10) Percentage of disk space availability below which an alarm is sent.


(-1) Specifies whether the alarm is issued when disk space availability goes below threshold (-1) or above it (1).


(24). Interval in hours between subsequent repetition of disk availability alarms.


Description for servers response alarm.


(600) Interval in seconds between server response checks. Set to 0 to disable checking of server response.


(10) If server response time in seconds exceeds this value, alarm issued.


(1) Specifies whether alarm is issued when server response time is greater that (1) or less than (-1) the threshold.


(24) Interval in hours between subsequent repetition of server response alarm.


This utility provides statistics acquired from different system counters. Here is a current list of available counter objects:

# /opt/SUNWmsgsr/sbin/counterutil -l
Listing registry (/opt/SUNWmsgsr/data/counter/counter)
numobjects = 11
refcount = 1
created = 25/Sep/2003:02:04:55 -0700
modified = 02/Oct/2003:22:48:55 -0700
       entry = alarm
       entry = diskusage
       entry = serverresponse
       entry = db_lock
       entry = db_log
       entry = db_mpool
       entry = db_txn
       entry = imapstat
       entry = httpstat
       entry = popstat
       entry = cgimsg

Each entry represents a counter object and supplies a variety of useful counts for this object. In this section we will only be discussing the alarm, diskusage, serverresponse, db_lock, popstat, imapstat, and httpstat counter objects. For details on counterutil command usage, refer to the Sun Java System Messaging Server Administration Reference.

counterutil Output

counterutil has a variety of flags. A command format for this utility may be as follows:

An example of counterutil usage is as follows:

# counterutil -o imapstat -i 5 -n 10
Monitor counteroobject (imapstat)
registry /gotmail/iplanet/server5/msg-gotmail/counter/counter opened
counterobject imapstat opened

count = 1 at 972082466 rh = 0xc0990 oh = 0xc0968

global.currentStartTime [4 bytes]: 17/Oct/2000:12:44:23 -0700
global.lastConnectionTime [4 bytes]: 20/Oct/2000:15:53:37 -0700
global.maxConnections [4 bytes]: 69
global.numConnections [4 bytes]: 12480
global.numCurrentConnections [4 bytes]: 48
global.numFailedConnections [4 bytes]: 0
global.numFailedLogins [4 bytes]: 15
global.numGoodLogins [4 bytes]: 10446

Alarm Statistics Using counterutil

These alarm statistics refer to the alarms sent by stored.The alarm counter provides the following statistics:

Table 22-2  counterutil alarm Statistics




Number of times crossing threshold.


Number of warnings sent.


Current monitored valued.


Highest ever recorded value.


Lowest ever recorded value.


The last time current value was set.


The last time warning was sent.


The last time reset was performed.


The last time alarm state changed.


Warning state (yes(1) or no(0)).

IMAP, POP, and HTTP Connection Statistics Using counterutil

To get information on the number of current IMAP, POP, and HTTP connections, number of failed logins, total connections from the start time, and so forth, you can use the command counterutil -o CounterObject -i 5 -n 10.where CounterObject represents the counter object popstat, imapstat, or httpstat. The meaning of the imapstat suffixes is shown in Table 22-3. The popstat and httpstat objects provide the same information in the same format and structure.

Table 22-3  counterutil imapstat Statistics




Start time of the current IMAP server process.


The last time a new client was accepted.


Maximum number of concurrent connections handled by IMAP server.


Total number of connections served by the current IMAP server.


Current number of active connections.


Number of failed connections served by the current IMAP server.


Number of failed logins served by the current IMAP server.


Number of successful logins served by the current IMAP server.

Disk Usage Statistics Using counterutil

The command: counterutil -o diskusage generates following information:

Table 22-4  counterutil diskstat Statistics




Total space available in the disk partition.


The last time statistic was taken.


Mail partition path.


Disk partition space available percentage.


Total space in the disk partition.

Server Response Statistics

The command: counterutil -o serverresponse generates following information. This information is useful for checking if the servers are running, and how quickly they’re responding.

Table 22-5  counterutil serverresponse Statistics




Last time http server response was checked.


Response time for the http.


Last time imap server response was checked.


Response time for the imap.


Last time pop server response was checked.


Response time for the pop.


Last time ldap_host1_389 server response was checked.


Response time for the ldap_host1_389.


Last time ugldap_host2_389 server response was checked.


Response time for the ugldap_host2_389.

Log Files

Messaging server logs event records for SMTP, IMAP, POP, and HTTP. The policies for creating and managing the Messaging Server log files are customizable.

Since logging can affect the server performance, logging should be considered very carefully before the burden is put on the server. Refer to Chapter 20, "Logging and Log Analysis"” for more information.

imsimta counters

The MTA accumulates message traffic counters based upon the Mail Monitoring MIB, RFC 1566 for each of its active channels. The channel counters are intended to help indicate the trend and health of your e-mail system. Channel counters are not designed to provide an accurate accounting of message traffic. For precise accounting, instead see MTA logging as discussed in Chapter 20, "Logging and Log Analysis".

The MTA channel counters are implemented using the lightest weight mechanisms available so that they cause as little impact as possible on actual operation. Channel counters do not try harder: if an attempt to map the section fails, no information is recorded; if one of the locks in the section cannot be obtained almost immediately, no information is recorded; when a system is shut down, the information contained in the in-memory section is lost forever.

The imsimta counters -show command provides MTA channel message statistics (see below). These counters need to be examined over time noting the minimum values seen. The minimums may actually be negative for some channels. A negative value means that there were messages queued for a channel at the time that its counters were zeroed (for example, the cluster-wide database of counters created). When those messages were dequeued, the associated counters for the channel were decremented and therefore leading to a negative minimum. For such a counter, the correct “absolute” value is the current value less the minimum value that counter has ever held since being initialized.

Channel          Messages    Recipients    Blocks
-------          --------    ----------    -------
   Received       29379       79714      982252                      (1)
   Stored            61         113       -2004                      (2)
   Delivered      29369       79723      983903 (29369 first time)    (3)
   Submitted      13698       13699       18261                      (4)
   Attempted          0           0           0                      (5)
   Rejected           1          10           0                      (6)
   Failed           104         104        4681                      (7)

   Queue time/count        16425/29440 = 0.56                        (8)
   Queue first time/count  16425/29440 = 0.56                        (9)

   Total In Assocs           297637
   Total Out Assocs           28306

1) Received is the number of messages enqueued to the channel named tcp_local. That is, the messages enqueued (E records in the mail.log* file) to the tcp_local channel by any other channel.

2) Stored is the number of messages stored in the channel queue to be delivered.

3) Delivered is the number of messages which have been processed (dequeued) by the channel tcp_local. (That is, D records in the mail.log* file.) A dequeue operation may either correspond to a successful delivery (that is, an enqueue to another channel), or to a dequeue due to the message being returned to the sender. This will generally correspond to the number Received minus the number Stored.

The MTA also keeps track of how many of the messages were dequeued upon first attempt; this number is shown in parentheses.

4) Submitted is the number of messages enqueued (E records in the mail.log file) by the channel tcp_local to any other channel.

5) Attempted is the number of messages which have experienced temporary problems in dequeuing, that is, Q or Z records in the mail.log* file.

6) Rejected is the number of attempted enqueues which have been rejected, that is, J records in the mail.log* file.

7) Failed is the number of attempted dequeues which have failed, that is, R records in the mail.log* file.

8) Queue time/count is the average time-spent-in-queue for the delivered messages. This includes both the messages delivered upon the first attempt, see (9), and the messages that required additional delivery attempts (hence typically spent noticeable time waiting fallow in the queue).

9) Queue first time/count is the average time-spent-in-queue for the messages delivered upon the first attempt.

Note that the number of messages submitted can be greater than the number delivered. This is often the case, since each message the channel dequeues (delivers) will result in at least one new message enqueued (submitted) but possibly more than one. For example, if a message has two recipients reached via different channels, then two enqueues will be required. Or if a message bounces, a copy will go back to the sender and another copy may be sent to the postmaster. Usually that will be two submissions (unless both are reached through the same channel).

More generally, the connection between Submitted and Delivered varies according to type of channel. For example, in the conversion channel, a message would be enqueued by some other arbitrary channel, and then the conversion channel would process that message and enqueue it to a third channel and mark the message as dequeued from its own queue. Each individual message takes a path:

elsewhere -> conversion   E record   Received
conversion -> elsewhere   E record   Submitted
conversion                 D record   Delivered

However, for a channel such as tcp_local which is not a “pass through,” but rather has two separate pieces (slave and master), there is no connection between Submitted and Delivered. The Submitted counter has to do with the SMTP server portion of the tcp_local channel, whereas the Delivered counter has to do with the SMTP client portion of the tcp_local channel. Those are two completely separate programs, and the messages travelling through them may be completely separate.

Messages submitted to the SMTP server:

tcp_local -> elsewhere  E record    Submitted

Messages sent out to other SMTP hosts via the SMTP client:

elsewhere -> tcp_local  E record    Received
tcp_local               D record    Delivered

Channel dequeues (delivers) will result in at least one new message enqueued (submitted) but possibly more than one. For example, if a message has two recipients reached via different channels, then two enqueues will be required. Or if a message bounces, a copy will go back to the sender and another copy may be sent to the postmaster. Usually that will be reached through the same channel.

Implementation on UNIX and NT

For performance reasons, a node running the MTA keeps a cache of channel counters in memory using a shared memory section (UNIX) or shared file-mapping object (NT). As processes on the node enqueue and dequeue messages, they update the counters in this in-memory cache. If the in-memory section does not exist when a channel runs, the section will be created automatically. (The imta start command also creates the in-memory section, if it does not exist.)

The command imta counters -clear or the imta qm command counters clear may be used to reset the counters to zero.

imsimta qm counters

The imsimta qm counters utility displays MTA channel queue message counters. You must be root or inetuser to run this utility. The output fields are the same as those described in imsimta counters. See also Sun Java System Messaging Server Administration Reference for usage details.


# imsimta counters -create
# imsimta qm counters show

Channel                 Messages    Recipients     Blocks
----------------------  ----------  ----------    ----------
   Received              13077        13859         264616
   Stored                   92           91           -362
   Delivered             12985        13768         264978
   Submitted              2594         2594           3641

Every time you restart the MTA, you must run: # imsimta counters -create

MTA Monitoring Using SNMP

Messaging Server supports system monitoring through the Simple Network Management Protocol (SNMP). Using an SNMP client (sometimes called a network manager) such as Sun Net Manager or HP OpenView (not provided with this product), you can monitor certain parts of the Messaging Server. Refer to Appendix A, "SNMP Support"” for details.

imquotacheck for Mailbox Quota Checking

You can monitor mailbox quota usage and limits by using the imquotacheck utility. The imquotacheck utility generates a report that lists defined quotas and limits, and provides information on quota usage.

For example, the following command lists all user quota information:

% imquotacheck
Domain (diskquota = not set msgquota = not set) quota usage
diskquota        size(K)    %use    msgquota      msgs    %use    user
# of domains = 1
# of users = 705

no quota         50418              no quota      4392            ajonkish
no quota         5                  no quota      2               andrewt
no quota         355518             no quota      2500            aniksri

The following example shows the quota usage for user sorook:

% imquotacheck -u sorook
quota usage for user sorook
diskquota      size(K)    %use    msgquota      msgs     %use    user

no quota       1487              no quota      305              sorook

Previous      Contents      Index      Next     

Copyright 2004 Sun Microsystems, Inc. All rights reserved.