19 Monitoring the MTA

This chapter describes how to monitor the Oracle Communications Messaging Server Message Transfer Agent (MTA).

Monitoring the Size of the Message Queues

Excessive message queue growth may indicate that messages are not being delivered, are being delayed in their delivery, or are coming in faster than the system can deliver them. Reasons for this situation include a denial of service attack caused by huge numbers of messages flooding your system, or the Job Controller not running.

See "Channel Message Queues", "Messages Are Not Dequeued", and "MTA Messages Are Not Delivered" for more information on message queues.

Symptoms of Message Queue Problems

Disk space usage grows.
User not receiving messages in a reasonable time.
Message queue sizes are abnormally high.

To Monitor the Size of the Message Queues

Probably the best way to monitor the message queues is to use imsmita counters and imsimta qm summarize.

You can also monitor the number of files in the queue directories (DataRoot/queue/). The number of files will be site-specific, and you'll need to build a baseline history to find out what is "too many." This can be done by recording the size of the queue files over a two week period to get an approximate average.

Checking for Held messages

If the MTA detects a message is looping, it will be sidelined by renaming the queue message file to .HELD. For more discussion of how messages can become .HELD and what to do about them, refer to "Diagnosing and Cleaning up .HELD Messages". To see whether there are any held messages, use the imsimta qm summarize -held command described in Messaging Server Reference.

Monitoring Rate of Delivery Failure

A delivery failure is a failed attempt to deliver a message to an external site. A large increase in rate of delivery failure can be a sign of a network problem such as a dead DNS server or a remote server timing out on responding to connections.

Symptoms of Rate of Delivery

There are no outward symptoms. Lots of Q records will appear in to mail.log_current.

To Monitor the Rate of Delivery Failure

Delivery failures are recorded in the MTA logs with the logging entry code Q. Look at the record in the file DataRoot/log/mail.log_current. Example:

mail.log:06-Oct-2003 00:24:03.66 501d.0b.9 ims-ms Q 5 durai.balusamy@Sun.COM rfc822;durai.balusamy@Sun.COM durai@ims-ms-daemon <00ce01c38bda$c7e2b240$6501a8c0@guindy>Mailbox is busy

Monitoring Inbound SMTP Connections

An unusual increase in the number of inbound SMTP connections from a given IP address may indicate:

An external user is trying to relay mail.
An external user is trying to do a service denial attack.

Symptoms of Unauthorized SMTP Connections

External user relaying mail : No outward symptoms.
Service denial attack: External attempt to overload the SMTP servers with message requests.

To Monitor Inbound SMTP Connections

External user relaying mail: Look in DataRoot/log/mail.log_current for records with the logging entry code J (rejected relays). To turn on logging of remote IP addresses run the following command: msconfig set log_connection 1

Note that there is a slight performance trade-off when this feature is enabled.
Service denial attack: To find out who and how many users are connecting to the SMTP servers, you can run the command netstat and check for connections at the SMTP port (default: 25). Example:

Local address Remote address State 
192.18.79.44.25 192.18.78.44.56035 32768 0 32768 0 CLOSE_WAIT 
192.18.79.44.25 192.18.136.54.57390 8760 0 24820 0 ESTABLISHED 
192.18.79.44.25 192.18.26.165.48508 33580 0 24820 0 TIME_WAIT

Note that you will first need to determine the appropriate number of SMTP connections and their states (ESTABLISHED, CLOSE_WAIT, etc.) for your system to determine if a particular reading is out of the ordinary. If you find many connections staying in the SYN_RECEIVED state this might be caused by a broken network or a denial of service attack. In addition, the lifetime of an SMTP server process is limited. This is controlled by the MTA Dispatcher configuration option MAX_LIFE_TIME. The default is 86,400 seconds (one day). Similarly, MAX_LIFE_CONNS specifies the maximum number of connections a server process can handle in its lifetime. If you find a particular SMTP server that has around for a long time you may want to investigate.

Monitoring the Dispatcher and Job Controller Processes

The Dispatcher and Job Controller Processes must be operating for MTA to work. You should have one process of each kind.

Symptoms of Dispatcher and Job Controller Processes Down

If the Dispatcher is down or does not have enough resources, SMTP connections are refused. If the Job Controller is down, queue size will grow.

To Monitor Dispatcher and Job Controller Processes

Check to see that the processes called dispatcher and job_controller exist. See "Check that the Job Controller and Dispatcher Are Running" for more information.