This chapter describes how to monitor the Message Transfer Agent (MTA).
Excessive message queue growth may indicate that messages are not being delivered, are being delayed in their delivery, or are coming in faster than the system can deliver them. This may be caused by a number of reasons such as a denial of service attack caused by huge numbers of messages flooding your system, or the Job Controller not running.
Disk space usage grows.
User not receiving messages in a reasonable time.
Message queue sizes are abnormally high.
You can also monitor the number of files in the queue directories (MessagingServer_home/data/queue/). The number of files will be site-specific, and you'll need to build a baseline history to find out what is "too many." This can be done by recording the size of the queue files over a two week period to get an approximate average.
If the MTA detects a message is looping, it will be sidelined by renaming the queue message file to .HELD. For more discussion of how messages can become .HELD and what to do about them, refer to "Diagnosing and Cleaning up .HELD Messages." To see whether there are any held messages, use the imsimta qm summarize -held command described in Table 8-20, "summarize Sub-Command Options".
A delivery failure is a failed attempt to deliver a message to an external site. A large increase in rate of delivery failure can be a sign of a network problem such as a dead DNS server or a remote server timing out on responding to connections.
There are no outward symptoms. Lots of Q records will appear in to mail.log_current.
Delivery failures are recorded in the MTA logs with the logging entry code Q. Look at the record in the file MessagingServer_home/data/log/mail.log_current. Example:
mail.log:06-Oct-2003 00:24:03.66 501d.0b.9 ims-ms Q 5 durai.balusamy@Sun.COM rfc822;durai.balusamy@Sun.COM durai@ims-ms-daemon <00ce01c38bda$c7e2b240$6501a8c0@guindy>Mailbox is busy
An unusual increase in the number of inbound SMTP connections from a given IP address may indicate:
An external user is trying to relay mail.
An external user is trying to do a service denial attack.
External user relaying mail : No outward symptoms.
Service denial attack: External attempt to overload the SMTP servers with message requests.
External user relaying mail: Look in MessagingServer_home/log/mail.log_current for records with the logging entry code J (rejected relays). To turn on logging of remote IP addresses run the following command: msconfig set log_connection 1
Note that there is a slight performance trade-off when this feature is enabled.
Service denial attack: To find out who and how many users are connecting to the SMTP servers, you can run the command netstat and check for connections at the SMTP port (default: 25). Example:
Local address Remote address State 22.214.171.124.25 126.96.36.199.56035 32768 0 32768 0 CLOSE_WAIT 188.8.131.52.25 184.108.40.206.57390 8760 0 24820 0 ESTABLISHED 220.127.116.11.25 18.104.22.168.48508 33580 0 24820 0 TIME_WAIT
Note that you will first need to determine the appropriate number of SMTP connections and their states (ESTABLISHED, CLOSE_WAIT, etc.) for your system to determine if a particular reading is out of the ordinary. If you find many connections staying in the SYN_RECEIVED state this might be caused by a broken network or a denial of service attack. In addition, the lifetime of an SMTP server process is limited. This is controlled by the MTA Dispatcher configuration option MAX_LIFE_TIME. The default is 86,400 seconds (one day). Similarly, MAX_LIFE_CONNS specifies the maximum number of connections a server process can handle in its lifetime. If you find a particular SMTP server that has around for a long time you may wish to investigate.
The Dispatcher and Job Controller Processes must be operating for MTA to work. You should have one process of each kind.
If the Dispatcher is down or does not have enough resources, SMTP connections are refused. If the Job Controller is down, queue size will grow.
Check to see that the processes called dispatcher and job_controller exist. See "Check that the Job Controller and Dispatcher Are Running."