This section consists of the following subsections:
Excessive message queue growth may indicate that messages are not being delivered, are being delayed in their delivery, or are coming in faster than the system can deliver them. This may be caused by a number of reasons such as a denial of service attack caused by huge numbers of messages flooding your system, or the Job Controller not running.
Disk space usage grows.
User not receiving messages in a reasonable time.
Message queue sizes are abnormally high.
Probably the best way to monitor the message queues is to use imsimta qm and imsimta summarize. Refer to 27.8.6 imsimta qm counters.
You can also monitor the number of files in the queue directories (msg-svr-base/data/queue/). The number of files will be site-specific, and you’ll need to build a baseline history to find out what is “too many.” This can be done by recording the size of the queue files over a two week period to get an approximate average.
A delivery failure is a failed attempt to deliver a message to an external site. A large increase in rate of delivery failure can be a sign of a network problem such as a dead DNS server or a remote server timing out on responding to connections.
Delivery failures are recorded in the MTA logs with the logging entry code Q. Look at the record in the file msg-svr-base/data/log/mail.log_current. Example:
mail.log:06-Oct-2003 00:24:03.66 501d.0b.9 ims-ms Q 5 durai.balusamy@Sun.COM rfc822;durai.balusamy@Sun.COM durai@ims-ms-daemon <00ce01c38bda$c7e2b240$6501a8c0@guindy> Mailbox is busy
An external user is trying to relay mail.
An external user is trying to do a service denial attack.
External user relaying mail: Look in msg-svr-base/log/mail.log_current for records with the logging entry code J (rejected relays). To turn on logging of remote IP addresses add the following line to the option.dat file:
Note that there is a slight performance trade-off when this feature is enabled.
Local address Remote address State 126.96.36.199.25 188.8.131.52.56035 32768 0 32768 0 CLOSE_WAIT 184.108.40.206.25 220.127.116.11.57390 8760 0 24820 0 ESTABLISHED 18.104.22.168.25 22.214.171.124.48508 33580 0 24820 0 TIME_WAIT
Note that you will first need to determine the appropriate number of SMTP connections and their states (ESTABLISHED, CLOSE_WAIT, etc.) for your system to determine if a particular reading is out of the ordinary.
If you find many connections staying in the SYN_RECEIVED state this might be caused by a broken network or a denial of service attack. In addition, the lifetime of an SMTP server process is limited. This is controlled by the MTA configuration variable MAX_LIFE_TIME in the dispatcher.cnf file. The default is 86,400 seconds (one day). Similarly, MAX_LIFE_CONNS specifies the maximum number of connections a server process can handle in its lifetime. If you find a particular SMTP server that has around for a long time you may wish to investigate.
The Dispatcher and Job Controller Processes must be operating for MTA to work. You should have one process of each kind.
If the Dispatcher is down or does not have enough resources, SMTP connections are refused.
If the Job Controller is down, queue size will grow.
Check to see that the processes called dispatcher and job_controller exist. See 26.2.4 Check that the Job Controller and Dispatcher are Running.