This chapter focuses on Messaging Server monitoring, however, you will also need to monitor the system on which the server resides. A well-configured server cannot perform well on a poorly-tuned system, and symptoms of server failure may be an indication that the hardware is not powerful enough to serve the email load. This chapter does not provide all the details for monitoring system performance as many of these procedures are platform specific and may require that you refer to the platform specific system documentation. The following procedures are described here for performance monitoring:
Email needs to be delivered on time. This may be a service agreement requirement, but also it is good policy to have mail delivered as quickly as possible. Slow end-to-end times could indicate a number of things. It may be that the server is not working properly, or that certain times of the day experience overwhelming message loads, or that the existing hardware resources are being pushed beyond their capacity.
Mail takes a longer period of time to be delivered than normal.
Use any facility that sends a message and receives it. Compare the headers times between server hops, and times between point of origin and retrieval. See immonitor-access.
Inadequate disk space is one of the most common causes of the mail server problems and failure. Without space to write to the MTA queues or to the message store, the mail server will fail. In addition, unless log files are monitored and cleaned up, they can grow uncontrollably filling up all disk space.
Message store partitions grow as new messages are delivered to the mailboxes; for example, if message store quotas are not enforced, the message store can outgrow the disk space available for a partition. Another cause of running out of disk space are the MTA message queues growing too large. A third area of concern is if a problem occurs with the log file monitoring facilities and the log files growing uncontrollably. (Note that there are a number of log files such as LDAP, MTA, and Message Access, and that each of these log files can be stored on different disks.)
Different symptoms can occur depending on which disk or partition is running out of space. MTA queues can overflow and reject SMTP connections, messages might remain in the ims_master queue and not be not delivered to the message store, and log files can overflow.
If a message store partition fills up, message access daemons can fail, and message store data can be corrupted. Message store maintenance utilities such as imexpire and reconstruct can repair the damage and reduce disk usage. However, these utilities require additional disk space, and repairing a partition that has filled an entire disk can potentially cause down time.
Depending upon the system configuration you may need to monitor various disks and partitions. For example, MTA queues may reside on one disk/partition, message stores may reside on another, and log files may reside on yet another. Each of these spaces will require monitoring and the methods to monitor these spaces may differ.
Messaging Server provides specific methods for monitoring message store disk usage and preventing partitions from filling up all available disk space.
You can take the following steps to monitor the message store’s use of disk space:
Set parameters to monitor message store disk usage
Lock message store partitions when a disk-usage threshold is reached
For details, see the sections that follow: Monitoring the Message Store and Monitoring Message Store Partitions.
It is recommended that message store disk usage not exceed 75% capacity. You can monitor message store disk usage by configuring the following alarm attributes using the configutil utility:
By setting these parameters, you can specify how often the system should monitor disk space and under what circumstances the system should send a warning. For example, if you want the system to monitor disk space every 600 seconds, specify the following command:
configutil -o alarm.diskavail.msgalarmstatinterval -v 600
If you want to receive a warning whenever available disk space falls below 20%, specify the following command:
configutil -o alarm.diskavail.msgalarmthreshold -v 20
Refer to Table 23–6 for more information on these parameters.
You can halt messages from being delivered to a message store partition when the partition fills more than a specified percentage of available disk space. This is done by setting two configutil parameters to enable the feature and specify the disk-usage threshold.
With this feature, the message store daemon monitors the partition’s disk usage. As disk usage increases, the store daemon dynamically checks the partition more frequently (ranging from once every 100 minutes to once a minute).
If disk usage goes higher than the specified threshold, the store daemon:
Locks the partition. Incoming messages are held in the MTA message queue, but not delivered to the mailboxes in the message store partition.
Logs a message to the default log file.
Sends an email notification to the postmaster. (You can change the recipient of the email by setting the configutil parameter alarm.msgalarmnoticercpt.)
When disk usage falls below the threshold, the partition is unlocked, and messages are again delivered to the store.
The configutil parameters are as follows:
local.store.checkdiskusage enables the partition-monitoring feature.
Allowable values: yes, no
Default value: yes
local.store.diskusagethreshold specifies the disk-usage threshold. The value of local.store.diskusagethreshold is a percentage from 1 to 99.
Default value: 99
You should set the disk-usage threshold to a percentage low enough to give you time to repartition or assign more disk space to the local message store.
For example, suppose a partition fills up disk space at a rate of 2 percent per hour, and it takes an hour to allocate additional disk space for the local message store. In this case, you should set the disk-usage threshold to a value lower than 98 percent.
You will need to monitor MTA queue disk and logging space disk usage.
For information on managing logging space, see Chapter 21, Managing Logging For example, to learn how to monitor the mail.log file, see Managing MTA Message and Connection Logs
High CPU usage is either a sign that there is not enough CPU capacity for the level of usage or some process is using up more CPU cycles than is appropriate.
Poor system response time. Slow logging in of users. Slow rate of delivery.
Monitoring CPU usage is a platform specific task. Refer to the relevant platform documentation.