Previous Contents Index Next |
iPlanet Messaging Server 5.2 Administrator's Guide |
Chapter 15 Monitoring the iPlanet Messaging Server
In most cases, a well-planned, well-configured server will perform without extensive intervention from an administrator. As an administrator, however, it is your job to monitor the server for signs of problems. This chapter describes the monitoring of the iPlanet Messaging Server. It consists of the following sections:
"Daily Monitoring Tasks"
Troubleshooting procedures can be found in Chapter 14 "Troubleshooting the MTA."Monitoring System Performance"
"Monitoring LDAP Directory Server"
Daily Monitoring Tasks
The most important tasks you should perform on a daily basis are checking postmaster mail, monitoring the log files, and setting up the stored utility. These tasks are described below.
Checking postmaster Mail
Messaging Server has a predefined administrative mailing list set up for postmaster email. Any users who are part of this mailing list will automatically receive mail addressed to postmaster.The rules for postmaster mail are defined in RFC822, which requires every email site to accept mail addressed to a user or mailing list named postmaster and that mail sent to this address be delivered to a real person. All messages sent to postmaster@host.domain are sent to a postmaster account or mailing list.
Typically, the postmaster address is where users should send email about their mail service. As postmaster, you might receive mail from local users about server response time, from other server administrators who are encountering problems sending mail to your server, and so on. You should check postmaster mail daily.
You can also configure the server to send certain error messages to the postmaster address. For example, when the MTA cannot route or deliver a message, you can be notified via email sent to the postmaster address. You can also send exception condition warnings (low disk space, poor server response) to postmaster.
Monitoring and Maintaining the Log Files
iPlanet Messaging Server creates a separate set of log files for each of the major protocols, or services, it supports: SMTP, IMAP, POP, and HTTP. You should monitor the log files on a routine basis--especially if you are having problems with the server.Be aware that logging can impact server performance. The more verbose the logging you specify, the more disk space your log files will occupy for a given amount of time. You should define effective but realistic log rotation, expiration, and backup policies for your server. For information about defining logging policies for your server, see Chapter 13 "Logging and Log Analysis."
Setting Up the stored Utility
The stored utility performs automatic monitoring and maintenance tasks for the server, such as:
Background and daily messaging tasks.
The stored utility automatically performs cleanup and expiration operations once a day at midnight. For further information see "stored".Deadlock detection and rollback of deadlocked database transactions.
Cleanup of temporary files on startup.
Implementation of aging policies.
Periodic monitoring of server state, disk space, service response times, and so on.
Monitoring System Performance
This chapter focuses on iPlanet Messaging Server monitoring, however, you will also need to monitor the system on which the server resides. A well-configured server cannot perform well on a poorly-tuned system, and symptoms of server failure may be an indication that the hardware is not powerful enough to serve the email load. This chapter does not provide all the details for monitoring system performance as many of these procedures are platform specific and may require that you refer to the platform specific system documentation. The following procedures are described here for performance monitoring:
"Monitoring End-to-end Message Delivery Times"
Monitoring End-to-end Message Delivery Times
Email needs to be delivered on time. This may be a service agreement requirement, but also it is good policy to have mail delivered as quickly as possible. Slow end-to-end times could indicate a number of things. It may be that the server is not working properly, or that certain times of the day experience overwhelming message loads, or that the existing hardware resources are being pushed beyond their capacity.
Symptoms of Poor End-to-end Message Delivery Times
Mail takes a longer period of time to be delivered than normal.
To Monitor End-to-end Message Delivery Times
Use any facility that sends a message and receives it. Compare the headers times between server hops, and times between point of origin and retrieval.
Monitoring Disk Space
Inadequate disk space is one of the most common causes of the mail server problems and failure. Without space to write to the MTA queues or to the message store, the mail server will fail. In addition, unless log files are monitored and cleaned up, they can grow uncontrollably filling up all disk space.Disk space can be rapidly depleted when the clean up function of stored fails and deleted messages are not expunged from the message store. Other causes of running out of disk space are the MTA message queues growing too large, the message store outgrowing the available disk space, and unmonitored log files growing uncontrollably. (Note that there are a number of log files such as LDAP, MTA, and Message Access, and that each of these log files can be stored on different disks.)
Symptoms of Disk Space Problems
Different symptoms can occur depending on which disk or partition is running out of space. MTA queues can overflow and reject SMTP connections, messages might remain in the ims_master queue and not be not delivered to the message store, and log files can overflow.
To Monitor Disk Space
Depending upon the system configuration you may need to monitor various disks and partitions. For example, MTA queues may reside on one disk/partition, message stores may reside on another, and log files may reside on yet another. Each of these spaces will require monitoring and the methods to monitor these spaces may differ.
Monitoring the Message Store
It is recommended that message store disk usage not exceed 75% capacity. You can monitor message store disk usage by configuring the following alarm attributes using the configutil utility:By setting these parameters, you can specify how often the system should monitor disk space and under what circumstances the system should send a warning. For example, if you want the system to monitor disk space every 600 seconds, specify the following command:
configutil -o alarm.diskavail.msgalarmstatinterval -v 600
If you want to receive a warning whenever available disk space falls below 20%, specify the following command:
configutil -o alarm.diskavail.msgalarmthreshold -v 20
Refer to Table 15-1 for more information on these parameters.
Monitoring the MTA Queues and Logging Space
You will need to monitor MTA queue disk and logging space disk usage.
Monitoring CPU Usage
High CPU usage is either a sign that there is not enough CPU capacity for the level of usage or some process is using up more CPU cycles than is appropriate.
Symptoms of CPU Usage Problems
Poor system response time. Slow logging in of users. Slow rate of delivery.
To Monitor CPU Usage
Monitoring CPU usage is a platform specific task. Refer to the relevant platform documentation.
Monitoring the MTA
This section consists of the following subsections:
"Monitoring the Size of the Message Queues"
"Monitoring Rate of Delivery Failure"
Monitoring the Size of the Message Queues
Excessive message queue growth may indicate that messages are not being delivered, are being delayed in their delivery, or are coming in faster than the system can deliver them. This may be caused by a number of reasons such as a denial of service attack caused by huge numbers of messages flooding your system, or the Job Controller not running.See "Channel Message Queues", "Messages are Not Dequeued", and "MTA Messages are Not Delivered" for more information on message queues.
Symptoms of Message Queue Problems
To Monitor the Size of the Message Queues
Probably the best way to monitor the message queues is to use imsimta qm. Refer to "imsimta qm counters".You can also monitor the number of files in the queue directories (/ServeRoot/msg-instance/imta/queue/). The number of files will be site-specific, and you'll need to build a baseline history to find out what is "too many." This can be done by recording the size of the queue files over a two week period to get an approximate average.
Monitoring Rate of Delivery Failure
A delivery failure is a failed attempt to deliver a message to an external site. A large increase in rate of delivery failure can be a sign of a network problem such as a dead DNS server or a remote server timing out on responding to connections.
Symptoms of Rate of Delivery Failure
There are no outward symptoms. Lots of Q records will appear in to mail.log_current.
To Monitor the Rate of Delivery Failure
Delivery failures are recorded in the MTA logs with the logging entry code Q. Look at the Q record in the file msg-instance/log/imta/mail.log_current
Monitoring Inbound SMTP Connections
An unusual increase in the number of inbound SMTP connections from a given IP address may indicate:
Symptoms of Unauthorized SMTP Connections
External user relaying mail: No outward symptoms.
Service denial attack: External attempt to overload the SMTP servers with message requests.
To Monitor Inbound SMTP Connections
External user relaying mail: Look in msg-instance/log/imta/mail.log_current for records with the logging entry code J (rejected relays). To turn on logging of remote IP addresses add the following line to the option.dat file:
Service denial attack: To find out who and how many users are connecting to the SMTP servers, you can run the command netstat and check for connections at the SMTP port (default: 25). Example:
- log_connection=1
- Note that there is a slight performance trade-off when this feature is enabled.
- Local address Remote
address State
192.18.79.44.25 192.18.78.44.56035 32768 0 32768 0
CLOSE_WAIT
192.18.79.44.25 192.18.136.54.57390 8760 0 24820 0
ESTABLISHED
192.18.79.44.25 192.18.26.165.48508 33580 0 24820 0
TIME_WAIT
- Note that you will first need to determine the appropriate number of SMTP connections and their states (ESTABLISHED, CLOSE_WAIT, etc.) for your system to determine if a particular reading is out of the ordinary.
- If you find many connections staying in the SYN_RECEIVED state this might be caused by a broken network or a denial of service attack. In addition, the lifetime of an SMTP server process is limited. This is controlled by the MTA configuration variable MAX_LIFE_TIME in the dispatcher.cnf file. The default is 86,400 seconds (one day). Similarly, MAX_LIFE_CONNS specifies the maximum number of connections a server process can handle in its lifetime. If you find a particular SMTP server that has around for a long time you may wish to investigate.
Monitoring the Dispatcher and Job Controller Processes
The Dispatcher and Job Controller Processes must be operating for MTA to work. You should have one process of each kind.
Symptoms of Dispatcher and Job Controller Processes Down
If the Dispatcher is down or does not have enough resources, SMTP connections are refused.If the Job Controller is down, queue size will grow.
To Monitor Dispatcher and Job Controller Processes
Check to see that the processes called dispatcher and job_controller exist. See "Check That the Job Controller and Dispatcher are Running".
Monitoring Message Access
This section consists of the following subsections:
"Monitoring imapd, popd and httpd"
Monitoring imapd, popd and httpd
These processes provide access to IMAP, POP and Webmail services. If any of these is not running or not responding, the service will not function appropriately. If the service is running, but is over loaded, monitoring will allow you to detect this and configure it more appropriately.
Symptoms of imapd, popd and httpd Problems
Connections are refused or system is too slow to connect. For example, if IMAP is not running and you try to connect to IMAP directly you will see something like this:telnet 0 143
Trying 0.0.0.0...
telnet: Unable to connect to remote host: Connection refusedIf you try to connect with a client, you will get a message such as:
netscape is unable to connect to the server at the location you have specified. The server may be down or busy.
To Monitor imapd, popd and httpd
Can be monitored with SNMP.
Check log files.
- If you have the SNMP set up, this is a very good way to monitor these processes. See Appendix A "SNMP Support." The server information is in the Network Services Monitoring MIB.
Can be checked with counterutil. See "counterutil" and the iPlanet Messaging Server Reference Manual.
- Look in the directory msg-instance/log/service where service can be http or IMAP or POP. In that directory you will find a number of log files. One filename is the name of the service (imap, pop, http) and the others are the name of the service plus a sequence number and a date concatenated to the service name. For example:
- imap imap.29.1010221593 imap.31.1010394412 imap.33.1010567224
- The file with just the service name is the latest log. The other ones are ordered by the sequence number (here 29, 31, 33) and the one with the highest sequence number is the next newest one. (See Chapter 13 "Logging and Log Analysis.")
- If a server was shut down you might see something like this:
- [05/Jan/2002:08:36:38 -0800] gotmail-a imapd[10275]: General Warning: iPlanet Messaging Server IMAP4 5.2 (built Dec 9 2001) shutting down
Run the platform-specific command to verify that the imapd, popd and httpd processes are running. For example, in Solaris you can use the ps command and look for imapd, popd and mshttpd. In Windows NT, you can either use the Task Manager window or the command line.
You can set alarms for specified server performance thresholds by setting the server response configuration parameters described in "Recommended stored Parameters".
Monitoring stored
stored performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk. If stored stops running, the messaging server will eventually run into problems. If stored doesn't start when start-msg is run, no other processes will start. For more information about stored see the iPlanet Messaging Server Reference Manual.
Symptoms of stored Problems
There are no outward symptoms.
Check that the stored process is running. stored creates and updates a pid file in msg-instance/config called pidfile.store. The pid file shows an init state when recovering and a ready state when ready. For example:
Check for log file build up in msg-instance/store/mailboxlist. Note that not every log file build up is caused by direct stored problems. Log files may also build up if imapd dies or there is a database problem.
- 231: cat pidfile.store
28250
ready
- The number on the first line is the process ID of stored.
- 232: ps -eaf | grep stored
mailsrv 28250 1 0 Jan 05 ? 8:44 /usr/iplanet/server5/bin/msg/admin/bin/stored -d
Check the timestamp on the following files in msg-instance/config:
Check for stored messages in the default log file msg-instance/log/default/default
- stored.ckp - Touched when attempt at checkpointing is made. Should get time stamped every 1 minute
stored.lcu - Touched at every db log cleanup. Should get time stamped every 5 minutes
stored.per - Touched at every spawn of peruser db writeout. Should get time stamped every 60 minutes
Monitoring LDAP Directory Server
This section consists of the following subsection:
"Monitoring slapd"
Monitoring slapd
The LDAP directory server (slapd) provides directory information for the messaging system. If slapd is down, the system will not work properly. If slapd response time is too slow, this will affect login speed and any other transaction that requires LDAP lookups.
Check that ns-slapd process is running.
Check slapd log files access and errors in slapd-instance/logs/
Check the ns-slapd response time while searching for a user.
Monitoring the Message Store
Messages are stored in a database. The distribution of users on disks, the size of their mailbox, and disk requirements affect the store performance. This section consists of the following subsections:
"Monitoring the State of Message Store Database Locks"
"Monitoring the Number of Database Log Files in the mboxutil Directory"
Monitoring the State of Message Store Database Locks
The state of DB-locks is held by different server processes. These database locks can affect the performance of the message store. In case of deadlocks, messages will not be getting inserted into the store at reasonable speeds and the ims-ms channel queue will grow larger as a result. There are legitimate reasons for a queue to back up, so it is useful to have a history of the queue length in order to diagnose problems.
Symptoms of Message Store Database Lock Problems
Number of transactions are accumulating and not resolving.
To Monitor Message Store Database Locks
Use the command counterutil -o db_lock
Monitoring the Number of Database Log Files in the mboxutil Directory
Database log files refer to sleepycat transaction checkpointing log files (msg-instance/store/mboxlist). Log file build up is a symptom of database checkpointing not happening. Log file build up can also be due to stored problems.
Symptoms of Database Log File Problems
There should be 2 or 3 log files. If there are more, it is a sign of a potentially serious problem. The message store uses a few databases for messages and quotas, and a problem with those can lead to problems for all of the mail server.
To Monitor Database Log Files
Look in the msg-instance/store/mboxlist directory and make sure there are only 2 or 3 files.
Utilities and Tools for Monitoring
The following tools are available in for monitoring:
"stored"
stored
The stored utility performs maintenance tasks on the server, but it also can do monitoring. It can periodically check the server state, disk space, service response times and, if specified, it can issue alarms in the form of email messages to the postmaster (see page 508).An alarm comes in the form of an email message from stored to the postmaster warning of a specified condition. A sample email alarm sent by stored when a certain threshold is exceeded is shown below:
Subject: ALARM: server response time in seconds of "ldap_siroe.com_389" is 10
Date: Tue, 17 Jul 2001 16:37:08 -0700 (PDT)
From: postmaster@siroe.com
To: postmaster@siroe.comServer instance: /usr/iplanet/server5/msg-europa
Alarmid: serverresponse
Instance: ldap_siroe_europa.com_389
Description: server response time in seconds
Current measured value (17/Jul/2001:16:37:08 -0700): 10
Lowest recorded value: 0
Highest recorded value: 10
Monitoring interval: 600 seconds
Alarm condition is when over threshold of 10
Number of times over threshold: 1You can specify how often stored monitors disk and server performance, and under what circumstances it sends alarms. This is done by using the configutil command to set the alarm parameters. Table 15-1 shows useful stored parameters along with their default setting.
counterutil
This utility provides statistics acquired from different system counters. Here is a current list of available counter objects:counterutil -l
entry = alarm
entry = diskusage
entry = serverresponse
entry = db_lock
entry = db_log
entry = db_mpool
entry = db_txn
entry = popstat
entry = imapstat
entry = httpstat
entry = cgimsgEach entry represents a counter object and supplies a variety of useful counts for this object. In this section we will only be discussing the alarm, diskusage, serverresponse, db_lock, popstat, imapstat, and httpstat counter objects. For details on counterutil command usage, refer to the iPlanet Messaging Server Reference Manual.
counterutil Output
counterutil has a variety of flags. A command format for this utility may be as follows:
An example of counterutil usage is as follows:
- counterutil -o CounterObject -i 5 -n 10
- where,
- -o CounterObject represents the counter object alarm, diskusage, serverresponse, db_lock, popstat, imapstat, and httpstat.
- -i 5 specifies a 5 second interval.
- -n 10 represents the number of iterations (default: infinity).
counterutil -o imapstat -i 5 -n 10
Monitor counteroobject (imapstat)
registry /gotmail/iplanet/server5/msg-gotmail/counter/counter opened
counterobject imapstat opened
count = 1 at 972082466 rh = 0xc0990 oh = 0xc0968
global.currentStartTime [4 bytes]: 17/Oct/2000:12:44:23 -0700
global.lastConnectionTime [4 bytes]: 20/Oct/2000:15:53:37 -0700
global.maxConnections [4 bytes]: 69
global.numConnections [4 bytes]: 12480
global.numCurrentConnections [4 bytes]: 48
global.numFailedConnections [4 bytes]: 0
global.numFailedLogins [4 bytes]: 15
global.numGoodLogins [4 bytes]: 10446
...
Alarm Statistics Using counterutil
These alarm statistics refer to the alarms sent by stored.The alarm counter provides the following statistics:
Table 15-2    counterutil alarm Statistics
Suffix
Description
IMAP, POP, and HTTP Connection Statistics Using counterutil
To get information on the number of current IMAP, POP, and HTTP connections, number of failed logins, total connections from the start time, and so forth, you can use the command counterutil -o CounterObject -i 5 -n 10.where CounterObject represents the counter object popstat, imapstat, or httpstat. The meaning of the imapstat suffixes is shown in Table 15-3. The popstat and httpstat objects provide the same information in the same format and structure.
Disk Usage Statistics Using counterutil
The command: counterutil -o diskusage generates following information:
Table 15-4    counterutil diskstat Statistics
Suffix
Description
Server Response Statistics
The command: counterutil -o serverresponse generates following information. This information is useful for checking if the servers are running, and how quickly they're responding.
Table 15-5    counterutil serverresponse Statistics
Suffix
Description
Log Files
Messaging server logs event records for SMTP, IMAP, POP, and HTTP. The policies for creating and managing the Messaging Server log files are customizable.Since logging can affect the server performance, logging should be considered very carefully before the burden is put on the server. Refer to Chapter 13 "Logging and Log Analysis" for more information.
imsimta counters
The MTA accumulates message traffic counters based upon the Mail Monitoring MIB, RFC 1566 for each of its active channels. The channel counters are intended to help indicate the trend and health of your e-mail system. Channel counters are not designed to provide an accurate accounting of message traffic. For precise accounting, instead see MTA logging as discussed in Chapter 13 "Logging and Log Analysis.The MTA channel counters are implemented using the lightest weight mechanisms available so that they cause as little impact as possible on actual operation. Channel counters do not try harder: if an attempt to map the section fails, no information is recorded; if one of the locks in the section cannot be obtained almost immediately, no information is recorded; when a system is shut down, the information contained in the in-memory section is lost forever.
The imsimta counters -show command provides MTA channel message statistics (see below). These counters need to be examined over time noting the minimum values seen. The minimums may actually be negative for some channels. A negative value means that there were messages queued for a channel at the time that its counters were zeroed (for example, the cluster-wide database of counters created). When those messages were dequeued, the associated counters for the channel were decremented and therefore leading to a negative minimum. For such a counter, the correct "absolute" value is the current value less the minimum value that counter has ever held since being initialized.
1) Received is the number of messages enqueued to the channel named tcp_local. That is, the messages enqueued (E records in the mail.log* file) to the tcp_local channel by any other channel.
2) Stored is the number of messages stored in the channel queue to be delivered.
3) Delivered is the number of messages which have been processed (dequeued) by the channel tcp_local. (That is, D records in the mail.log* file.) A dequeue operation may either correspond to a successful delivery (that is, an enqueue to another channel), or to a dequeue due to the message being returned to the sender. This will generally correspond to the number Received minus the number Stored.
The MTA also keeps track of how many of the messages were dequeued upon first attempt; this number is shown in parentheses.
4) Submitted is the number of messages enqueued (E records in the mail.log file) by the channel tcp_local to any other channel.
5) Attempted is the number of messages which have experienced temporary problems in dequeuing, that is, Q or Z records in the mail.log* file.
6) Rejected is the number of attempted enqueues which have been rejected, that is, J records in the mail.log* file.
7) Failed is the number of attempted dequeues which have failed, that is, R records in the mail.log* file.
8) Queue time/count is the average time-spent-in-queue for the delivered messages. This includes both the messages delivered upon the first attempt, see (9), and the messages that required additional delivery attempts (hence typically spentnoticeable time waiting fallow in the queue).
9) Queue first time/count is the average time-spent-in-queue for the messages delivered upon the first attempt.
Note that the number of messages submitted can be greater than the number delivered. This is often the case, since each message the channel dequeues (delivers) will result in at least one new message enqueued (submitted) but possibly more than one. For example, if a message has two recipients reached via different channels, then two enqueues will be required. Or if a message bounces, a copy will go back to the sender and another copy may be sent to the postmaster. Usually that will be two submissions (unless both are reached through the same channel).
More generally, the connection between Submitted and Delivered varies according to type of channel. For example, in the conversion channel, a message would be enqueued by some other arbitrary channel, and then the conversion channel would process that message and enqueue it to a third channel and mark the message as dequeued from its own queue. Each individual message takes a path:
elsewhere -> conversion E record Received
conversion -> elsewhere E record Submitted
conversion D record DeliveredHowever, for a channel such as tcp_local which is not a "pass through," but rather has two separate pieces (slave and master), there is no connection between Submitted and Delivered. The Submitted counter has to do with the SMTP server portion of the tcp_local channel, whereas the Delivered counter has to do with the SMTP client portion of the tcp_local channel. Those are two completely separate programs, and the messages travelling through them may be completely separate.
Messages submitted to the SMTP server:
tcp_local -> elsewhere E record Submitted
Messages sent out to other SMTP hosts via the SMTP client:
elsewhere -> tcp_local E record Received
tcp_local D record DeliveredChannel dequeues (delivers) will result in at least one new message enqueued (submitted) but possibly more than one. For example, if a message has two recipients reached via different channels, then two enqueues will be required. Or if a message bounces, a copy will go back to the sender and another copy may be sent to the postmaster. Usually that will be reached through the same channel.
Implementation on UNIX and NT
For performance reasons, a node running the MTA keeps a cache of channel counters in memory using a shared memory section (UNIX) or shared file-mapping object (NT). As processes on the node enqueue and dequeue messages, they update the counters in this in-memory cache. If the in-memory section does not exist when a channel runs, the section will be created automatically. (The imta start command also creates the in-memory section, if it does not exist.)The command imta counters -clear or the imta qm command counters clear may be used to reset the counters to zero.
imsimta qm counters
The imsimta qm counters utility displays MTA channel queue message counters. You must be root or mailsrv to run this utility. The output fields are the same as those described in "imsimta counters". See also iPlanet Messaging Server Reference Manual for usage details.Channel Messages Recipients Blocks
---------------------- ---------- ---------- ----------
autoreply
Received 13077 13859 264616
Stored 92 91 -362
Delivered 12985 13768 264978
Submitted 2594 2594 3641
...4370 messages processed so far today
Your license permits an unlimited number of messages per day.
MTA Monitoring Using SNMP
iPlanet Messaging Server supports system monitoring through the Simple Network Management Protocol (SNMP). Using an SNMP client (sometimes called a network manager) such as Sun Net Manager or HP OpenView (not provided with this product), you can monitor certain parts of the iPlanet Messaging Server. Refer to Appendix A "SNMP Support" for details.
mboxutil for Mailbox Quota Checking
You can monitor mailbox quota usage and limits by using the mboxutil utility. The mboxutil utility generates a report that lists defined quotas and limits, and provides information on quota usage. Quotas and usage figures are reported in kilobytes.For example, the following command lists all user quota information:
The following example shows the quota usage for user sorook:
Previous Contents Index Next
Copyright © 2002 Sun Microsystems, Inc. All rights reserved.
Last Updated February 27, 2002