CHAPTER 12 |
SIMS Monitoring and Logging
|
|
|
|
|
|
|
|
| |
Monitoring enables you to manage your SIMS server more effectively. Through monitoring, problems are detected earlier so that you can take corrective actions before your mail service is affected.
SIMS provides a number of ways of monitoring the messaging components. The three primary ways are:
1. | Monitoring utilities | |
2. | SNMP support | |
3. | Log files |
In addition, there are several categories of information that you can monitor, these include:
Availability: Is a service or component operating? |
Connectivity: Is the mail service accessible? (POP, IMAP etc.) |
Response time: How is the service performing from an end user perspective? |
Processes: Are critical processes running? |
Resources: Is there sufficient disk space for the message store? Is there enough virtual memory? What is the system load? |
Security: Are there denial of service attacks in progress? Is the service subject to a SPAM attack? If so, from where? |
The monitoring capabilities built into SIMS allow you to answer these questions. This chapter explains how to monitor a SIMS based e-mail service, and how to automate the monitoring process so that you are notified as problem situations that need attention develop.
SIMS captures a lot of information about significant events in the operation of the email service. This information is logged to various log files that are listed in TABLE 12-3. The type of information that is collected can be controlled. Some components provide a configuration file where the logging detail can be set (refer to the appropriate section in the SIMS Reference Manual).
Since an active SIMS server can generate a lot of logging information, manual analysis of the log files is often impractical. For this reason, SIMS includes a number of monitoring tools that extract interesting statistics from the log files (see "SIMS Monitoring Utilities" on page 261). In addition useful analysis can be performed using Solaris shell scripts. For example, here's how to determine the number of successful IMAP logins today:
% grep -c "ims.imaccessd\[[0-9]*\]\: imap\[[0-9]*]\: Auth" /var/log/syslog
252and how to determine how many IMAP connections were lost:
% grep -c "ims.imaccessd\[[0-9]*\]\: imap\[[0-9]*]\: Connection broken" /var/log/syslog
13
he MSMA logs all imaccessd and message store utility errors and events. Log messages are in the /var/log/syslog file and prefixed with SUNWmail.ims. Note that system messages are logged every time a user checks for mail. It logs the pop3 and imap login and logout. This can cause a clutter syslog messages. To stop login/logout messages from clogging you syslog file, change the syslog logging level in /etc/syslog.conf from mail.debug to mail.notice and restart the syslog daemon after the change. See the syslog(3) man page for details.
If configured, logs various information such as protocol trace information for function calls and debug information about packets. Log files are in the /var/opt/SUNWconn/ldap/log directory and are shown in TABLE 12-1. Refer to Netscape Directory Service documentation for information on Netscape log files.
Note - Note that if you are using the Netscape Directory Service, the log files will be different. Refer to product documentation.
TABLE 12-2 dsserv.log ExampleBy default, the directory server daemon logs information about connections and operations in /var/opt/SUNWconn/ldap/log/dsserv.log. In the event that you need more diagnostic ability than is provided by SIMS default Directory configuration, you may run dsservd from a shell manually, providing additional diagnostics. See the -d and -s arguments to dsservd, documented in dsservd(8) man page. Note that running with additional diagnostics turned on will run dsservd in the foreground, and will use up disk space quickly if you have all optional diagnostic output enabled.
In addition, certain messages may be logged to the console of the machine on which dsservd is running, and/or to /var/adm/messages.
TABLE 12-2 contains an example of the information that is logged during a search operation. This example shows two interactions with the directory, the first to add an entry, and the second to search the directory for all entries that have a commonName attribute. The log includes details of the bind and unbind, and operational information.
Thu May 15 16:03: conn=9 fd=15 connection from unknown (127.0.0.1)
|
Thu May 15 16:03: conn=9 op=0 BIND dn="CN=admin,O=sun,C=us" method=128
|
Thu May 15 16:03: conn=9 op=0 RESULT err=0 tag=97 nentries=0
|
Thu May 15 16:03: conn=9 op=1 ADD dn="O=sun,C=us"
|
Thu May 15 16:03: conn=9 op=1 RESULT err=0 tag=105 nentries=0
|
Thu May 15 16:03: conn=9 op=2 UNBIND
|
Thu May 15 16:03: conn=9 op=2 fd=15 closed errno=0
|
Thu May 15 16:26: conn=10 fd=15 connection from unknown (127.0.0.1)
|
Thu May 15 16:26: conn=10 op=0 BIND dn="CN=admin,O=sun,C=us" method=128
|
Thu May 15 16:26: conn=10 op=0 RESULT err=0 tag=97 nentries=0
|
Thu May 15 16:26: conn=10 op=1 SRCH base="O=sun,C=us" scope=2 filter="(cn=*)"
|
Thu May 15 16:26: conn=10 op=1 RESULT err=0 tag=101 nentries=150
|
Thu May 15 16:26: conn=10 op=-1 fd=15 closed errno=0
|
Thu May 15 16:26: conn=10 op=2 UNBIND
|
If configured, an IMTA channel logs in each email as it enters and is removed from the channel queue. Unless otherwise indicated in the table below, IMTA log file names are located in the directory /var/opt/SUNWmail/imta/log.
TABLE 12-3 Overview of IMTA Logging Files Log File
Description
Further Information
mail.log
Cumulative IMTA events
"Diagnostics Output" on page 105,
"Message Logging" on page 107
"Logging Messages Passing Through the IMTA" on page 254
"Viewing Enqueued Messages" on page 116mail.log_current
Today's IMTA events
mail.log_yesterday
Yesterday's IMTA events
dispatcher.log-*
SMTP dispatcher
SIMS Reference Manual: log files. IMTA service dispatcher
job_controller.log-*
Job controller
SIMS Reference Manual: job controller
tcp_smtp_server.log-*
SMTP server
/var/log/syslog
prefixed with SUNWmail.imta
channel_master.log-*
Master program output (usually client) for channel
channel_slave.log-*
Output of slave program (usually server) for channel
server-name_server.log-*
Logging for server server-name
server-name_thread.log-*
Per thread errors for server-name
post.log-*
Log output for the periodic IMTA delivery job
return.log-*
Log output for the periodic IMTA message bouncer job
Channel log files are not created unless an error occurs or debugging output is enabled for the channel with the master_debug channel keyword or slave_debug channel keyword.
Each new log file is created with a unique id to avoid overwriting an earlier log written by the same channel. You can use the imta find utility to aid in finding the desired "version" of a log file. You can purge back older log files using the imta purge command.
This section points out some message logging and tracking techniques.
When the identtcp or identtcpnumeric channel keyword is set in the tcp_local channel, the IMTA attempts to use an IDENT query to identify incoming SMTP connections. If the sending system is running an IDENT server, it will return to the IMTA the SMTP sender's identity for the IMTA to insert in the Received: header the IMTA constructs. If the sending system is not running an IDENT server, the IMTA will just use the port number (port 25) and the sending system IP address or name.
With identtcpnumeric, the IMTA uses the IDENT information (if any) and the actual IP address of the sending system; with identtcp, the IMTA also attempts to translate the IP address to a system name by performing a DNS reverse lookup.Thus identtcpnumeric incurs slightly less overhead because it does not do the DNS reverse lookup, and the actual IP address might be considered somewhat more authoritative that the name resulting from a DNS query. However, using the system name as with identtcp may be considered more user-friendly.
Identifying information in Received headers can assist in detecting spoofed email and in holding the senders of such spoofed email accountable. Note that user friendly identifying information is a not insignificant feature: even a naive user may notice that a Received header in a suspicious message contains an unexpected address, for example, anonymous@SpoofersAreUs.edu, but only a fairly sophisticated user is able to pay attention to any IP addresses showing up in Received headers. So a choice between these keywords may be affected by whether you are looking to provide forewarning to users that they may have received spoofed email, or whether you merely wish to preserve the identifying information for use in investigating cases of spoofed email.
The IMTA provides facilities for logging each message as it is enqueued and dequeued. All log entries are made to the mail.log_current file in the IMTA log directory, /var/opt/SUNWmail/imta/log. Logging is controlled on a per-channel basis. The logging keyword activates logging for a particular channel while the nologging keyword disables it. Logging is enabled on all channels by default.
When logging is turned on, the cumulative mail.log file in the IMTA log directory will continue to grow and it is up to you to periodically write it to backup and delete it, or truncate it, or whatever your site prefers.
The log file is written as normal ASCII text and the format is quite simple. By default, each entry contains eight or nine fields. An example is shown in the
CODE EXAMPLE 12-1 mail.log_current File.
|
You can also view the message queue by running:
% imta cache -view <channel_name>
The imta test -rewrite command verifies that an address is properly handled by the IMTA.
If you still cannot find a lost message, check whether it's being held, by looking for *.HELD messages in the channel queues.
The fields are described in TABLE 12-4:
TABLE 12-4 Log File Fields date/time
Date and time when the entry was made.
src-channel
Channel name for the source channel.
dest-ch.
Channel name of the destination channel. For SMTP channels when LOG_CONNECTION is enabled, a minus sign (-) indicates inbound to the SMTP server, a plus sign (+) indicates outbound via the SMTP client.
Type
Type of entry. See TABLE 12-5.
Size
Size of the message. This is expressed in kilobytes by default, although this default can be changed by using the BLOCK_SIZE keyword in the IMTA option file.
Sender
Envelope From address. Note that for messages with an empty envelope From address, such as notification messages, this field will be blank.
Rcpt
Original form of the envelope To address.
smtp:250<marlow@alpha.com>
Active (current) form of the envelope To address (SMTP channels during dequeue only).
Recipient ok
Delivery status (SMTP channels only during dequeue).
The logging entry codes are described in TABLE 12-5:
In addition to the base set of logging enabled by the logging channel keyword, the IMTA has options that cause additional information to be included in the entries written to the mail.log* files. Note that logging such additional information tends to incur additional overhead.
Particularly likely to be of interest are the LOG_MESSAGE_ID, LOG_FILENAME, and LOG_CONNECTION options. Enabling LOG_MESSAGE_ID allows correlation of which entries relate to which message.
Enabling LOG_FILENAME makes it easier to immediately spot how many times delivery of a particular message file has been retried, and can be useful in understanding when the IMTA does or does not split a message to multiple recipients into separate message file copies on disk.
Enabling LOG_CONNECTION causes the IMTA to log TCP/IP connections, as well as message traffic, to the mail.log files by default; alternatively, the SEPARATE_CONNECTION_LOG option may be used to specify that connection log entries instead be written to connection.log files. That is, if SEPARATE_CONNECTION_LOG and LOG_CONNECTION are set, the connection log entries will be written to the connection.log_current file in the IMTA log directory.
Setting LOG_HEADER=1 may be of interest if you wish to save certain message headers to the mail.log* files.
Additionally, setting LOG_PROCESS=1 and LOG_USERNAME=1 on an IMTA firewall system generally results in fairly monotonous extra information being logged: the process ID of the process enqueuing a message on an IMTA firewall system would normally be that of an IMTA SMTP server process (for SMTP messages), and the user name would normally just be the user name of the user who last restarted the IMTA Service Dispatcher. Enable these options if you wish to confirm that the process IDs and user names of processes enqueuing messages are as expected.
Note - It is up to the individual sites whether or not to implement a log cleaning policy. By default, mail.log is never removed; this can potentially fill up your disk space.
Command: imta-counters
The IMTA has channel statistics counters to collect and monitor channel counters based upon the SNMP Mail Monitoring MIB, RFC 1566. These counters tabulate on a per channel basis the twelve items described in TABLE 12-6.
It is important to note that these counters generally need to be looked at over time noting the minimum values seen. The minimums may actually be negative for some channels. Such a negative value merely means that there were messages queued for a channel at the time that its counters were zeroed (e.g., the cluster-wide database of counters created). When those messages were dequeued, the associated counters for the channel were decremented therefore leading to a negative minimum. For such a counter, the correct "absolute" value is the current value less the minimum value that counter has ever held since being initialized.
The IMTA channel counters are intended for indicating the trend and health of your e-mail system. The IMTA channel counters are not designed nor intended to provide an accurate accounting of message traffic. For precise accounting, see "Logging Messages Passing Through the IMTA" on page 254. The IMTA's channel counters are implemented using the lightest weight mechanisms available, namely a shared memory section on each system.
The example below shows a sample excerpt of counters data, as might be seen using the imta counters -show utility.
Channel Messages Recipients Blocks
------- -------- ---------- ------
directory
Received 6523 9042 69694 (1)
Stored 4 4 149 (2)
Delivered 6519 9038 69545 (3)
Submitted 6811 9019 71123 (4)In this example:
1. | The "Received" value represents a count of messages coming from any channel to the channel named directory. That is, messages enqueued to the directory channel by any other channel. | |
2. | The "Stored" value represents a count of messages stored in the channel queue to be delivered. This will generally correspond to the number of entries currently stored for the channel in the IMTA queue cache database. | |
3. | The "Delivered" value represents a count of messages which have been processed (dequeued) by the channel directory; a dequeue operation may either correspond to a successful "delivery" (that is, an enqueue to another channel), or to a dequeue due to the message being returned to the sender. This will generally correspond to the number Received minus the number Stored. | |
4. | The "Submitted" value represents a count of messages which have been enqueued from the channel directory to any other channel. |
Note that in this example, the number of messages Submitted is greater than the number delivered. This is often the case, since each message the channel dequeues (delivers) will result in at least one new message enqueued (submitted) but possibly more than one. For example, if a message has two recipients reached via different channels, then two enqueues will be required. Or if a message bounces, a copy will go back to the sender and another copy may be sent to the postmaster. Usually that will be two submissions (unless both are reached through the same channel).
SIMS provides a set of command line monitoring tools (see TABLE 12-7) that provide detailed information about the status and performance of a SIMS server. For additional information refer to the man pages.
The key attributes of a SIMS mail server that determine server performance are:
Message Transport Queue Size. Messages passing through a SIMS server are stored in the IMTA message queues. The performance of the IMTA depends on the size of the queue. An extremely large Queue size can slow down the IMTA. |
Accessibility and responsiveness of core SIMS services (SMTP, IMAP, POP, LDAP). A large number of connections (POP, IMAP, SMTP. LDAP) may indicate that connections are being abnormally terminated, either because of a failure of a SIMS service or a deliberate denial of service attack. Either of these can lead to reduced performance of the SIMS server. |
System Resource Utilization. Normal SIMS operation contributes to system load and resource consumption, such as cpu, disk and virtual memory resources. To keep SIMS performing at an optimal level it is important to monitor system resources. |
To ensure high availability of the e-mail service, the key attributes of the system should be continually monitored. This will allow the system administrator to be alerted automatically whenever a potential problem develops. We recommend monitoring at least the attributes listed in TABLE 12-8.
Each monitoring utility allows thresholds to be set which define the upper bounds on the resource utilization, for example, the maximum amount of disk space used by the message store directory. When the threshold is exceeded, the utility can generate an alert or initiate the notification of a system operator via a pager (must be e-mail accessible). Establishing a reasonable set of thresholds depends on the performance characteristics of the system on which the component is running (e.g., low-end or high-end server) and the usage profile of your e-mail service. TABLE 12-8 shows some example default thresholds settings that we recommend monitoring.
1IMTA queue size - This threshold will depend on the function of the IMTA. If the IMTA is responsible for routing outbound messages into the Internet, 500 is too low. 1000-2000 messages in the mail queue due to DNS problems or remote IMTA problems that are beyond the control of the local ISP are not uncommon. This number will also likely vary based on size of the ISP. If you have more customers, the chances that a mail to an address that is having problems will increase.
For internal SMTP routers, or IMTA's whose only function it is to deliver to the local message store, 500 would be too high. Except for a broadcast delivery, if the queue gets to 500 on this type of IMTA there is a problem that needs to be diagnosed.
2IMTA HELD messages - Although any number of messages may be held in the .HELD file, you should investigate even a single message caught in this file.
3Number of IMAP, POP connections - The IMAP and POP are going to be different since IMAP is a persistent connection and higher than POP.
An alternative approach to these thresholds would be to run the monitoring commands at a time of day when the service is heavily loaded, and over a period of an hour, measure the values of these resources every 5 minutes. You can then set thresholds to the highest value plus 10%.
It is likely that you will need to tune these thresholds. You can tell if they are set too low, because the thresholds will be frequently exceeded even though the service is performing satisfactorily. On the other hand, of the service fails and you were not notified in advance, the threshold may be set too high. For example if you set the disk space threshold for the message store to 95% you might not be notified in time before the disk space is exhausted.
The monitoring utilities take snapshots of the status of the SIMS components. To provide ongoing coverage we recommend that the utilities be run periodically using a facility such as cron(1M).
The monitoring utilities use a threshold based alerting system. The administrator needs to set some threshold values and take some corrective action when they are exceeded.
For example, if the sizes of any of the IMTA queues are unusually large or there are a large number of messages in the .HELD queue, this can indicate message delivery or addressing problems. The monitoring utilities can be configured to automatically detect these scenarios and send alerts.
In this case the various SIMS components (IMTA, message access and store, directory server, admin server) are installed on the same system. So all the monitoring utilities are scheduled to run on a single system. Also let us assume that inetmail (the SIMS Postmaster) is the user to be notified when any threshold is exceeded. Note that in this example the administrator has set up a special account named 'test_user' with a password of `passwd' for the purposes of monitoring.
Scenario 1 Commands:
The alert mechanism sends e-mail alerts to inetmail and to all SIMS administrators, so if the SIMS server is having mail delivery problems the alerts may not be received. Use the -A option to specify an alternate SMTP server for the alert delivery channel. For optimal monitoring, schedule the utilities to run at an interval of 30 minutes (see the crontab(1) man page). Assuming that /opt/SUNWmail/sbin is in the PATH, the corresponding crontab entries for monitoring utilities using the thresholds listed above are as follows:
To alert inetmail when the IMTA queue size exceeds 500 messages or when the number of HELD messages exceeds 100: |
30 * * * * immonitor queue -s 500 -H 100 -r inetmail > /dev/null 2>&1 |
To alert inetmail when utilization of /var/opt/SUNWmail/ims exceeds 2 gigabytes or when the utilization of "/" exceeds 200 Megabytes OR when the number of established SMTP connections exceeds 2000. The back slash (\) before the "%" is required since "%" is a keyword for crontab. |
30 * * * * immonitor system -f /var/opt/SUNWmail/ims=2g -f /=200m -m 90\% -p smtp=2000 -r inetmail > /dev/null 2>&1 |
Alerts inetmail when the LDAP response time exceeds 5 minutes OR when mail sending time exceeds 2 minutes or when the mail retrieval time with POP/IMAP exceeds 5 minutes OR when the total round-trip time exceeds 5 minutes, use: |
30 * * * * immonitor access -u test_user -w passwd -L localhost:=300 -S localhost:=120 -I localhost:=300 -P localhost:=300 -D 300 -r inetmail > /dev/null 2>&1 |
In this scenario a pure proxy server (proxy-svr) acts as a proxy for IMAP and POP requests but not SMTP. The back-end server (msma-svr) contains the message store and supports SMTP in addition to IMAP and POP protocols. Finally a dedicated server (dir-svr) provides directory services. For additional information on proxy servers, see Appendix A, "Configuring SIMS as a Proxy Message Access Server.
Since each server isn't running the full set of SIMS services, we need to tailor the monitoring configuration on each server. In this case the services we need to monitor in the various systems are:
1. | proxy-svr: IMAP/POP access and response time, resource utilization (cpu, disk). | |
2. | msma-svr: IMAP/POP/SMTP access and response time, message round-trip delivery time, message store capacity and cpu utilization. | |
3. | dir-svr: LDAP access and response time, directory capacity and CPU utilization. |
We will assume the threshold values listed in the table below. As in the previous example, the alert mechanism sends e-mail alerts to inetmail and to all SIMS administrators, so if the SIMS server is having mail delivery problems the alerts may not be received. Use the -A option to specify an alternate SMTP server for the alert delivery channel. For optimal monitoring, schedule the utilities to run at an interval of 30 minutes (see the crontab(1)).
Thresholds and crontab entries for proxy-svr
The access times for IMAP and POP (items 1 and 2) can be checked with a single command: |
30 * * * * immonitor access -u test_user -w passwd -I proxy-svr:=120 -P proxy-svr:=150 -r inetmail > /dev/null 2>&1 |
The number of established IMAP and POP connections (items 3 and 4) can be checked with the command: |
40 * * * * immonitor system -p IMAP=500 -p POP3=250 -r inetmail > /dev/null 2>&1 |
Virtual memory usage (5) and the utilization of the /var/opt partition (6) can be measured with the following entry: |
0 * * * * immonitor system -f /var/opt=2g -m 20g -m 90\ -r inetmail > /dev/null 2>&1 |
Thresholds and crontab entries for msma-svr
SMTP / IMAP: |
20 * * * * immonitor access -u test_user -w passwd -S msma-svr:=60 -I msma-svr:=90 -k "Testing" -r inetmail > /dev/null 2>&1 |
SMTP / POP: |
40 * * * * immonitor access -u test_user -w passwd -S msma-svr:=60 -P msma-svr:=120 -k "Testing" -r inetmail > /dev/null 2>&1 |
If the IMAP or POP threshold is exceeded, messages sent using the -S option will not be deleted, so it is necessary to delete these messages periodically using another crontab entry: |
0 * * * * immonitor access -u test_user -w passwd -I msma-svr:=600 -z -k "Testing" > /dev/null 2>&1 |
Established connections for POP, IMAP and SMTP can be checked with one command: |
10 * * * * immonitor system -p POP3=200 -p IMAP=500 -p SMTP=100 -r inetmail > /dev/null 2>&1 |
Virtual memory usage and the size of the message store (/var/opt/SUNWmail/ims) can be measured with the following crontab entry: |
10 * * * * immonitor system -f /var/opt/SUNWmail/ims=2g -m 90\ -r inetmail > /dev/null 2>&1 |
Thresholds and crontab entries for dir-svr
Monitored Value
Threshold
1) LDAP bind, search time
10 seconds
2) Directory size
200 megabytes
3) Solaris: virtual memory usage
90%
LDAP: |
20 * * * * immonitor access -u test_user -L dir-svr:=10 -r inetmail > /dev/null 2>&1 |
Virtual memory usage (3) and the size of the directory (2) (/var/opt/SUNWconn/ldap) can be measured with the following crontab entry: |
10 * * * * immonitor system -f /var/opt/SUNWconn/ldap=200m -m 90\ -r inetmail > /dev/null 2>&1 |
The Sun Directory Service is SNMP enabled. It supports the X.500 Directory Monitoring MIB (rfc1567). It can therefore be monitored by any SNMP enabled management tool. Refer to Netscape Directory Service documentation for information on Netscape SNMP support.
To enable SNMP monitoring of the Sun Directory Service you should first enable the SNMP agent dsnmpserv(1m). The following describes how to enable this service:
1. | Configure the SNMP configuration file /etc/opt/SUNWconn/ldap/current/dsnmpserv.conf by running the following command: |
2. | Start the agent: |