37 Monitoring the Message Store

This information describes message store monitoring tasks. See "Managing the Message Store and Mailboxes" for conceptual information.

Topics:

For More Information

General Message Store Monitoring Procedures

This section outlines standard monitoring procedures for the message store. These procedures are helpful for general message store checks, testing, and standard maintenance.

Topics in this section:

Checking Hardware Space

A message store should have enough additional disk space and hardware resources. When the message store is near the maximum limit of disk space and hardware space, problems might occur within the message store.

Inadequate disk space is one of the most common causes of the mail server problems and failure. Without space to write to the message store, the mail server will fail. In addition, when the available disk space goes below a certain threshold, there will be problems related to message delivery, logging, and so forth. Disk space can be rapidly depleted when the clean up function of the stored process fails and deleted messages are not expunged from the message store.

For information on monitoring disk space, see "Monitoring Disk Space."

Checking Log Files

Check the log files to make sure the message store processes are running as configured. Oracle Communications Messaging Server creates a separate set of log files for each of the major protocols, or services, it supports: SMTP, IMAP, POP, and HTTP. You can look at the log files in the MessagingServer_home/log/ directory. You should monitor the log files on a routine basis.

Be aware that logging can impact server performance. The more verbose the logging you specify, the more disk space your log files will occupy for a given amount of time. You should define effective but realistic log rotation, expiration, and backup policies for your server. For information about defining logging policies for your server, see "Using Message Store Log Messages."

Checking User IMAP/POP/Webmail Session by Using Telemetry

Messaging Server provides a feature called telemetry that can capture a user's entire IMAP, POP or HTTP session into a file. This feature is useful for debugging client problems. For example, if users complain that their message access client is not working as expected, this feature can be used to trace the interaction between the access client and Messaging Server.

To capture a POP session, create the following directory:

MessagingServer_home/data/telemetry/pop_or_imap_or_http/userid

To capture a POP session, create the following directory:

MessagingServer_home/data/telemetry/pop/userid

To capture an IMAP session, create the following directory:

MessagingServer_home/data/telemetry/imap/userid

To capture a Webmail session, create the following directory:

MessagingServer_home/data/telemetry/http/userid

Note: userid is "uid" for default domain and "uid@domain" for hosted domains.

Note that the directory must be owned or writable by the messaging server userid.

Messaging Server will create one file per session in that directory. Example output is shown below.

LOGIN redb 2003/11/26 13:03:21
>0.017>1 OK User logged in
<0.047<2 XSERVERINFO MANAGEACCOUNTURL MANAGELISTSURL MANAGEFILTERSURL
>0.003>* XSERVERINFO MANAGEACCOUNTURL {67}
http://redb@cuisine.blue.planet.com:800/bin/user/admin/bin/enduser 
MANAGELISTSURL NIL MANAGEFILTERSURL NIL
2 OK Completed
<0.046<3 select "INBOX"
>0.236>* FLAGS (\Answered flagged draft deleted \Seen $MDNSent Junk)
* OK [PERMANENTFLAGS (\Answered flag draft deleted \Seen $MDNSent Junk \*)]
* 1538 EXISTS
* 0 RECENT
* OK [UNSEEN 23]
* OK [UIDVALIDITY 1046219200]
* OK [UIDNEXT 1968]
3 OK [READ-WRITE] Completed
<0.045<4 UID fetch 1:* (FLAGS)
>0.117>* 1 FETCH (FLAGS (\Seen) UID 330)
* 2 FETCH (FLAGS (\Seen) UID 331)
* 3 FETCH (FLAGS (\Seen) UID 332)
* 4 FETCH (FLAGS (\Seen) UID 333)
* 5 FETCH (FLAGS (\Seen) UID 334)
<etc>

You can gather command telemetry that does not include end-user information by using the imap.logcommands msconfig option (or in legacy configuration local.imap.logcommands). See the Messaging Server Reference for additional information.

To disable the telemetry logging, move or remove the directory that you created.

Checking stored Processes

The stored function performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk. If stored stops running, Messaging Server will eventually run into problems. If stored does not start when start-msg is run, no other processes will start.

  • Check that the stored process is running. Run "imcheck."

  • Check for the log file build up in store_root/mboxlist.

  • Check for stored messages in the default log file MessagingServer_home/log/default/default.

  • Check that the time stamps of the following files (in directory MessagingServer_home/config/) are updated whenever one of the following functions are attempted by the stored process:

Table 37-1 stored Operations

stored Operation Function

stored.ckp

Touched when a database checkpoint was initiated. Stamped approximately every 1 minute.

stored.lcu

Touched at every database log cleanup. Time stamped approximately every 5 minutes.

stored.per

Touched at every spawn of peruser db write out. Time stamped once an hour.


For more information on the stored process, see "stored." For additional information on monitoring the stored function, see "Monitoring stored."

Checking Database Log Files

Database log files refer to sleepycat transaction checkpointing log files (in directory store_root/mboxlist). If log files accumulate, then database checkpointing is not occurring. In general, there are two or three database log files during a single period of time. If there are more files, it could be a sign of a problem.

Checking User Folders

If you want to check the user folders, you might run the command reconstruct -r -n (recursive no fix) which will review any user folder and report errors. For more information on the reconstruct command, see "Repairing Mailboxes and the Mailboxes Database (reconstruct Command)."

Checking for Core Files

Core files only exist when processes have unexpectedly terminated. It is important to review these files, particularly when you see a problem in the message store. On Oracle Solaris, use coreadm to configure core file location.

Monitoring imapd, popd and httpd

These processes provide access to IMAP, POP and Webmail services. If any of these is not running or not responding, the service will not function appropriately. If the service is running, but is over loaded, monitoring will allow you to detect this and configure it more appropriately.

Topics in this section:

Symptoms of imapd, popd and httpd Problems

Connections are refused or system is too slow to connect. For example, if IMAP is not running and you try to connect to IMAP directly you will see something like this:

telnet 0 143 Trying 0.0.0.0... telnet: Unable to connect to remote host: Connection refused

If you try to connect with a client, you will get a message such as:

"Client is unable to connect to the server at the location you have specified. The server may be down or busy."

To Monitor imapd, popd and httpd

  • Can be monitored with SNMP. If you have the SNMP set up, this is a very good way to monitor these processes. See "SNMP Support." The server information is in the Network Services Monitoring MIB.

  • Check log files. Look in the directory MessagingServer_home/log/service where _ service_ can be HTTP or IMAP or POP. In that directory you will find a number of log files. One filename is the name of the service (imap, pop, http) and the others are the name of the service plus a sequence number and a date concatenated to the service name. For example:

    imap imap.29.1010221593 imap.31.1010394412 imap.33.1010567224

The file with just the service name is the latest log. The other ones are ordered by the sequence number (here 29, 31, 33) and the one with the highest sequence number is the next newest one. (See "Using Message Store Log Messages.")

If a server was shut down you might see something like this:

imap.12.1065431243:[07/Oct/2003:01:15:43 -0700] gotmail-2 imapd[20525]: General Warning: Sun Java System Messaging Server IMAP4 6.1 (built Sep 24 2003) shutting down

  • Run the platform-specific command to verify that the imapd, popd and httpd processes are running. For example, in Oracle Solaris you can use the ps command and look for imapd, popd and mshttpd.

  • You can set alarms for specified server performance thresholds by setting the server response configuration options described in "Alarm Messages."

Monitoring stored

"stored" performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk. If stored stops running, the messaging server will eventually run into problems. If stored does not start when start-msg is run, no other processes will start. See "stored" for more information.

Topics in this section:

Symptoms of stored Problems

There are no outward symptoms.

To Monitor stored

  • Check that the stored process is running. stored creates and updates a pid file in MessagingServer_home/data/proc called store. The pid file shows an init state when recovering and a ready state when ready. For example:

    231: cat store 
    28250 
    ready
    

    The number on the first line is the process ID of stored.

    232: ps -eaf | grep stored 
    inetuser 28250 1 0 Jan 05 ? 8:44 
    /opt/SUNWmsgsr/lib/stored -d
    
  • Check for log file build up in MessagingServer_home/store/mboxlist. Note that not every log file build up is caused by direct stored problems. Log files may also build up if imapd dies or there is a database problem.

  • Check the timestamp on the following files in MessagingServer_home/config:

    stored.ckp - Touched when attempt at checkpointing is made. Should get time stamped every 1 minute.

    stored.lcu - Touched at every db log cleanup. Should get time stamped every 5 minutes.

    stored.per - Touched at every spawn of peruser db writeout. Should get time stamped every 60 minutes.

  • Check for stored messages in the default log file MessagingServer_home/log/default/default

  • Can be monitored with watcher and msprobe. See "Automatic Restart of Failed or Unresponsive Services" and "Monitoring Using msprobe and watcher Functions."

Monitoring the State of Message Store Database Locks

The state of database-locks is held by different server processes. These database locks can affect the performance of the message store. In case of deadlocks, messages will not be getting inserted into the store at reasonable speeds and the ims-ms channel queue will grow larger as a result. There are legitimate reasons for a queue to back up, so it is useful to have a history of the queue length in order to diagnose problems.

Topics in this section:

Symptoms of Message Store Database Lock Problems

Number of transactions are accumulating and not resolving.

To Monitor Message Store Database Locks

Use the command "imcheck" -s (used to be counterutil -o db_lock).

To Monitor Mailbox Quotas and Usage

You can monitor mailbox quota usage and limits by using the "imquotacheck" utility. The imquotacheck utility generates a report that lists defined quotas and limits, and provides information on quota usage.

For example, the following command lists all user quota information:

% imquotacheck 
-------------------------------------------------------------------------
Domain red.example.com (diskquota = not set msgquota = not set) quota usage
-------------------------------------------------------------------------
diskquota         size(K)    %use    msgquota      msgs    %use    user
# of domains = 1
# of users = 705
no quota          50418             no quota      4392             ajonk
no quota              5             no quota      2                andrt
no quota         355518             no quota      2500             ansri
 ...

The following example shows the quota usage for user sorook:

% imquotacheck -u sorook
-------------------------------------------------------------------------
quota usage for user sorook
-------------------------------------------------------------------------
diskquota      size(K)    %use    msgquota      msgs     %use    user
no quota       1487               no quota      305              sorook

To list the usage of all users whose quota exceeds the least threshold in the rule file:

imquotacheck

To list quota information for a the domain example.com:

imquotacheck -d example.com

To send a notification to all users in accordance to the default rule file:

imquotacheck -n

To send a notification to all users in accordance to a specified rulefile, myrulefile, and to a specified mail template file, mytemplate.file (for more information, refer to "imquotacheck"):

imquotacheck -n -r myrulefile -t mytemplate.file

To list per folder usages for one user user1 (will ignore the rule file):

imquotacheck -u user1 -e

To Monitor Message Store Database Statistics with imcheck

Use imcheck -s to monitor database statistics including logs and transactions. See "imcheck."

Gathering Message Store Counter Statistics by Using counterutil

Topics in this section:

To Get a Current List of Available Counter Objects

This utility provides statistics acquired from different system counters. (See "counterutil.")

Here is how to get a current list of available counter objects:

# counterutil -l
Listing registry (/opt/sun/comms/messaging64/data/counter/counter)
numobjects = 7
refcount = 20
created = 17/Mar/2015:14:10:03 +0000
modified = 24/Aug/2015:13:00:24 +0000
counterobjects:
  imapstat
  popstat
  alarm
  serverresponse
  diskusage
  httpstat
  mmpstat

Each entry represents a counter object and supplies a variety of useful counts for this object. In this section we will only be discussing the alarm, diskusage, serverresponse, popstat, imapstat, and httpstat counter objects. For details on counterutil command usage, refer to "counterutil."

counterutil Output

"counterutil" has a variety of flags. A command format for this utility may be as follows:

counterutil -oCounterObject-i 5 -n 10

where,

-oCounterObject represents the counter object alarm, diskusage, serverresponse, popstat, imapstat, and httpstat.

-i 5 specifies a 5 second interval.

-n 10 represents the number of iterations (default: infinity).

An example of counterutil usage is as follows:

# counterutil -o imapstat -i 5 -n 10 
Monitor counteroobject (imapstat) 
registry /gotmail/iplanet/server5/msg-gotmail/counter/counter opened 
counterobject imapstat opened 
count = 1 at 972082466 rh = 0xc0990 oh = 0xc0968 
global.currentStartTime [4 bytes]: 17/Oct/2000:12:44:23 -0700 
global.lastConnectionTime [4 bytes]: 20/Oct/2000:15:53:37 -0700 
global.maxConnections [4 bytes]: 69 
global.numConnections [4 bytes]: 12480 
global.numCurrentConnections [4 bytes]: 48 
global.numFailedConnections [4 bytes]: 0 
global.numFailedLogins [4 bytes]: 15 
global.numGoodLogins [4 bytes]: 10446 
...

Gathering Alarm Statistics by Using counterutil

These alarm statistics refer to the alarms sent by stored. The alarm counter provides the following statistics:

Table 37-2 counterutil alarm Statistics

Suffix Description

alarm.countoverthreshold

Number of times crossing threshold.

alarm.countwarningsent

Number of warnings sent.

alarm.current

Current monitored valued.

alarm.high

Highest ever recorded value.

alarm.low

Lowest ever recorded value.

alarm.timelastset

The last time current value was set.

alarm.timelastwarning

The last time warning was sent.

alarm.timereset

The last time reset was performed.

alarm.timestatechanged

The last time alarm state changed.

alarm.warningstate

Warning state (yes(1) or no(0)).


IMAP, POP, HTTP, and MMP Connection Statistics by Using counterutil

To get information on the number of current IMAP, POP, HTTP, and MMP connections, number of failed logins, total connections from the start time, and so forth, you can use the command counterutil -oCounterObject-i 5 -n 10. Where CounterObject represents the counter object popstat, imapstat, httpstat, or mmpstat. For mmpstat, we have modified the counter names to differentiate the services IMAP and POP since the MMP proxies both. The meaning of the imapstat suffixes is shown in Table 37-3. The popstat and httpstat objects provide the same information in the same format and structure.

Table 37-3 counterutil imapstat Statistics

Suffix Description

currentStartTime

Start time of the current IMAP server process.

lastConnectionTime

The last time a new client was accepted.

maxConnections

Highest recorded number of concurrent TCP connections handled by IMAP server since the last counter reset.

numConnections

Total number of TCP connections successfully accepted by the current IMAP server. numConnections can include failed connections, but not always.

numCurrentConnections

Current number of active TCP connections.

numFailedConnections

Total number of failed TCP connections by the current IMAP server. This number accumulates until the server restart or reset by "counterutil." numFailedConnections counts connections abnormally terminated, including unsuccessful accepts and connections successfully accepted but which had an error later. An error message is logged when a connection failed with an expected error. You can check your IMAP log files for error messages such as the following:

Unable to accept client connection: <error message>
Socket error : <error message>

numFailedLogins

Number of failed system logins served by the current IMAP server.

numGoodLogins

Number of successful system logins served by the current IMAP server.


Disk Usage Statistics by Using counterutil

The command counterutil -o diskusage generates following information:

Table 37-4 counterutil diskusage Statistics

Suffix Description

diskusage.availSpace

Total space available in the disk partition. The values are scaled to fit in the 4 byte counter. If you have a very large file system, the actual number will be divided by 1024 until it is small enough to fit in the 32-bit integer.

diskusage.lastStatTime

The last time statistic was taken.

diskusage.mailPartitionPath

Mail partition path.

diskusage.percentAvail

Disk partition space available percentage.

diskusage.totalSpace

Total space in the disk partition. The values are scaled to fit in the 4 byte counter. If you have a very large file system, the actual number will be divided by 1024 until it is small enough to fit in the 32-bit integer.


Server Response Statistics

The command counterutil -o serverresponse generates following information. This information is useful for checking if the servers are running, and how quickly they're responding.

Table 37-5 counterutil serverresponse Statistics

Suffix Description

http.laststattime

Last time http server response was checked.

http.responsetime

Response time for the http.

imap.laststattime

Last time imap server response was checked.

imap.responsetime

Response time for the imap.

pop.laststattime

Last time pop server response was checked.

pop.responsetime

Response time for the pop.