Sun Java System Messaging Server 6.3 Administration Guide

A.5 SNMP Information from the Messaging Server

This section summarizes the Messaging Server information provided via SNMP. It consists of the following subsection:

For detailed information refer to the individual MIB tables in RFC 2788 and RFC 2789. Note that the RFC/MIB terminology refers to the messaging services (MTA, HTTP, etc.) as applications (appl), Messaging Server network connections as associations (assoc), and MTA channels as MTA groups (mtaGroups).

Note that on platforms where more than one instance of Messaging Server may be concurrently monitored, there may then be multiple sets of MTAs and servers in the applTable, and multiple MTAs in the other tables.

Note –

The cumulative values reported in the MIBs (e.g., total messages delivered, total IMAP connections, etc.) are reset to zero after a reboot.

Each site will have different thresholds and significant monitoring values. A good SNMP client will allow you to do trend analysis and then send alerts when sudden deviations from historical trends occur.

A.5.1 applTable

The applTable provides server information. It is a one-dimensional table with one row for the MTA and an additional row for each of the following servers, if enabled: WebMail HTTP, IMAP, POP, SMTP, and SMTP Submit. This table provides version information, uptime, current operational status (up, down, congested), number of current connections, total accumulated connections, and other related data.

Below is an example of data from applTable (mib-2.27.1.1).

            
applTable:

    applName.1  = mailsrv-1  MTA on mailsrv-1.west.sesta.com      (1)
    applVersion.1 = 5.1
    applUptime.1 = 7322                         (2)
    applOperStatus.1 = up                       (3)
    applLastChange.1 = 7422                     (2)
    applInboundAssociations.1 =                 (5)
    applOutboundAssociations.1 =                (2)
    applAccumulatedInboundAssociations.1 = 873
    applAccumulatedOutboundAssociations.1 = 234
    applLastInboundActivity.1 = 1054822          (2)
    applLastOutboundActivity.1 = 1054222         (2)
    applRejectedInboundAssociations.1 = 0        (4)
    applFailedOutboundAssociations.1 = 17
    applDescription.1 = Sun Java System Messaging Server 6.1
    applName.2 1 = mailsrv-1 HTTP WebMail svr. mailsrv-1.sesta.com (1)
    ...
    applName.3 = mailsrv-1 IMAP server on mailsrv-1.west.sesta.com
    ...
    applName.4 = mailsrv-1 POP server on mailsrv-1.west.sesta.com
    ...
    applName.5 = mailsrv-1 SMTP server on mailsrv-1.west.sesta.com
    ...
    applName.6 = mailsrv-1 SMTP Submit server on mailsrv-1.west.sesta.com
    ...

Notes:

The application (.appl*) suffixes (.1, .2, etc.) are the row numbers, applIndex. applIndex has the value 1 for the MTA, value 2 for the HTTP server, etc. Thus, in this example, the first row of the table provides data on the MTA, the second on the POP server, etc.

The name after the equal sign is the name of the Messaging Server instance being monitored. In this example, the instance name is mailsrv-1.
These are SNMP TimeStamp values and are the value of sysUpTime at the time of the event. sysUpTime, in turn, is the count of hundredths of seconds since the SNMP master agent was started.
The operational status of the HTTP, IMAP, POP, SMTP, and SMTP Submit servers is determined by actually connecting to them via their configured TCP ports and performing a simple operation using the appropriate protocol (for example, a HEAD request and response for HTTP, a HELO command and response for SMTP, and so on). From this connection attempt, the status—up (1), down (2), or congested (4)—of each server is determined.

Note that these probes appear as normal inbound connections to the servers and contribute to the value of the applAccumulatedInboundAssociations MIB variable for each server.

For the MTA, the operational status is taken to be that of the Job Controller. If the MTA is shown to be up, then the Job Controller is up. If the MTA is shown to be down, then the Job Controller is down. This MTA operational status is independent of the status of the MTA’s Service Dispatcher. The operational status for the MTA only takes on the value of up or down. Although the Job Controller does have a concept of “congested,” it is not indicated in the MTA status.
For the HTTP, IMAP, and POP servers the applRejectedInboundAssociations MIB variable indicates the number of failed login attempts and not the number of rejected inbound connection attempts.

A.5.1.1 applTable Usage

Monitoring server status (applOperStatus) for each of the listed applications is key to monitoring each server.

If it’s been a long time since the MTA last inbound activity as indicated by applLastInboundActivity, then something may be broken preventing connections. If applOperStatus=2 (down), then the monitored service is down. If applOperStatus=1 (up), then the problem may be elsewhere.

A.5.2 assocTable

This table provides network connection information to the MTA. It is a two-dimensional table providing information about each active network connection. Connection information is not provided for other servers.

Below is an example of data from applTable (mib-2.27.2.1).

assocTable:

    assocRemoteApplication.1.1  = 129.146.198.167        (1)
    assocApplicationProtocol.1.1 = applTCPProtoID.25     (2)
    assocApplicationType.1.1 = peerinitiator(3)          (3)
    assocDuration.1.1 = 400                              (4)
...

Notes:

In the .x.y suffix (1.1), x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y serves to enumerate each of the connections for the application being reported on.

The source IP address of the remote SMTP client.
This is an OID indicating the protocol being used over the network connection. aplTCPProtoID indicates the TCP protocol. The .n suffix indicates the TCP port in use and .25 indicates SMTP which is the protocol spoken over TCP port 25.
It is not possible to know if the remote SMTP client is a user agent (UA) or another MTA. As such, the subagent always reports peer-initiator; ua-initiator is never reported.
This is an SNMP TimeInterval and has units of hundredths of seconds. In this example, the connection has been open for 4 seconds.

A.5.2.1 assocTable Usage

This table is used to diagnose active problems. For example, if you suddenly have 200,000 inbound connections, this table can let you know where they are coming from.

A.5.3 mtaTable

This is a one-dimensional table with one row for each MTA in the applTable. Each row gives totals across all channels (referred to as groups) in that MTA for select variables from the mtaGroupTable.

Below is an example of data from applTable (mib-2.28.1.1).

mtaTable:

    mtaReceivedMessages.1 = 172778        
    mtaStoredMessages.1 = 19
    mtaTransmittedMessages.1 = 172815
    mtaReceivedVolume.1 = 3817744
    mtaStoredVolume.1 = 34
    mtaTransmittedVolume.1 = 3791155
    mtaReceivedRecipients.1 = 190055
    mtaStoredRecipients.1 = 21
    mtaTransmittedRecipients.1 = 3791134
    mtaSuccessfulConvertedMessages.1 = 0 (1)
    mtaFailedConvertedMessages.1 = 0
    mtaLoopsDetected.1 = 0               (2)

Notes:

The .x suffix (.1) provides the row number for this application in the applTable. In this example, .1 indicates this data is for the first application in the applTable. Thus, this is data on the MTA.

Only takes on non-zero values for the conversion channel.
Counts the number of .HELD message files currently stored in the MTA’s message queues.

A.5.3.1 mtaTable Usage

If mtaLoopsDetected is not zero, then there is a looping mail problem. Locate and diagnose the .HELD files in the MTA queue to resolve the problem.

If the system does virus scanning with a conversion channel and rejects infected messages, then mtaSuccessfulConvertedMessages will give a count of infected messages in addition to other conversion failures.

A.5.4 mtaGroupTable

This two-dimensional table provides channel information for each MTA in the applTable. This information includes such data as counts of stored (that is, queued) and delivered mail messages. Monitoring the count of stored messages, mtaGroupStoredMessages, for each channel is critical: when the value becomes abnormally large, mail is backing up in your queues.

Below is an example of data from mtaGroupTable (mib-2.28.2.1).

mtaGroupTable:

mtaGroupName.1.1 = tcp_intranet                1
        ...
mtaGroupName.1.2 = ims-ms
        ...
mtaGroupName.1.3 = tcp_local
    mtaGroupDescription.1.3 = mailsrv-1 MTA tcp_local channel
    mtaGroupReceivedMessages.1.3 = 12154
    mtaGroupRejectedMessages.1.3 = 0
    mtaGroupStoredMessages.1.3 = 2
    mtaGroupTransmittedMessages.1.3 = 12148
    mtaGroupReceivedVolume.1.3 = 622135
    mtaGroupStoredVolume.1.3 = 7
    mtaGroupTransmittedVolume.1.3 = 619853
    mtaGroupReceivedRecipients.1.3 = 33087
    mtaGroupStoredRecipients.1.3 = 2
    mtaGroupTransmittedRecipients.1.3 = 32817
    mtaGroupOldestMessageStored.1.3 = 1103
    mtaGroupInboundAssociations.1.3 = 5
    mtaGroupOutboundAssociations.1.3 = 2
    mtaGroupAccumulatedInboundAssociations.1.3 = 150262
    mtaGroupAccumulatedOutboundAssociations.1.3 = 10970
    mtaGroupLastInboundActivity.1.3 = 1054822
    mtaGroupLastOutboundActivity.1.3 = 1054222
    mtaGroupRejectedInboundAssociations.1.3 = 0
    mtaGroupFailedOutboundAssociations.1.3 = 0
    mtaGroupInboundRejectionReason.1.3 =
    mtaGroupOutboundConnectFailureReason.1.3 =
    mtaGroupScheduledRetry.1.3 = 0
    mtaGroupMailProtocol.1.3 = applTCPProtoID.25
    mtaGroupSuccessfulConvertedMessages.1.3 = 03     2
    mtaGroupFailedConvertedMessages.1.3 = 0
    mtaGroupCreationTime.1.3 = 0
    mtaGroupHierarchy.1.3 = 0
    mtaGroupOldestMessageId.1.3 = <01IFBV8AT8HYB4T6UA@red.iplanet.com>
    mtaGroupLoopsDetected.1.3 = 0                    3
    mtaGroupLastOutboundAssociationAttempt.1.3 = 1054222

Notes:

In the .x.y suffix (example: 1.1, 1.2. 1.3), x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y serves to enumerate each of the channels in the MTA. This enumeration index, mtaGroupIndex, is also used in the mtaGroupAssociationTable and mtaGroupErrorTable tables.

The name of the channel being reported on. In this case, the tcp_intranet channel.
Only takes on non-zero values for the conversion channel.
Counts the number of .HELD message files currently stored in this channel’s message queue.

A.5.4.1 mtaGroupTable Usage

Trend analysis on *Rejected* and *Failed* might be useful in determining potential channel problems.

A sudden jump in the ratio of mtaGroupStoredVolume to mtaGroupStoredMessages could mean that a large junk mail is bouncing around the queues.

A large jump in mtaGroupStoredMessages could indicate unsolicited bulk email is being sent or that delivery is failing for some reason.

If the value of mtaGroupOldestMessageStored is greater than the value used for the undeliverable message notification times (notices channel keyword) this may indicate a message which cannot be processed even by bounce processing. Note that bounces are done nightly so you will want to use mtaGroupOldestMessageStored > (maximum age + 24 hours) as the test.

If mtaGroupLoopsDetected is greater than 0, a mail loop has been detected.

A.5.5 mtaGroupAssociationTable

This is a three-dimensional table whose entries are indices into the assocTable. For each MTA in the applTable, there is a two-dimensional sub-table. This two-dimensional sub-table has a row for each channel in the corresponding MTA. For each channel, there is an entry for each active network connection which that channel has currently underway. The value of the entry is the index into the assocTable (as indexed by the entry’s value and the applIndex index of the MTA being looked at). This indicated entry in the assocTable is a network connection held by the channel.

In simple terms, the mtaGroupAssociationTable table correlates the network connections shown in the assocTable with the responsible channels in the mtaGroupTable.

Below is an example of data from mtaGroupAssociationTable (mib-2.28.3.1).

mtaGroupAssociationTable:

    mtaGroupAssociationIndex.1.3.1 = 1 1
    mtaGroupAssociationIndex.1.3.2 = 2
    mtaGroupAssociationIndex.1.3.3 = 3
    mtaGroupAssociationIndex.1.3.4 = 4
    mtaGroupAssociationIndex.1.3.5 = 5
    mtaGroupAssociationIndex.1.3.6 = 6
    mtaGroupAssociationIndex.1.3.7 = 7

Notes:

In the .x.y.z suffix, x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y indicates which channel of the mtaGroupTable is being reported on. In this example, 3 indicates the tcp_local channel. The z serves to enumerate the associations open to or from the channel.

The value here is an index into the assocTable. Specifically, x and this value become, respectively, the values of the applIndex and assocIndex indices into the assocTable. Or, put differently, this is saying that (ignoring the applIndex) the first row of the assocTable describes a network connection controlled by the tcp_local channel.

A.5.6 mtaGroupErrorTable

This is another three-dimensional table which gives the counts of temporary and permanent errors encountered by each channel of each MTA while attempting delivery of messages. Entries with index values of 4000000 are temporary errors while those with indices of 5000000 are permanent errors. Temporary errors result in the message being re-queued for later delivery attempts; permanent errors result in either the message being rejected or otherwise returned as undeliverable.

Below is an example of data from mtaGroupErrorTable (mib-2.28.5.1).

mtaGroupErrorTable:

    mtaGroupInboundErrorCount.1.1.4000000 1 = 0
    mtaGroupInboundErrorCount.1.1.5000000 = 0
    mtaGroupInternalErrorCount.1.1.4000000 = 0
    mtaGroupInternalErrorCount.1.1.5000000 = 0
    mtaGroupOutboundErrorCount.1.1.4000000 = 0
    mtaGroupOutboundErrorCount.1.1.5000000 = 0

    mtaGroupInboundErrorCount.1.2.4000000 1 = 0
    ...

    mtaGroupInboundErrorCount.1.3.4000000 1 = 0
    ...

Notes:

In the .x.y.z suffix, x is the application index, applIndex, and indicates which application in the applTable is being reported on. In this case, the MTA. The y indicates which channel of the mtaGroupTable is being reported on. In this example, 1 specifies the tcp_intranet channel, 2 the ims-ms channel, and 3 the tcp_local channel. Finally, the z is either 4000000 or 5000000 and indicates, respectively, counts of temporary and permanent errors encountered while attempting message deliveries for that channel.

A.5.6.1 mtaGroupErrorTable Usage

A large jump in error count may likely indicate an abnormal delivery problem. For instance, a large jump for a tcp_ channel may indicate a DNS or network problem. A large jump for the ims_ms channel may indicate a delivery problem to the message store (for example, a partition is full, stored problem, and so on).