Sun Java System Messaging Server 6.3 Administration Guide

26.3 Common MTA Problems and Solutions

This sections lists common problems and solutions for MTA configuration and operation.

26.3.1 TLS Problems

If, during SMTP dialog, the STARTTLS command returns the following error:

454 4.7.1 TLS library initialization failure

and if you have certificates installed and working for pop/imap access, check the following:

Protections/ownerships of the certificates have to be set so mailsrv account can access the files
The directory where the certificates are stored need to have protections/ownerships set such that the mailsrv account can access the files within that directory.

After changing protections and installing certificates, you must run:

stop-msg dispatcher 
start-msg dispatcher

Restarting should work, but it is better to shut it down completely, install the certificates, and then start things back up.

26.3.2 Changes to Configuration Files or MTA Databases Do Not Take Effect

If changes to your configuration, mapping, conversion, security, option, or alias files are not taking effect, check to see if you have performed the following steps:

Recompile the configuration (by running imsimta cnbuild).
Restart the appropriate processes (like imsimta restart dispatcher).
Re-establish any client connections.

26.3.3 The MTA Sends Outgoing Mail but Does Not Receive Incoming Mail

Most MTA channels depend upon a slave or channel program to receive incoming messages. For some transport protocols that are supported by the MTA (like TCP/IP and UUCP), you need to make sure that the transport protocol activates the MTA slave program rather than its standard server. Replacing the native sendmail SMTP server with the MTA SMTP server is performed as a part of the Messaging Server installation.

For the multi-threaded SMTP server, the startup of the SMTP server is controlled by the Dispatcher. If the Dispatcher is configured to use a MIN_PROCS value greater than or equal to one for the SMTP service, then there should always be at least one SMTP server process running (and potentially more, according to the MAX_PROCS value for the SMTP service). The imsimta process command may be used to check for the presence of SMTP server processes. See imsimta process in Sun Java System Messaging Server 6.3 Administration Reference for more information.

26.3.4 Dispatcher (SMTP Server) Won’t Start Up

If the dispatcher won’t start up, first check the dispatcher.log-* for relevant error messages. If the log indicates problems creating or accessing the /tmp/.SUNWmsgsr.dispatcher.socket file, then verify that the /tmp protections are set to 1777. This would show up in the permissions as follows:

drwxrwxrwt 8 root sys 734 Sep 17 12:14 tmp/
.

Also do an ls -l of the .SUNWmsgsr.dispatcher.socket file and confirm the proper ownership. For example, if this is created by root, then it is inaccessible by inetmail.

Do not remove the .SUNWmsgsr.dispatcher.file and do not create it if it’s missing. The dispatcher will create the file. If protections are not set to 1777, the dispatcher will not start or restart because it won’t be able to create/access the socket file. In addition, there may be other problems occurring not related to the Messaging Server.

26.3.5 Timeouts on Incoming SMTP connections

Timeouts on incoming SMTP connections are most often related to system resources and their allocation. The following techniques can be used to identify the causes of timeouts on incoming SMTP connections:

To Identify the Causes of Timeouts on Incoming SMTP Connections

Check how many simultaneous incoming SMTP connections you allow. This is controlled by the MAX_PROCS and MAX_CONNS Dispatcher settings for the SMTP service; the number of simultaneous connections allowed is MAX_PROCS*MAX_CONNS. If you can afford the system resources, consider raising this number if it is too low for your usage.

Another technique you can use is to open a TELNET session.

In the following example, the user connects to 127.0.0.1 port 25. Once connected, 220 banner is returned. For example:
telnet 127.0.0.1 25 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is ’^]’. 220 budgie.sesta.com --Server ESMTP (Sun Java System Messaging Server 6.1 (built May 7 2001))
If you are connected and receive a 220 banner, but additional commands (like ehlo and mail from) do not illicit a response, then you should run imsimta test -rewrite to ensure that the configuration is correct.

If the response time of the 220 banner is slow, and if running the pstack command on the SMTP server shows the following iii_res* functions (these functions indicate that a name resolution lookup is being performed):
febe2c04 iii_res_send (fb7f4564, 28, fb7f4de0, 400, fb7f458c, fb7f4564) + 42c febdfdcc iii_res_query (0, fb7f4564, c, fb7f4de0, 400, 7f) + 254
then it is likely that the host has to do reverse name resolution lookups, even on a common pair like localhost/127.0.0.1. To prevent such a performance slowdown, you should reorder your host’s lookups in the /etc/nsswitch.conf file. To do so, change the following line in the /etc/nsswitch.conf file from:
hosts: dns nis [NOTFOUND=return] files
to:
hosts: files dns nis [NOTFOUND=return]
Making this change in the /etc/nsswitch.conf file can improve performance as fewer SMTP servers have to handle messages instead of multiple SMTP servers having to perform unnecessary lookups.

You can also put the slave_debug keyword on the channels handling incoming SMTP over TCP/IP mail, usually tcp_local and tcp_intranet. After doing so, review the most recent tcp_local_slave.log-uniqueid files to identify any particular characteristics of the messages that time out. For example, if incoming messages with large numbers of recipients are timing out, consider using the expandlimit keyword on the channel.

Remember that if your system is overloaded and overextended, timeouts will be difficult to avoid entirely.

26.3.6 Messages are Not Dequeued

Errors encountered during TCP/IP delivery are often transient; the MTA will generally retain messages when problems are encountered and retry them periodically. It is normal on large networks to experience periodic outages on certain hosts while other host connections work fine. To verify the problem, examine the log files for errors relating to delivery attempts. You may see error messages such as, “Fatal error from smtp_open.” Such errors are not uncommon and are usually associated with a transient network problem. To debug TCP/IP network problems, use utilities like PING, TRACEROUTE, and NSLOOKUP.

The following example shows the steps you might use to see why a message is sitting in the queue awaiting delivery to xtel.co.uk. To determine why the message is not being dequeued, you can recreate the steps the MTA uses to deliver SMTP mail on TCP/IP.

% nslookup -query=mx xtel.co.uk (Step 1)
            
Server: LOCALHOST
Address: 127.0.0.1

Non-authoritative answer:
XTEL.CO.UK  preference = 10, mail exchanger = nsfnet-relay.ac.uk (Step 2)

% telnet nsfnet-relay.ac.uk 25 (Step 3)
Trying... [128.86.8.6]
telnet: Unable to connect to remote host: Connection refused

Use the NSLOOKUP utility to see what MX records, if any, exist for this host. If no MX records exist, then you should try connecting directly to the host. If MX records do exist, then you must connect to the designated MX relays. The MTA honors MX information preferentially, unless explicitly configured not to do so. See also 12.4.3.5 TCP/IP MX Record Support.
In this example, the DNS (Domain Name Service) returned the name of the designated MX relay for xtel.co.uk. This is the host to which the MTA will actually connect. If more than one MX relay is listed, the MTA will try each MX record in succession, with the lowest preference value tried first.
If you do have connectivity to the remote host, you should check if it is accepting inbound SMTP connections by using TELNET to the SMTP server port 25.

Note –
If you use TELNET without specifying the port, you will discover that the remote host accepts normal TELNET connections. This does not indicate that it accepts SMTP connections; many systems accept regular TELNET connections but refuse SMTP connections and vice versa. Consequently, you should always do your testing against the SMTP port.

In the previous example, the remote host is refusing connections to the SMTP port. This is why the MTA fails to deliver the message. The connection may be refused due to a misconfiguration of the remote host or some sort of resource exhaustion on the remote host. In this case, nothing can be done to locally to resolve the problem. Typically, you should let the MTA continue to retry the message.

If you are running Messaging Server on a TCP/IP network that does not use DNS, you can skip the first two steps. Instead, you can use TELNET to directly access the host in question. Be careful to use the same host name that the MTA would use. Look at the relevant log file from the MTA’s last attempt to determine the host name. If you are using host files, you should make sure that the host name information is correct. It is strongly recommended that you use DNS instead of host names.

Note that if you test connectivity to a TCP/IP host and encounter no problems using interactive tests, it is quite likely that the problem has simply been resolved since the MTA last tried to deliver the message. You can re-run the imsimta submit tcp_channel on the appropriate channel to see if messages are being dequeued.

26.3.6.1 To Create a New Channel

In certain circumstances, a remote domain can break down and the volume of mail addressed to this server can be so great that the outgoing channel queue will fill up with messages that cannot be delivered. The MTA tries to redeliver these messages periodically (the frequency and number of the retries is configurable using the backoff keywords) and under normal circumstances, no action is needed. However, if too many messages get stuck in the queue, other messages may not get delivered in a timely manner because all the channel jobs are working to process the backlog of messages that cannot be delivered.

In this situation, you can reroute these messages to a new channel running in its own job controller pool. This will avoid contention for processing and allow the other channels to deliver their messages. This procedure is described below. We assume a domain called siroe.com

To Create a New Channel

Create a new channel called tcp_siroe-daemon and add a new value for the pool keyword.

Channels are created in the channel block section of /msg-svr-base/config/imta.cnf. The channel should have the same channel keywords on your regular outgoing tcp_* channel. Typically, this is the tcp_local channel, which handles all outbound (internet) traffic. Since siroe.com is out on the internet, this is the channel to emulate. The new channel may look something like this:
```
tcp_siroe smtp nomx single_sys remotehost inner allowswitchchannel     \
dentnonenumeric subdirs 20 maxjobs 7 pool SMTP_SIROE maytlsserver      \
maysaslserver saslswitchchannel tcp_auth missingrecipientpolicy 0      \    
tcp_siroe-daemon
```
Note the new keyword-value pair pool SMTP_SIROE. This specifies that messages to this channel will only use computer resources from the SMTP_SIROE pool. Note also that a blank line is required before and after the new channel.

Add two rewrite rules to the rewrite rule section of the imta.cnf file to direct email destined for siroe.com to the new channel.

The new rewrite rules look like this:
siroe.com $U%$D@tcp_siroe-daemon .siroe.com $U%$H$D@tcp_siroe-daemon
These rewrite rules will direct messages to siroe.com (including addresses like host1.siroe.com or hostA.host1.siroe.com) to the new channel whose official host name is tcp_siroe-daemon. The rewriting part of these rules, $U%$D and $U%$H$D, retain the original addresses of the messages. $U copies the user name from original address. % is the separator—the @ between the username and domain. $H copies the unmatched portion of host/domain specification at the left of dot in pattern. $D copies the portion of domain specification that matched.

Define a new job controller pool called SMTP_SIROE.

In /msg-svr-base/config/job_controller.cnf add the following:
[POOL=SMTP_SIROE] job_limit=10
This creates a message resource pool called SMTP_SIROE that allows up to 10 jobs to be simultaneously run. Be sure not to leave any blank lines between this pool definition and the others. See 8.7 The Job Controller for details on jobs and pools.

Restart the MTA.

Issue the commands: imsimta cnbuild;imsimta restart

This recompiles the configuration and restarts the job controller and dispatcher.

In this example, a large quantity of email from your internal users is destined for a particular remote site called siroe.com. For some reason, siroe.com, is temporarily unable to accept incoming SMTP connections and thus cannot deliver email. (This type of situation is not a rare occurence.)

As email destined for siroe.com comes in, the outgoing channel queue, typically tcp_local, will fill up with messages that cannot be delivered. The MTA tries to redeliver these messages periodically (the frequency and number of the retries is configurable using the backoff keywords) and under normal circumstances, no action is needed.

However, if too many messages get stuck in the queue, other messages may not get delivered in a timely manner because all the channel jobs are working to process the backlog of siroe.com messages. In this situation, you may wish reroute siroe.com messages to a new channel running in its own job controller pool (see 8.7 The Job Controller). This will allow the other channels to deliver their messages without having to contend for processing resources used by siroe.com messages. Creating a new channel to address this situation is described below.

26.3.7 MTA Messages are Not Delivered

In addition to message transport problems, there are two common problems which can result in unprocessed messages in the message queues:

The queue cache is not synchronized with the messages in the queue directories. Message files in the MTA queue subdirectories that are awaiting delivery are entered into an in-memory queue cache. When channel programs run, they consult this queue cache to determine which messages to deliver in their queues. There are circumstances where there are message files in the queue, but there is no corresponding queue cache entry.
1. To check if a particular file is in the queue cache, you can use the imsimta cache -view utility; if the file is not in the queue cache, then the queue cache needs to be synchronized.
  
  The queue cache is normally synchronized every four hours. If required, you can manually resynchronize the cache by using the command imsimta cache -sync. Once synchronized, the channel programs will process the originally unprocessed messages after new messages are processed. If you want to change the default (4 hours), you should modify the job_controller.cnf file in directory msg-svr-base/config by adding sync_time=timeperiod where timeperiod reflects how often the queue cache is synchronized. Note that the timeperiod must be greater than 30 minutes. In the following example, the queue cache synchronization is modified to 2 hours by adding the sync_time=02:00 to the global defaults section of the job_controller.cnf:
  ! VERSION=5.0 !IMTA job controller configuration file ! !Global defaults tcp_port=27442 secret=N1Y9[HzQKW slave_command=NULL sync_time=02:00
  You can run imsimta submit channel to clear out the backlog of messages after running imsimta cache -sync. It is important to note that clearing out the channel may take a long time if the backlog of messages is large (greater than 1000).
  
  For summarized queue cache information, run imsimta qm -maint dir -database -total.
2. If after synchronizing the queue cache, messages are still not being delivered, you should restart the Job Controller. To do so, use the imsimta restart job_controller command.
  
  Restarting the Job Controller will cause the message data structure to be rebuilt from the message queues on disk.
  
  Caution –
  Restarting the Job Controller is a drastic step and should only be performed after all other avenues have been thoroughly exhausted.
  
  Refer 8.7 The Job Controller for more information on the Job Controller.
Channel processing programs fail to run because they cannot create their processing log file. Check the access permissions, disk space and quotas.

26.3.8 Messages are Looping

If the MTA detects that a message is looping, that message will be sidelined as a .HELD file. See 26.3.8.1 Diagnosing and Cleaning up .HELD Messages. Certain cases can lead to message loops which the MTA can not detect.

The first step is to determine why the messages are looping. You should look at a copy of the problem message file while it is in the MTA queue area, MTA mail log entries (if you have the logging channel keyword enabled in your MTA configuration file for the channels in question) relating to the problem message, and MTA channel debug log files for the channels in question. Determining the From: and To: addresses for the problem message, seeing the Received: header lines, and seeing the message structure (type of encapsulation of the message contents), can all help pinpoint which sort of message loop case you are encountering.

Some of the more common cases include:

A postmaster address is broken.

The MTA requires that the postmaster address be a functioning address that can receive email. If a message to the postmaster is looping, check that your configuration has a proper postmaster address pointing to an account that can receive messages.
Stripping of Received: header lines is preventing the MTA from detecting the message loop.

Normal detection of message loops is based on Received: header lines. If Received: header lines are being stripped (either explicitly on the MTA system itself, or on another system like a firewall), it can interfere with proper detection of message loops. In these scenarios, check that no undesired stripping of Received: header lines is occurring. Also, check for the underlying reason why the messages are looping. Possible reasons include: a problem in the assignment of system names or a system not configured to recognize a variant of its own name, a DNS problem, a lack of authoritative addressing information on the system in question, or a user address forwarding error.
Incorrect handling of notification messages by other messaging systems are generating reencapsulated messages in response to notification messages.

Internet standards require that notification messages (reports of messages being delivered, or messages bouncing) have an empty envelope From: address to prevent message loops. However, some messaging systems do not correctly handle such notification messages. When forwarding or bouncing notification messages, these messaging systems may insert a new envelope From: address. This can then lead to message loops. The solution is to fix the messaging system that is incorrectly handling the notification messages.

26.3.8.1 Diagnosing and Cleaning up .HELD Messages

If the MTA detects a serious problem having to do with delivery of a message, the message is stored in a file with the suffix .HELD in /msg-svr-base/data/queue/channel. For example:

% ls 
ZZ0HXZ00G0EBRBCP.HELD
ZZ0HY200C0O6LGHU.HELD
ZZ0HYA006LP66O3H.HELD
ZZ0HZ7003EOQSE37.HELD

.HELD files can occur due to three major reasons:

Looping messages. The MTA detected that the messages were looping via build-up of one or another sort of Received: header lines).
User or domain status set to hold. These are messages that are, by intent of the MTA administrator, intentionally being side-lined, typically while some maintenance procedure is being performed, (for example, while moving user mailboxes).
Suspicious messages. Messages that met some suspicion thresh hold and were held for later manual inspection by the MTA administrator. Messages can be .HELD due to exceeding a configured maximum number of envelope recipients (see the holdlimit channel keyword in 12.5.9 Expansion of Multiple Addresses), due to running the imsimta qclean in Sun Java System Messaging Server 6.3 Administration Reference, clean in Sun Java System Messaging Server 6.3 Administration Reference or hold in Sun Java System Messaging Server 6.3 Administration Reference commands based on some suspicion of the message(s) in question, or due to use of a hold action in a Sieve script.

Messages .HELD Due to Looping

Messages bouncing between servers or channels are said to be looping. Typically, a message loop occurs because each server or channel thinks the other is responsible for delivery of the message. Looping messages usually have a great many *Received: header lines. The Received: header lines will illustrate the exact path of the message loop. Look carefully at the host names and any recipient address information (for example, for recipientclauses or (ORCPT recipient)comments) appearing in such header lines. One cause of such message loops is user error.

For example, an end user may set an option to forward messages on two separate mail hosts to one another. On his sesta.com account, the end-user enables mail forwarding to his varrius.com account. And, forgetting that he has enabled this setting, he sets mail forwarding on his varrius.com account to his sesta.com account.

A loop can also occur with a faulty MTA configuration. For example, MTA Host X thinks that messages for mail.sesta.com go to Host Y. However, Host Y thinks that Host X should handle messages for mail.sesta.com; as a result, Host Y returns the mail to Host X.

In these cases, the message is ignored by the MTA and no further delivery is attempted. When such a problem occurs, look at the header lines in the message to determine which server or channel is bouncing the message. Fix the entry as needed.

Another common cause of message loops is the MTA receiving a message that was addressed to the MTA host using a network name that the MTA does not recognize (has not been configured to recognize) as one of its own names. The solution is to add the additional name to the list of names that your MTA recognizes as its own. Note that the MTA's thresh holds for determining that a message is looping are configurable; see the MAX_*RECEIVED_LINES option.dat options (Option File Format and Available Options in Sun Java System Messaging Server 6.3 Administration Reference). Also note that the MTA may optionally be configured--see the HELD_SNDOPR global MTA option--to generate a syslog notice whenever a message is forced into .HELD state due to exceeding such a thresh hold. If syslog messages of Received count exceeded; message held.are present, then you know that this is occurring.

You can resend the .HELD message by running release in Sun Java System Messaging Server 6.3 Administration Reference or following these steps:

Rename the .HELD extension to any 2 digit number other than 00. For example, .HELD to .06.

Note –
Before renaming the .HELD file, be sure that the message has stopped looping.
Run imsimta cache -sync. Running this command will update the cache.
Run imsimta submit channel or imsimta run channel.

It may be necessary to perform these steps multiple times, since the message may again be marked as .HELD, because the Received: header lines accumulate. If the problem still exists, the *.HELD file will be recreated under the same channel with as before. If the problem has been addressed, the messages will be dequeued and delivered.

If you determine that the messages can simply be deleted with no attempt to deliver them, see clean in Sun Java System Messaging Server 6.3 Administration Reference in the Sun Java System Messaging Server 6.3 Administration Reference.

Messages `.HELD` Due to User or Domain `hold` Status

Messages .HELD due to a user or domain status of hold--and only messages .HELD for such a reason--will normally be stored in the hold channel's queue area. That is, .HELD message files in the hold channel's queue area can be assumed to be .HELD due to user or domain status.

Messages .HELD Due to a Suspicious Characteristic

Messages .HELD due to some suspicious characteristic will of course exhibit that characteristic. The characteristic could be anything which the site has chosen to characterize as suspicious. MTA Administrators should stay aware of these configuration choices and actions. However, if you are not the only or original administrator of this MTA, then check the MTA configuration for any configured use of the holdlimit channel keyword (12.5.9 Expansion of Multiple Addresses), any use of the $H flag in address-based *_ACCESS mapping tables in the MTA mappings file, or any use of the hold action in any system Sieve file (the system level imta.filter file, or any channel level Sieve filters configured and named via use of sourcefilter or destinationfilter channel keywords; see 12.12.4 Specifying Mailbox Filter File Location); and ask any fellow MTA administrators about any manual command line message holds (through, for instance, an imsimta qm clean command) they might have recently performed. Note also that application of a Sieve filter hold action, whether from a system Sieve filter or from users' personal Sieve filters, may optionally be logged; see the LOG_FILTER global MTA option (Option File Format and Available Options in Sun Java System Messaging Server 6.3 Administration Reference) for more information.

26.3.9 Received Message is Encoded

Messages sent by the MTA are received in an encoded format. For example:

Date: Wed, 04 Jul 2001 11:59:56 -0700 (PDT)
From: "Desdemona Vilalobos" <Desdemona@sesta.com> 
To: santosh@varrius.com
Subject: test message with 8bit data 
MIME-Version: 1.0
Content-type: TEXT/PLAIN; CHARSET=ISO-8859-1
Content-transfer-encoding: QUOTED-PRINTABLE

2=00So are the Bo=F6tes Void and the Coal Sack the same?=

These messages appear unencoded when read with the MTA decoder command imsimta decode. Refer to the Sun Java System Messaging Server Administration Reference for more information.

The SMTP protocol only allows the transmission of ASCII characters (a seven-bit character set) as set forth by RFC 821. In fact, the unnegotiated transmission of eight-bit characters is illegal via SMTP, and it is known to cause a variety of problems with some SMTP servers. For example, SMTP servers can go into compute bound loops. Messages are sent over and over again. Eight-bit characters can crash SMTP servers. Finally, eight-bit character sets can wreak havoc with browsers and mailboxes that cannot handle eight-bit data.

An SMTP client used to only have three options when handling a message containing eight-bit data: return the message to the sender as undeliverable, encode the message, or send it in direct violation of RFC 821. But with the advent of MIME and the SMTP extensions, there are now standard encodings which may be used to encode eight-bit data by using the ASCII character set.

In the previous example, the recipient received an encoded message with a MIME content type of TEXT/PLAIN. The remote SMTP server (to which the MTA SMTP client transferred the message) did not support the transfer of eight-bit data. Since the original message contained eight-bit characters, the MTA had to encode the message.

26.3.10 Server-Side Rules (SSR) Are Not Working

A filter consists of one or more conditional actions to apply to a mail message. Since the filters are stored and evaluated on the server, they are often referred to as server-side rules (SSR).

This section includes information on the following SSR topics:

26.3.10.1 Testing Your SSR Rules

To check the MTA’s user filters, use the command:

# imsimta test -rewrite -debug -filter user@domain

In the output, look for the following information:

mmc_open_url called to open ssrf:user@ims-ms
   URL with quotes stripped: ssrd: user@ims-ms
Determined to be a SSRD URL.
   Identifier: user@ims-ms-daemon
Filter successfully obtained.

In addition, you can add the slave_debug keyword to the tcp_local channel to see how a filter is applied. The results are displayed in the tcp_local_slave.log file. Be sure to add mm_debug=5 in the option.dat file in directory /msg-svr-base/config in order to get sufficient debugging information.

26.3.10.2 Common Syntax Problems

If there is a syntax problem with the filter, then look for the following message in the tcp_local_slave.log-* file:

Error parsing filter expression:...
- If the filter is good, then filter information will be at the end of the output.
- If the filter is bad, then the following error will be at the end of the output:Address list error -- 4.7.1 Filter syntax error: desdaemona@sesta.com
  
  Also, if the filter is bad, then the SMTP RCPT TO command will return a temporary error response code:
RCPT TO: user@domain 452 4.7.1 Filter syntax error

26.3.11 Slow Response After Users Press Send Email Button

If users are experiencing delays when they send messages, it may be because disk input/output is reduced due to insufficiently sized message queue disks. When users press the SEND button on their email client, the MTA will not fully accept receipt of the message until the message has been committed to the message queue. Information on message queue sizing can be found

26.3.12 Asterisks in the Local Parts of Addresses or Received Fields

The MTA now checks for 8-bit characters (instead of just ASCII characters) in the local parts of addresses as well as the received fields it constructs and replaces them with asterisks.