|Oracle® Mail Administrator's Guide
10g Release 1 (10.1.1)
Part Number B14491-03
This appendix outlines identification and resolution paths for issues affecting Oracle Mail. An assumption is made that, prior to any problems or issues and under similar load, the system behaved correctly, something changed, and now a problem has presented itself.
See Also:"Tuning Oracle Mail" in Oracle Collaboration Suite Administrator's Guide for tuning due to an increased load
This appendix includes the following topics:
This section discusses common problems with mail protocols, mail queues, and housekeeping.
Oracle Enterprise Manager 10g, including the Oracle Enterprise Manager 10g Grid Control Console and Oracle Enterprise Manager 10g Application Server Control Console for Collaboration Suite
An administrator can be alerted to a problem by a metric or beacon transaction exceeding a threshold. This alert can come from any of the Oracle Mail metrics, delivery time, slow protocol transaction times, or if the availability drops below a defined level. An alert can be generated for any event that could degrade, or has substantially degraded, the quality of service.
Tip:Oracle Enterprise Manager Configuration for Oracle Collaboration Suite for information about setting thresholds
A user calls in with a complaint.
Scripts and notification tools can be set up to periodically examine log files for errors. Viewing log files is the principal tool for identifying process-related issues. Within Oracle Enterprise Manager 10g, one can view the Oracle Mail-related logs. The
esd_logscan.pl script, located in the
$ORACLE_HOME/oes/admin directory, is shipped to search information in log files.
Process logs can be found in the
server-type directory. For example, List Server logs are found in the
See Also:Appendix B, "Oracle Mail Error Messages" for a list of error messages, possible causes, and actions to resolve the errors
Scripts and command-line tools
oesmon command-line utility provides command-line access to near real-time metric data in raw format. This metric data is sampled periodically by Oracle Enterprise Manager 10g Grid Control and can be viewed in charted form using the Oracle Enterprise Manager 10g Grid Control Console. The
oesmon utility directly contacts running servers and returns current values, whereas the Grid Control Console charts typically sample at a rate of once every ten minutes.
See Also:Chapter 7, "Monitoring Oracle Mail" for more information about
Oracle Mail also ships with various utility scripts located in the
$ORACLE_HOME/oes/admin directory to analyze a running system. These server diagnostic scripts all have names starting with
esd_. For example, the
esd_mail_queue.sql script can be used to monitor the activity of the mail queues and see if there are messages stuck in the queues. Also, the
esd_logscan.pl Perl script can be used to search for errors in the log files and format the information in a way that it is easier to read than the raw log file.
Administrators doing ad-hoc monitoring of the system, or other sources.
If there is a process consuming an unusually large amount of resources, monitoring the CPU usage will bring this process to the highest level. Use a system monitoring tool such as
top to view process CPU usage.
Excessive memory usage by a process will degrade the performance of all system's processes, in addition to the overall performance of the system. Use a system monitoring tool such as
top to view process memory usage.
Every transaction in the database is written to an archive log file that is sized in advance by the database administrator. If the transactions can no longer be written to file because of space constraints, the database will seize and all end-user connections will also seize. Therefore, it is imperative that the administrator monitor disk space capacity to insure that disk space is available for a continuous operation.
Log file entries are continuously written to various file locations within the Oracle Mail directory structure. These files can become quite large and can cause some disruption if the directory gets full. Although the log files turn over based on the size to which they are set, it is good practice to monitor the disk to which log files are being written.
Use the command
df -k to view the current and available disk usage.
Note:This appendix only addresses issues that are specific to administrators. Troubleshooting and FAQs for end-users can be found
End-user issues are commonly identified by a user contacting a help desk. These issues range from login issues to simple functionality issues. They can be local to a single account or to a wider audience of users experiencing the same problem.
To best determine how to evaluate a problem, begin by determining whether the problem is local to the account or system-wide.
This section includes the following topics:
To best determine the nature of a mail client issue, it is important to understand which client the user is utilizing. There are issues that relate to one client but not the others. These issues may also not be a product of the client itself but rather of the backend application. Determine the difference by the types of incidents that are reported.
This section addresses problems with the mail clients, including:
When a user connects to the Applications Tier using any type of mail client, the first thing that occurs is connection establishment. For this to succeed, the Applications Tier must be listening for and accepting connections. When the connection is established, the next step is authentication of the user. This requires access to the Oracle Internet Directory server. After the user is authenticated, the user's Inbox folder is retrieved, which requires access to the Oracle Collaboration Suite Database holding the user's folders and messages. A problem encountered during any of these steps will prevent the user from accessing their account.
Some of the common problems that prevent users from successfully logging in are:
User password has expired
IMAP or POP servers are not configured to connect to all Oracle Collaboration Suite Databases
Servers have reached the maximum connections to the Oracle Collaboration Suite Database or Oracle Internet Directory
Diagnosing Connection Establishment
Issues with IMAP or POP client connection establishment can be diagnosed by using telnet to connect to the Applications Tier, such as:
telnet apptier.foo.com 143
apptier.foo.com is the Applications Tier host machine and 143 is the IMAP protocol port. This should succeed and give back a banner line which indicates that the server is ready. Enter
1 logout<ENTER> to disconnect from the IMAP server and enter
quit<ENTER> to disconnect from the POP server, where
<ENTER> indicates pressing the Enter key.
If the attempt does not succeed, check the status of both the listener and the server. On the Applications Tier host use the commands
lsnrctl status listener_es to display the status of the listener and
lsnrctl services listener_es to display the services provided by the listener. Use the command
opmnctl status to display the status of the Applications Tier processes and verify that either the IMAP server or POP server is started.
Diagnosing Oracle Internet Directory and Oracle Collaboration Suite Database Issues
To see if there are issues with the Oracle Internet Directory server or the Oracle Collaboration Suite Database, check the IMAP or POP log files for errors. Specifically,
ESCAPI-500 errors will be logged when a server runs out of either free Oracle Internet Directory or Oracle Collaboration Suite Database connections, or both, in the pools. Use
esd_logscan.pl to search in the log files for
ESCAPI-500 and also for
ORA-XXXXX, which will be logged if there are database errors.
See Also:"esd_logscan.pl" for script usage information
The following section contains additional information about Oracle Internet Directory server diagnostics.
This problem is most likely to occur due to slower response time from the Oracle Internet Directory server. Run the following diagnostic tools for the Oracle Internet Directory server.
ldapbind (found in the
$ORACLE_HOME/bin): By entering the appropriate information (see following usage example), the
ldapbind command confirms communication to Oracle Internet Directory. If the connection is slow or fails to connect and all supplied parameters are correct, there may be an issue with either the Oracle Internet Directory listener (
oidmon) or its associated daemon (
$ ldap options
options are as follows:
-D binddn bind -w passwd bind passwd (for simple authentication) -h host ldap server -p port port on ldap server -W Wallet wallet location -P Wpasswd wallet password -U SSLAuth SSL authentication mode -E Encoding character set
ldapcheck (found in the
$ORACLE_HOME/ldap/bin): This command performs a quick check of the necessary LDAP processes showing their existence (see following usage example). It displays the process name and its associated OS process id (PID).
$ ldapcheck Checking Oracle Internet Directory processes... Process oidmon is Alive as PID 8158 Process oidldapd is Alive as PID 8196 Process oidldapd is Alive as PID 8186 Not Running ---- Process oidrepld
Check that connections between IMAP and the Oracle Internet Directory server are not timing out or being dropped because of any network issues.
Check the CPU usage on Oracle Collaboration Suite Applications Tier system. High usage can result in higher connection time. If the IMAP server is suddenly consuming more of the CPU resources, check the log level for the server. If there is a lot of information going to the log files, it can degrade server performance. Lower the log level to resolve this.
Message retrieval fetches a blob from
If any of the fetches are slow, there will be latency in message retrieval
Consider running the Statspak and check on waits for these objects
A particular user may report long download times if he is reading messages from a slower tertiary store.
A message having too many MIME parts will also increase download time. Many clients typically issue a separate request for each MIME part resulting in round-trips to the server.
Ensure that the maximum number of Oracle Collaboration Suite Database connections to the server has not been reached
Ensure the network connectivity between the client and the IMAP server, and the server and the database
Try to read the problematic message using different e-mail clients. Not all clients can display all types of rich messages. If the message can be viewed with one of the clients, use that client to move the message to a separate folder. If the log files report errors in the fetching of the message, contact Oracle support with details from the log file.
This problem typically occurs when the LDAP connection pool has been exceeded. If CPU resources are available, increase the LDAP connection pool size for the IMAP server.
This can occur due to the server not being able to obtain a database session to insert messages into mail store, which can occur if the number of clients sending messages simultaneously exceeds the maximum sessions configured for the inbound server. This problem is usually temporary. Check the inbound logs for one of the following error messages to verify this:
Could not get OCI service ctx
Queueing the request returns=-110
Getting service context from pool failed
Evaluate your system to determine if it requires a change in the configuration if this problem occurs frequently.
The problem could be attributed to a particular message. Check the log file to analyze the nature of the message and detect potential problems, such as
mail causing a routing loop,
mail loop due to too many mime levels,
routing address loop, and
If the problem is persistent, and all required processes (listener, database, SMTP inbound, and Oracle Internet Directory) are up and running, contact Oracle Support with details of the logs.
See Also:"User unable to log in to account" for troubleshooting information
One Inbox query goes through the
ES_INSTANCE table to gather all rows of a folder. As the size increases, the number of blocks retrieved increases.
Check the database statistics and if the preceding query is an expensive query, consider analyzing the
Unlike other servers, Oracle Java Mail API (OJMA) is an API, a building block used when creating an application such as Oracle WebMail. The application using OJMA can be dependent upon many other components, such as Apache, single sign-on, LDAP, OC4J, or Oracle Web Cache. This topic only addresses how to diagnose OJMA-related issues. Refer to the documentation for other components if they are suspected when troubleshooting an issue.
User reports long response times: To determine if the response time is related to the application or OJMA, turn on OJMA tracing and monitor the time it takes to request a message from the database. If this time is unacceptable, examine the database for performance issues.
For timing information, set the following property:
The timing information is generated in the
$ORACLE_HOME/opmn/logs directory in an Oracle WebMail/OC4J environment. Otherwise, the timing information is set to standard output.
User reports long authentication or open public and shared folder times: Both of these operations are dependent on the performance of the Oracle Internet Directory server. Check the Oracle Internet Directory server for any problems.
This section addresses problems with the mail delivery, including:
Check the SMTP inbound parameter settings. If the Inbound Rules and Routing Control Relay Allowed parameter is set to
False (open relay disabled), ensure that the local domains are given in the Trusted Domains SMTP Inbound Rules and Routing Control parameter.
If the system allows local users to connect to SMTP through an ISP (so that the connection comes from a foreign domain), set Relay Allowed to
Authenticated and the SMTP Inbound General Authentication parameter to either
See Also:"Oracle Mail SMTP Inbound Server" for definitions of these parameters
Usually, a delivery status notification (DSN) will be sent back to the sender, indicating either a delay or a failure in the delivery. The only exception is when the user specifically requests not to send DSNs.
A DSN will also be sent to the postmaster with the delivery failure reason. To receive the DSN, enter a valid e-mail address in the Postmaster E-mail Address SMTP Inbound server parameter. Also ensure that the Postmaster DSNs SMTP Inbound server parameter is set to
If delivery is delayed for any reason, the message will be present in the queue for up to 5 days (default queue timeout) and the reason for delivery failure will be stored in the Oracle Collaboration Suite Database. Use the
esd_mail_queue.sql script to examine the Relay queue. This script will show the deferral reason stored in the Oracle Collaboration Suite Database.
When the delivery fails with a fatal error, both queue and recipient records will be cleaned by the Housekeeper server within a couple of hours and there may not be any trace of it in the database. Log files will contain information showing that the message is being requeued, such as the following:
OCI_ERROR: ORA-20220: Folder locked
OCI_ERROR: ORA-20221: User locked
Local delivery failed for user
Failed to deliver to user inbox for local users
Delivery to email@example.com failed with smtp_err=421 or
Connect failed: Connection refused for relay users
Tip:"Oracle Mail SMTP Inbound Server" for definitions of these parameters
esd_find_message.sql script to locate lost messages. This script searches the database for messages given various search criteria. Once the message is found and the message ID is known, use the
esd_show_message.sql script to see if the message is stuck in any of the queues. This procedure should only be used as a last resort since the
esd_find_message.sql is very expensive to run. It is better to directly examine the queues using the
esd_mail_queue.sql script and visually inspect the generated report to see if there are any stuck messages addressed to the user.
See Also:"esd_find_message.sql", "esd_show_message.sql", and "esd_mail_queue.sql" for script usage information
When there is no trace of the message in the database, look for the message in the next possible hop to the destination. If the message is relayed to the Internet, there will be a log entry (at log level 16) in the log file.
This issue could be due to the user exceeding their quota. Either have the user delete old e-mail or increase the user's quota, if necessary. An administrator can check the amount of storage used by a single Oracle Mail account with the
esd_list_user_folders.sql script. This script generates a report which lists all the user's folders and the size of each. The total space used is listed at the end of the report. Compare this value with the user's quota. If the SMTP Inbound or SMTP Outbound, or both, servers are configured to log at the Notification level, they will log
ESSM-203 messages when an e-mail message is processed for a user that is over quota.
See Also:"Modifying E-mail User Attributes" for information on how to view a user's quota attribute
Check the local queue using the
esd_mail_queue.sql script to see if e-mails are queued due to temporary problems. A message that has been in the local queue for only a short time is not generally a problem. The initial delivery attempt may have failed due to the user's Inbox folder being locked, which can occur when another process is delivering a message to the user's folder or when another server, such as IMAP, is accessing the Inbox.
Delivery failure due to folder lock is a temporary problem that typically requires no action by the administrator. E-mail will be delivered shortly after the Inbox is unlocked by the process that currently has it locked. A message that has been in the local queue for a long time may be a more serious problem.
If the user's Inbox has somehow become permanently locked, contact Oracle Support for assistance in resolving the issue. A permanently locked Inbox is possibly the case if there are many messages for the user that are stuck in the local queue for an extended period of time. The
utllockt.sql database diagnostic script may be useful in diagnosing database lock issues.
See Also:"utllockt.sql" for script usage information
The most common reason for this is the MTA queue processor going to sleep when there are no messages in the queue. The installation default is to have queue processors sleep for 2 minutes, configured by the
oidadmin to adjust this setting to edit the information in the Oracle Internet Directory server. Using
oidadmin, navigate to the settings for the SMTP Outbound server and update the value. You can decrease this interval to as little as 1 second, but keep in mind that very small values will cause the SMTP servers to poll more frequently and will increase the load on the Applications Tier CPU and also on the Oracle Collaboration Suite Database.
This can occur when the MTA gets terminated abruptly due to an internal error. When this happens, a large number of messages can remain waiting to be requeued, resulting in half an hour of recovery delay. A bad message in the queue could potentially cause the MTA to go down every half an hour (or whenever it retries) and that could cause long delays for other messages.
When the MTA goes down repeatedly, look for the oldest messages in the queue. This can be determined based upon how frequently new log directory entries get created. Under normal circumstances, messages must not be in the queue for more than an hour.
If some messages get stuck for days, contact Oracle Support.
If the log level is sufficiently high, check whether the name resolution is returning all the expected recipient addresses
Examine all the child messages created using the
esd_find_message.sql script located in the
$ORACLE_HOME/oes/admin directory for the delivery (1 for every 1000 recipients) and ensure that all the recipients are present, and then look for delivery errors, if any.
The List Server may not be asking for DSNs in some cases. In that case, no failure notifications are sent back. Otherwise, a DSN is sent to the envelope's return path (
es_envelope.mailfrom) and to the postmaster address.
If the MTA goes down during the name resolution and delivery, this could also result in partial deliveries.
Check the Oracle Internet Directory Query Entry Return Limit. This number should be at least equal to the size of the distribution list.
See Also:"List Server" for more information
If the distribution list is very large, meaning the number of subscribers is large, it is possible that the List Server is still processing the list.
Check if any users have exceeded their quota and so have not yet received the mail from a distribution list.
Check if the users have suspended their subscription from the distribution list, in which case resume their subscription.
A mail loop can be defined as a message that is being sent from one account to another and back again. One way this can happen is by having two separate accounts. For example, a user has a Yahoo e-mail account and a corporate e-mail account. They go on vacation. Before leaving, the user configures their Yahoo account to forward all messages to their corporate account. The user also sets a vacation auto-reply on their corporate account. This results in any mail sent to the Yahoo account being forwarded to the corporate account. When the corporate account receives that e-mail and sends back the auto-reply vacation message to the Yahoo account, which is set to auto-forward, the message can't be delivered to either account and gets caught in a mail loop.
Mail loops result in a sudden increase in the number of messages coming into the system. The queues may start backing up as a result. Use the
esd_mail_queue.sql script to get details of the messages in the queue.
After identifying a user with a mail loop, delete the rule using
oesrl and notify the user of the problem.
Note:When a server crashes it will be restarted by
opmn. A line similar to the following is written to
05/04/07 14:37:03  Process Crashed: email~email_smtp_out~111266499464438837~1 (662306867:5491) - Restarting
Check the timestamps on these messages. If the timestamps are close together and if there are a lot of messages like this, the processes are frequently crashing.
Causes for this problem include:
Anti-Spam: Message is identified as spam
Anti-Virus: Attachment contains a known virus
Message Size: Message size meets or exceeds the maximum message size accepted
System rule or user rule criteria for rejection: MTA has evaluated rules stating that message should be rejected if it meets certain criteria
Check the log files for entries showing that a message was rejected
Check the message routing policy parameters in the Policy subtab of the Administration tab of the Oracle WebMail client and confirm the entries listed for rejection. If an entry matches a sender that should be allowed to send e-mail to this application, that entry should be removed or updated to allow the sender only.
See Also:"Configuring Routing Control for Incoming Mail" for more information about setting routing control policies
Check the Maximum Message Size parameter on the SMTP inbound process
Check system rules
Causes for this problem include:
Server is being spammed and there is a spike in the mail traffic
Check the log files for frequent temporary failures and check if many log directories are getting created due to the server going down. Log files can show which message is getting processed when the server goes down. Moving problematic messages to a temporary queue will clear the requeued messages that were picked up before and could not be processed.
Check for network contention or problems with DNS server
Check LDAP communication and Oracle Internet Directory for any performance problems
Check database for problems with insertion and any other performance issues with the Oracle Collaboration Suite Database
Often a single message can cause problems. These can be messages that are malformed or have a characteristic that is not acceptable, such as being in the form of spam or having a virus attachment to the message.
This section addresses problems with a single message, including:
See Also:"It takes a long time to download a mail message" for more information
Check for specific rules. Rules are basically kept in two places: Oracle Internet Directory and in the
user_source view on the Oracle Collaboration Suite Database associated with the user.
The Oracle Internet Directory information is stored in XML format and is what the user sees in Oracle Collaboration Suite. The
user_source view contains the procedural information and is actually what is checked against when messages come through.
The following example shows what a rule can look like in the directory and in the database.
Example A-1 Structure of a Rule
RGMUM1:UM903v2 % oesrl -p firstname.lastname@example.org <account qualifiedName="JANE.DOE@ACME.COM" ownerType="user" id="0"> <rulelist event="deliver"> <rule description="OEM Alerts US" group="all" active="yes" visible="yes"> <condition negation="no" junction="or"> <condition negation="no" junction="and"> <attribute tag="rfc822to"/> <operator caseSensitive="no" op="contains"/> <operand>oemalerts_us@ACME.COM</operand></condition></condition> <action> <command tag="moveto"/> <parameter>/jane.doe/INBOX/oemalerts_us</parameter></action></rule> <rule description="gmapocsDBA" group="all" active="yes" visible="yes"> <condition negation="no" junction="and"> <condition negation="no" junction="and"> <attribute tag="rfc822from"/> <operator caseSensitive="no" op="contains"/> <operand>email@example.com</operand></condition></condition> <action> <command tag="moveto"/> <parameter>/jane.doe/INBOX/gmapocsDBA</parameter></action></rule> <rule description="GM Team" group="all" active="yes" visible="yes"> ......
The first rule states that if a message comes into the user's inbox with the
oemalerts_us@ACME.COM, move it to the oemalerts_us folder, which is a subfolder of the inbox.
It appears as follows in
SQL> select text from user_source where name like 'DELIVER_19225%'; TEXT -------------------------------------------------------------------------------- PROCEDURE deliver_19225 AS BEGIN IF ((UPPER(es_rule.rfc822to) LIKE '%'||UPPER('oemalerts_us@ACME.COM')||'%')) THEN es_rule.moveto('/jane.doe/INBOX/oemalerts_us'); END IF;
All rules in
user_source have a name of
number is the user's user ID (same as the
folder_id of the INBOX). When rules are created, they also trigger the creation of a procedure.
You can query against
user_source to find other rules, such as ones that someone might have setup to delete messages or to BCC an account.
Any one of the Oracle Mail applications can experience problems due to physical or virtual areas of the installation that may require certain resources to be available. When these resources are depleted, an area of the application will be affected.
The following categories have been identified as those that directly affect the operation of the Oracle Mail application:
This section includes the following topics:
If you can bind to a protocol server but not issue a successful command, check the log files for database related problems.
If you cannot bind, check to see if the listener is up.
See Also:"Checking the health of the e-mail protocol server listener" for instructions on checking the status of the listener
If processes will not start using the Application Server Control Console for Collaboration Suite, and there is still no access, check the log files for possible issues.
Ensure that the server's parameters are configured correctly. Incorrect configuration can prevent the server from starting.
Check for Oracle Collaboration Suite Database performance issues.
One or more user folders are locked: Some transaction is holding a lock on
ES_USER records for one or more users. This leads to the IMAP server consuming more and more database connections as the requesting clients timeout a request and issue a similar request, again leading to the IMAP server taking more connections from the pool.
Use regular lock detection SQL scripts, such as
utllockt.sql or Oracle Enterprise Manager 10g to detect this situation.
oesmon output for the database connection dump (taken few minutes apart) will also show the same users still executing the same statements.
As a solution, end the session-holding lock and contact Oracle Support with all the information about the ended session.
Client connections can be rejected under high load, typically due to the following:
All available sessions have been consumed, so no database sessions are available to service client requests to send mail. This can be observed from the log message
Getting service context from pool failed. Increasing the number of available sessions for the SMTP Inbound server instance can reduce the number of rejections. However, this increase should not result in too many overall database sessions needed by different server instances running against the database.
All threads up to the configured maximum have been consumed and new client requests result in failures. This can be observed from the log message
No worker available. Increasing the maximum allowed threads will reduce client connect rejections. Consider increasing this limit in increments of 100.
If it is taking a long time to connect to the SMTP server, the problem is most likely either a slow network or the Oracle Collaboration Suite Applications Tier protocol server is overloaded. Neither a database nor an Oracle Internet Directory connection is necessary for the initial greeting. If the Applications Tier is not overloaded, begin to trace the network request to find the location of network congestion.
Messages that are accepted for delivery are queued up into different queues based on the recipient list. The queues are defined as submit, local, relay, and list. Messages can become deferred or delayed, based on a problem that the message may be experiencing. These problems can be caused by network problems, system resource problems, or message content. However, there can also be issues with an overabundance of messages that can delay e-mail delivery.
This section addresses problems with the mail queues, including:
Not unusual but should be evaluated
Consider increasing the number of SMTP Inbound and Outbound instances
Consider increasing the number of pool connections to the LDAP and database servers
See Also:"Modifying Parameter Settings for a Specific Server Instance" for information about editing SMTP server parameter settings
Possible problematic bad message in the queue that causes the server to go down. Although very rare, it has been known to happen.
On UNIX systems, a symptom of this is core files in the log directories. A message that causes the mail servers to go down will do so repeatedly since it never gets removed from the queue. On UNIX systems look for core files with the following command:
% find $ORACLE_HOME/oes/log/um_system/smtp* -name core
If core files are found, contact Oracle Support.
The incoming message rate can be higher than that of the processing rate, which can result in Local queue growth. Monitor the Length of Local Queue metric of the Oracle Collaboration Suite Database target using the Grid Control Console. The
esd_mail_queue.sql script can also be used to examine the current contents of the Local queue.
If the count of messages in the Local queue continues to increase, you can infer that the system is not able to handle the incoming load.
Possible reasons for this are:
Lack of system resources; slow Oracle Collaboration Suite Applications Tier; slow database; slow Oracle Internet Directory are some possibilities. SMTP and IMAP are known to spin in the past and consume all the CPU resources on Applications Tier.
Large amount of incoming spam mail. This can be detected by looking at the sender's address in the
ES_ENVELOPE table. If the envelope's
MAILFROM is not like one of the local domains and not null (
<>), it has to be of external origin. Spam filters can be turned on to block those senders.
esd_queue_examine.sql script located in the
$ORACLE_HOME/oes/admin directory to determine if there is a large amount of spam in the incoming queue.
Insufficient SMTP Inbound and Outbound instances. If the MTAs are processing at the expected rate and the queue is still growing, increase the number of MTAs.
If the count of messages with NULL
modified_date is very low, delivery must be failing for some reason. SMTP Inbound and Outbound server log messages located in the
$ORACLE_HOME/oes/log/um_system/smtp_out directories, respectively, should give an indication of what is happening.
Possible reasons for local mail delivery failure are:
Target database is down; log files will show ORA- XXXXX errors.
Oracle Internet Directory is down or name resolution is failing. The log files will indicate the Oracle Internet Directory errors.
Folder Locks are a rare occurrence, unless a large amount of mail is being delivered to a small number of users. Another possibility is that one of the user folders could have been locked by a nonexistent process or a session. This should only block mail for a single user, but could result in a lot of requeues.
Check for the following:
One of the most common problems is incorrect DNS setup or slow DNS servers. A failure in DNS lookup will result in relay failure. These errors can be seen in the SMTP Outbound log located in the
Failed to connect to foreign MTA
Causes for failure include:
The remote host is refusing the connection due to reverse DNS lookup failure or a spam check failure. If the relaying MTA is not one of the MX hosts of the domain and does not have a PTR record in the DNS, the foreign host might not allow the connection. Sometimes, the relaying hosts can get blacklisted, denied connection, or both, if they are acting as open relays.
The relaying host is unable to make connections outside the Oracle Collaboration Suite installation due to firewall problems.
If the system is relaying locally to one or more local Oracle Collaboration Suite Databases, the local relay mail can get stuck when one of the databases is down or not accepting mail fast enough.
Mail loops caused by incorrect setup
Messages bouncing between the Oracle Collaboration Suite and external agents, such as spam filters and virus scanners, due to incorrect address rewriting or setup.
Analyze the queue to check whether the backlog is submit, and the density of messages with similar subjects, senders, recipients, or domains within the queue using the
See Also:"esd_queue_examine.sql" for script usage information
A large density of messages with similar subjects, senders, recipients, or domains usually occurs for one of two possible reasons: the system has been spammed with unwanted mail or there is a mail loop. Looking at the subject and headers is usually sufficient to determine between the two possibilities, if one needs to look further at the body of the message one can use the
First, look at the queue using the
esd_mail_queue.sql script which lists all messages in the queue along with their message ID. Find the ID of the message of interest. Then, use the
esd_show_message.sql script to look at detailed information about that message.
The Housekeeper will perform the following sequence of events during its processing of the e-mail application:
Note:Not all of these events may apply and most are configurable.
See Also:"Modifying Parameter Settings for a Specific Server Instance" for information about editing Housekeeper server parameter settings
Expiration of regular messages
Pruning of processed messages in queues
Pruning of expunged messages
Collection of pruned messages
Moving old messages to tertiary storage
Text index synchronization
Text index optimization
This section addresses problems with the Housekeeper, including:
Determine which housekeeping tasks with which the system is running behind. The possible areas include the pruning queue and the collection queue.
If the pruning queue is very large, consider increasing the Concurrency Level parameter on the Housekeeper server instance that performs the pruning task. In most cases, concurrency level of 4 to 6 would be more than adequate to handle a significant amount of backlog.
If the collection queue is very large, consider increasing the Concurrency Level parameter on the Housekeeper server instance that performs the collection task.
If an instance is configured to perform both pruning and collection, the increased concurrency level will apply to both instances upon process refresh.
Check if the Frequency of Execution of Housekeeper Process parameter is set unreasonably low. A frequency of every 60 or 120 minutes is recommended.
First, check whether the Collection parameter is configured on any Housekeeper instance.
If such an instance exists and the processes associated with the instance are running, the system may experience data inconsistencies or data corruptions. The next step is to check the Housekeeper log file on the Oracle Collaboration Suite Applications Tier. If log file directories are getting created rapidly, it indicates that the server has crashed. Locate the core dump, if present, and the log file content and contact Oracle Support.
If this is not the case but there are ORA-XXXXX errors present in the log file, check the errors and see if it is fixable. If not contact Oracle Support.
If there are no errors reported in the log file but the log file grows at a very slow rate, consider increasing the Concurrency Level parameter on the instances with the Collection parameter enabled. Also check that Log Miner Recovery is disabled in the Housekeeper server configuration debug parameters. Enabling this parameter causes collection to slow down 300%. Therefore if Log Miner Recovery is not going to be used in the system, it is recommended that this parameter be disabled.
Check whether Oracle Text is installed and configured correctly. If an installation does not use Oracle Text and it was not activated during installation, Oracle Mail might be adversely affected. Usually installation log files show whether Oracle Text index-related data are configured and created correctly. If not, the system is left with invalid Oracle Text indexes, which causes errors when the Housekeeper tries to delete entries from related tables. In that case, drop whichever index reported in the log file, if the system is not configured to enable text search in e-mail messages.
First, check whether the Housekeeper instances have the Pruning parameter enabled. If the Housekeeper instances are verified to be running correctly, monitor the length of the
ES_QUEUE entries and see if they drop at a reasonable rate. Pruning causes the collect queue to grow. Do not be alarmed if this behavior is observed.
Next, check server log files on the backend database $ORACLE_HOME to see if there are any errors reported. If core dumps are present, contact Oracle Support. Analyze and fix, if possible, any ORA-XXXXX errors found in the log file. Otherwise contact Oracle Support. ORA-XXXX errors occur very rarely in pruning logs.
The Oracle Collaboration Suite Database stores all of the e-mail messages, text indexing, and folders of every account that is authorized for access. After successful Oracle Internet Directory authentication, a user is passed to a single database connection to the database for message access. There is a possibility that access fails or is denied. This problem can be local to a single account or global to all accounts. By checking complaints from the user community it will be easy to recognize as a global problem.
This section includes the following topics:
If the network suddenly goes offline, this can cause massive disruption to end-user connectivity, as well as e-mail process communication (in a distributed environment) and end-user communication, depending upon the severity of the outage.
It may be necessary to shut down the protocol processes until the network is stable. Once the network stabilizes, bring all processes back online.
SQL*Net is the protocol that enables access to the database. There is an instance of the SQL*Net listener on both the infrastructure and storage tiers.
The following problems can occur if the SQL*Net listener is down:
Users cannot connect to Oracle Internet Directory for authentication.
When authenticated with Oracle Internet Directory, the user is passed to the storage database. If the SQL*Net listener is down, users cannot access the Oracle Collaboration Suite Database.
Check the listeners on both the infrastructure and the Oracle Collaboration Suite Database locations by running the following commands:
On each system where the database resides:
$ lsnrctl status
On the Applications Tier system, if
$ORACLE_HOME/network/admin/tnsnames.ora is configured on the Applications Tier to access the databases or, if not, on the infrastructure and Oracle Collaboration Suite Database tiers:
$ tnsping connect_string
The following symptoms could be indicative of a slow database:
Slow response times for users opening folders, reading mail, and sending mail (not authenticating, only Oracle Internet Directory is contacted).
Large number of database connections. If the database is slow to handle a request, the protocol servers can request a new database connection for the next unit of work that arrives. If the database is slow because of disk constraints, or some other hardware resource issue, increasing the database pool can make matters worse. The increased connections can tax a loaded database even further when a flood of new database requests comes in, each taking their own database and operating system memory. Whether the database is under stress can best be analyzed by a DBA and through tools such as Statspack or Oracle Enterprise Manager 10g.
The Housekeeper queues continue to grow and never catch up until a decrease in activity, such as over the weekend.
Mail delivery slows down.
Users will see an
unable to retrieve database connection error occasionally upon login. This causes the Applications Tier to slow down if the maximum connection pool exceeds the available memory on the system.
If archive logging is enabled, all of the transactions are saved to a file for recovery purposes.
See Also:"Oracle Mail Archive Policies" for more information
To check the current space usage:
Change directory to the designated archive log directory or partition.
From the system command prompt, execute the following command to check the available space on the current disk drive:
$ df –k
A normal routine of backup should be performed and confirmed. Afterwards, the files in the directory can be purged with the exception of the current log file.
If this directory is not backed up and the directory partition reaches full capacity, then the database will literally stop until one of two things occur to relieve the disk space:
Old archive files are moved off this partition to another partition, or
A backup is performed of the archive files to a storage medium for future recovery purposes.
If the Oracle Mail storage tablespaces run out of extents, e-mail delivery and end-user e-mail message commits fail. Check the database alert logs for any tablespace full errors.
Table A-1 lists tablespaces upon which to focus an investigation should Oracle Mail tablespaces run out of extents.
Table A-1 Oracle Mail Storage Tablespaces
|Tablespace||Tables Contained Within|
Contains the largest tables:
Contains the smaller tables, including all of the other tables not listed. The
Contains the most frequently used tables, including
Contains the table indexes from the
Contains table indexes from the
Contains only the
To display a summary of available space of all tablespaces, execute the following SQL statement as
SQL> select tablespace_name, sum(bytes) from dba_free_space group by tablespace_name order by sum(bytes);
The command returns the following:
TABLESPACE_NAME SUM(BYTES) ------------------------------------------- XDB 262144 EXAMPLE 458752 USERS 983040 ESINFREQIDX 2490368 SYSTEM 3211264 ESSMLTBL 3407872 ESPERFTBL 5046272 ESFREQTBL 9568256 ESFREQIDX 9633792 ESNEWS 10223616 ESTERSTORE 10223616 ESBIGTBL 17039360 ESORATEXT 20185088 ESMRLMNR 52297728 ……… ………
For each table within the tablespaces listed there is a NEXT_EXTENT column that has a particular size allocated, by default. As space decreases, the tablespace seeks more space to accommodate its NEXT_EXTENT setting. If there is not enough space, it fails to extend and the application begins to receive errors.
Solution: If space remaining is depleted, add another data file to the tablespace experiencing problems.
See Also:Oracle Database Administrator's Guide for details about adding datafiles to the tablespace
This section discusses various debugging strategies to aid in troubleshooting.
This section includes the following topics:
Checking the health of the e-mail protocol server listener
The listener for Oracle Mail is called
listener_es, by default. Execute the following command to check the listener status:
$lsnrctl stat listener_es
Example A-3 illustrates a typical return on the command:
Example A-3 Status of Listener
LSNRCTL for Linux: Version 126.96.36.199.0 - Production on 06-FEB-2004 11:23:32 Copyright (c) 1991, 2001, Oracle Corporation. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=UMREG))) STATUS of the LISTENER ------------------------ Alias listener_es Version TNSLSNR for Linux: Version 188.8.131.52.0 - Production Start Date 17-DEC-2003 22:41:00 Uptime 50 days 12 hr. 42 min. 32 sec Trace Level off Security OFF SNMP OFF Listener Parameter File /u01/app/oracle/product/v2/network/admin/listener.ora Listener Log File /u01/app/oracle/product/v2/network/log/listener_es.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=UMREG))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rgmum9.us.oracle.com)(PORT=25))(PRES ENTATION=ESSMI)) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rgmum9.us.oracle.com))(PORT=143))(PRESENTATION=IMAP)) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rgmum9.us.oracle.com))(PORT=110))(PRE SENTATION=POP)) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rgmum9.us.oracle.com) Services Summary... Service "ESSMI" has 2 instance(s). Instance "um_system", status READY, has 1 handler(s) for this service... Instance "um_system", status READY, has 1 handler(s) for this service... Service "ESSMIAMOCS" has 2 instance(s). Instance "um_system", status READY, has 1 handler(s) for this service... Instance "um_system", status READY, has 1 handler(s) for this service... Service "IMAP" has 2 instance(s). Instance "um_system", status READY, has 1 handler(s) for this service... Instance "um_system", status READY, has 1 handler(s) for this service... The command completed successfully
Note:In the example, instances refers to processes. There are two inbound SMTP servers connected to the listener.
Checking memory, PGA memory, and number of processes connecting from MTAs to an Oracle Collaboration Suite Database
Occasionally, memory usage must be checked in the Oracle Collaboration Suite Databases due to various issues with program global area (PGA) memory usage within the databases. First, check to see how many connections (and what type) are coming into the database.
Connect to the database as
es_diag and run the
Note:The number of Oracle Mail server database connections is determined using the Oracle Enterprise Manager 10g Application Server Control Console, as described in Chapter 3.
To check PGA memory usage, use the following script:
set pages 9999 select s.sid, s.program, st.value from v$session s, v$sesstat st where s.sid=st.sid and statistic#=20 and s.program like 'es%' order by 3;
Output will return similar to the following:
303 esimapds@rgmum6 (TNS V1-V3) 3339424 164 esimapds@rgmum13 (TNS V1-V3) 3500144 285 esimapds@rgmum13 (TNS V1-V3) 3735304 82 esimapds@rgmum13 (TNS V1-V3) 4394984 125 firstname.lastname@example.org (TNS V1-V3) 7911248
The first column contains the SID and the program name, followed by the amount of memory consumed. In this example, there is a List Server instance from
rgmum2 using about 7.4 MB of PGA memory on this database instance. If you see processes consuming more than 5 or 6 MB, they should be investigated and bounced, if necessary.