A Troubleshooting Oracle Mail

This appendix outlines identification and resolution paths for issues affecting Oracle Mail. An assumption is made that, prior to any problems or issues and under similar load, the system behaved correctly, something changed, and now a problem has presented itself.

See Also:

"Tuning Oracle Mail" in Oracle Collaboration Suite Administrator's Guide for tuning due to an increased load

This appendix includes the following topics:

Becoming Aware of a Problem
End-User Issues
Oracle Mail Application Problems

This section discusses common problems with mail protocols, mail queues, and housekeeping.
Problems with Oracle Collaboration Suite Database
Debugging Oracle Mail

Becoming Aware of a Problem

An administrator can be alerted to problems through the following methods:

Oracle Enterprise Manager 10g, including the Oracle Enterprise Manager 10g Grid Control Console and Oracle Enterprise Manager 10g Application Server Control Console for Collaboration Suite

An administrator can be alerted to a problem by a metric or beacon transaction exceeding a threshold. This alert can come from any of the Oracle Mail metrics, delivery time, slow protocol transaction times, or if the availability drops below a defined level. An alert can be generated for any event that could degrade, or has substantially degraded, the quality of service.

Tip:
Oracle Enterprise Manager Configuration for Oracle Collaboration Suite for information about setting thresholds
Helpdesk

A user calls in with a complaint.
Log files

Scripts and notification tools can be set up to periodically examine log files for errors. Viewing log files is the principal tool for identifying process-related issues. Within Oracle Enterprise Manager 10g, one can view the Oracle Mail-related logs. The esd_logscan.pl script, located in the $ORACLE_HOME/oes/admin directory, is shipped to search information in log files.

Process logs can be found in the $ORACLE_HOME/oes/log/server-type directory. For example, List Server logs are found in the $ORACLE_HOME/oes/log/list directory.

See Also:
Appendix B, "Oracle Mail Error Messages" for a list of error messages, possible causes, and actions to resolve the errors
Scripts and command-line tools

The oesmon command-line utility provides command-line access to near real-time metric data in raw format. This metric data is sampled periodically by Oracle Enterprise Manager 10g Grid Control and can be viewed in charted form using the Oracle Enterprise Manager 10g Grid Control Console. The oesmon utility directly contacts running servers and returns current values, whereas the Grid Control Console charts typically sample at a rate of once every ten minutes.

See Also:
Chapter 7, "Monitoring Oracle Mail" for more information about oesmon

Oracle Mail also ships with various utility scripts located in the $ORACLE_HOME/oes/admin directory to analyze a running system. These server diagnostic scripts all have names starting with esd_. For example, the esd_mail_queue.sql script can be used to monitor the activity of the mail queues and see if there are messages stuck in the queues. Also, the esd_logscan.pl Perl script can be used to search for errors in the log files and format the information in a way that it is easier to read than the raw log file.
Others

Administrators doing ad-hoc monitoring of the system, or other sources.
- CPU Usage
  
  If there is a process consuming an unusually large amount of resources, monitoring the CPU usage will bring this process to the highest level. Use a system monitoring tool such as top to view process CPU usage.
- Memory Usage
  
  Excessive memory usage by a process will degrade the performance of all system's processes, in addition to the overall performance of the system. Use a system monitoring tool such as top to view process memory usage.
- Disk Usage
  
  Every transaction in the database is written to an archive log file that is sized in advance by the database administrator. If the transactions can no longer be written to file because of space constraints, the database will seize and all end-user connections will also seize. Therefore, it is imperative that the administrator monitor disk space capacity to insure that disk space is available for a continuous operation.
  
  Log file entries are continuously written to various file locations within the Oracle Mail directory structure. These files can become quite large and can cause some disruption if the directory gets full. Although the log files turn over based on the size to which they are set, it is good practice to monitor the disk to which log files are being written.
  
  Use the command df -k to view the current and available disk usage.

End-User Issues

Note:

This appendix only addresses issues that are specific to administrators. Troubleshooting and FAQs for end-users can be found here.

End-user issues are commonly identified by a user contacting a help desk. These issues range from login issues to simple functionality issues. They can be local to a single account or to a wider audience of users experiencing the same problem.

To best determine how to evaluate a problem, begin by determining whether the problem is local to the account or system-wide.

This section includes the following topics:

Problems with Mail Clients
Problems with Mail Delivery
Problems with a Single Message

Problems with Mail Clients

To best determine the nature of a mail client issue, it is important to understand which client the user is utilizing. There are issues that relate to one client but not the others. These issues may also not be a product of the client itself but rather of the backend application. Determine the difference by the types of incidents that are reported.

This section addresses problems with the mail clients, including:

User unable to log in to account
High login time experienced
It takes a long time to download a mail message
Unable to read a particular message
During peak periods, valid users receive Invalid Username and Password error
Cannot send mail
Server says password is incorrect but password has not changed
It takes a long time to open the Inbox
Oracle WebMail client issues

User unable to log in to account

When a user connects to the Applications Tier using any type of mail client, the first thing that occurs is connection establishment. For this to succeed, the Applications Tier must be listening for and accepting connections. When the connection is established, the next step is authentication of the user. This requires access to the Oracle Internet Directory server. After the user is authenticated, the user's Inbox folder is retrieved, which requires access to the Oracle Collaboration Suite Database holding the user's folders and messages. A problem encountered during any of these steps will prevent the user from accessing their account.

Some of the common problems that prevent users from successfully logging in are:

User password has expired
IMAP or POP servers are not configured to connect to all Oracle Collaboration Suite Databases
Servers have reached the maximum connections to the Oracle Collaboration Suite Database or Oracle Internet Directory

Diagnosing Connection Establishment

Issues with IMAP or POP client connection establishment can be diagnosed by using telnet to connect to the Applications Tier, such as:

telnet apptier.foo.com 143

where apptier.foo.com is the Applications Tier host machine and 143 is the IMAP protocol port. This should succeed and give back a banner line which indicates that the server is ready. Enter 1 logout<ENTER> to disconnect from the IMAP server and enter quit<ENTER> to disconnect from the POP server, where <ENTER> indicates pressing the Enter key.

If the attempt does not succeed, check the status of both the listener and the server. On the Applications Tier host use the commands lsnrctl status listener_es to display the status of the listener and lsnrctl services listener_es to display the services provided by the listener. Use the command opmnctl status to display the status of the Applications Tier processes and verify that either the IMAP server or POP server is started.

Diagnosing Oracle Internet Directory and Oracle Collaboration Suite Database Issues

To see if there are issues with the Oracle Internet Directory server or the Oracle Collaboration Suite Database, check the IMAP or POP log files for errors. Specifically, ESCAPI-500 errors will be logged when a server runs out of either free Oracle Internet Directory or Oracle Collaboration Suite Database connections, or both, in the pools. Use esd_logscan.pl to search in the log files for ESCAPI-500 and also for ORA-XXXXX, which will be logged if there are database errors.

See Also:

"esd_logscan.pl" for script usage information

The following section contains additional information about Oracle Internet Directory server diagnostics.

High login time experienced

This problem is most likely to occur due to slower response time from the Oracle Internet Directory server. Run the following diagnostic tools for the Oracle Internet Directory server.
- ldapbind (found in the $ORACLE_HOME/bin): By entering the appropriate information (see following usage example), the ldapbind command confirms communication to Oracle Internet Directory. If the connection is slow or fails to connect and all supplied parameters are correct, there may be an issue with either the Oracle Internet Directory listener (oidmon) or its associated daemon (oidldapd).
```
$ ldap options
```
  where options are as follows:
```
-D binddn   bind
-w passwd   bind passwd (for simple authentication)
-h host     ldap server
-p port     port on ldap server
-W Wallet   wallet location
-P Wpasswd  wallet password
-U SSLAuth  SSL authentication mode
-E Encoding character set
```
- ldapcheck (found in the $ORACLE_HOME/ldap/bin): This command performs a quick check of the necessary LDAP processes showing their existence (see following usage example). It displays the process name and its associated OS process id (PID).
```
$ ldapcheck

Checking Oracle Internet Directory processes...

Process oidmon is Alive as PID 8158
Process oidldapd is Alive as PID 8196
Process oidldapd is Alive as PID 8186
Not Running ---- Process oidrepld
```
Check that connections between IMAP and the Oracle Internet Directory server are not timing out or being dropped because of any network issues.
Check the CPU usage on Oracle Collaboration Suite Applications Tier system. High usage can result in higher connection time. If the IMAP server is suddenly consuming more of the CPU resources, check the log level for the server. If there is a lot of information going to the log files, it can degrade server performance. Lower the log level to resolve this.

It takes a long time to download a mail message

Message retrieval fetches a blob from ES_BODY.
- If any of the fetches are slow, there will be latency in message retrieval
- Consider running the Statspak and check on waits for these objects
A particular user may report long download times if he is reading messages from a slower tertiary store.
A message having too many MIME parts will also increase download time. Many clients typically issue a separate request for each MIME part resulting in round-trips to the server.
- Ensure that the maximum number of Oracle Collaboration Suite Database connections to the server has not been reached
- Ensure the network connectivity between the client and the IMAP server, and the server and the database

Unable to read a particular message

Try to read the problematic message using different e-mail clients. Not all clients can display all types of rich messages. If the message can be viewed with one of the clients, use that client to move the message to a separate folder. If the log files report errors in the fetching of the message, contact Oracle support with details from the log file.

During peak periods, valid users receive Invalid Username and Password error

This problem typically occurs when the LDAP connection pool has been exceeded. If CPU resources are available, increase the LDAP connection pool size for the IMAP server.

Cannot send mail

This can occur due to the server not being able to obtain a database session to insert messages into mail store, which can occur if the number of clients sending messages simultaneously exceeds the maximum sessions configured for the inbound server. This problem is usually temporary. Check the inbound logs for one of the following error messages to verify this:
- Could not get OCI service ctx
- Queueing the request returns=-110
- Getting service context from pool failed
Evaluate your system to determine if it requires a change in the configuration if this problem occurs frequently.
The problem could be attributed to a particular message. Check the log file to analyze the nature of the message and detect potential problems, such as mail causing a routing loop, mail toobig, mail loop due to too many mime levels, routing address loop, and invalid recipient.
If the problem is persistent, and all required processes (listener, database, SMTP inbound, and Oracle Internet Directory) are up and running, contact Oracle Support with details of the logs.

Server says password is incorrect but password has not changed

See Also:

"User unable to log in to account" for troubleshooting information

It takes a long time to open the Inbox

One Inbox query goes through the ES_INSTANCE table to gather all rows of a folder. As the size increases, the number of blocks retrieved increases.

Check the database statistics and if the preceding query is an expensive query, consider analyzing the ES_INSTANCE table.

Oracle WebMail client issues

Unlike other servers, Oracle Java Mail API (OJMA) is an API, a building block used when creating an application such as Oracle WebMail. The application using OJMA can be dependent upon many other components, such as Apache, single sign-on, LDAP, OC4J, or Oracle Web Cache. This topic only addresses how to diagnose OJMA-related issues. Refer to the documentation for other components if they are suspected when troubleshooting an issue.

User reports long response times: To determine if the response time is related to the application or OJMA, turn on OJMA tracing and monitor the time it takes to request a message from the database. If this time is unacceptable, examine the database for performance issues.

For timing information, set the following property:
```
oracle.mail.sdk.esmail.timing=true
oracle.mail.sdk.esmail.db_timing=true
```
The timing information is generated in the $ORACLE_HOME/opmn/logs directory in an Oracle WebMail/OC4J environment. Otherwise, the timing information is set to standard output.
User reports long authentication or open public and shared folder times: Both of these operations are dependent on the performance of the Oracle Internet Directory server. Check the Oracle Internet Directory server for any problems.

Problems with Mail Delivery

End-user mail delivery issues can occur from both sending of email and receipt of email.

This section addresses problems with the mail delivery, including:

User says mail delivery to Internet addresses does not occur
User did not receive a mail message that was certainly sent
User does not receive mail for 4-5 minutes but there is little database load
User receives most messages quickly, but some are delayed for hours or days
Users say they received mail from a distribution list, while others say they have not
Message is caught in a mail loop
Messages are being rejected
Message delivery is slow

User says mail delivery to Internet addresses does not occur

Check the SMTP inbound parameter settings. If the Inbound Rules and Routing Control Relay Allowed parameter is set to False (open relay disabled), ensure that the local domains are given in the Trusted Domains SMTP Inbound Rules and Routing Control parameter.

If the system allows local users to connect to SMTP through an ISP (so that the connection comes from a foreign domain), set Relay Allowed to Authenticated and the SMTP Inbound General Authentication parameter to either Optional or Mandatory.

See Also:

"Oracle Mail SMTP Inbound Server" for definitions of these parameters

User did not receive a mail message that was certainly sent

Usually, a delivery status notification (DSN) will be sent back to the sender, indicating either a delay or a failure in the delivery. The only exception is when the user specifically requests not to send DSNs.

A DSN will also be sent to the postmaster with the delivery failure reason. To receive the DSN, enter a valid e-mail address in the Postmaster E-mail Address SMTP Inbound server parameter. Also ensure that the Postmaster DSNs SMTP Inbound server parameter is set to Failures or All.

If delivery is delayed for any reason, the message will be present in the queue for up to 5 days (default queue timeout) and the reason for delivery failure will be stored in the Oracle Collaboration Suite Database. Use the esd_mail_queue.sql script to examine the Relay queue. This script will show the deferral reason stored in the Oracle Collaboration Suite Database.

See Also:

"esd_mail_queue.sql" for script usage information

When the delivery fails with a fatal error, both queue and recipient records will be cleaned by the Housekeeper server within a couple of hours and there may not be any trace of it in the database. Log files will contain information showing that the message is being requeued, such as the following:

OCI_ERROR: ORA-20220: Folder locked
OCI_ERROR: ORA-20221: User locked
Local delivery failed for user
Failed to deliver to user inbox for local users
Delivery to user@foo.com failed with smtp_err=421 or
Connect failed: Connection refused for relay users

Tip:

"Oracle Mail SMTP Inbound Server" for definitions of these parameters

Use the esd_find_message.sql script to locate lost messages. This script searches the database for messages given various search criteria. Once the message is found and the message ID is known, use the esd_show_message.sql script to see if the message is stuck in any of the queues. This procedure should only be used as a last resort since the esd_find_message.sql is very expensive to run. It is better to directly examine the queues using the esd_mail_queue.sql script and visually inspect the generated report to see if there are any stuck messages addressed to the user.

See Also:

"esd_find_message.sql", "esd_show_message.sql", and "esd_mail_queue.sql" for script usage information

When there is no trace of the message in the database, look for the message in the next possible hop to the destination. If the message is relayed to the Internet, there will be a log entry (at log level 16) in the log file.

This issue could be due to the user exceeding their quota. Either have the user delete old e-mail or increase the user's quota, if necessary. An administrator can check the amount of storage used by a single Oracle Mail account with the esd_list_user_folders.sql script. This script generates a report which lists all the user's folders and the size of each. The total space used is listed at the end of the report. Compare this value with the user's quota. If the SMTP Inbound or SMTP Outbound, or both, servers are configured to log at the Notification level, they will log ESSM-203 messages when an e-mail message is processed for a user that is over quota.

See Also:

"Modifying E-mail User Attributes" for information on how to view a user's quota attribute

"Modifying E-mail User Attributes" for information on how to view a user's quota attribute
"esd_list_user_folders.sql" for script usage information

Check the local queue using the esd_mail_queue.sql script to see if e-mails are queued due to temporary problems. A message that has been in the local queue for only a short time is not generally a problem. The initial delivery attempt may have failed due to the user's Inbox folder being locked, which can occur when another process is delivering a message to the user's folder or when another server, such as IMAP, is accessing the Inbox.

See Also:

"esd_mail_queue.sql" for script usage information

Delivery failure due to folder lock is a temporary problem that typically requires no action by the administrator. E-mail will be delivered shortly after the Inbox is unlocked by the process that currently has it locked. A message that has been in the local queue for a long time may be a more serious problem.

If the user's Inbox has somehow become permanently locked, contact Oracle Support for assistance in resolving the issue. A permanently locked Inbox is possibly the case if there are many messages for the user that are stuck in the local queue for an extended period of time. The utllockt.sql database diagnostic script may be useful in diagnosing database lock issues.

See Also:

"utllockt.sql" for script usage information

User does not receive mail for 4-5 minutes but there is little database load

The most common reason for this is the MTA queue processor going to sleep when there are no messages in the queue. The installation default is to have queue processors sleep for 2 minutes, configured by the orclMailSMTPQueuePollInterval setting.

Use oidadmin to adjust this setting to edit the information in the Oracle Internet Directory server. Using oidadmin, navigate to the settings for the SMTP Outbound server and update the value. You can decrease this interval to as little as 1 second, but keep in mind that very small values will cause the SMTP servers to poll more frequently and will increase the load on the Applications Tier CPU and also on the Oracle Collaboration Suite Database.

User receives most messages quickly, but some are delayed for hours or days

This can occur when the MTA gets terminated abruptly due to an internal error. When this happens, a large number of messages can remain waiting to be requeued, resulting in half an hour of recovery delay. A bad message in the queue could potentially cause the MTA to go down every half an hour (or whenever it retries) and that could cause long delays for other messages.

When the MTA goes down repeatedly, look for the oldest messages in the queue. This can be determined based upon how frequently new log directory entries get created. Under normal circumstances, messages must not be in the queue for more than an hour.

If some messages get stuck for days, contact Oracle Support.

Users say they received mail from a distribution list, while others say they have not

If the log level is sufficiently high, check whether the name resolution is returning all the expected recipient addresses
Examine all the child messages created using the esd_find_message.sql script located in the $ORACLE_HOME/oes/admin directory for the delivery (1 for every 1000 recipients) and ensure that all the recipients are present, and then look for delivery errors, if any.
The List Server may not be asking for DSNs in some cases. In that case, no failure notifications are sent back. Otherwise, a DSN is sent to the envelope's return path (es_envelope.mailfrom) and to the postmaster address.
If the MTA goes down during the name resolution and delivery, this could also result in partial deliveries.
Check the Oracle Internet Directory Query Entry Return Limit. This number should be at least equal to the size of the distribution list.

See Also:
"List Server" for more information
If the distribution list is very large, meaning the number of subscribers is large, it is possible that the List Server is still processing the list.
Check if any users have exceeded their quota and so have not yet received the mail from a distribution list.
Check if the users have suspended their subscription from the distribution list, in which case resume their subscription.

Message is caught in a mail loop

A mail loop can be defined as a message that is being sent from one account to another and back again. One way this can happen is by having two separate accounts. For example, a user has a Yahoo e-mail account and a corporate e-mail account. They go on vacation. Before leaving, the user configures their Yahoo account to forward all messages to their corporate account. The user also sets a vacation auto-reply on their corporate account. This results in any mail sent to the Yahoo account being forwarded to the corporate account. When the corporate account receives that e-mail and sends back the auto-reply vacation message to the Yahoo account, which is set to auto-forward, the message can't be delivered to either account and gets caught in a mail loop.

Mail loops result in a sudden increase in the number of messages coming into the system. The queues may start backing up as a result. Use the esd_mail_queue.sql script to get details of the messages in the queue.

See Also:

"esd_mail_queue.sql" for script usage information

After identifying a user with a mail loop, delete the rule using oesrl and notify the user of the problem.

See Also:

"oesrl" for information on how to delete a rule
Chapter 8, "Deploying Oracle Mail" in Oracle Collaboration Suite Deployment Guide if this level of load has not been encountered before either adding new processes, changing their parameters, or both

Note:

When a server crashes it will be restarted by opmn. A line similar to the following is written to $ORACLE_HOME/opmn/logs/ipm.log:

05/04/07 14:37:03 [4] Process Crashed: email~email_smtp_out~111266499464438837~1 (662306867:5491) - Restarting

Check the timestamps on these messages. If the timestamps are close together and if there are a lot of messages like this, the processes are frequently crashing.

Messages are being rejected

Causes for this problem include:

Anti-Spam: Message is identified as spam
Anti-Virus: Attachment contains a known virus
Message Size: Message size meets or exceeds the maximum message size accepted
System rule or user rule criteria for rejection: MTA has evaluated rules stating that message should be rejected if it meets certain criteria

Solutions include:

Check the log files for entries showing that a message was rejected
Check the message routing policy parameters in the Policy subtab of the Administration tab of the Oracle WebMail client and confirm the entries listed for rejection. If an entry matches a sender that should be allowed to send e-mail to this application, that entry should be removed or updated to allow the sender only.

See Also:
"Configuring Routing Control for Incoming Mail" for more information about setting routing control policies
Check the Maximum Message Size parameter on the SMTP inbound process
Check system rules

Message delivery is slow

Causes for this problem include:

Server is being spammed and there is a spike in the mail traffic
Check the log files for frequent temporary failures and check if many log directories are getting created due to the server going down. Log files can show which message is getting processed when the server goes down. Moving problematic messages to a temporary queue will clear the requeued messages that were picked up before and could not be processed.
Check for network contention or problems with DNS server
Check LDAP communication and Oracle Internet Directory for any performance problems
Check database for problems with insertion and any other performance issues with the Oracle Collaboration Suite Database

Problems with a Single Message

Often a single message can cause problems. These can be messages that are malformed or have a characteristic that is not acceptable, such as being in the form of spam or having a virus attachment to the message.

This section addresses problems with a single message, including:

Read e-mail time is slow
Misconfigured rules causes mishandling of messages

Read e-mail time is slow

See Also:

"It takes a long time to download a mail message" for more information

Misconfigured rules causes mishandling of messages

Check for specific rules. Rules are basically kept in two places: Oracle Internet Directory and in the user_source view on the Oracle Collaboration Suite Database associated with the user.

The Oracle Internet Directory information is stored in XML format and is what the user sees in Oracle Collaboration Suite. The user_source view contains the procedural information and is actually what is checked against when messages come through.

The following example shows what a rule can look like in the directory and in the database.

Example A-1 Structure of a Rule

RGMUM1:UM903v2 % oesrl -p jane.doe@acme.com
<account qualifiedName="JANE.DOE@ACME.COM" ownerType="user" id="0">
<rulelist event="deliver">
<rule description="OEM Alerts US" group="all" active="yes" visible="yes">
<condition negation="no" junction="or">
<condition negation="no" junction="and">
<attribute tag="rfc822to"/>
<operator caseSensitive="no" op="contains"/>
<operand>oemalerts_us@ACME.COM</operand></condition></condition>
      <action>
      <command tag="moveto"/>
      <parameter>/jane.doe/INBOX/oemalerts_us</parameter></action></rule>
      <rule description="gmapocsDBA" group="all" active="yes" visible="yes">
      <condition negation="no" junction="and">
      <condition negation="no" junction="and">
      <attribute tag="rfc822from"/>
      <operator caseSensitive="no" op="contains"/>
      <operand>amerocs@rgmdbs1.us.acme.com</operand></condition></condition>
      <action>
      <command tag="moveto"/>
      <parameter>/jane.doe/INBOX/gmapocsDBA</parameter></action></rule>
      <rule description="GM Team" group="all" active="yes" visible="yes">
      ......

The first rule states that if a message comes into the user's inbox with the to string oemalerts_us@ACME.COM, move it to the oemalerts_us folder, which is a subfolder of the inbox.

It appears as follows in user_source:

Example A-2

SQL> select text from user_source where name like 'DELIVER_19225%';

TEXT
--------------------------------------------------------------------------------
PROCEDURE deliver_19225 AS
BEGIN
  IF ((UPPER(es_rule.rfc822to) LIKE '%'||UPPER('oemalerts_us@ACME.COM')||'%'))

  THEN
    es_rule.moveto('/jane.doe/INBOX/oemalerts_us');
  END IF;

All rules in user_source have a name of DELIVER_number, where number is the user's user ID (same as the folder_id of the INBOX). When rules are created, they also trigger the creation of a procedure.

You can query against user_source to find other rules, such as ones that someone might have setup to delete messages or to BCC an account.

Oracle Mail Application Problems

Any one of the Oracle Mail applications can experience problems due to physical or virtual areas of the installation that may require certain resources to be available. When these resources are depleted, an area of the application will be affected.

The following categories have been identified as those that directly affect the operation of the Oracle Mail application:

Mail protocols
Mail queues
Housekeeping

This section includes the following topics:

Problems with Mail Protocols
Problems with the Mail Queues
Problems with the Housekeeper Process

Problems with Mail Protocols

This section addresses problems with the IMAP, POP, SMTP, and NNTP protocols, including:

Complete failure of protocol servers
Too many IMAP/POP database connections
Too many SMTP Inbound client connect rejections
Connect time to SMTP server is long

Complete failure of protocol servers

If you can bind to a protocol server but not issue a successful command, check the log files for database related problems.
If you cannot bind, check to see if the listener is up.

See Also:
"Checking the health of the e-mail protocol server listener" for instructions on checking the status of the listener
If processes will not start using the Application Server Control Console for Collaboration Suite, and there is still no access, check the log files for possible issues.
Ensure that the server's parameters are configured correctly. Incorrect configuration can prevent the server from starting.

Too many IMAP/POP database connections

Check for Oracle Collaboration Suite Database performance issues.
One or more user folders are locked: Some transaction is holding a lock on ES_FOLDER or ES_USER records for one or more users. This leads to the IMAP server consuming more and more database connections as the requesting clients timeout a request and issue a similar request, again leading to the IMAP server taking more connections from the pool.

Use regular lock detection SQL scripts, such as utllockt.sql or Oracle Enterprise Manager 10g to detect this situation. oesmon output for the database connection dump (taken few minutes apart) will also show the same users still executing the same statements.

As a solution, end the session-holding lock and contact Oracle Support with all the information about the ended session.

Too many SMTP Inbound client connect rejections

Client connections can be rejected under high load, typically due to the following:

All available sessions have been consumed, so no database sessions are available to service client requests to send mail. This can be observed from the log message Getting service context from pool failed. Increasing the number of available sessions for the SMTP Inbound server instance can reduce the number of rejections. However, this increase should not result in too many overall database sessions needed by different server instances running against the database.
All threads up to the configured maximum have been consumed and new client requests result in failures. This can be observed from the log message No worker available. Increasing the maximum allowed threads will reduce client connect rejections. Consider increasing this limit in increments of 100.

Connect time to SMTP server is long

If it is taking a long time to connect to the SMTP server, the problem is most likely either a slow network or the Oracle Collaboration Suite Applications Tier protocol server is overloaded. Neither a database nor an Oracle Internet Directory connection is necessary for the initial greeting. If the Applications Tier is not overloaded, begin to trace the network request to find the location of network congestion.

Problems with the Mail Queues

Messages that are accepted for delivery are queued up into different queues based on the recipient list. The queues are defined as submit, local, relay, and list. Messages can become deferred or delayed, based on a problem that the message may be experiencing. These problems can be caused by network problems, system resource problems, or message content. However, there can also be issues with an overabundance of messages that can delay e-mail delivery.

This section addresses problems with the mail queues, including:

Queues are building up and not emptying
Length of local queue
Length of relay queue
Length of submit queue

Queues are building up and not emptying

Not unusual but should be evaluated
- Consider increasing the number of SMTP Inbound and Outbound instances
- Consider increasing the number of pool connections to the LDAP and database servers
  
  See Also:
  "Modifying Parameter Settings for a Specific Server Instance" for information about editing SMTP server parameter settings
Possible problematic bad message in the queue that causes the server to go down. Although very rare, it has been known to happen.

On UNIX systems, a symptom of this is core files in the log directories. A message that causes the mail servers to go down will do so repeatedly since it never gets removed from the queue. On UNIX systems look for core files with the following command:
```
% find $ORACLE_HOME/oes/log/um_system/smtp* -name core
```
If core files are found, contact Oracle Support.

Length of local queue

The incoming message rate can be higher than that of the processing rate, which can result in Local queue growth. Monitor the Length of Local Queue metric of the Oracle Collaboration Suite Database target using the Grid Control Console. The esd_mail_queue.sql script can also be used to examine the current contents of the Local queue.

See Also:

"esd_mail_queue.sql" for script usage information

If the count of messages in the Local queue continues to increase, you can infer that the system is not able to handle the incoming load.

Possible reasons for this are:

Lack of system resources; slow Oracle Collaboration Suite Applications Tier; slow database; slow Oracle Internet Directory are some possibilities. SMTP and IMAP are known to spin in the past and consume all the CPU resources on Applications Tier.
Large amount of incoming spam mail. This can be detected by looking at the sender's address in the ES_ENVELOPE table. If the envelope's MAILFROM is not like one of the local domains and not null (<>), it has to be of external origin. Spam filters can be turned on to block those senders.

Run the esd_queue_examine.sql script located in the $ORACLE_HOME/oes/admin directory to determine if there is a large amount of spam in the incoming queue.
Insufficient SMTP Inbound and Outbound instances. If the MTAs are processing at the expected rate and the queue is still growing, increase the number of MTAs.

If the count of messages with NULL modified_date is very low, delivery must be failing for some reason. SMTP Inbound and Outbound server log messages located in the $ORACLE_HOME/oes/log/um_system/smtp_in and $ORACLE_HOME/oes/log/um_system/smtp_out directories, respectively, should give an indication of what is happening.

Possible reasons for local mail delivery failure are:

Target database is down; log files will show ORA- XXXXX errors.
Oracle Internet Directory is down or name resolution is failing. The log files will indicate the Oracle Internet Directory errors.
Folder Locks are a rare occurrence, unless a large amount of mail is being delivered to a small number of users. Another possibility is that one of the user folders could have been locked by a nonexistent process or a session. This should only block mail for a single user, but could result in a lot of requeues.

Length of relay queue

Check for the following:

DNS problems

One of the most common problems is incorrect DNS setup or slow DNS servers. A failure in DNS lookup will result in relay failure. These errors can be seen in the SMTP Outbound log located in the $ORACLE_HOME/oes/log/um_system/smtp_out directory.
Failed to connect to foreign MTA

Causes for failure include:
- The remote host is refusing the connection due to reverse DNS lookup failure or a spam check failure. If the relaying MTA is not one of the MX hosts of the domain and does not have a PTR record in the DNS, the foreign host might not allow the connection. Sometimes, the relaying hosts can get blacklisted, denied connection, or both, if they are acting as open relays.
- The relaying host is unable to make connections outside the Oracle Collaboration Suite installation due to firewall problems.
If the system is relaying locally to one or more local Oracle Collaboration Suite Databases, the local relay mail can get stuck when one of the databases is down or not accepting mail fast enough.
Mail loops caused by incorrect setup
Messages bouncing between the Oracle Collaboration Suite and external agents, such as spam filters and virus scanners, due to incorrect address rewriting or setup.

Length of submit queue

Analyze the queue to check whether the backlog is submit, and the density of messages with similar subjects, senders, recipients, or domains within the queue using the esd_queue_examine.sql script.

See Also:
"esd_queue_examine.sql" for script usage information
A large density of messages with similar subjects, senders, recipients, or domains usually occurs for one of two possible reasons: the system has been spammed with unwanted mail or there is a mail loop. Looking at the subject and headers is usually sufficient to determine between the two possibilities, if one needs to look further at the body of the message one can use the esd_show_message.sql script.

First, look at the queue using the esd_mail_queue.sql script which lists all messages in the queue along with their message ID. Find the ID of the message of interest. Then, use the esd_show_message.sql script to look at detailed information about that message.

See Also:
"esd_mail_queue.sql" and "esd_show_message.sql" for script usage information

Problems with the Housekeeper Process

The Housekeeper is the daemon that cleans up unreferenced e-mail data, such as messages deleted by a user, queued messages already processed, and expired messages or folders with an expiration date.

The Housekeeper will perform the following sequence of events during its processing of the e-mail application:

Note:

Not all of these events may apply and most are configurable.

See Also:

"Modifying Parameter Settings for a Specific Server Instance" for information about editing Housekeeper server parameter settings

Expiration of regular messages
Pruning of processed messages in queues
Pruning of expunged messages
Collection of pruned messages
Moving old messages to tertiary storage
Text index synchronization
Text index optimization

This section addresses problems with the Housekeeper, including:

Housekeeper cannot keep up with cleaning up old mail
Length of collect queue is growing
Length of pruning queue

Housekeeper cannot keep up with cleaning up old mail

Determine which housekeeping tasks with which the system is running behind. The possible areas include the pruning queue and the collection queue.
If the pruning queue is very large, consider increasing the Concurrency Level parameter on the Housekeeper server instance that performs the pruning task. In most cases, concurrency level of 4 to 6 would be more than adequate to handle a significant amount of backlog.
If the collection queue is very large, consider increasing the Concurrency Level parameter on the Housekeeper server instance that performs the collection task.

If an instance is configured to perform both pruning and collection, the increased concurrency level will apply to both instances upon process refresh.
Check if the Frequency of Execution of Housekeeper Process parameter is set unreasonably low. A frequency of every 60 or 120 minutes is recommended.

Length of collect queue is growing

First, check whether the Collection parameter is configured on any Housekeeper instance.

If such an instance exists and the processes associated with the instance are running, the system may experience data inconsistencies or data corruptions. The next step is to check the Housekeeper log file on the Oracle Collaboration Suite Applications Tier. If log file directories are getting created rapidly, it indicates that the server has crashed. Locate the core dump, if present, and the log file content and contact Oracle Support.

If this is not the case but there are ORA-XXXXX errors present in the log file, check the errors and see if it is fixable. If not contact Oracle Support.

If there are no errors reported in the log file but the log file grows at a very slow rate, consider increasing the Concurrency Level parameter on the instances with the Collection parameter enabled. Also check that Log Miner Recovery is disabled in the Housekeeper server configuration debug parameters. Enabling this parameter causes collection to slow down 300%. Therefore if Log Miner Recovery is not going to be used in the system, it is recommended that this parameter be disabled.

Check whether Oracle Text is installed and configured correctly. If an installation does not use Oracle Text and it was not activated during installation, Oracle Mail might be adversely affected. Usually installation log files show whether Oracle Text index-related data are configured and created correctly. If not, the system is left with invalid Oracle Text indexes, which causes errors when the Housekeeper tries to delete entries from related tables. In that case, drop whichever index reported in the log file, if the system is not configured to enable text search in e-mail messages.

Length of pruning queue

First, check whether the Housekeeper instances have the Pruning parameter enabled. If the Housekeeper instances are verified to be running correctly, monitor the length of the ES_QUEUE entries and see if they drop at a reasonable rate. Pruning causes the collect queue to grow. Do not be alarmed if this behavior is observed.

Next, check server log files on the backend database $ORACLE_HOME to see if there are any errors reported. If core dumps are present, contact Oracle Support. Analyze and fix, if possible, any ORA-XXXXX errors found in the log file. Otherwise contact Oracle Support. ORA-XXXX errors occur very rarely in pruning logs.

Problems with Oracle Collaboration Suite Database

The Oracle Collaboration Suite Database stores all of the e-mail messages, text indexing, and folders of every account that is authorized for access. After successful Oracle Internet Directory authentication, a user is passed to a single database connection to the database for message access. There is a possibility that access fails or is denied. This problem can be local to a single account or global to all accounts. By checking complaints from the user community it will be easy to recognize as a global problem.

This section includes the following topics:

Oracle Collaboration Suite Database Connectivity Problems
Oracle Collaboration Suite Database Performance Problems

Oracle Collaboration Suite Database Connectivity Problems

This section addresses connectivity problems with the Oracle Collaboration Suite Database.

Network unavailable causing massive problems

If the network suddenly goes offline, this can cause massive disruption to end-user connectivity, as well as e-mail process communication (in a distributed environment) and end-user communication, depending upon the severity of the outage.

It may be necessary to shut down the protocol processes until the network is stable. Once the network stabilizes, bring all processes back online.

Oracle Collaboration Suite Database Performance Problems

This section addresses performance problems with the Oracle Collaboration Suite Database, including:

SQL*Net service is unavailable
Oracle Collaboration Suite Database is slow
Archive log file directory partition is full
Oracle Mail storage tablespaces are full due to lack of extents

SQL*Net service is unavailable

SQL*Net is the protocol that enables access to the database. There is an instance of the SQL*Net listener on both the infrastructure and storage tiers.

The following problems can occur if the SQL*Net listener is down:

Users cannot connect to Oracle Internet Directory for authentication.
When authenticated with Oracle Internet Directory, the user is passed to the storage database. If the SQL*Net listener is down, users cannot access the Oracle Collaboration Suite Database.

Check the listeners on both the infrastructure and the Oracle Collaboration Suite Database locations by running the following commands:

On each system where the database resides:
```
$ lsnrctl status
```
On the Applications Tier system, if $ORACLE_HOME/network/admin/tnsnames.ora is configured on the Applications Tier to access the databases or, if not, on the infrastructure and Oracle Collaboration Suite Database tiers:
```
$ tnsping connect_string
```

Oracle Collaboration Suite Database is slow

The following symptoms could be indicative of a slow database:

Slow response times for users opening folders, reading mail, and sending mail (not authenticating, only Oracle Internet Directory is contacted).
Large number of database connections. If the database is slow to handle a request, the protocol servers can request a new database connection for the next unit of work that arrives. If the database is slow because of disk constraints, or some other hardware resource issue, increasing the database pool can make matters worse. The increased connections can tax a loaded database even further when a flood of new database requests comes in, each taking their own database and operating system memory. Whether the database is under stress can best be analyzed by a DBA and through tools such as Statspack or Oracle Enterprise Manager 10g.
The Housekeeper queues continue to grow and never catch up until a decrease in activity, such as over the weekend.
Mail delivery slows down.
Users will see an unable to retrieve database connection error occasionally upon login. This causes the Applications Tier to slow down if the maximum connection pool exceeds the available memory on the system.

Archive log file directory partition is full

If archive logging is enabled, all of the transactions are saved to a file for recovery purposes.

See Also:

"Oracle Mail Archive Policies" for more information

To check the current space usage:

Change directory to the designated archive log directory or partition.
From the system command prompt, execute the following command to check the available space on the current disk drive:
```
$ df –k
```

A normal routine of backup should be performed and confirmed. Afterwards, the files in the directory can be purged with the exception of the current log file.

If this directory is not backed up and the directory partition reaches full capacity, then the database will literally stop until one of two things occur to relieve the disk space:

Old archive files are moved off this partition to another partition, or
A backup is performed of the archive files to a storage medium for future recovery purposes.

Oracle Mail storage tablespaces are full due to lack of extents

If the Oracle Mail storage tablespaces run out of extents, e-mail delivery and end-user e-mail message commits fail. Check the database alert logs for any tablespace full errors.

Table A-1 lists tablespaces upon which to focus an investigation should Oracle Mail tablespaces run out of extents.

Table A-1 Oracle Mail Storage Tablespaces

Tablespace	Tables Contained Within
ESBIGTBL	Contains the largest tables: `ES_BODY` (lob table), `ES_IMT_TEXT` (search text), `ES_BODY_RCOV`
ESSMLTBL	Contains the smaller tables, including all of the other tables not listed. The `ES_HEADER` table should be checked, however, because it contains address information relating to individual e-mail messages.
ESFREQTBL	Contains the most frequently used tables, including `ES_INSTANCE`, `ES_FOLDER`, `ES_QUEUE`, `ES_USER`, `ES_RECIPIENT`, and `ES_DOMAIN`.
ESFREQIDX	Contains the table indexes from the `ESFREQTBL` tablespace.
ESINFREQIDX	Contains table indexes from the `ESSMLTBL` tablespace.
ESTERSTORE	Contains only the `ES_TBODY` table that is populated when the Housekeeper process moves messages from the `ES_BODY` table to off load the more active disks to less expensive disks. The Housekeeper process must be configured in order for this tablespace to be populated by enabling the Tertiary storage task, as described in "Enabling Tertiary Storage".

To display a summary of available space of all tablespaces, execute the following SQL statement as sys or system:

SQL> select tablespace_name, sum(bytes) from dba_free_space group by tablespace_name order by sum(bytes);

The command returns the following:

TABLESPACE_NAME                 SUM(BYTES)
-------------------------------------------
XDB                                  262144
EXAMPLE                              458752
USERS                                983040
ESINFREQIDX                         2490368
SYSTEM                              3211264
ESSMLTBL                            3407872
ESPERFTBL                           5046272
ESFREQTBL                           9568256
ESFREQIDX                           9633792
ESNEWS                             10223616
ESTERSTORE                         10223616
ESBIGTBL                           17039360
ESORATEXT                          20185088
ESMRLMNR                           52297728
………
………

For each table within the tablespaces listed there is a NEXT_EXTENT column that has a particular size allocated, by default. As space decreases, the tablespace seeks more space to accommodate its NEXT_EXTENT setting. If there is not enough space, it fails to extend and the application begins to receive errors.

Solution: If space remaining is depleted, add another data file to the tablespace experiencing problems.

See Also:

Oracle Database Administrator's Guide for details about adding datafiles to the tablespace

Debugging Oracle Mail

This section discusses various debugging strategies to aid in troubleshooting.

This section includes the following topics:

Checking the health of the e-mail protocol server listener
Checking memory, PGA memory, and number of processes connecting from MTAs to an Oracle Collaboration Suite Database

Checking the health of the e-mail protocol server listener

The listener for Oracle Mail is called listener_es, by default. Execute the following command to check the listener status:

$lsnrctl stat listener_es

Example A-3 illustrates a typical return on the command:

Example A-3 Status of Listener

LSNRCTL for Linux: Version 9.0.1.4.0 - Production on 06-FEB-2004 11:23:32

Copyright (c) 1991, 2001, Oracle Corporation.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=UMREG)))
STATUS of the LISTENER
------------------------
Alias                     listener_es
Version                   TNSLSNR for Linux: Version 9.0.1.4.0 - Production
Start Date                17-DEC-2003 22:41:00
Uptime                    50 days 12 hr. 42 min. 32 sec
Trace Level               off
Security                  OFF
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/v2/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/v2/network/log/listener_es.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=UMREG)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rgmum9.us.oracle.com)(PORT=25))(PRES ENTATION=ESSMI))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rgmum9.us.oracle.com))(PORT=143))(PRESENTATION=IMAP))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rgmum9.us.oracle.com))(PORT=110))(PRE SENTATION=POP))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rgmum9.us.oracle.com)
Services Summary...
Service "ESSMI" has 2 instance(s).
  Instance "um_system", status READY, has 1 handler(s) for this service...
  Instance "um_system", status READY, has 1 handler(s) for this service...
Service "ESSMIAMOCS" has 2 instance(s).
  Instance "um_system", status READY, has 1 handler(s) for this service...
  Instance "um_system", status READY, has 1 handler(s) for this service...
Service "IMAP" has 2 instance(s).
  Instance "um_system", status READY, has 1 handler(s) for this service...
  Instance "um_system", status READY, has 1 handler(s) for this service...
The command completed successfully

Note:

In the example, instances refers to processes. There are two inbound SMTP servers connected to the listener.

Checking memory, PGA memory, and number of processes connecting from MTAs to an Oracle Collaboration Suite Database

Occasionally, memory usage must be checked in the Oracle Collaboration Suite Databases due to various issues with program global area (PGA) memory usage within the databases. First, check to see how many connections (and what type) are coming into the database.

Connect to the database as es_diag and run the esd_show_sessions.sql script.

Note:

The number of Oracle Mail server database connections is determined using the Oracle Enterprise Manager 10g Application Server Control Console, as described in Chapter 3.

To check PGA memory usage, use the following script:

set pages 9999

 select s.sid, s.program, st.value from v$session s, v$sesstat st
  where s.sid=st.sid and statistic#=20
      and s.program like 'es%' order by 3;

Output will return similar to the following:

303 esimapds@rgmum6 (TNS V1-V3)                         3339424
   164 esimapds@rgmum13 (TNS V1-V3)                        3500144
   285 esimapds@rgmum13 (TNS V1-V3)                        3735304
    82 esimapds@rgmum13 (TNS V1-V3)                        4394984
   125 esls@rgmum2.us.oracle.com (TNS V1-V3)               7911248

The first column contains the SID and the program name, followed by the amount of memory consumed. In this example, there is a List Server instance from rgmum2 using about 7.4 MB of PGA memory on this database instance. If you see processes consuming more than 5 or 6 MB, they should be investigated and bounced, if necessary.