16 Diagnosing and Troubleshooting Problems

 This chapter describes the methods and information sources you can use for diagnosing and solving problems that you might encounter while using Oracle Traffic Director.

This chapter contains the following sections:

Roadmap for Troubleshooting Oracle Traffic Director

This section provides the sequence of tasks you can perform to diagnose and solve problems with Oracle Traffic Director.

  1. Verify whether the system configuration is correct.

    For information about the supported platforms and operating systems, see the Oracle Fusion Middleware Supported System Configurations at:

    http://www.oracle.com/technetwork/middleware/ias/downloads/fusion-certification-100350.html

  2. Look for a solution to the problem in Solutions to Common Errors.

  3. Check whether the information in Frequently Asked Questions helps you understand or solve the problem.

  4. Try to diagnose the problem.

    1. Review the messages logged in the server log. Look for messages of type WARNING, ERROR, and INCIDENT_ERROR.

      For messages of type WARNING and ERROR, try to solve the problem by following the directions, if any, in the error message.

      An INCIDENT_ERROR message indicates a serious problem caused by unknown reasons. You should contact Oracle for support.

    2. Increase the verbosity of the server log, and try to reproduce the problem.

      Oracle Traffic Director supports several log levels for the server log, as described in Server Log Levels. The default log level is NOTIFICATION:1. The least verbose log level is INCIDENT_ERROR, at which only serious error messages are logged. At the TRACE:1, TRACE:16, or TRACE:32 levels, the logs are increasingly verbose, but provide more detailed information, which can be useful for diagnosing problems.

      Increase the log verbosity and then try to reproduce the problem. When the problem occurs again, review the messages logs for pointers to the cause of the problem.

      For information about changing the server log level, see Configuring Log Preferences.

  5. Contact Oracle for support, as described in Contacting Oracle for Support.

Troubleshooting High Availability Configuration Issues

This section provides information about the tasks you can perform to diagnose and solve problems with an Oracle Traffic Director high availability configuration.

  • The Oracle Traffic Director configuration must be deployed on two nodes. For more information, see Creating and Managing Failover Groups.

  • The router ID for each failover group has to be unique.

  • Make sure that KeepAlived is installed. In most cases KeepAlived software is installed by default on both the Exalogic compute nodes (or VMs) where Oracle Traffic Director instances are running. To check if KeepAlived is installed, run the following command:

    rpm -qa | grep keepalived
    

    If KeepAlived is correctly installed, an output similar to the following is displayed:

    keepalived-1.2.2-1.el5
    

    Note that if KeepAlived is not installed, the RPM can be found in the software repository.

  • For KeepAlived specific information, check the logs in the /var/log/messages.

  • Make sure to provide the correct VIP address and the appropriate subnet mask (netmask) bit-size for successfully completing the high availability configuration. In addition, ensure that you provide the netmask bits and not the actual netmask value. For more information, see Creating Failover Groups.

Solutions to Common Errors

This section provides solutions to the following problems:

Startup failure: could not bind to port

This error occurs when one or more HTTP listeners in the configuration are assigned to a TCP port number that is already in use by another process.

[ERROR:32] startup failure: could not bind to port port (Address already in use)
[ERROR:32] [OTD-10380] http-listener-1: http://host:port: Error creating socket (Address already in use)
[ERROR:32] [OTD-10376] 1 listen sockets could not be created
[ERROR:32] server initialization failed

You can find out the process that is listening on a given port by running the following command:

> netstat -npl | grep :port | grep LISTEN

If the configured HTTP listener port is being used by another process, then either free the port or change it as described in Modifying a Listener.

Unable to start server with HTTP listener port 80

This error occurs if you configure an HTTP listener port up to 1024 (say 80) and attempt to start the Oracle Traffic Director instance as a non-root user.

The following messages are written to the server log:

[ERROR:32] [OTD-10376] 1 listen sockets could not be created
[ERROR:32] [OTD-10380] http-listener-1: http://soa.example.com:80:
 Error creating socket (No access rights)

Port numbers up to 1024 are assigned by the Internet Assigned Numbers Authority (IANA) to various services. These port numbers are accessible only by the root user.

To solve this problem, you can do one of the following:

  • Configure the Oracle Traffic Director listener with a port number higher than 1024 (say, 8080), and create an IP packet-filtering rule to internally redirect requests received at port 80 to the configured Oracle Traffic Director port, as shown in the following examples:

    # /sbin/iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8080
    # /sbin/iptables -t nat -A PREROUTING -p udp -m udp --dport 80 -j REDIRECT --to-ports 8080
    

    Make sure that the iptables service is started by default when the server restarts by running the chkconfig command, as shown in the following example:

    # chkconfig --level 35 iptables on
    
  • If xinetd is installed in the system, create a file (named otd, for example) in the /etc/xinetd.d/ directory with the following entry:

    service otd
    {
    type = UNLISTED
    disable = no
    socket_type = stream
    protocol = tcp
    user = root
    wait = no
    port = 80
    redirect = 127.0.0.1 8080
    }
    

    This entry redirects all incoming TCP traffic received on port 80 to port 8080 on the local machine.

    For more information, see the Linux xinetd documentation.

Oracle Traffic Director consumes excessive memory at startup

When you start an Oracle Traffic Director instance, the values for certain parameters—maximum number of keep-alive connections, size of the connection queue, and maximum number of connections to origin servers—are assigned automatically based on the system's file descriptor limit.

If the file descriptor limit is very high, the auto-assigned values for undefined parameters can be needlessly high, causing Oracle Traffic Director to consume an excessive amount of memory. To avoid this problem, explicitly configure the maximum number of keep-alive connections (Tuning Keep-Alive Settings), the size of the connection queue (Tuning the Thread Pool and Connection Queue Settings), and the maximum number of connections to individual origin servers (Modifying an Origin Server).

Operating system error: Too many open files in system

This operating system error occurs in Linux when the number of allocated file descriptors reaches the limit for the system.

The following message is written to the server log:

[ERROR:16] [OTD-10546] Insufficient file descriptors for optimum configuration.

To avoid this error, increase the file descriptor limit on Linux from the default of 1024 to a reasonable number. For more information, see Tuning the File Descriptor Limit.

Unable to stop instance after changing the temporary directory

This error occurs when, after changing the temporary directory for a configuration, you deploy the change without stopping the instances, and then attempt to stop the instances later. The temporary directory is the directory (on the administration node) in which the process ID and socket information for the instances of the configuration are stored.

When this error occurs, the following message is written to the server log:

OTD-63585 An error occurred while stopping the server. For details, see the server log.

To Avoid This Error

If you change the temporary directory for a configuration, you should first stop all the instances of the configuration, deploy the changes, and then start the instances.

To Solve This Problem

Kill the Oracle Traffic Director instance.

  1. Find out the current temporary directory for the configuration by doing one of the following:

    • Run the get-config-prop command, as shown in the following example:

      tadm> get-config-prop --config=soa temp-path
      /tmp/net-test-a46e5844
      
    • Log in to the administration console, select the required configuration, and select Advanced Settings. On the resulting page, look for the Temporary Directory field.

    Note the path to the temporary directory.

  2. Find out the process ID of the running instance by running the following command:

    cat temp_dir/pid
    

    temp_dir is the full path to the temporary directory that you noted in step 1.

    Note the process ID that this command returns.

  3. Kill the process, by running the following command:

    kill pid
    

    pid is the process ID that you noted in step 2.

Unable to restart the administration server

In Linux systems, the cron script tmpwatch, located at /etc/cron.daily/tmpwatch, is set to execute everyday by default. This script removes all files that are older than 240 hours (10 days) from all /tmp directories in the administration server. Hence, if the administration server is not restarted for more than 10 days, the default pid file is removed. This in turn prevents the administration server from being restarted after 10 days.

To Avoid This Problem

  • Change temp-path location: In the file, <otd-home>/admin-server/config/server.xml, change the temp-path value to a location where the server user has exclusive rights. For example, change it to, <temp-path>/var/tmp/https-test-1234</temp-path>. In addition, make sure that the new temp-path is not being monitored by the tmpwatch script.

  • Change the cron script: Remove the value 240 /tmp from the cron script for tmpwatch. Use the -X/--exclude-pattern option to exclude a directory from being monitored by tmpwatch. For more information about using this option, see the man-page for tmpwatch.

Oracle Traffic Director does not maintain session stickiness

Oracle Traffic Director can maintain session stickiness as follows:

Cookie Based Session Persistence

This is a common scenario where clients accept cookies from web or application servers. In this scenario, Oracle Traffic Director, while load balancing HTTP traffic, ensures session persistence using its own cookie. This ensures that sticky requests, requests containing HTTP Session cookie, are routed to the same back-end application server where this session cookie originated.

Oracle Traffic Director 11. 1.1.5 needs to be explicitly configured to honor session persistence when a back-end application server uses HTTP Session cookie other than the default JSESSIONID. On the other hand, Oracle Traffic Director 11. 1.1.6 honors session persistence on receiving any cookie from the origin server.

Note:

Oracle Traffic Director needs additional patches within WebLogic 10.3.x to maintain URI based session stickiness.

URI Based Session Persistence

This is not a very common scenario. In this case, cookies are disabled on clients and back-end web or application servers maintain session persistence by appending HTTP session information to the URI.

In this scenario, Oracle Traffic Director can honor session persistence if the back-end application server appends Oracle Traffic Director's JRoute cookie to the URI. Origin servers like WebLogic Server 10.3.6.2 and higher, 12.1 and higher, and GlassFish 2.0 and higher have the ability to append this JRoute cookie to the URI. Hence, Oracle Traffic Director is able to maintain URI based session persistence only with these origin servers.

Contacting Oracle for Support

If you have a service agreement with Oracle, you can contact Oracle Support (http://support.oracle.com) for help with Oracle Traffic Director problems.

Before Contacting Oracle Support

Before contacting Oracle Support, do the following:

  • Try all the appropriate diagnostics and troubleshooting guidelines described in this document Oracle Traffic Director Administrator's Guide).

  • Check whether the problem you are facing, or a similar problem, has been discussed in the OTN Discussion Forums at http://forums.oracle.com/.

    If the information available on the forum is not sufficient to help you solve the problem, post a question on the forum. Other Oracle Traffic Director users on the forum might respond to your question.

  • To the extent possible, document the sequence of actions you performed just before the problem occurred.

  • Where possible, try to restore the original state of the system, and reproduce the problem using the documented steps. This helps to determine whether the problem is reproducible or an intermittent issue.

  • If the issue can be reproduced, try to narrow down the steps for reproducing the problem. Problems that can be reproduced by small test cases are typically easier to diagnose when compared with large test cases.

    Narrowing down the steps for reproducing problems enables Oracle Support to provide solutions for potential problems faster.

Information You Should Provide to Oracle Support

When you contact Oracle for support, provide the following information.

  • The release number of Oracle Traffic Director.

    To find out the release number, run the following command:

    > $ORACLE_HOME/bin/tadm --version
    Oracle Traffic Director 11.1.1.7.0 Administration Command Line B01/14/2013 09:08
    
  • A brief description of the problem, including the actions you performed just before the problem occurred.

  • If you need support with using the administration interfaces, the name of the command-line subcommand or the title of the administration-console screen for which you require help.

  • Zip file containing the configuration files for the configuration in which you encountered the error.

    INSTANCE_HOME/admin-server/config-store/config_name/current.zip
    
  • Zip file containing the configuration files for the last error-free configuration.

    INSTANCE_HOME/admin-server/config-store/config_name/backup/date_time.zip
    
  • The latest server and access log files.

    Note:

    When you send files to Oracle Support, remember to provide the MD5 checksum value for each file, so that Oracle Support personnel can verify the integrity of the files before using them for troubleshooting the problem.