Skip Headers
Oracle® Fusion Applications Administrator's Guide
11g Release 1 (11.1.1.5)

Part Number E14496-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

24 Troubleshooting Performance, Scalability, Reliability

This chapter describes performance, scalability, and reliability issues that you might encounter and explains how to solve them.

This chapter includes the following topics:

24.1 Introduction to Troubleshooting Performance, Reliability, and Scalability

This section provides guidelines and a process for using the information in this chapter. Using the following guidelines and process will focus and minimize the time you spend resolving problems.

Guidelines

When using the information in this chapter, Oracle recommends:

Process

Follow the process outlined in Table 24-1 when using the information in this guide. If the information in a particular section does not resolve your problem, proceed to the next step in this process.

Table 24-1 Process for Resolving Performance Issues

Step Section to Use Purpose

1

Chapter 9 and Chapter 10 for locating key metrics

Chapter 11 for diagnosing Java applications in the middle tier

Chapter 12 for monitoring and tuning the Oracle database

Collect symptoms about the performance problem to determine if the problem is related the following:

  • Response time or throughput

  • Widespread or limited to specific users and flows

Determine what changed since the system was last performing well.

2

Section 24.2 through Section 24.10

Use Section 24.2 if the problem is widespread. Otherwise, review the problem description in Section 24.3 through Section 24.10 to see if there is a match

These section describes:

  • Possible causes of the problems

  • Solution procedures corresponding to each of the possible causes

4

Section 24.11

Use My Oracle Support to get additional troubleshooting information about Oracle Fusion Applications or performance, scalability, and reliability. My Oracle Support provides access to several useful troubleshooting resources, including Knowledge Base articles and Community Forums and Discussions.

5

Section 24.11

Log a service request if the information in this chapter and My Oracle Support does not resolve your problem. You can log a service request using My Oracle Support at https://support.oracle.com.


In addition to this process, for more information about determining if database cache sizes need to be increased, see "Automatic Database Performance Monitoring" chapter in the Oracle Database 2 Day + Performance Tuning Guide to use Automatic Database Diagnostic Monitor (ADDM) reports

24.2 Overall Slowness

Problem

When many users report slowness across many business flows, you need to determine if the cause is the host, components, heap usage, web sessions, or too many users accessing the application at one time.

Solution

To resolve this problem, perform the following steps:

  1. From Grid Control, check host health.

    For example, you can view the host CPU usage from Performance Summary page. To access this page from Grid Control:

    1. From the home page, click the Targets tab.

    2. From the Targets tab, click the Hosts secondary tab.

      The Hosts page displays the overall status of all the computers in the environment.

    3. From the Search list, search for specific host.

    4. Click a specific host name to monitor the performance.

      The Host home page displays.

    5. Click the Performance tab.

  2. Check if any key components are down:

  3. Check heap usage in EM Performance Summary page.

  4. If the heap is constantly close to 100 percent, then search with the string OutOfMemoryErrors in the Oracle WebLogic Server server_name.out in the following directories:

    (UNIX) DOMAIN_HOME/servers/server_name/logs
    (Windows) DOMAIN_HOME\servers\server_name\logs
    
  5. If there are OutOfMemoryErrors, a heap dump would have been generated in the directory specified by the -DHeapDumpPath parameter from the Oracle WebLogic Server startup JVM option. Submit the heap dump to Oracle Support for further analysis of what is retaining memory.

  6. Use Fusion Applications Control to view log messages, and see if there are a lot of messages getting logged and incidents getting raised. See Chapter 13 for and Chapter 17 for information about Oracle Fusion Applications log file, and see the "Managing Log Files and Diagnostic Data" chapter in Oracle Fusion Middleware Administrator's Guide for information about Oracle Fusion Middleware log files.

  7. Address the source of the errors and incidents, and verify the log level is set to SEVERE.

  8. Determine the number of web sessions in Fusion Applications Control or Grid Control. This number can fluctuate depending on the flow and heap size. Therefore, monitoring the trend can help you find spikes.

    For Fusion Applications Control:

    • View the number of active sessions for a specific Oracle Fusion application product:

      1. From the navigation pane, expand the product family, then Products, and then select the product.

      2. In the Product home page, locate the Servers sections.

      3. View the Active Sessions metric.

    • View the number of active sessions for an Oracle WebLogic Server domain:

      1. From the navigation pane, expand the farm and then WebLogic Domain.

        You can similarly check the active session for an Oracle WebLogic Server cluster or Managed Server from the navigation pane.

      2. Select a domain.

      3. In the WebLogic Domain home page, in the table on the left-hand side of the page, view the Active Sessions column.

    For Grid Control:

    1. Click the Targets tab.

    2. Click the Middleware secondary tab.

    3. From the Search list, select Oracle WebLogic Server Domain, and then click Go.

      You can similarly check the active sessions for an Oracle WebLogic Server cluster or Managed Server by selecting Oracle WebLogic Server Cluster or Oracle WebLogic Managed Server.

    4. Click on a domain.

      The WebLogic Server Domain home page displays.

    5. In the table on the left-hand side of the page, view the Active Sessions column.

    6. In the Product home page, locate the Servers sections.

    7. View the Active Sessions metric.

  9. If there is a spike in the number of web sessions, find out what is generating the additional load and what tests are being run around that time. By default, the session timeout is set to 15 minutes.

  10. Check data source health and see if it is running out of connection.

    • For Fusion Applications Control, see Section 10.4.1.

    • For Grid Control, check the Server Datasource metrics, as described in Section 10.5.2.

  11. Review current execution stacks for the Oracle WebLogic Server threads. There are several ways to perform this step:

    Oracle WebLogic Server Administration Console:

    1. In the Domain Structure, expand Environment and then Servers.

    2. In the Summary of Servers page, click the server from the table.

    3. In the Settings for server_name page, click on the Monitoring tab, then click on Threads subtab.

    4. Click Dump Thread Stacks.

    5. Click Save.

    For Grid Control:

    1. Click the Targets tab.

    2. Click the Middleware secondary tab.

    3. From the Search list, select Oracle WebLogic Managed Server, and then click Go.

      You can similarly check the active sessions for an Oracle WebLogic Server cluster or Managed Server by selecting Oracle WebLogic Server Cluster or Oracle WebLogic Managed Server.

    4. Click on a specific server having problems.

      The WebLogic Server home page displays.

    5. From the WebLogic Server menu, choose JVM Diagnostics > Threads > Real-Time Analysis.

    6. In the JVMs section, click on a thread in the upper section show details in the Threads section.

  12. If the threads are not blocked, follow the instructions in Section 11.3 to review the top Java methods and Section 11.4 to review the top SQL using JVM diagnostics.

  13. Extract JFR recording and review timing breakdown of slow requests. See Section 24.4.

24.3 Stuck Threads

Problem

Stuck threads may result if the server is nearing out of memory. If the server is close to out of memory, all requests should slow down. To resolve an out-of-memory issue, see Section 24.4.

If a request is taking longer than 10 minutes, the stuck thread is reported to Oracle WebLogic Server server_name.out in the following directories:

(UNIX) DOMAIN_HOME/servers/server_name/logs
(Windows) DOMAIN_HOME\servers\server_name\logs

For example:

<Mar 4, 2011 7:44:08 AM PST> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "600" seconds working on the request "weblogic.servlet.internal.ServletRequestImpl@18986012[
GET /productManagement/faces/PimDashboardUiShellPage?_afrLoop=1398820150000&_afrWindowMode=0&_adf.ctrl-state=a44e7uxcc_13 HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: fr
UA-CPU: x86
...
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds
. Stack trace:
Thread-164 "[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
    jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
    jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
    java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
    java.net.SocketInputStream.read(SocketInputStream.java:107)
...

In this example, the request has been running longer than the configured 600 seconds. Here is the associated stack trace showing the thread is stuck:

Thread-164 "[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
java.net.SocketInputStream.read(SocketInputStream.java:107)
...

Solution

If the stack shows the thread is waiting for a response from another server, check the status of the other server and see it has performance problems before proceeding with this solution.

To determine what the stuck thread was doing prior to becoming stuck, perform the following steps:

  1. Look at the next few log messages in server_name.outfor a message indicating an incident has been created. For example:

    <Mar 4, 2011 7:44:10 AM PST> <Alert> <Diagnostics> <BEA-320016>  <Creating diagnostic image in DOMAIN_HOME/servers /ProductManagementServer_1/adr/diag/ofm/SCMDomain/ ProductManagementServer_1/incident/incdir_394 with a lockout minute  period of 1.>
    

    The above message may not always appear after each stuck thread reported. It is printed at most four times an hour. If the message does not appear, manually look for the incident directory by checking the readme file in the subdirectories under the following directories:

    (UNIX) DOMAIN_HOME/servers/server_name/adr/diag/ofm/domain_name/server_name/incident
    (Windows) DOMAIN_HOME\servers\server_name\adr\diag\ofm\domain_name\server_name\incident
    

    The incident directory contains a WLDF diagnostic image which contains the JFR recording, and a file containing the thread dump

    For more information about diagnosing incidents, see the "Diagnosing Problems" chapter in the Oracle Fusion Middleware Administrator's Guide.

  2. Review thread dump to see call stack of the thread. If thread is blocked waiting for lock, check what the thread holding the lock is doing.

  3. If call stack involves executing JDBC calls, you can go to Grid Control and check the top activity around that time window, and see if there is a session with a matching module and action. See Section 11.4.

  4. Review the JRockit flight recording file JRockitFlightRecorder.jfr for more details. You will also need the ECID of the request which is recorded in the readme.txt file of the incident directory, and also the Oracle WebLogic Server log.

  5. Perform the tasks in Section 24.4.

24.4 View Detailed Timing Of a Request Using a JRockit Flight Recorder (JFR) File

Problem

Certain requests are slow and there is a need to find out where time is spent

Solution

The JRockit Flight Recorder (JFR) file contains a record of various events that consume time, and can be used to help understand why a request is taking time

To resolve this problem, create a JFR file:

  1. Extract a JFR file from an Oracle WebLogic Server server by running the following command:

    (UNIX) JROCKIT_HOME/bin/jrcmd jrockit_pid dump_flightrecording recording=1 copy_to_file=path compress_copy=true
    (Windows) JROCKIT_HOME\bin\jrcmd.exe jrockit_pid dump_flightrecording recording=1 copy_to_file=path compress_copy=true
    

    See the "Running Diagnostic Commands" chapter in the Oracle JRockit JDK Tools Guide for more information about the jrcmd command-line tool.

  2. To view the file, start the JRockit Mission Control Client from the following directories:

    (UNIX) JAVA_HOME/bin/bin/jrmc
    (Windows) JAVA_HOME\bin\jrmc.exe
    
  3. Choose File > Open File to select the JFR file.

  4. Locate the slowest requests or investigate a specific request:

    To locate the slowest requests: To investigate a specific request:
    1. In the JRockitFlightRecorder.jfr page, click the Events icon.
    2. Click the Log tab at the bottom of the page.

    3. In the Event Type navigation pane on the left, locate Dynamic Monitoring System and then HttpRequest.

    4. Click HTTP request; de-select all the other event types.

    5. In the Log tab, in the Event Log section, click the Duration column to sort the duration in descending order.

      Each row corresponds to a HTTP Request and the duration column shows the response time for that request.

    6. Click the row in the table to view the attributes of the requests.

    7. In the Event Attributes sections, note the start time and the thread that serviced the request.

    1. Find the Execution Context Identifier (ECID) of that request.

      If the request is related to an incident triggered by a STUCK thread, the incident readme.txt file will contain the ECID.

      Alternatively, you can search the Oracle WebLogic Server HTTP access.log for requests from specific users. See the "Viewing and Searching Log Files" section in the Oracle Fusion Middleware Administrator's Guide.

    2. In the JRockit Mission Control Client, in the JRockitFlightRecorder.jfr page, choose the WebLogic icon, and then

      If the Weblogic icon is not available, choose Help > Install Plugins to download the Oracle WebLogic Server plug-in.

    3. Click the ECIDs tab at the bottom of the age.

    4. In the ECIDs section, from Filter Column list, select ECID.

    5. Enter the ECID in the search box and choose <Enter>.

    6. In the results table, highlight the row with the matching ECID and right-click to bring up the menu.

    7. Choose Operative Set > Clear, and then Operative Set > Add matching ECID > ECID to add the ECID to the operative set.

      This enables users to view only events associated with the operative set.

    8. Click the Events icon.

    9. In the Event Type navigation pane on the left, locate Dynamic Monitoring System and then HttpRequest.

    10. Click HTTP request; de-select all the other event types. ** In the Event Log section, click Show Only Operative Set.

      Each row corresponds to the request with the matching ECID

    11. Click the row in the table to view the attributes of the requests.

    12. Note the start time and the thread that serviced the request


  5. Once the start time and the thread that serviced the request are identified, in the Logs tab, drag the time selector at the top of the screen to include only the time window for the duration of the request.

  6. In the Event Log section, perform the following search:

    1. Deselect Show Only Operative Set.

    2. Enter the thread name in the search box.

    3. From the Filter Column list, select Thread.

    4. Choose <Enter>.

  7. In the Event Type navigation pane on the left, click the events of interest. Typically, these events are located under nodes Dynamic Monitoring System, Java Application, and WebLogic > JDBC.

    The selected events appear in the table in the Event Log section.

  8. Click the Start Time column to sort y the time when these events occur, or click the Duration column to view the events that took longest

    The JDBC Statement Execute events corresponds to SQL execution. If there are slow SQLs, the event details give the SQL text. These events do not have callstacks.

  9. To see to callstack for slow SQLs, view the Socket Read event that happens right after the JDBC Statement Execute event.

    This event corresponds to Oracle WebLogic Server waiting for the SQL results to return, and it has callstack in the event details.

  10. Review the callstacks for long Java Blocked and Java Wait events to see if the cause can be identified. See the "Analyzing Flight Recorder Data in JRockit Mission Control" section in the Oracle Fusion Middleware Configuring and Using the Diagnostics Framework for Oracle WebLogic Server.

  11. If more details are needed to compare with what is captured in the default recording, and the user can reproduce the slowness, start an explicit recording. See the "Starting an Explicit Recording" section in the Oracle JRockit Flight Recorder Run Time Guide.

24.5 Memory Leaks and Heap Usage Pressure

Problem

Application performance degrades over time, heap usage and garbage collection activity increases overtime, sometimes OutOfMemoryErrors are seen.There could be memory leaks in the application, which causes the amount of free memory in the JVM to continuously decrease.

Solution

To solve this problem, perform the following:

  1. Review the server_name.out file for OutOfMemoryErrors errors, which indicate a heap dump file has been written. The server_name.out file is located in the following directories:

    (UNIX) DOMAIN_HOME/servers/server_name/logs
    (Windows) DOMAIN_HOME\servers\server_name\logs
    
  2. Restart the Managed Server.

    See the following documentation resources to learn more about other methods for starting and stopping the Managed Servers:

    If the problem persists, proceed to Step 3.

  3. Open the file with a heap-dump analysis tool that can handle binary HPROF format, such as Eclipse Memory Analyzer.

  4. Review what objects and classes are retaining most memory. Send the heap dump file to Oracle Support for further analysis.

  5. Sometimes it may be necessary to take several heap dumps to see what objects or classes are consuming and increasing the amount of memory.

    To take heap dumps on demand, use the jrcmd command-line tool. See the "Running Diagnostic Commands" chapter in the Oracle JRockit JDK Tools Guide. Many heap dump analysis tools, such as Eclipse Memory Analyzer, enable you to compare two heap dumps to identify memory growth areas.

    Heap dumps provide information on why memory is retained. Sometimes it is necessary to know how memory is allocated to further resolve the issue. For these cases, proceed to Step 6.

  6. Use the JRockit Memory Leak Detector tool that is part of JRockit Mission Control Client to understand how memory is allocated.

    For more information, see the JRockit Mission Control online help.

24.6 Connection Usage

Problem

The connection usage on the Oracle Database is high, or there is an Oracle process on the database host consuming high amount of CPU.

Solution

To find out the source of the connection causing the high CPU on To adjust the reference pool size from Fusion Applications Control:

  1. Oracle Fusion Applications set values on a number of v$session attributes to indicate how the connection is being used. When looking at a connection consuming high CPU on the database, or when trying to understand what connections are used for what processes, inspect the value of these attributes as follows:

    Attribute in v$session Value Being Set
    Process Data Source Name (for example, ApplicationDB)
    Program Oracle WebLogic Server Domain plus the Managed Server name, prefixed by DS (for example, DS/FinancialDomain/AccountsReceivableServer_1)
    Module Oracle Application Development Framework: ADF BC application module name

    Oracle Enterprise Scheduler:

    • Java job type: Class name, except oracle.apps

    • PLSQL: the package and procedure name (for example, mypkg.myproc)

    • Other jobs: Static: Executable name

    Oracle BI Publisher: Name of the report

    Action Oracle Application Development Framework: jspx name

    Oracle Enterprise Scheduler: Job definition name

    Oracle BI Publisher, if request is submitted:

    • Oracle Enterprise Scheduler: Oracle Enterprise Scheduler job definition name

    • Oracle BI Publisher Scheduler Job: Oracle BI Publisher job name submitted by the user

    • Oracle BI Publisher online: Static string BIP:Online

    • Oracle BI Publisher Web services: Name of the web services

    Client_Identifier Application User Name

  2. If the error messages related to connection pool capacity being reached are also seen in Oracle WebLogic Server logs, use the solution for connection leaks described in Section 24.7.

24.7 Connection Leaks

Problem

When there are errors in the log, and the error message indicates connection pool size has been reached

Solution

To resolve this problem:

  1. When data source is at maximum capacity and there are errors during connection reservation requests, then there may be connection leaks in the code

  2. Enable JDBC profiling from the Oracle WebLogic Server Administration Console:

    1. In the Domain Structure, expand Services and then Data Sources.

    2. Click on the data source that needs to profiled, for example, ApplicationDB.

    3. In the Settings page, click on the Configuration tab, then click on Diagnostics subtab.

    4. Check the profiles that need to be collected (PROFILE_TYPE_CONN_USAGE_STR).

    5. Click Save.

  3. Configure the diagnostic archive where the profiling data is saved from the Oracle WebLogic Server Administration Console:

    1. In the Domain Structure, expand Services, Diagnostics, and then Archives.

    2. Click on the server where you want to make changes (archives are stored for each server)

    3. In the Settings page, you can change archive location, size and how to retire data.

    4. Check the profiles that need to be collected (PROFILE_TYPE_CONN_USAGE_STR).

    5. Click Save.

  4. To retrieve profiling data, use the sample code (http://download.oracle.com/docs/cd/E15051_01/wls/docs103/wldf_configuring/access_diag_data.html#wp1100898), with changes to the URL, username and password in the initialize method.

  5. Run the sample code as a standalone program.

  6. The program will capture the stack trace for each request for a connection from that data source. Inspect the callers to see the suspicious stack. This sample program requires connecting to a live Oracle WebLogic Server instance.

    The diagnostic archive file under the archive location can also be provided to Oracle Support for further analysis.

    Oracle WebLogic Server will not report a leak unless inactive connection timeout connection pool setting is set to a positive value. This cannot be done for Oracle Fusion Applications, as it will break functionality.

24.8 Slow Requests Using SQL trace

Solution

When a user reports that a specific operation is slow, and the slowness is reproducible and that slow database operations are suspected but the top activity reports did not provide sufficient information for resolving the problem.

Solution

To resolve this problem:

  1. Enable SQL trace for the user session. See Section 12.2.4.6.

  2. Ask user to re-run the problematic flow and collect the SQL trace files and review

24.9 Excessive Activation

Problem

When response time suddenly increases with rising user count, even though there is no memory pressure, it is possible that the reference pool size for key application modules needs to be increased. If there is a JFR recording to review, and you observe many events containing callstacks containing the activateState method, you should also try adjusting the reference pool size.

Solution

To adjust the reference pool size from Fusion Applications Control:

  1. Review the number of web sessions from Performance Summary pages:

    For Fusion Applications Control:

    1. From the navigation pane, expand the farm, Application Deployments.

    2. From the Applications Deployments page, select the application.

    3. From the Application Deployments menu, choose ADF > ADF Performance.

      The ADF Performance page displays.

    4. Click the Application Module Pools tab.

    5. Sort the request by descending order.

    6. For the top 10 or so application modules, click the application module name to view the Activations count.

    For Grid Control:

    1. Click the Targets tab.

    2. Click the Middleware secondary tab.

    3. From the Search list, select Oracle WebLogic Server Domain, and then click Go.

    4. Click on a domain.

      The WebLogic Server Domain home page displays.

    5. In the table on the right-hand side of the page, expand the Application Deployments node.

    6. Click the target application.

      The Application Deployment page displays.

    7. From the Application Deployments menu, choose ADF > ADF Performance.

      The ADF Performance page displays.

    8. In the Application Module Pools table, from the View list, select Total Requests, and once selected, from the Total Requests column, click Sort Descending.

    9. For the top 10 or so application modules, click to see the details of each one.

    10. After selecting an application module, on the Requests graph, from the Select metric to display in chart list, select Passivation and Activation to add to the graph.

      If activation count is close to passivation and is constantly above 0, then following Step 2 to adjust

  2. If the activation count constantly increases, increase the application module reference pool size from Fusion Applications Control:

    1. From the Application Deployments menu, choose ADF > Configure ADF Business Components.

      The ADF Configuration BC Configurations page displays.

    2. From the Application Modules section, click the application module of interest. From the left hand side, select the local by selecting a name that ends in Local.

    3. Click the Pooling and Scalability tab, and change the Reference Pool Size parameter.

24.10 Slow Oracle Enterprise Scheduler Jobs of SQL Type

Problem

When the user submits a SQL job type, the job remains in a RUNNING state for too long.

Solution

To resolve this problem, perform the following steps:

  1. Use Fusion Applications Control to find the database session ID that was used to process the job:

    1. Search for the request, as described in Section 5.7.2.

    2. On the Request Details page, in the Request Properties section, next to the Execution Type field, click the eye glasses icon.

      The Spawned Process Details dialog displays. This will bring up a pop-up with the database session id that was used to process this job

    3. Take note of the value in the Session Id field, and then click OK.

  2. Use Grid Control to ensure the request processor and request dispatcher are running:

    1. Run an Active Session History (ASH) report for the session within the relevant time window to inspect top SQLs and top wait events. See the "Resolving Transient Performance Problems" section in the Oracle Database 2 Day + Performance Tuning Guide.

    2. Identify time consuming SQLs and tune following normal SQL tuning procedures. See Section 11.4.

24.11 Using My Oracle Support for Additional Troubleshooting Information

You can use My Oracle Support (formerly MetaLink) to help resolve Oracle Fusion Applications problems. My Oracle Support contains several useful troubleshooting resources, such as:

Note:

You can also use My Oracle Support to log a service request.

You can access My Oracle Support at https://support.oracle.com.