Oracle® Fusion Applications Administrator's Guide 11g Release 1 (11.1.1.5) Part Number E14496-02 |
|
|
View PDF |
This chapter describes performance, scalability, and reliability issues that you might encounter and explains how to solve them.
This chapter includes the following topics:
Introduction to Troubleshooting Performance, Reliability, and Scalability
View Detailed Timing Of a Request Using a JRockit Flight Recorder (JFR) File
Using My Oracle Support for Additional Troubleshooting Information
This section provides guidelines and a process for using the information in this chapter. Using the following guidelines and process will focus and minimize the time you spend resolving problems.
Guidelines
When using the information in this chapter, Oracle recommends:
After performing any of the solution procedures in this chapter, immediately retrying the failed task that led you to this troubleshooting information. If the task still fails when you retry it, perform a different solution procedure in this chapter and then try the failed task again. Repeat this process until you resolve the problem.
Making notes about the solution procedures you perform, symptoms you see, and data you collect while troubleshooting. If you cannot resolve the problem using the information in this chapter and you must log a service request, the notes you make will expedite the process of solving the problem.
Process
Follow the process outlined in Table 24-1 when using the information in this guide. If the information in a particular section does not resolve your problem, proceed to the next step in this process.
Table 24-1 Process for Resolving Performance Issues
Step | Section to Use | Purpose |
---|---|---|
1 |
Chapter 9 and Chapter 10 for locating key metrics Chapter 11 for diagnosing Java applications in the middle tier Chapter 12 for monitoring and tuning the Oracle database |
Collect symptoms about the performance problem to determine if the problem is related the following:
Determine what changed since the system was last performing well. |
2 |
Section 24.2 through Section 24.10 |
Use Section 24.2 if the problem is widespread. Otherwise, review the problem description in Section 24.3 through Section 24.10 to see if there is a match These section describes:
|
4 |
Use My Oracle Support to get additional troubleshooting information about Oracle Fusion Applications or performance, scalability, and reliability. My Oracle Support provides access to several useful troubleshooting resources, including Knowledge Base articles and Community Forums and Discussions. |
|
5 |
Log a service request if the information in this chapter and My Oracle Support does not resolve your problem. You can log a service request using My Oracle Support at |
In addition to this process, for more information about determining if database cache sizes need to be increased, see "Automatic Database Performance Monitoring" chapter in the Oracle Database 2 Day + Performance Tuning Guide to use Automatic Database Diagnostic Monitor (ADDM) reports
Problem
When many users report slowness across many business flows, you need to determine if the cause is the host, components, heap usage, web sessions, or too many users accessing the application at one time.
Solution
To resolve this problem, perform the following steps:
From Grid Control, check host health.
For example, you can view the host CPU usage from Performance Summary page. To access this page from Grid Control:
From the home page, click the Targets tab.
From the Targets tab, click the Hosts secondary tab.
The Hosts page displays the overall status of all the computers in the environment.
From the Search list, search for specific host.
Click a specific host name to monitor the performance.
The Host home page displays.
Click the Performance tab.
Check if any key components are down:
For Fusion Applications Control, see Section 10.2.1 and Section 10.2.2.
For Grid Control, see Section 2.7.
Check heap usage in EM Performance Summary page.
For Fusion Applications Control, see "Monitoring the Oracle Fusion Applications Middle Tier" chapter in the Oracle Fusion Middleware Administrator's Guide and Table 10-6.
For Grid Control, see Table 10-9.
If the heap is constantly close to 100 percent, then search with the string OutOfMemoryErrors
in the Oracle WebLogic Server server_name
.out
in the following directories:
(UNIX) DOMAIN_HOME/servers/server_name/logs (Windows) DOMAIN_HOME\servers\server_name\logs
If there are OutOfMemoryErrors
, a heap dump would have been generated in the directory specified by the -DHeapDumpPath
parameter from the Oracle WebLogic Server startup JVM option. Submit the heap dump to Oracle Support for further analysis of what is retaining memory.
Use Fusion Applications Control to view log messages, and see if there are a lot of messages getting logged and incidents getting raised. See Chapter 13 for and Chapter 17 for information about Oracle Fusion Applications log file, and see the "Managing Log Files and Diagnostic Data" chapter in Oracle Fusion Middleware Administrator's Guide for information about Oracle Fusion Middleware log files.
Address the source of the errors and incidents, and verify the log level is set to SEVERE
.
Determine the number of web sessions in Fusion Applications Control or Grid Control. This number can fluctuate depending on the flow and heap size. Therefore, monitoring the trend can help you find spikes.
For Fusion Applications Control:
View the number of active sessions for a specific Oracle Fusion application product:
From the navigation pane, expand the product family, then Products, and then select the product.
In the Product home page, locate the Servers sections.
View the Active Sessions metric.
View the number of active sessions for an Oracle WebLogic Server domain:
From the navigation pane, expand the farm and then WebLogic Domain.
You can similarly check the active session for an Oracle WebLogic Server cluster or Managed Server from the navigation pane.
Select a domain.
In the WebLogic Domain home page, in the table on the left-hand side of the page, view the Active Sessions column.
For Grid Control:
Click the Targets tab.
Click the Middleware secondary tab.
From the Search list, select Oracle WebLogic Server Domain, and then click Go.
You can similarly check the active sessions for an Oracle WebLogic Server cluster or Managed Server by selecting Oracle WebLogic Server Cluster or Oracle WebLogic Managed Server.
Click on a domain.
The WebLogic Server Domain home page displays.
In the table on the left-hand side of the page, view the Active Sessions column.
In the Product home page, locate the Servers sections.
View the Active Sessions metric.
If there is a spike in the number of web sessions, find out what is generating the additional load and what tests are being run around that time. By default, the session timeout is set to 15 minutes.
Check data source health and see if it is running out of connection.
For Fusion Applications Control, see Section 10.4.1.
For Grid Control, check the Server Datasource metrics, as described in Section 10.5.2.
Review current execution stacks for the Oracle WebLogic Server threads. There are several ways to perform this step:
Oracle WebLogic Server Administration Console:
In the Domain Structure, expand Environment and then Servers.
In the Summary of Servers page, click the server from the table.
In the Settings for server_name page, click on the Monitoring tab, then click on Threads subtab.
Click Dump Thread Stacks.
Click Save.
For Grid Control:
Click the Targets tab.
Click the Middleware secondary tab.
From the Search list, select Oracle WebLogic Managed Server, and then click Go.
You can similarly check the active sessions for an Oracle WebLogic Server cluster or Managed Server by selecting Oracle WebLogic Server Cluster or Oracle WebLogic Managed Server.
Click on a specific server having problems.
The WebLogic Server home page displays.
From the WebLogic Server menu, choose JVM Diagnostics > Threads > Real-Time Analysis.
In the JVMs section, click on a thread in the upper section show details in the Threads section.
If the threads are not blocked, follow the instructions in Section 11.3 to review the top Java methods and Section 11.4 to review the top SQL using JVM diagnostics.
Extract JFR recording and review timing breakdown of slow requests. See Section 24.4.
Problem
Stuck threads may result if the server is nearing out of memory. If the server is close to out of memory, all requests should slow down. To resolve an out-of-memory issue, see Section 24.4.
If a request is taking longer than 10 minutes, the stuck thread is reported to Oracle WebLogic Server server_name
.out
in the following directories:
(UNIX) DOMAIN_HOME/servers/server_name/logs (Windows) DOMAIN_HOME\servers\server_name\logs
For example:
<Mar 4, 2011 7:44:08 AM PST> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "600" seconds working on the request "weblogic.servlet.internal.ServletRequestImpl@18986012[ GET /productManagement/faces/PimDashboardUiShellPage?_afrLoop=1398820150000&_afrWindowMode=0&_adf.ctrl-state=a44e7uxcc_13 HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */* Accept-Language: fr UA-CPU: x86 ... ]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds . Stack trace: Thread-164 "[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> { jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???) jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24) java.net.SocketInputStream.socketRead0(SocketInputStream.java:???) java.net.SocketInputStream.read(SocketInputStream.java:107) ...
In this example, the request has been running longer than the configured 600 seconds. Here is the associated stack trace showing the thread is stuck:
Thread-164 "[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> { jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???) jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24) java.net.SocketInputStream.socketRead0(SocketInputStream.java:???) java.net.SocketInputStream.read(SocketInputStream.java:107) ...
Solution
If the stack shows the thread is waiting for a response from another server, check the status of the other server and see it has performance problems before proceeding with this solution.
To determine what the stuck thread was doing prior to becoming stuck, perform the following steps:
Look at the next few log messages in server_name
.out
for a message indicating an incident has been created. For example:
<Mar 4, 2011 7:44:10 AM PST> <Alert> <Diagnostics> <BEA-320016> <Creating diagnostic image in DOMAIN_HOME/servers /ProductManagementServer_1/adr/diag/ofm/SCMDomain/ ProductManagementServer_1/incident/incdir_394 with a lockout minute period of 1.>
The above message may not always appear after each stuck thread reported. It is printed at most four times an hour. If the message does not appear, manually look for the incident
directory by checking the readme
file in the subdirectories under the following directories:
(UNIX) DOMAIN_HOME/servers/server_name/adr/diag/ofm/domain_name/server_name/incident (Windows) DOMAIN_HOME\servers\server_name\adr\diag\ofm\domain_name\server_name\incident
The incident directory contains a WLDF diagnostic image which contains the JFR recording, and a file containing the thread dump
For more information about diagnosing incidents, see the "Diagnosing Problems" chapter in the Oracle Fusion Middleware Administrator's Guide.
Review thread dump to see call stack of the thread. If thread is blocked waiting for lock, check what the thread holding the lock is doing.
If call stack involves executing JDBC calls, you can go to Grid Control and check the top activity around that time window, and see if there is a session with a matching module and action. See Section 11.4.
Review the JRockit flight recording file JRockitFlightRecorder.jfr
for more details. You will also need the ECID of the request which is recorded in the readme.txt
file of the incident directory, and also the Oracle WebLogic Server log.
Perform the tasks in Section 24.4.
Problem
Certain requests are slow and there is a need to find out where time is spent
Solution
The JRockit Flight Recorder (JFR) file contains a record of various events that consume time, and can be used to help understand why a request is taking time
To resolve this problem, create a JFR file:
Extract a JFR file from an Oracle WebLogic Server server by running the following command:
(UNIX) JROCKIT_HOME/bin/jrcmd jrockit_pid dump_flightrecording recording=1 copy_to_file=path compress_copy=true (Windows) JROCKIT_HOME\bin\jrcmd.exe jrockit_pid dump_flightrecording recording=1 copy_to_file=path compress_copy=true
See the "Running Diagnostic Commands" chapter in the Oracle JRockit JDK Tools Guide for more information about the jrcmd
command-line tool.
To view the file, start the JRockit Mission Control Client from the following directories:
(UNIX) JAVA_HOME/bin/bin/jrmc (Windows) JAVA_HOME\bin\jrmc.exe
Choose File > Open File to select the JFR file.
Locate the slowest requests or investigate a specific request:
To locate the slowest requests: | To investigate a specific request: |
---|---|
|
|
Once the start time and the thread that serviced the request are identified, in the Logs tab, drag the time selector at the top of the screen to include only the time window for the duration of the request.
In the Event Log section, perform the following search:
Deselect Show Only Operative Set.
Enter the thread name in the search box.
From the Filter Column list, select Thread.
Choose <Enter>.
In the Event Type navigation pane on the left, click the events of interest. Typically, these events are located under nodes Dynamic Monitoring System, Java Application, and WebLogic > JDBC.
The selected events appear in the table in the Event Log section.
Click the Start Time column to sort y the time when these events occur, or click the Duration column to view the events that took longest
The JDBC Statement Execute events corresponds to SQL execution. If there are slow SQLs, the event details give the SQL text. These events do not have callstacks.
To see to callstack for slow SQLs, view the Socket Read event that happens right after the JDBC Statement Execute event.
This event corresponds to Oracle WebLogic Server waiting for the SQL results to return, and it has callstack in the event details.
Review the callstacks for long Java Blocked and Java Wait events to see if the cause can be identified. See the "Analyzing Flight Recorder Data in JRockit Mission Control" section in the Oracle Fusion Middleware Configuring and Using the Diagnostics Framework for Oracle WebLogic Server.
If more details are needed to compare with what is captured in the default recording, and the user can reproduce the slowness, start an explicit recording. See the "Starting an Explicit Recording" section in the Oracle JRockit Flight Recorder Run Time Guide.
Problem
Application performance degrades over time, heap usage and garbage collection activity increases overtime, sometimes OutOfMemoryErrors
are seen.There could be memory leaks in the application, which causes the amount of free memory in the JVM to continuously decrease.
Solution
To solve this problem, perform the following:
Review the server_name.out
file for OutOfMemoryErrors
errors, which indicate a heap dump file has been written. The server_name.out
file is located in the following directories:
(UNIX) DOMAIN_HOME/servers/server_name/logs (Windows) DOMAIN_HOME\servers\server_name\logs
Restart the Managed Server.
See the following documentation resources to learn more about other methods for starting and stopping the Managed Servers:
"Starting Managed Servers with a Startup Script" section in Oracle Fusion Middleware Managing Server Startup and Shutdown for Oracle WebLogic Server
"Starting Managed Servers with the java weblogic.Server Command" section in the Oracle Fusion Middleware Managing Server Startup and Shutdown for Oracle WebLogic Server
"Starting and Stopping Managed Servers Using Fusion Middleware Control" section in Oracle Fusion Middleware Managing Server Startup and Shutdown for Oracle WebLogic Server
"Start and Stop Servers" and various startup and shutdown procedures in the Cluster section of the Administration Console Online Help.
If the problem persists, proceed to Step 3.
Open the file with a heap-dump analysis tool that can handle binary HPROF
format, such as Eclipse Memory Analyzer.
Review what objects and classes are retaining most memory. Send the heap dump file to Oracle Support for further analysis.
Sometimes it may be necessary to take several heap dumps to see what objects or classes are consuming and increasing the amount of memory.
To take heap dumps on demand, use the jrcmd
command-line tool. See the "Running Diagnostic Commands" chapter in the Oracle JRockit JDK Tools Guide. Many heap dump analysis tools, such as Eclipse Memory Analyzer, enable you to compare two heap dumps to identify memory growth areas.
Heap dumps provide information on why memory is retained. Sometimes it is necessary to know how memory is allocated to further resolve the issue. For these cases, proceed to Step 6.
Use the JRockit Memory Leak Detector tool that is part of JRockit Mission Control Client to understand how memory is allocated.
For more information, see the JRockit Mission Control online help.
Problem
The connection usage on the Oracle Database is high, or there is an Oracle process on the database host consuming high amount of CPU.
Solution
To find out the source of the connection causing the high CPU on To adjust the reference pool size from Fusion Applications Control:
Oracle Fusion Applications set values on a number of v$session
attributes to indicate how the connection is being used. When looking at a connection consuming high CPU on the database, or when trying to understand what connections are used for what processes, inspect the value of these attributes as follows:
Attribute in v$session | Value Being Set |
---|---|
Process |
Data Source Name (for example, ApplicationDB) |
Program |
Oracle WebLogic Server Domain plus the Managed Server name, prefixed by DS (for example, DS/FinancialDomain/AccountsReceivableServer_1 ) |
Module |
Oracle Application Development Framework: ADF BC application module name
Oracle Enterprise Scheduler:
Oracle BI Publisher: Name of the report |
Action |
Oracle Application Development Framework: jspx name
Oracle Enterprise Scheduler: Job definition name Oracle BI Publisher, if request is submitted:
|
Client_Identifier |
Application User Name |
If the error messages related to connection pool capacity being reached are also seen in Oracle WebLogic Server logs, use the solution for connection leaks described in Section 24.7.
Problem
When there are errors in the log, and the error message indicates connection pool size has been reached
Solution
To resolve this problem:
When data source is at maximum capacity and there are errors during connection reservation requests, then there may be connection leaks in the code
Enable JDBC profiling from the Oracle WebLogic Server Administration Console:
In the Domain Structure, expand Services and then Data Sources.
Click on the data source that needs to profiled, for example, ApplicationDB.
In the Settings page, click on the Configuration tab, then click on Diagnostics subtab.
Check the profiles that need to be collected (PROFILE_TYPE_CONN_USAGE_STR).
Click Save.
Configure the diagnostic archive where the profiling data is saved from the Oracle WebLogic Server Administration Console:
In the Domain Structure, expand Services, Diagnostics, and then Archives.
Click on the server where you want to make changes (archives are stored for each server)
In the Settings page, you can change archive location, size and how to retire data.
Check the profiles that need to be collected (PROFILE_TYPE_CONN_USAGE_STR).
Click Save.
To retrieve profiling data, use the sample code (http://download.oracle.com/docs/cd/E15051_01/wls/docs103/wldf_configuring/access_diag_data.html#wp1100898
), with changes to the URL, username and password in the initialize method.
Run the sample code as a standalone program.
The program will capture the stack trace for each request for a connection from that data source. Inspect the callers to see the suspicious stack. This sample program requires connecting to a live Oracle WebLogic Server instance.
The diagnostic archive file under the archive location can also be provided to Oracle Support for further analysis.
Oracle WebLogic Server will not report a leak unless inactive connection timeout connection pool setting is set to a positive value. This cannot be done for Oracle Fusion Applications, as it will break functionality.
Solution
When a user reports that a specific operation is slow, and the slowness is reproducible and that slow database operations are suspected but the top activity reports did not provide sufficient information for resolving the problem.
Solution
To resolve this problem:
Enable SQL trace for the user session. See Section 12.2.4.6.
Ask user to re-run the problematic flow and collect the SQL trace files and review
Problem
When response time suddenly increases with rising user count, even though there is no memory pressure, it is possible that the reference pool size for key application modules needs to be increased. If there is a JFR recording to review, and you observe many events containing callstacks containing the activateState
method, you should also try adjusting the reference pool size.
Solution
To adjust the reference pool size from Fusion Applications Control:
Review the number of web sessions from Performance Summary pages:
For Fusion Applications Control:
From the navigation pane, expand the farm, Application Deployments.
From the Applications Deployments page, select the application.
From the Application Deployments menu, choose ADF > ADF Performance.
The ADF Performance page displays.
Click the Application Module Pools tab.
Sort the request by descending order.
For the top 10 or so application modules, click the application module name to view the Activations count.
For Grid Control:
Click the Targets tab.
Click the Middleware secondary tab.
From the Search list, select Oracle WebLogic Server Domain, and then click Go.
Click on a domain.
The WebLogic Server Domain home page displays.
In the table on the right-hand side of the page, expand the Application Deployments node.
Click the target application.
The Application Deployment page displays.
From the Application Deployments menu, choose ADF > ADF Performance.
The ADF Performance page displays.
In the Application Module Pools table, from the View list, select Total Requests, and once selected, from the Total Requests column, click Sort Descending.
For the top 10 or so application modules, click to see the details of each one.
After selecting an application module, on the Requests graph, from the Select metric to display in chart list, select Passivation and Activation to add to the graph.
If activation count is close to passivation and is constantly above 0, then following Step 2 to adjust
If the activation count constantly increases, increase the application module reference pool size from Fusion Applications Control:
From the Application Deployments menu, choose ADF > Configure ADF Business Components.
The ADF Configuration BC Configurations page displays.
From the Application Modules section, click the application module of interest. From the left hand side, select the local by selecting a name that ends in Local.
Click the Pooling and Scalability tab, and change the Reference Pool Size parameter.
Problem
When the user submits a SQL job type, the job remains in a RUNNING
state for too long.
Solution
To resolve this problem, perform the following steps:
Use Fusion Applications Control to find the database session ID that was used to process the job:
Search for the request, as described in Section 5.7.2.
On the Request Details page, in the Request Properties section, next to the Execution Type field, click the eye glasses icon.
The Spawned Process Details dialog displays. This will bring up a pop-up with the database session id that was used to process this job
Take note of the value in the Session Id field, and then click OK.
Use Grid Control to ensure the request processor and request dispatcher are running:
Run an Active Session History (ASH) report for the session within the relevant time window to inspect top SQLs and top wait events. See the "Resolving Transient Performance Problems" section in the Oracle Database 2 Day + Performance Tuning Guide.
Identify time consuming SQLs and tune following normal SQL tuning procedures. See Section 11.4.
You can use My Oracle Support (formerly MetaLink) to help resolve Oracle Fusion Applications problems. My Oracle Support contains several useful troubleshooting resources, such as:
Knowledge base articles
Community forums and discussions
Patches and upgrades
Certification information
Note:
You can also use My Oracle Support to log a service request.You can access My Oracle Support at https://support.oracle.com
.