13 Diagnosing Problems
- About the Diagnostic Framework
Oracle Fusion Middleware includes a Diagnostic Framework, which aids in detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors, such as those caused by code bugs, metadata corruption, customer data corruption, deadlocked threads, and inconsistent state. - How the Diagnostic Framework Works
The Diagnostic Framework is active in each server and provides automatic error detection through predefined configured rules. Oracle Fusion Middleware components and applications automatically benefit from this always-on checking. - Configuring the Diagnostic Framework
You can configure some settings for the Diagnostic Framework, including custom diagnostic rules and problem suppression. In addition, you can configure an WLDF Policies and Actions to create an incident. - Investigating, Reporting, and Solving a Problem
You can use WLST and ADRCI commands and the Remote Diagnostic Agent (RDA) to investigate and report a problem (critical error), and in some cases, resolve the problem.
Parent topic: Monitoring Oracle Fusion Middleware
About the Diagnostic Framework
Oracle Fusion Middleware includes a Diagnostic Framework, which aids in detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors, such as those caused by code bugs, metadata corruption, customer data corruption, deadlocked threads, and inconsistent state.
When a critical error occurs, the Diagnostic Framework assigns it an incident number, and diagnostic data for the error (such as log files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR), where it can later be retrieved by incident number and analyzed.
The goals of the Diagnostic Framework are:
-
First-failure diagnosis
-
Limiting damage and interruptions after a problem is detected
-
Reducing problem diagnostic time
-
Reducing problem resolution time
-
Simplifying customer interaction with Oracle Support
The Diagnostic Framework includes the following technologies:
-
Automatic capture of diagnostic data upon first failure: For critical errors, the ability to capture error information at first failure greatly increases the chance of a quick problem resolution and reduced downtime. The Diagnostic Framework automatically collects diagnostics, such as thread dumps, DMS metric dumps, and WebLogic Diagnostics Framework (WLDF) server image dumps. Such diagnostic data is similar to the data collected by airplane "black box" flight recorders. When a problem is detected, alerts are generated and the fault diagnosability infrastructure is activated to capture and store diagnostic data. The data is stored in a file-based repository and is accessible with command-line utilities.
-
Standardized log formats: Standardized log formats (using the ODL log file format) across all Oracle Fusion Middleware components allows administrators and Oracle Support personnel to use a single set of tools for problem analysis. Problems are more easily diagnosed, and downtime is reduced.
-
Diagnostic rules: Each component defines diagnostic rules that are used to evaluate whether a given log message should result in an incident being created and which dumps should be executed. The diagnostic rules also indicate whether an individual dump should be created synchronously or asynchronously.
In addition, you can define custom rules that apply to a domain, a server, or an application in a domain or server.
-
Incident detection log filter: The incident detection log filter implements the java.util.logging filter. It inspects each log message to see if an incident should be created, basing its decision on the diagnostic rules for components and applications.
-
Incident packaging service (IPS) and incident packages: The IPS enables you to automatically and easily gather the diagnostic data—log files, dumps, reports, and more—pertaining to a critical error that has a corresponding incident, and package the data into a zip file for transmission to Oracle Support. All diagnostic data relating to a critical error that has been detected by the Diagnostics Framework is captured and stored as an incident in ADR. The incident packaging service identifies the required files automatically and adds them to the zip file.
Before creating the zip file, the IPS first collects diagnostic data into an intermediate logical structure called an incident package. Packages are stored in the Automatic Diagnostic Repository. If you choose to, you can access this intermediate logical structure, view and modify its contents, add or remove additional diagnostic data at any time, and when you are ready, create the zip file from the package and upload it to Oracle Support.
-
Integration with WebLogic Diagnostics Framework (WLDF): The Oracle Fusion Middleware Diagnostics Framework integrates with some features of WebLogic Diagnostics Framework (WLDF), including the capturing of WebLogic Server images on detection of critical errors. WLDF is a monitoring and diagnostic framework that defines and implements a set of services that run within WebLogic Server processes and participate in the standard server life cycle. Using WLDF, you can create, collect, analyze, archive, and access diagnostic data generated by a running server and the applications deployed within its containers. This data provides insight into the run-time performance of servers and applications and enables you to isolate and diagnose faults when they occur.
Oracle Fusion Middleware Diagnostics Framework integrates with the following components of WLDF:
-
WLDF Policies and Actions, which watches specific logs and metrics for specified conditions and sends a notification when a condition is met. There are several types of notifications, including JMX notification and a notification to create a Diagnostic Image. Oracle Fusion Middleware Diagnostics Framework integrates with the WLDF Policies and Actions component to create incidents.
-
Diagnostic Image Capture, which gathers the most common sources of the key server state used in diagnosing problems. It packages that state into a single artifact, the Diagnostic Image. With Oracle Fusion Middleware Diagnostics Framework, it writes the artifact to ADR.
For more information about WLDF, see What Is the WebLogic Diagnostics Framework? in Configuring and Using the Diagnostics Framework for Oracle WebLogic Server
-
About Incidents and Problems
To facilitate diagnosis and resolution of critical errors, the Diagnostic Framework introduces two concepts for Oracle Fusion Middleware: problems and incidents.
A problem is a critical error. Critical errors manifest as internal errors or other severe errors. Problems are tracked in the ADR. Each problem has a problem key, which is a text string that describes the problem. It includes an error code (in the format XXX-nnnnn) and in some cases, other error-specific values.
An incident is a single occurrence of a problem. When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the ADR. Each incident is identified by a numeric incident ID, which is unique within the ADR home. When an incident occurs, the Diagnostic Framework:
-
Gathers first-failure diagnostic data about the incident in the form of dump files (incident dumps).
-
Stores the incident dumps in an ADR subdirectory created for that incident.
-
Registers the incidents dumps with the incident in ADR.
Parent topic: About the Diagnostic Framework
Incident Flood Control
It is conceivable that a problem could generate dozens or perhaps hundreds of incidents in a short period of time. This would generate too much diagnostic data, which would consume too much space in the ADR and could possibly slow down your efforts to diagnose and resolve the problem. For these reasons, the Diagnostic Framework applies flood control to incident generation after certain thresholds are reached. A flood-controlled incident is an incident that is not recorded in the ADR. Instead, the Diagnostic Framework writes a message at the WARNING level to the log file and returns an oracle.dfw.incident.Incident object. Flood-controlled incidents provide a way of informing you that a critical error is ongoing, without overloading the system with diagnostic data.
By default, if more than 5 incidents with the same problem key occur within 60 minutes, subsequent incidents with the same problem key are flood controlled. You can change this value using MBeans, as described in Configuring the Diagnostic Framework.
Parent topic: About Incidents and Problems
Diagnostic Framework Components
Note:
To use the Diagnostic Framework, in particular the Automatic Diagnostic Repository, the Managed Servers must have Oracle JRF applied. The following directory will exist for each Managed Server if Oracle JRF has been applied:
DOMAIN_HOME/SERVERS/server_name/adr
If the directory does not exist take one of the following steps:
-
Apply Oracle JRF, as described in Applying Oracle JRF Template to a Managed Server or Cluster.
-
If Oracle JRF has been applied, restart the servers, making sure that the Node Manager property
startScriptEnabled
is set totrue
, as described in Configuring Node Manager to Start Managed Servers.
The following topics describe the key components of the Diagnostic Framework:
- Automatic Diagnostic Repository
- Diagnostic Dumps
- Diagnostic Framework Management MBeans
- WLST Commands for Diagnostic Framework
- ADRCI Command-Line Utility
Parent topic: About the Diagnostic Framework
Automatic Diagnostic Repository
The Automatic Diagnostic Repository (ADR) is a file-based hierarchical repository for Oracle Fusion Middleware diagnostic data, such as traces and dumps. The Oracle Fusion Middleware components store all incident data in the ADR. Each Oracle WebLogic Server stores diagnostic data in subdirectories of its own home directory within the ADR. For example, each Managed Server and Administration Server has an ADR home directory.
The ADR root directory is known as ADR base. By default, the ADR base is located in the following directory:
DOMAIN_HOME/servers/server_name/adr
Within ADR base, there can be multiple ADR homes, where each ADR home is the root directory for all incident data for a particular instance of Oracle WebLogic Server. The following path shows the location of the ADR home:
ADR_BASE/diag/ofm/domain_name/server_name
Figure 13-1 illustrates the directory hierarchy of the ADR home for an Oracle WebLogic Server instance.
Figure 13-1 ADR Directory Structure for Oracle Fusion Middleware
Description of "Figure 13-1 ADR Directory Structure for Oracle Fusion Middleware"
The subdirectories in the ADR home contain the following information:
-
alert: The XML-formatted alert log.
-
incident: A directory that can contain multiple subdirectories, where each subdirectory is named for a particular incident. The subdirectories are named incdir_n, with n representing the number of the incident. Each subdirectory contains information and diagnostic dumps pertaining only to that incident.
-
(others): Other subdirectories of ADR home, which store incident packages and other information.
Note:
ADR uses the domain name as the Product ID and the server name as the Instance ID when it packages an incident. However, if either name is more than 30 characters, ADR truncates the name. In addition, dollar sign ($) and space characters are replaced with underscores.
Parent topic: Diagnostic Framework Components
Diagnostic Dumps
A diagnostic dump captures and dumps specific diagnostic information when an incident is created (automatic) or on the request of an administrator (manual). When executed as part of incident creation, the dump is included with the set of incident diagnostics data. Examples of diagnostic dumps include a JVM thread dump, JVM class histogram dump, and DMS metric dump. For a list of diagnostic dumps, see Table 13-7.
Parent topic: Diagnostic Framework Components
Diagnostic Framework Management MBeans
The Diagnostic Framework provides MBeans that you can use to configure the Diagnostic Framework. For example, you can enable or disable flood control and you can configure how many incidents with the same problem key can occur within a specified time period. For information about using the management MBeans to configure the Diagnostic Framework, see Configuring the Diagnostic Framework.
You can also use the MBeans to query and create incidents, discover the list of available diagnostic dump types, and execute individual diagnostic dumps.
Parent topic: Diagnostic Framework Components
WLST Commands for Diagnostic Framework
The Diagnostic Framework provides WLST commands that you can use to view information about problems and incidents, create incidents, execute specific dumps and query the set of diagnostic dump types:
-
Diagnostic Framework Custom WLST Commands in the WLST Command Reference for Infrastructure Components
Parent topic: Diagnostic Framework Components
ADRCI Command-Line Utility
The ADR Command Interpreter (ADRCI) is a utility that enables you to investigate problems, and package and upload first-failure diagnostic data to Oracle Support, all within a command-line environment. ADRCI also enables you to view the names of the dump files in the ADR, and to view the alert log with XML tags stripped, with and without content filtering.
ADRCI is installed in the following directory:
(UNIX) ORACLE_HOME/oracle_common/adr (Windows) ORACLE_HOME\oracle_common\adr
See the following topics for information about using the ADRCI command-line utility:
-
Packaging an Incident for information on packaging an incident.
-
Purging Incidents for information on purging incidents.
See Also:
-
ADRCI: ADR Command Interpreter in Oracle Database Utilities
-
Managing Diagnostic Data in the Oracle Database Administrator's Guide
Parent topic: Diagnostic Framework Components
How the Diagnostic Framework Works
The Diagnostic Framework is active in each server and provides automatic error detection through predefined configured rules. Oracle Fusion Middleware components and applications automatically benefit from this always-on checking.
Incidents are automatically detected in two ways:
-
By the incident detection log filter, which is automatically configured to detect critical errors.
-
By the WLDF Policies and Actions component. The Diagnostics Framework listens for a predefined notification type and creates incidents when it receives such notifications.
For information about configuring WLDF Policies and Actions, see Configuring WLDF Policies and Actions for the Diagnostic Framework.
-
Programmatic incident creation. Some components create incidents directly.
Figure 13-2 shows the interaction when the incident is detected by the incident log detector. It shows the interaction among the incident log detector, the WLDF Diagnostic Image MBean, ADR, and component or application dumps when an incident is detected by the incident log detector.
Figure 13-2 Incident Creation Generated by Incident Log Detector
Description of "Figure 13-2 Incident Creation Generated by Incident Log Detector"
The steps represented in Figure 13-2 are:
-
The incident detection log filter is initialized with component and application diagnostic rules.
-
An application or component logs a message using the java.util.logging API.
-
The ODL log handler passes the message to the incident detection log filter.
-
The incident log detection filter inspects the log message to see if an incident should be created, basing its decision on the diagnostic rules for the component. If the diagnostic rule indicates that an incident should be created, it creates an incident in the ADR.
-
The ODL log handler writes the log message to the log file, and returns control to the application.
When an incident is created, a message, similar to the following, is written to the log file:
[2017-03-28T11:05:34.603-07:00] [wls_server_1] [NOTIFICATION] [DFW-40101] [oracle.dfw.incident] [tid: [ACTIVE].ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId: weblogic] [ecid: 66217af9-247f-4344-94a9-14f90e75a586-000e093f,0] An incident has been signalled with the incident facts: [problemKey=MDS-50500 [MANUAL] incidentSource=MANUAL incidentTime=Fri March 28 11:05:34 PDT 2017 errorMessage=MDS-50500 executionContextId=null]
-
The Diagnostic Framework executes the diagnostic dumps that are indicated by the diagnostic rules for the component.
-
The Diagnostic Framework writes the dumps to ADR, in the directory created for the incident.
-
The Diagnostic Framework invokes the WLDF Diagnostic Image MBean requesting that a Diagnostic Image be created in ADR.
-
WLDF writes the Diagnostic Image to ADR.
Figure 13-3 shows the interaction when an incident is detected by the WLDF WLDF Policies and Actions system. It shows the interaction among the incident notification listener, the WLDF Policies and Actions system, and the WLDF Diagnostic Image MBean.
Figure 13-3 Incident Creation Generated by WLDF WLDF Policies and Actions
Description of "Figure 13-3 Incident Creation Generated by WLDF WLDF Policies and Actions"
The steps represented in Figure 13-3 are:
-
The incident notification listener is initialized with component and application diagnostic rules.
-
Oracle Fusion Middleware Diagnostic Framework registers a JMX notification listener with WLDF. The listener listens for events from the WLDF WLDF Policies and Actions system. It only processes notifications of type oracle.dfw.wldfnotification.
-
Something in the system causes the configured WLDF policy to be triggered, causing a notification to be sent to the incident notification listener. The notification includes event information describing the data that caused the policy to trigger.
-
The Diagnostic Framework creates an incident in ADR.
-
The Diagnostic Framework executes the diagnostic dumps that are indicated by the diagnostic rules.
-
The Diagnostic Framework writes the dumps to ADR, in the directory created for the incident.
-
The Diagnostic Framework invokes the WLDF Diagnostic Image MBean requesting that a Diagnostic Image be created in ADR.
-
WLDF writes the Diagnostic Image to ADR.
Parent topic: Diagnosing Problems
Configuring the Diagnostic Framework
You can configure some settings for the Diagnostic Framework, including custom diagnostic rules and problem suppression. In addition, you can configure an WLDF Policies and Actions to create an incident.
The following topics describe how to configure the Diagnostic Framework:
- Configuring Diagnostic Framework Settings
- Configuring Custom Diagnostic Rules
- Configuring Problem Suppression
- Retrieving Problem Key Filters
- Configuring WLDF Policies and Actions for the Diagnostic Framework
Oracle Fusion Middleware configures a WLDF Diagnostics Module that contains a set of Policy and Action rules (previously known as Watch and Notification rules) for detecting a specific set of critical errors and creating an incident for each occurrence of those errors. You can configure those Policies to create an incident.
Parent topic: Diagnosing Problems
Configuring Diagnostic Framework Settings
You can configure the following settings:
-
Enabling or disabling the detection of incidents through the log files
-
Enabling or disabling flood control and setting parameters for flood control
You configure these settings by using the Diagnostic Framework MBean DiagnosticConfig. The following shows the MBean's ObjectName:
oracle.dfw:type=oracle.dfw.jmx.DiagnosticsConfigMBean,name=DiagnosticsConfig
Table 13-1 shows the attributes for the DiagnosticConfig MBean and a description of each parameter.
Table 13-1 DiagnosticConfig MBean Attributes for Diagnostic Framework
Attributes | Description |
---|---|
DumpSamplingIdleWhenHealthy |
Determines whether dump sampling is active when the system is healthy. By default, this is set to true, which means that dump sampling is not active until an incident occurs. |
DumpSamplingMinimumHealthyPeriod |
The amount of time in seconds that the dump sampling is active after an incident occurs. The default is 259200 seconds (72 hours). |
floodControlEnabled |
Enables or disables flood control. Specify Note that flood control does not apply to manually created incidents. |
floodControlIncidentCount |
Sets the number of incidents with the same problem key that can be created within the time period, specified by floodControlIncidentTimeoutPeriod, before they are controlled by flood control. The default is 5. When flood control is enabled, if the number of incidents with the same problem key exceeds this count, no incidents are created, but the Diagnostic Framework writes a message at the WARNING level to the log file. |
floodControlIncidentTimeoutPeriod |
Sets the time period in which the number of incidents, as specified by floodControlIncidentCount, with the same problem key can be created before they are controlled by flood control. The default is 60 minutes. |
incidentCreationEnabled |
Enables or disables incident creation. Specify |
logDetectionEnabled |
Enables or disables the detection of incidents through the log files. Specify |
maxTotalIncidentSize |
Sets the maximum total size that is allocated for all incidents. When the limit is reached, the oldest incidents are purged until the space used by all incidents is less than the amount specified by this parameter. The default is 500 MB. The limit may be exceeded during the creation of an incident, but when the incident creation completes, the oldest incidents are purged. |
reservedMemoryKB |
The amount of reserved memory that is released when OutOfMemoryError is detected. When the Diagnostic Framework starts, it allocates 512 KB of memory for its own private use. When the Diagnostic Framework detects that an OutOfMemoryError has occurred in the server, it frees that block of memory and proceeds to create the incident. The default is 512 KB. |
uncaughtExceptionDetectionEnabled |
Enables the Java-based uncaught exception handler. When enabled and an uncaught exception is detected, an incident is created. Specify The default is |
useExternalCommands |
Indicates whether external JVM commands should be used to perform thread dumps. Specify |
The following example shows how to configure these settings using the Fusion Middleware Control System MBean Browser:
Parent topic: Configuring the Diagnostic Framework
Configuring Custom Diagnostic Rules
You can configure custom diagnostic rules that apply to a domain, a server, or an application in a domain or server.
You create the custom diagnostic rules by creating an .xml file with a particular format, which is shown in the example later in this section. You must save the file to one of the following locations:
-
For rules that apply to the entire domain:
DOMAIN_HOME/config/fmwconfig/dfw
-
For rules that apply to a particular server:
DOMAIN_HOME/config/fmwconfig/servers/server_name/dfw
The file name must use the following format:
name.xml appname#name.xml
In the format, appname
is the name of the application to which the rule applies. The appname
must be the exact name of the deployed application. name
is the name of the rule you specify. If you do not specify appname
, the rules apply to the entire server. For example, the following rule applies to the application myApp:
myApp#custom_rule.xml
The custom diagnostic rules file can contain the following types of elements to define the rule:
-
Log detection conditions, which are optional
You can define a set of conditions, in the logDetectionConditions element, to check for in the diagnostic logs applicable to the server or to the specified application against which that the rules are registered. When a log message matching the condition is detected, an incident is created, capturing diagnostics that help identify the problem. By default, all INCIDENT_ERROR messages are detected and an incident created for them. In addition, specific components may have configured rules to detect specific messages.
The following example shows a fragment of a custom diagnostic rules file that defines four log detection conditions. If one or more of the conditions are true, an incident is created.
<?xml version="1.0" encoding="UTF-8"?> <diagnosticRules xmlns="http://www.oracle.com/DFW/DiagnosticsFrameworkRules" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"> <logDetectionConditions> <condition messageSeverity="INCIDENT_ERROR"/> <condition messageSeverity="ERROR" component="jrfServer_admin"/> <condition messageSeverity="ERROR" module="test.servletA"/> <condition messageId="FMW-40300"/> </logDetectionConditions>
See Table 13-2 for a description of the conditions you can use.
-
Processing rules
You can define processing rules that are evaluated when either the server or application rules are involved in incident creation. For example, if the application MyApp is involved in incident creation, any rules associated with the MyApp application are evaluated. In all cases, server-wide rules are evaluated regardless of the application.
Processing rules consist of two parts:
-
Default actions, which are optional. If they are present, they are always executed during incident creation. The actions are a list of diagnostic dumps to execute, along with optional arguments.
The following shows an example set of default actions:
<defaultActions> <dumpAction name="odl.logs"> <argument name="timestamp" value="INCIDENT_TIME" valueType="fact"/> </dumpAction> <dumpAction name="dms.metrics"/> </defaultActions>
See Table 13-3 for a description of the optional arguments that you can use.
-
Condition-based actions, which are executed only if the condition evaluates to true. Each <rule> element consists of a name attribute, along with a child <ruleCondition> element and a child <ruleActions> element. The <ruleActions> element contains one or more dumpAction elements. See Table 13-4 for a list of the <ruleCondition> element attributes.
If multiple <condition> elements are specified in a single <rule> element, the dumpAction is executed only if all conditions evaluate to true.
The following shows an example of a condition-based action rule. If the MESSAGE_ID is DFW-99997, the condition evaluates to true and the jvm.classhistogram dump is executed.
<processingRules> <rule name="OOME"> <ruleCondition> <condition name="MESSAGE_ID" value="DFW-99997"/> </ruleCondition> <ruleActions> <dumpAction name="jvm.classhistogram"/> </ruleActions> </rule> </processingRules>
-
Table 13-2 describes the attributes you can use to create the log detection conditions:
Table 13-2 Conditions for the LogDetectionConditions Element
Condition | Description |
---|---|
messageSeverity |
The log level at which the message was logged. (The MESSAGE_LEVEL field for ODL log files.) For example, INCIDENT_ERROR, ERROR. |
messageId |
The ID of the message. (The MESSAGE_ID field for ODL log files.) For example, DFW-99997. |
component |
The component name. (The COMPONENT_ID field for ODL log files.) For example, |
module |
The name of the module that originated the message. (The MODULE_ID field for ODL log files.) |
See Table 12-1 for a description of the ODL log file fields.
Table 13-3 describes the optional arguments that you can use for the <defaultActions> element.
Table 13-3 Optional Arguments for the defaultActions Element
Argument | Description |
---|---|
name |
The name of the argument. |
value |
The value of the argument |
type |
The type of argument. Valid values are:
|
Table 13-4 shows the <ruleCondition> element attributes.
Table 13-4 Attributes for the ruleCondition Element
Element | Description |
---|---|
name |
The name of the attribute. Valid values depend on the valueType:
|
operator |
The operator. Value values are EQ, EQNoCase, NE, Contains, StartsWith, EndsWith, LT, GT, LE, GE. The default is EQ. The values are case sensitive. |
value |
The literal value to compare. |
datatype |
The data type. Valid values are String or Integer. The default is String. The values are case sensitive. |
valueType |
The type of argument:
|
To create and load a custom diagnostic rule:
Parent topic: Configuring the Diagnostic Framework
Configuring Problem Suppression
In certain situations, you may want to suppress the creation of incidents based on a particular problem key. For example, in a development environment, when you are developing a servlet, you may generate high number of uncaught exceptions as you refine the code. This results in the creation of unnecessary incidents.
The Diagnostic Framework allows you to configure problem suppression filters so that problems that match the filter criteria do not result in the creation of an incident.
When you configure a problem suppression filter, you use a regular expression that represents a pattern that you want to match. The regular expression is matched using the java.util.regex class. For example:
-
The following regular expression matches any incident with a problem key that starts with MDS-5000.
MDS-5000.*
-
The following regular expression matches any problem with the text
OutOfMemory
. Because the regular expression is case-sensitive, it will not match problems with the textoutofmemory.
.*OutOfMemory.*
You can add and remove filters and get a list of filters or the detail of one filter using the DiagnosticConfig MBean.
Table 13-5 shows the operations and attribute for configuring problem suppression filters and a description of each.
Table 13-5 DiagnosticConfig MBean Operations and Attributes for Problem Suppression Filters
Operations and Attribute | Description |
---|---|
Operation: addProblemKeyFilter(filter_pattern) |
Adds a new problem suppression filter. You pass it the regular expression that represents a pattern that you want to match. For example: addProblemKeyFilter(".*OutOfMemory.*) |
Attribute: getProblemKeyFilters() |
Returns a list of the configured problem suppression filters. For example: getProblemKeyFilters() |
Operation: getProblemKeyFilter(filterID) |
Returns the filter pattern associated with the specified ID. For example: getProblemKeyFilter(id) To find the ID, use the getProblemKeyFilters() operation. |
Operation: removeProblemKeyFilter(filterID) |
Removes the filter pattern associated with the given filter ID. For example: removeProblemKeyFilter(id) |
To configure a problem suppression filter:
You can delete the filters using the removeProblemKeyFilter operation.
Parent topic: Configuring the Diagnostic Framework
Retrieving Problem Key Filters
You can retrieve a specific filter, passing the ID of the filter to the getProblemKeyFilter operation.
Alternatively, you can retrieve a list of the filters using the getProblemKeyFilters attribute:
Parent topic: Configuring the Diagnostic Framework
Configuring WLDF Policies and Actions for the Diagnostic Framework
Oracle Fusion Middleware configures a WLDF Diagnostics Module that contains a set of Policy and Action rules (previously known as Watch and Notification rules) for detecting a specific set of critical errors and creating an incident for each occurrence of those errors. You can configure those Policies to create an incident.
The WLDF Diagnostics Module is called Module-FMWDFW and contains the following set of Policy conditions:
Name | Description |
---|---|
Deadlock |
Two or more Java threads have circular lock chains among their Java Monitor object usage. |
StuckThread |
An Oracle WebLogic Server ExecuteThread, which is blocked or busy for more than the time specified by the Oracle WebLogic Server StuckThreadMaxTime parameter. |
UncheckedException |
This category includes all Unchecked Exception, RuntimeException, and Errors caught by the Oracle WebLogic Server ExecuteThread, such as NullPointerException, StackOverflowError, or OutOfMemoryError. |
The Diagnostic Module also includes a configured WLDF JMX Notification FMWDFW-notification
of type oracle.dfw.wldfnotification
. You can reuse this WLDF JMX Notification for your own WLDF Policy conditions to create an incident:
-
In Fusion Middleware Control, from the WebLogic Domain menu, expand Diagnostics and select Diagnostic Modules.
The Diagnostic Modules page is displayed.
-
Click Module-FMWDFW.
The Module-FMWDFW page is displayed.
-
Select the Configuration tab, then the Policies and Actions tab. The following figure shows the Policies section:
-
Select the Policies tab and click Create.
The Create a Diagnostic Policy page is displayed.
-
For Policy Name, enter a name for the policy.
You can enter any name. Alternatively, you can use the following format to force the Diagnostic Framework to use a custom message ID:
message-id#[application_name]#any_text
The message ID consists of a prefix that can be 1 to 6 characters, and a number, that can be 1 to 6 digits. The application name is optional. For example:
WLS-40500#My_Policy_Name
The following example uses the application name testapp:
WLS-40501#testapp#My_Policy_Name
The Diagnostic Framework uses the message ID as the incident message ID in constructing the incident problem key.
-
For Rule Type, select a type, for example, Server Log or Smart Rule based, which are pre-configured rules, such as Cluster Low Average Throughput.
-
Click Next.
-
The next pages depend on the Policy type:
-
For Smart Rule based:
-
Select a rule and click Next.
-
Provide values for the parameters and click Next.
-
If you intend the policy to be scheduled on particular days of the week or month, for Start Time, provide values for Hour, Minute, and Second. Then, select AM or PM.
This option is used only when you select Specific days of the week or Specific days of the month from the Repeat field.
-
For Repeat, select the number of times it will be run within a specified time. For example, select Every N minutes.
-
If you intend the policy to be scheduled to be run at specified intervals, such as every five minutes, for Frequency, enter a value, such a 5. If you selected Every N minutes for Repeat, then the schedule would be every 5 minutes. Click Next.
-
Select an alarm type and click NEXT.
-
For Scaling Actions, select either Scale Up Action or Scale Down Action.
-
For Diagnostic Actions, move an action from the Available column to the Chosen column.
-
-
For Calendar Based:
-
If you intend the policy to be scheduled on particular days of the week or month, for Start Time, provide values for Hour, Minute, and Second. Then, select AM or PM.
This option is used only when you select Specific days of the week or Specific days of the month from the Repeat field.
-
For Repeat, select the number of times it will be run within a specified time. For example, select Every N minutes.
-
If you intend the policy to be scheduled to be run at specified intervals, such as every five minutes, for Frequency, enter a value, such a 5. If you selected Every N minutes for Repeat, then the schedule would be every 5 minutes. Click Next.
-
If you have a dynamic cluster, for Scaling Actions, select either Scale Up Action or Scale Down Action.
-
For Diagnostic Actions, move an action from the Available column to the Chosen column.
-
-
For Collected Metrics:
-
Select Smart Rule or Expression. Click Next.
If you select Smart Rule, provide values for the parameters.
If you select Expression, enter an expression.
-
Click Next.
-
If you intend the policy to be scheduled on particular days of the week or month, for Start Time, provide values for Hour, Minute, and Second. Then, select AM or PM.
This option is used only when you select Specific days of the week or Specific days of the month from the Repeat field.
-
For Repeat, select the number of times it will be run within a specified time. For example, select Every N minutes.
-
If you intend the policy to be scheduled to be run at specified intervals, such as every five minutes, for Frequency, enter a value, such a 5. If you selected Every N minutes for Repeat, then the schedule would be every 5 minutes. Click Next.
-
Select an alarm type and click NEXT.
-
For Scaling Actions, select either Scale Up Action or Scale Down Action.
-
For Diagnostic Actions, move an action from the Available column to the Chosen column.
-
-
For Domain Log:
-
Add expressions to create the rule for your policy. Then, click Next.
-
Select an alarm type and click NEXT.
-
If you have a dynamic cluster, for Scaling Actions, select either Scale Up Action or Scale Down Action.
-
For Diagnostic Actions, move an action from the Available column to the Chosen column.
-
-
For Event Data:
-
Add expressions to create the rule for your policy. Then, click Next.
-
Select an alarm type and click NEXT.
-
If you have a dynamic cluster, for Scaling Actions, select either Scale Up Action or Scale Down Action.
-
For Diagnostic Actions, move an action from the Available column to the Chosen column.
-
-
For Server Log:
-
Add expressions to create the rule for your policy. For example, you can construct the expression (SEVERITY = 'Error') and (MSGID = 'BEA-000337'). Then, click Next.
-
Select an alarm type and click NEXT.
-
If you have a dynamic cluster, for Scaling Actions, select either Scale Up Action or Scale Down Action.
-
For Diagnostic Actions, move an action from the Available column to the Chosen column.
-
-
-
Click Create.
For more information on creating policies, see Create policies for a diagnostic system module in Oracle WebLogic Server Administration Console Online Help.
Parent topic: Configuring the Diagnostic Framework
Investigating, Reporting, and Solving a Problem
You can use WLST and ADRCI commands and the Remote Diagnostic Agent (RDA) to investigate and report a problem (critical error), and in some cases, resolve the problem.
The section begins with a roadmap that summarizes the typical set of tasks that you must perform. It describes the following topics:
- Roadmap—Investigating, Reporting, and Resolving a Problem
- Viewing Problems and Incidents
- Analyzing Specific Problem Keys
- Working with Diagnostic Dumps
- Configuring and Using Diagnostic Dump Sampling
- Managing Incidents
- Generating an RDA Report
Parent topic: Diagnosing Problems
Roadmap—Investigating, Reporting, and Resolving a Problem
Typically, investigating, reporting, and resolving a problem begins with a critical error. This section provides an overview of that workflow.
Figure 13-4 illustrates the tasks that you complete to investigate, report, and resolve a problem.
Figure 13-4 Flow for Investigating a Problem
Description of "Figure 13-4 Flow for Investigating a Problem"
The following describes the workflow illustrated in Figure 13-4:
-
You notice that the system, component, or application is not functioning as expected. For example, you notice that there is a performance problem or users have reported that the application that they are trying to access is reporting errors.
-
Check to see if a problem and an incident have been created that may be related to the symptoms you are observing:
-
View the set of problems by using the WLST
listProblems
command, as described in Viewing Problems. -
If a problem has been created, list the incidents related to the specific problem using the
listIncidents
command, as described in Viewing Incidents.
-
-
If an incident has not been created, go to Step 4. If an incident has been created, go to Step 5.
-
If you do not see any incidents listed that are related to your problem, you can create an incident manually using the
createIncident
command to capture diagnostics for the problem.Consider creating an incident when you encounter an issue, such as software failure or performance problem, and you want to gather more diagnostic data. You can view the log files and the messages in the files. If there is a specific message that you believe is related to the issue you are seeing, you can use the message ID in the
createIncident
command.See Creating an Incident Manually for more information about creating an incident.
-
View the details of the specific incident using the
showIncident
command, as described in Viewing Incidents. This command lists information about the incident, including the related message ID, the time of the incident, the ECID, and the files generated by the incident. -
Use the
getIncidentFile
command to view the contents of files for the incident, as described in Viewing Incidents. The contents may provide information to guide you to the source of the problem and help in resolving it. -
If the contents of the files for the incident do not help you to resolve the problem, you can execute additional dumps to view detailed diagnostics. For example, if you are experiencing performance problems, execute the dms.metrics dump. See Working with Diagnostic Dumps for information about the dumps available and how to execute them.
-
If you still cannot resolve the problem, package the incident, along with the RDA report, and send them to Oracle Support. See Packaging an Incident and Generating an RDA Report for information about packaging incidents and generating RDA reports.
Parent topic: Investigating, Reporting, and Solving a Problem
Viewing Problems and Incidents
You can view the set of problems, the list of incidents, and the details of a particular incident using the WLST command-line utility, as described in the following topics:
Parent topic: Investigating, Reporting, and Solving a Problem
Viewing Problems
You can view the set of problems by executing the WLST listProblems
command, using the following format:
listProblems([adrHome] [,server])
The listProblems
command lists the problems in the ADR home. Each problem has a unique ID:
listProblems() Problem Id Problem Key 1 BEA-101020 [HTTP]
Parent topic: Viewing Problems and Incidents
Viewing Incidents
You can list of all available incidents or the incidents related to a specific problem by executing the WLST listIncidents
command, using the following format:
listIncidents([id], [ADRHome])
For example, to see the list of all incidents, use the following command:
listIncidents() Incident Id Incident Time Problem Key 2 Fri Apr 28 11:05:59 PDT 2017 MDS-50500 [MANUAL] 1 Fri Apr 28 11:02:22 PDT 2017 MDS-50500 [MANUAL]
To view the incidents related to a specific problem, use the following command:
listIncidents(id='1') Incident Id Incident Time Problem Key 2 Fri Apr 28 11:05:59 PDT 2017 MDS-50500 [MANUAL] 1 Fri Apr 28 11:02:22 PDT 2017 MDS-50500 [MANUAL]
To view the details of a particular incident, use the WLST showIncident
command, using the following format:
showIncident(id, [adrHome] [,server])
For example, to see the details of incident 1, use the following command:
showIncident(id='1') Incident Id: 1 Problem Id: 1 Problem Key: MDS-50500 [MANUAL] Incident Time:Fri Apr 28 11:02:22 PDT 2017 Error Message Id: MDS-50500 Execution Context: Flood Controlled: false Dump Files : readme.txt jvm_threads10_i1.txt dms_metrics11_i1.txt dfw_samplingArchive13_i1.JVMThreadDump.txt dfw_samplingArchive13_i1.readme.txt odl_logs14_i1.txt
To view the contents of a file in the incident, use the WLST getIncidentFile
command, using the following format:
getIncidentFile(id, name [,outputFile] [,adrHome] [,server])
For example, to view the contents for the file odl_logs4_i1.dmp use the following command:
getIncidentFile(id='1', name='odl_logs14_i1.txt',outputFile='/tmp/odl_logs4_i1_dmp.output')
The command writes the output to the file odl_logs4_i1_dmp.output.
Parent topic: Viewing Problems and Incidents
Querying Incidents
While the listIncidents
command shows you the incidents related to a particular problem ID, or for a particular server, it does not allow you to restrict the list further. The WLST queryIncidents
command lets you query for the value of particular attributes across one or more servers, or all servers in a domain. For example, you can query by the time of incident creation or the ECID.
An expression contains an incident attribute, an operator, and a string, in the following format:
attribute operator "string"
You can combine query expressions with the Boolean operators AND or OR, and group them by parentheses ().
The following incident attributes are supported:
-
TIMESTAMP: Incident creation time. You can use the
from
andto
operators to specify a time range. The date format isYYYY-MM-DD HH:MM
. -
ECID: Execution Context ID
-
PROBLEM_KEY: Problem Key
-
MSG_FACILITY: The error message facility, such as ORA or OHS.
-
MSG_NUMBER: The error message ID, such as 600.
Custom incident attributes are also supported. For example, TRACEID, APP, URI, and DSID are supported. In addition, the context values, as shown in the incident readme.txt file, are supported. For example, DFW_APP_NAME and DFW_USER_NAME are supported.
The following operators are supported:
-
equals
-
notEqual
-
startsWith
-
endsWith
-
contains
-
isNull
-
notNull
For example, you can query all incidents in all servers in the domain for the ECID f19wAgN000001:
queryIncidents(query="ECID equals f19wAgN000001")
The following example queries all incidents that occurred between March 1, 2017 and March 15, 2017, for the server wls_server_1:
queryIncidents(query="TIMESTAMP from '2017-03-01 00:00'AND TIMESTAMP to '2017-03-15 00:00'", servers=["wls_server_1"])
For the complete syntax for this command, see queryIncidents in the WLST Command Reference for Infrastructure Components.
Parent topic: Viewing Problems and Incidents
Analyzing Specific Problem Keys
The Diagnostic Framework provides a set of well-defined problem keys for unhandled exceptions. These exceptions are either detected through the existing WLDF Policy "UncheckedException" or through the Diagnostic Framework java.lang.Thread.UncaughtExceptionHandler handler. Previously, the Diagnostic Framework generated problem keys with different formats for the same type of issues. Table 13-6 describes these problem keys and how to use them to investigate a problem.
Table 13-6 Uncaught Exception Problem Keys
Exception | Problem Key | Description |
---|---|---|
java.lang.OutOfMemoryError |
DFW-99997 [java.lang.OutOfMemoryError] |
Used by all java.lang.OUtOfMemoryError incidents. With each incident of this type, a jvm.classhistogram dump is executed. The dump captures statistics about the instances of classes that have been loaded and the counts of associated Objects. Review the contents of this dump for a good starting point for understanding what has been loaded into the JVM's memory. In addition, the dms.metrics dump records statistics about the overall JVM memory. |
java.sql.SQLException |
DFW-99996 [ora-code|java.sql.SQLException]][package.class.method][app-name] |
Used for all exceptions of type java.sql.SQLException, including its subclasses. The Diagnostic Framework attempts to extract the Oracle error code from the exception error message, and if it is successful, uses that in the problem key. If not, it uses the exception name. Review the text associated with the exception to get more details, such as the operation that could not be performed on the database. In addition, you can review the SQL error code details for additional information. |
All others |
DFW-99998 [exception-name][package.class.name][app-name] |
Used by all other types of exceptions, such as java.lang.NullPointerException, java.io.IOException, java.lang.StringIndexOutOfBoundsException, that are not handled in a unique way. Review the text associated with the exception to get more details, such as the reason for the failure. The source line in the problem key is a best-attempt indicator of the location of the failure. |
Parent topic: Investigating, Reporting, and Solving a Problem
Working with Diagnostic Dumps
If you suspect a problem, you can make use of the built-in diagnostic dumps to report detailed diagnostics that can help diagnose the problem. Diagnostic dumps provide a means to output and record diagnostics data which serve as valuable information when diagnosing issues with Oracle Fusion Middleware components, applications, and infrastructure. The output from these dumps is intended to be used by customers and Oracle Support to diagnose issues with Oracle Fusion Middleware.
Diagnostic dumps are executed in the following ways:
-
Manually, using WLST commands, as described in the following sections
For example, if your Java EE application is hanging and you suspect a deadlock, you could use the jvm.threads dump to obtain the set of threads.
-
Automatically, when the Diagnostic Framework detects a critical error and creates an incident or when the administrator creates an incident
Parent topic: Investigating, Reporting, and Solving a Problem
Listing Diagnostic Dumps
You can find a list of diagnostic dumps that are available for a Managed Server by executing the WLST listDumps
command, using the following format:
listDumps([appName] [,server])
For example, to list the available dumps for wls_server1:
listDumps(server='wls_server1') Location changed to domainRuntime tree. This is a read-only tree with DomainMBean as the root. For more help, use help(domainRuntime) dfw.samplingArchive dms.configuration dms.ecidctx dms.metrics http.requests jvm.classhistogram jvm.threads mds.MDSInstancesDump odl.activeLogConfig odl.logs odl.quicktrace opss.diagTest opss.identityStoreUserRoleApiConfig opss.securityContext wls.image Use the command describeDump(name=<dumpName>) for help on a specific dump.
Table 13-7 lists the diagnostic dump actions that are defined by Oracle Fusion Middleware and their descriptions.
Table 13-7 Diagnostic Dump Actions
Dump Action | Description |
---|---|
dms.ecidctx |
The data associated with a specific Execution Context ID (ECID), if specified. Otherwise, the data associated with all available ECIDs. |
dms.metrics |
Dynamic Monitoring Service (DMS) metrics. For information about these metrics, see About Dynamic Monitoring Service (DMS) in Tuning Performance. |
http.requests |
A summary of the currently active HTTP requests. |
jvm.classhistogram |
A JVM class histogram, the output of which varies depending on the JVM vendor. |
jvm.flightRecording |
The active JRockit Flight Recorder recording. |
jvm.threads |
Summary statistics about the threads running in a JVM as well as performing a full thread dump. |
mds.MDSInstancesDump |
Information about each MDS instance in the current JVM. |
odl.activeLogConfig |
The active Java logging configuration. |
odl.logs |
Contents of diagnostic logs, correlated by ECID or time range. |
odl.quicktrace |
Quick trace messages. |
wls.image |
The WLDF server image dump. |
In addition, Oracle SOA Suite provides diagnostic dumps, as described in Diagnosing Problems with SOA Composite Applications in Administering Oracle SOA Suite and Oracle Business Process Management Suite.
Parent topic: Working with Diagnostic Dumps
Viewing a Description of a Diagnostic Dump
You can view a description of a particular dump, including the syntax for executing the dump by using the WLST describeDump
command. You specify the name of the dump in which you are interested. For example, to view a description of the dms.metrics dump, use the following command:
describeDump(name='dms.metrics') Name: dms.metrics Description: Dumps DMS (Dynamic Monitoring Service) metrics. Run Mode: asynchronous Mandatory Arguments: Optional Arguments: Name Type Description dump STRING How much to dump servers STRING Server names names STRING Name of DMS noun or metric format STRING Format of the dump output; raw or xml nountypes STRING Type of DMS noun
Parent topic: Working with Diagnostic Dumps
Executing Dumps
If you detect a problem and want to gather additional diagnostic data, you can invoke the executeDump
command for a specified dump. Each dump may have mandatory or optional arguments, or both. To view the arguments for a particular dump and how to specify them, use the describeDump
command, as described in Viewing a Description of a Diagnostic Dump.
The following example executes the dump with the name dms.metrics and the incident ID 1 and writes it to the file dumpout.txt:
executeDump(name='dms.metrics', outputFile='/tmp/dumpout.txt', id='1') Dump file dms_metrics1_i1.dmp added to incident 1
The command writes the dump output to the information about incident 1. If you execute the showIncident command for incident 1, the output includes dms_metrics1_i1.dmp.
Parent topic: Working with Diagnostic Dumps
Configuring and Using Diagnostic Dump Sampling
Diagnostic dump sampling captures the output of diagnostic dumps at specified intervals. By sampling at regular intervals, diagnostic dump sampling can help to reveal issues such as slow running web requests, and where work is being performed in those requests.
This section contains the following topics:
- About Diagnostic Dump Sampling
- Configuring Dump Sampling
- Listing Dump Samplings
- Retrieving the Dump Sampling Output
Parent topic: Investigating, Reporting, and Solving a Problem
About Diagnostic Dump Sampling
All diagnostic dump samplings are performed in the background, at specified intervals. By default, jvm.threads and jvm.classhistogram dumps are configured for sampling. However, they are not active until an incident is generated. Then, they remain active for 72 hours, by default.
You can modify the settings for the default dump samplings and you can create new sampling definitions for the dump actions listed in Table 13-7 and for any application-specific dumps. You can configure multiple sampling definitions for the same diagnostic dump, specifying different settings, such as sampling interval or server.
For each diagnostic dump sampling, the Diagnostic Framework stores the specified number of samples. When that limit is reached, the oldest sample is purged. All samples are purged when the server shuts down.
Table 13-8 shows the settings of the dump samplings that are configured by default.
Table 13-8 Default Diagnostic Dump Samplings Configuration
Dump Name | Sampling Interval | Maximum Samples Stored |
---|---|---|
jvm.threads |
60 seconds |
10 |
jvm.classhistogram |
30 minutes |
5 |
The Diagnostic Framework triggers the retrieval of the dump samples whenever an incident is created (through error detection or manual incident creation.) In addition, you can retrieve the contents of the dump samples, as described in Retrieving the Dump Sampling Output.
You can retrieve the dump sample archives in either text or zip files:
-
Text: By default, the diagnostic dump samples are concatenated into a single archive file, in text format. An ASCII header an footer are wrapped around each sample in the archive file. The header contains a timestamp and the name of the diagnostic dump that produced the sample. Both the header and footer contain the number of the samples in the archive and the number of the particular sample. For example:
$$$=== BEGIN OF Diagnostic Dump - jvm.classhistogram (Archive #0 1_of_2) ===$$$ Fri Apr 28:00:00 PDT 2017 <text of dump sampling> $$$=== END OF Diagnostic Dump - jvm.classhistogram (Archive #0 1_of_2) ===$$$
-
Zip: You can configure diagnostic dump samplings to return a zip file instead of a concatenated file. The zip file contains all available dump sample files. This format supports any diagnostic dumps whose outputs are in binary format not suitable for concatenation, as well as for dumps that generate output in text format. This format also reduces the size of the archive containing the samples.
The following example shows the contents of a zip file:
unzip -l jvm_dump.zip Archive: jvm_dump.zip Length Date Time Name -------- ---- ---- ---- 508780 04-28-17 07:25 dfw_samplingArchive1065570966467923683.JVMThreadDump.dmp 840 04-28-17 07:25 dfw_samplingArchive7749640004639161119.readme.txt -------- ------- 509620 2 files
In addition to a text or zip file, when you retrieve a dump sample, the Diagnostic Framework generates a readme file. The readme file either lists the line numbers for each dump sample in the archive (for text format) or the individual sample file names (for zip format). It also lists the timestamp for each sample and the index for the archive.
The dump sample files are named using the following format:
dfw_dumpArchivennn.Sampling_Name.{txt]|zip}
In the format nnn
is a unique number assigned by the Diagnostic Framework.
For example, the following is an example of the name of a dump sample file for JVMThreadDump:
dfw_dumpArchive17394218037.JVMThreadDump.txt
The readme files are named using the following format:
dfw_dumpArchivennn.readme.txt
In the format nnn
is a unique number assigned by the Diagnostic Framework.
All samplings are scheduled to begin at the next nearest interval, corresponding to the frequency. For example, if a sampling is configured at 12:05:13 PM and the frequency is 5 seconds, the sample will be collected at 12:05:15 PM. This ensures that the collection of a series of samplings with the same frequency will occur at the same time. It also aligns all samples across machines, assuming their system clocks are synchronized.
Note:
You must be connected to the Administration Server to execute the WLST dump sampling commands.
Parent topic: Configuring and Using Diagnostic Dump Sampling
Configuring Dump Sampling
You can create additional dump samplings, update existing dump samplings, remove dump samplings and enable or disable dump sampling, as described in the following topics:
- Activating the Default Samples
- Creating Dump Samplings
- Modifying Dump Sampling Settings
- Removing Dump Samplings
- Enabling or Disabling All Dump Sampling
Parent topic: Configuring and Using Diagnostic Dump Sampling
Activating the Default Samples
By default the jvm.threads and jvm.classhistogram dumps are not activated until an incident occurs. Then, they are active for 72 hours, by default.
You can change the behavior so that the dumps are active even if an incident has not occurred by setting the value of the MBean DumpSamplingIdleWhenHealthy to false
.
To change the amount of time used for determining the system's health, change the value of the DumpSamplingMinimumHealthyPeriod MBean.
For information about changing the value of the Diagnostic Framework MBeans, see Configuring Diagnostic Framework Settings
Parent topic: Configuring Dump Sampling
Creating Dump Samplings
You can create dump samplings for any dump listed in Table 13-7 and for any application-specific dumps. To create dump samplings, use the WLST command addDumpSample. The addDumpSample command uses the following syntax:
addDumpSample(sampleName="sample_name", diagnosticDumpName="dump_name",
[appName="application_name",] samplingInterval=num_seconds
,
rotationCount=num_samples, [dumpedImplicitly={true|false},]
[toAppend={true|false},] [args={"arg_name" : "value"},]
[server="server_name"])
For example, to create a dump sampling for the http.requests dump, setting the sampling interval to 300 seconds and the rotation count to 10 samples, for the server wls_server1:
addDumpSample(sampleName="HTTPSampling", diagnosticDumpName="http.requests", samplingInterval=300, rotationCount=10, server="wls_server1") HTTPSampling is added
For complete syntax, see addDumpSample in the WLST Command Reference for Infrastructure Components.
Parent topic: Configuring Dump Sampling
Modifying Dump Sampling Settings
You can change the settings of existing dump samplings by using the WLST command updateDumpSample. The updateDumpSample command uses the following syntax:
updateDumpSample(sampleName="sample_name", [appName="application_name",] samplingInterval=num_seconds, rotationCount=num_samplings, [dumpedImplicitly={true|false},] [toAppend={true|false},] [args={"arg_name" : "value"},] [server="server_name"])
For example, to modify the dump sampling HTTPSampling, changing the sampling interval to 200 and the rotation count to 5:
updateDumpSample(sampleName="HTTPSampling", samplingInterval=200, rotationCount=5, server="wls_server1") HTTPSampling is updated
For complete syntax, see updateDumpSample in the WLST Command Reference for Infrastructure Components.
Parent topic: Configuring Dump Sampling
Removing Dump Samplings
You can remove existing dump samplings using the WLST command removeDumpSample. The removeDumpSample command uses the following syntax:
removeDumpSample(sampleName="sample_name", [server="server_name"])
For example, to remove the dump sampling HTTPSampling:
removeDumpSample(sampleName="HTTPSampling", server="wls_server1")
Removed HTTPSampling
For complete syntax, see removeDumpSample in the WLST Command Reference for Infrastructure Components.
Parent topic: Configuring Dump Sampling
Enabling or Disabling All Dump Sampling
You can enable or disable all dump sampling using the WLST command enableDumpSampling. This command affects all configured dump samplings. The enableDumpSampling command uses the following syntax:
enableDumpSampling(enable={true|false}, [server="server_name"])
Note that the server parameter is valid only if you are connected to the Administration Server. If you do not specify the server parameter, dump sampling is disabled for the Administration Server.
For example, to disable dump sampling for the Administration Server:
enableDumpSampling(enable=false) Dump sampling disabled
To determine if dump sampling is enabled or disabled, use the WLST command isDumpSamplingEnabled. The isDumpSamplingEnabled command uses the following format:
isDumpSamplingEnabled([server="server_name"])
For complete syntax, see enableDumpSampling and isDumpSamplingEnabled in the WLST Command Reference for Infrastructure Components.
Parent topic: Configuring Dump Sampling
Listing Dump Samplings
You can list dump samplings using the WLST command listDumpSamples. You can list all dump samplings, a specified dump sampling, or all dump samplings associated with a specified server. The listDumpSamples command uses the following syntax:
listDumpSample([sampleName="sample_name",] [server="server_name"])
For example, to list all dump samplings associated with the server wls_server1:
listDumpSamples(server="wls_server1") Name : JVMThreadDump Dump Name : jvm.threads Application Name : Sampling Interval : 30 Rotation Count : 20 Dump Implicitly : true Append Samples : true Dump Arguments : context=true, timing=true, progressive=true, depth=20, threshold=30000 Name : JavaClassHistogram Dump Name : jvm.classhistogram Application Name : Sampling Interval : 1800 Rotation Count : 5 Dump Implicitly : false Append Samples : true Dump Arguments :
For complete syntax, see listDumpSample in the WLST Command Reference for Infrastructure Components.
Parent topic: Configuring and Using Diagnostic Dump Sampling
Retrieving the Dump Sampling Output
To retrieve the output of dump samples, you can use the WLST executeDump command or the WLST getSamplingArchives command, as described in the following topics:
- Retrieving Dump Samples Using the executeDump Command
- Retrieving Dump Samples Using the getSamplingArchives Command
Parent topic: Configuring and Using Diagnostic Dump Sampling
Retrieving Dump Samples Using the executeDump Command
You can retrieve dump samples using the WLST executeDump command, specifying the dfw.samplingArchive dump. This command collects all default sample archives and any dump samples that are specified with the parameter dumpImplicitly=true
from a temporary location and concatenates them into a single file. The command also returns a readme file, with details of the dump samples.
When you use the executeDump command, you use the following syntax:
executeDump(name="dfw.samplingArchive",outputFile="filename"
For the outputFile parameter, you can specify a text file or a zip file. If you specify a zip file, you must use the argument zipOutput=true
.
For any dump sampling that is configured with the parameter dumpImplicitly=false
, you must specify the optional dfw.samplingArchive argument sampleName
to collect the contents of those dump samples. For example:
executeDump(name='dfw.samplingArchive', args={'sampleName' : 'JavaClassHistogram'})
For complete syntax for this command, see executeDump in the WLST Command Reference for Infrastructure Components.
Parent topic: Retrieving the Dump Sampling Output
Retrieving Dump Samples Using the getSamplingArchives Command
You can retrieve dump samples using the WLST getSamplingArchives command. This command collects all dump samples in a zip file containing the individual dump sample files and a readme file. This method is particularly useful in dealing with binary format dumps.
The getSamplingArchives command uses the following syntax:
getSamplingArchives([sampleName="sample_name"] [,outputFile="filename" [,server="server_name"])
For example to retrieve the dump samples for the sampling JavaClassHistogram, use the following command:
getSamplingArchives(sampleName="JavaClassHistogram", outputFile="/tmp/sampling.zip")
The following shows the contents of the zip file:
unzip -l /tmp/sampling.zip Archive: /tmp/sampling.zip Length Date Time Name -------- ---- ---- ---- 6241768 04-28-17 11:19 dfw_samplingArchive8680976839106379444.JavaClassHistogram.dmp 552 04-28-17 11:19 dfw_samplingArchive7861027727509995202.readme.txt -------- ------- 6242320 2 files
For complete syntax, see getSamplingArchives in the WLST Command Reference for Infrastructure Components.
Parent topic: Retrieving the Dump Sampling Output
Managing Incidents
The Diagnostic Framework stores incidents, whether they are created automatically or manually, and Oracle Fusion Middleware provides tools to help you process incident reports and to package those incidents to send to Oracle Support. The following topics describe:
- Creating an Incident Manually
- Creating an Aggregated Incident
- Packaging an Incident
- Purging Incidents
Parent topic: Investigating, Reporting, and Solving a Problem
Creating an Incident Manually
System-generated problems—critical errors generated internally—are automatically added to the Automatic Diagnostic Repository (ADR). You can gather additional diagnostic data on these problems, upload diagnostic data to Oracle Support, and in some cases, resolve the problems, all with the workflow that is explained in Investigating, Reporting, and Solving a Problem.
Consider creating an incident manually when you encounter an issue, such as software failure or performance problem and you want to gather more diagnostic data, but the Diagnostic Framework has not automatically created an incident.
You use the WLST command createIncident
to create an incident manually. You can specify an incident based on time, a message ID, an impact area, or an ECID. Then, you can inspect the content of the incident or send it to Oracle Support for further analysis.
For example, to manually create an incident based on a message ID:
Parent topic: Managing Incidents
Creating an Aggregated Incident
If you have several incidents and want to combine them into a single incident, you can use the WLST createAggregatedIncident
command. For example, if you used selective tracing, the resulting incidents containing the trace data may be generated on multiple servers. With the createAggregatedIncident
command, you can generate an aggregated incident that meets criteria you specify. The original incidents are untouched. That is, the aggregated incident contains a copy of the incident files from the queried incidents.
The aggregated incidents are created on the Administration Server host, but they can contain incidents from one or more servers or all servers in the domain.
You construct a query using an expression that contains an incident attribute, an operator, and a string, in the following format:
attribute operator "string"
You can combine query expressions with the Boolean operators AND or OR, and group them by parentheses ().
For information about the supported attributes and operators, see createAggregatedIncident in the WLST Command Reference for Infrastructure Components.
Each aggregated incident will contain a zip file for each incident returned from the query, as well as descriptive text detailing the query used and the details of each incident.
For example, to create an aggregated incident for all incidents that contain the ODL_TRACE_ID of 123456 on the server wls_server1:
createAggregatedIncident(query="ORDL_TRACE_ID equals 123456", servers="wls_server1") Incident 55 created, containing the following incidents: Server wls_server1 Incident Id Problem Key Incident Time 15 TRACE [123456] [MANUAL] Mon Apr 17 11:22:12 EDT 2017
To create an aggregated incident for all incidents that contain the ODL_TRACE_ID of 123456 on all servers in the domain:
createAggregatedIncident(query="ORDL_TRACE_ID equals 123456") Incident 55 created, containing the following incidents: Server wls_server1, wls_server2 Incident Id Problem Key Incident Time 15 TRACE [123456] [MANUAL] Mon Apr 17 11:22:12 EDT 2017
Parent topic: Managing Incidents
Packaging an Incident
You can package the incident to facilitate sending the information to Oracle Support by using the ADR Command Interpreter (ADRCI). The ADRCI utility enables you to investigate and report problems in a command-line environment. With ADRCI, you can package incident and problem information into a zip file for transmission to Oracle Support.
The ADRCI command-line utility is located in the following directory:
(UNIX) ORACLE_HOME/oracle_common/adr (Windows) ORACLE_HOME\oracle_common\adr
Packaging an incident involves a three-step process:
-
Create a logical package.
The package is denoted as logical because it exists only as metadata in the ADR. It has no content until you generate a physical package from the logical package. The logical package is assigned a package number, and you refer to it by that number in subsequent commands.
You can create the logical package as an empty package, or as a package based on an incident number, a problem number, a problem key, or a time interval. If you create the package as an empty package, you can add diagnostic information to it in step 2.
Creating a package based on an incident means including diagnostic data, such as dumps, for that incident. Creating a package based on a problem number or problem key means including in the package diagnostic data for incidents that reference that problem number or problem key. Creating a package based on a time interval means including diagnostic data on incidents that occurred in the time interval.
-
Add diagnostic information to the package.
If you created a logical package based on an incident number, a problem number, a problem key, or a time interval, this step is optional. You can add additional incidents to the package or you can add any file within the ADR to the package. If you created an empty package, you must use ADRCI commands to add incidents or files to the package.
-
Generate the physical package.
When you submit the command to generate the physical package, ADRCI gathers all required diagnostic files and adds them to a zip file in a designated directory. You can generate a complete zip file or an incremental zip file. An incremental file contains all the diagnostic files that were added or changed since the last zip file was created for the same logical package. You can create incremental files only after you create a complete file, and you can create as many incremental files as you want. Each zip file is assigned a sequence number so that the files can be analyzed in the correct order.
Zip files are named according to the following format:
packageName_mode_sequence.zip
In the format:
-
packageName
consists of a portion of the problem key followed by a timestamp. -
mode
is eitherCOM
orINC
, for complete or incremental. -
sequence
is an integer.
-
For example, to package an incident, take the following steps:
Parent topic: Managing Incidents
Purging Incidents
By default, incidents are purged when the total size of all incidents exceed 500 MB. You can use the maxTotalIncidentSize MBean parameter to change this value, as described in Configuring Diagnostic Framework Settings.
You can manually purge incidents using the ADRCI command. You can purge based on an ID or range of IDs, the age of the incident, or the type of incident. For example, to purge incidents that are older than 60 minutes, use the following command:
purge -age 60
See the ADRCI: ADR Command Interpreter chapter of Oracle Database Utilities.
Parent topic: Managing Incidents
Generating an RDA Report
You can use the Remote Diagnostic Agent (RDA), a command-line diagnostic tool, to provide a comprehensive picture of your environment. Additionally, RDA can provide recommendations on various topics, for example configuration and security. This aids you and Oracle Support in resolving issues.
RDA is a set of command line diagnostic scripts that are executed by an engine written in the Perl programming language. RDA is used to gather detailed information about an Oracle environment; the data gathered is in turn used to aid in problem diagnosis. The output is also useful for seeing the overall system configuration.
RDA is designed to be as unobtrusive as possible; it does not modify systems in any way. A security filter is provided if required.
RDA collects information that is useful for troubleshooting issues in the following areas:
-
Installation and configuration
-
Performance
-
ORA-600, ORA-7445, ORA-3113, and ORA-4031 errors
-
Upgrade, migration, and linking
-
Oracle Database
-
Oracle Fusion Middleware
To run RDA, execute the following:
(UNIX) ORACLE_HOME/oracle_common/rda/rda.sh (Windows) ORACLE_HOME\oracle_common\rda\rda.cmd
The following shows a part of the output:
./rda.sh ------------------------------------------------------------------------------- S000INI: Initializes the Data Collection ------------------------------------------------------------------------------- RDA uses the output file prefix to identify all files belonging to the same data collection. The prefix must start with a letter and must contain only alphanumeric characters. Enter the prefix to be used for all the generated files Hit 'Return' to accept the default (RDA) > Enter the directory used for all the files to be generated Hit 'Return' to accept the default (/scratch/oracle1/Oracle/Middleware/Oracle_Home/oracle_common/rda/output) > Do you want to keep report packages from previous runs (Y/N)? Hit 'Return' to accept the default (N) > Enter the Oracle home to be used for data analysis Hit 'Return' to accept the default (/scratch/oracle1/Oracle/Middleware/Oracle_Home )
For more information about RDA, see the readme file, which is located at:
(UNIX) ORACLE_HOME/oracle_common/rda/README_Unix.txt (Windows) ORACLE_HOME\oracle_common\rda\README_Windows.txt
Parent topic: Investigating, Reporting, and Solving a Problem