|Oracle® Fusion Middleware Configuring and Using the Diagnostics Framework for Oracle WebLogic Server
11g Release 1 (10.3.3)
Part Number E13714-02
WLDF provides specific integration points with JRockit Flight Recorder. WebLogic Server events can optionally be propagated to the Flight Recorder for inclusion in a common data set for runtime or post-incident analysis. The Flight Recording data is also included in WLDF diagnostic image captures, enabling you to capture flight recording snapshots based on WLDF watch rules. This full set of functionality enables you to capture and analyze runtime system information for both the JVM and the Fusion Middleware components running on it, in a single view.
The following sections provide an overview of WLDF integration with JRockit Flight Recorder, and explain common usage scenarios that show how this integration can provide for a comprehensive performance analysis and diagnostic foundation for production systems based on WebLogic Server:
The version of Oracle JRockit that is available in the WebLogic Server installation program includes a component called JRockit Flight Recorder (JFR), a performance monitoring and profiling tool. JFR records diagnostic information on a continuous basis, making it always available, even in the wake of catastrophic failure such as a system crash.
JFR maintains a buffer of diagnostics and profiling data, called a flight recording or a JFR file, that you can access whenever you need it. The flight recording functions in a manner similar to an aircraft "black box" in which new data is continuously added and older data is stripped out, as shown in Figure 3-1.
Figure 3-1 Circular Flight Recording Buffer
For details about how the JFR flight recording works, see "Flight Recorder Data Flow" in Oracle JRockit Flight Recorder Run Time Guide.
The JFR file can be analyzed at any time to examine the details of system execution flow that occurred leading up to an event. The data contained in the JFR file includes events from the JVM and from any other event producer, such as WebLogic Server and Oracle Dynamic Monitoring System (DMS).
The amount of additional processing overhead that results when you enable JFR recording, and also configure WLDF to generate WebLogic Server diagnostics to be captured by JFR, is minimal. This makes it ideal to be used on a full time basis, especially in production environments where it adds the greatest value.
JFR provides the following key benefits:
Designed to run continuously — When JFR is configured to run full-time, with both JVM and WLDF events captured in the flight recording, diagnostic data is always available at the time an event occurs, including a system crash. This ensures that a record of diagnostic data leading up to the event is available, allowing you to diagnose the event without having to recreate it.
Comprehensive data — JFR combines data generated by the JRockit Runtime Analyzer and the JRockit Latency Analysis Tool and presents it in one place.
Integration with event providers — JRockit includes a set of APIs that allow the JFR to monitor additional system components, including WebLogic Server, Oracle Dynamic Monitoring System (DMS), and other Oracle products.
For more information about JFR, see "Introduction to the Flight Recorder" in Oracle JRockit Flight Recorder Run Time Guide.
The key features provided by WLDF to leverage integration with JFR include the following:
WLDF diagnostic data captured in JFR flight recording
WLDF can be configured to generate diagnostic data about WebLogic Server events that is captured in the JFR flight recording. Captured events include those from components such as: web applications; EJBs; JDBC, JTA, and JMS resources; resource adapters; and WebLogic Web Services.
WLDF diagnostic volume control
The ability to generate WebLogic Server event data for the Flight Recording is controlled by the WLDF diagnostic volume configuration. This control also determines the amount of WebLogic Server event data that is captured by JFR, and can be adjusted to include more, or less, data for each WebLogic Server event that is generated. For more information, see Configuring WLDF Diagnostic Volume.
By default, the WLDF diagnostic volume is set to
Off; however, this may be changed to
Low in a future release of WebLogic Server. For more information, see What's New in Oracle WebLogic Server.
The WLDF diagnostic volume setting does not affect explicitly configured diagnostic modules.
Automatic throttling of generated events under load
As processing load rises on a given WebLogic Server instance, WLDF automatically begins throttling the number of incoming WebLogic Server requests that are selected for event generation and recording into the JFR file. The degree of throttling is adjusted continuously as system load rises and falls.
Throttling provides three key benefits:
The overhead of capturing events generated by WLDF for JFR remains minimized, which is especially important when systems are under load.
The time interval encompassed in the JFR flight recording buffer is maximized, giving you a better historical record of data.
Throttling has the effect of sampling incoming WebLogic Server requests, maintaining high performance while still providing an accurate overall view of system activity under load.
Note:Throttling affects only the Flight Recording data that is captured by WLDF. It does not affect data captured by other event producers, such as the JVM.
WLDF diagnostic image capture support for JFR files
WLDF diagnostic image capture automatically includes the JFR file, if one has been generated by Flight Recorder. The JFR file includes data generated by all active event producers, including WebLogic Server. An image captured using the Watch and Notification component may contain the JFR file, if available.
WLST commands for downloading the contents of diagnostic image captures
WLST includes a set of commands for downloading the contents of diagnostic image captures, described in WLST Online Commands for Downloading Diagnostics Image Captures. Although these commands are generally useful for listing, copying, and downloading all entries contained in the diagnostic image capture, they can also be used for obtaining the JFR file, if available. Once obtained from the diagnostic image capture, the JFR file can be viewed in JRockit Mission Control.
This section summarizes three common business cases where using the JRockit Flight Recorder can help you resolve important diagnostic issues:
For more information about these scenarios, see "Flight Recorder Use Cases" in Oracle JRockit Flight Recorder Run Time Guide.
When a "catastrophic" failure occurs, the content of the Flight Recorder buffer can be made available for post-failure analysis in a manner analogous to the use of an aircraft's black box. Examples of such failures include a JVM crash or an out-of-memory error (OOME) resulting in an application terminating.
When these situations arise, the flight recording contains the following information, which can be helpful in determining the cause of the failure:
JVM core dump, including metadata about the Flight Recorder configuration at the time of the crash. Furthermore, if the Flight Recorder was running in persistent storage mode the data buffer file might contain a certain amount of data.
WebLogic Server events, captured by WLDF, that preceded the failure.
To have a Flight Recorder dump generated and captured in a file when a thread terminates due to an unhandled exception, you can specify the
-XX:FlightRecordingDumpOnUnhandledException command line option for JRockit. The dump file is written to the location defined by the option. (By default, this option is disabled.) For information about using this command, see the Oracle JRockit Command-Line Reference.
When running in persistent mode, the JFR uses a combination of memory and disk to store its buffer. The most recent data is stored in memory and is flushed out to disk as it “ages”. In this mode, the on-disk data will be available even after a power failure or similar catastrophic event; only the most recent data will be unavailable (for example, the data that had not yet been flushed to disk). The text dump file will contain metadata about the Flight Recorder configuration at the time of the crash, including the path to the data buffer file when applicable. For more information about persistent mode, see "Operating Modes" in Oracle JRockit Flight Recorder Run Time Guide.
Profiling involves capturing data beginning at a specific point in time so that, later, you can analyze the events that were generated after that point. In contrast to RADAR, described in the following section, profiling involves analyzing the diagnostic data generated after a particular event occurs, as opposed to the data that precedes it.
Profiling with JRockit Flight Recorder optimizes the ability to perform deep analysis of lock contention and causes of latency.
RADAR is the examination of diagnostic data generated during run time when a particular event occurs for the purposes of understanding the system activity that preceded the event; for example, system activity occurring moments before a serious error message is generated. By using the diagnostic capabilities available in WLDF in conjunction with JRockit Flight Recorder, you can capture a large amount of system-wide diagnostic data the moment a problem occurs. You can then leverage the capabilities of JRockit Mission Control to quickly correlate that event with other system activity and process execution data within the "snapshot in time" that the JFR file provides, enabling you to quickly isolate likely causes of the problem.
One WLDF feature whose usage with JRockit Flight Recorder makes for a powerful RADAR capability is image notification, which allows you to create a diagnostic image capture automatically in response to a particular event or error condition. A diagnostic image capture, which created as the result of an image notification, automatically includes the JFR file. The JFR file can then be extracted from the diagnostic image capture and examined immediately in JRockit Mission Control or stored for later analysis. Image notification, used when WLDF data is captured by JRockit Flight Recorder, is particularly well suited for this sort of real-time diagnosis of intermittent problems.
Image notification is part of the Watch and Notifications system in WLDF. To set up image notification, you create one or more individual watch rules. A watch rule includes a logical expression that uses the WLDF query language to specify the event for the watch to detect. For example, the following log event watch rule expression detects the server log message with severity level
Critical and ID
(SEVERITY = 'Critical') AND (MSGID = 'BEA-149618')
Watch rules can monitor any of the following:
Harvestable runtime MBean instances in the local runtime MBean server
A harvester watch can trigger an image notification if runtime MBean attributes detect a performance issue, such as high memory utilization rates or problems with open socket connections to the server.
Messages published to the server log
A log watch can trigger an image notification if a specific message, severity level, or string is issued.
Event generated the WLDF Instrumentation component
An event watch can trigger an image notification if an instrumentation service generates a particular event.
For more information, see the following topics:
The following sections explain how to obtain the JFR file from the diagnostic image capture and provide an example of using JRockit Mission Control to examine the WebLogic Server events contained in the JFR file:
The diagnostic image capture itself is a single ZIP file that contains individual images produced by the different server subsystems. If the JFR file is available, it is included in the diagnostic image as the file
A diagnostic image capture can be generated on-demand — for example, from the WebLogic Server Administration Console, WLST, or a JMX application — or it can be generated as the result of an image notification. For information about how to generate a diagnostic image captures and configure the location in which they are created, see "Configure and capture diagnostic images" in Oracle WebLogic Server Administration Console Help.
To view the contents of the JFR file, you first need to extract it from the diagnostic image capture as described in Chapter 5, "Configuring and Capturing Diagnostic Images." Once you have extracted the JFR file, you can view its contents in JRockit Mission Control.
For an example WLST script that retrieves the JFR file from a diagnostic image file and saves it to a local directory, see Example: Retrieving a JFR File from a Diagnostic Image Capture.
You use JRockit Mission Control to examine the contents of the Flight Recorder file after it has been extracted from the diagnostic image capture. The following sections highlight some of the capabilities of JRockit Mission Control's graphical user interface, which provides a lot of tooling support for drilling down into the diagnostic data generated not only by WLDF for WebLogic Server events, but also from all other available event producers, including the JRockit JVM:
For complete details about the JRockit Mission Control interface, see the Oracle JRockit Mission Control Online Help. See also Introduction to JRockit Mission Control Client.
JRockit Mission Control includes the JRockit Flight Recorder graphical user interface, which allows users who are running a Flight Recorder-compliant version of Oracle JRockit to view the JVM's recordings, current recording settings, and runtime parameters. The JFR interface includes the Events Type View, which gives you direct access to event information that has been recorded in the JFR file, such as event producers and types, event logging and graphing, event by thread, event stack traces, and event histograms.
The Overview tab in the JFR interface is useful for analyzing a system's general health because it can reveal behavior that might indicate bottlenecks or other sources of poor system performance. Figure 3-2 shows an example of the Overview tab in the Events Type View.
Note the following regarding the information shown in Figure 3-2:
The Events Type View is available by selecting the Events tab group icon.
The name of the Flight Recorder file appears at the top of the Overview tab. Note that the JFR is always named
JRockitFlightRecorder.jfr, it is useful to rename it descriptively after downloading it from the diagnostic image capture.
The Event Types Browser, on the left side, is a tree that shows the available event types in a recording. It works in conjunction with the Events tab group to provide a means to select events or groups of events in a recording that might be of interest to you and to obtain more granular information on them.
As you select and deselect entries in the Event Types Browser, the information displayed in the Overview tab is filtered dynamically. For example, by selecting only WebLogic Server, event data from all non-WebLogic event producers is filtered out.
The range navigator, which is the graph displayed below the Overview tab title, is a timeline that shows all events in a recording that pertain to the data displayed on the selected tab. A set of buttons are available for adjusting the range of data that is displayed, which can simplify the process of drilling down into the details of Flight Recorder data.
The Producers section identifies each event producer that generated the data that is displayed. Metrics are included for each producer, indicating the volume of event activity generated by each as a proportion of the total set of event data displayed.
The Event Types section lists all events represented in the Overview tab, along with key metric data about each event.
Figure 3-2 Example Overview Page of JRockit Flight Recorder File in JRockit Mission Control
This section shows an example of the steps that a developer or support engineer might use to identify the event activity associated with a particular request in a Web application hosted on WebLogic Server. This example is not meant to recommend a specific way to diagnose performance problems, but simply shows how the JFR graphical user interface can be used to greatly simplify the process of locating and analyzing performance issues.
The following examples are shown in this section:
When you start JRockit Mission Control and open a JFR file, you can use the Event Types View to quickly select the specific events you want to analyze. As you select and deselect items in the Event Types Browser (which is available in the Event Types View), the information displayed in the JFR graphical user interface is updated instantly to show information about only the selected event types.
Figure 3-3 shows the Event Types Browser with only servlet event types selected.
Figure 3-3 Event Types Browser
To view details about the events logged by one or more event types, select the Log tab, which is available at the bottom of the JFR graphical user interface. An example of the Log tab for servlet event types is shown in Figure 3-4.
Figure 3-4 Servlet Event Log
When using the Log tab, you can view details about events as follows:
You can click on individual column heads in the Event Log table to modify the sort order of the events. For example, by clicking the Duration column, you can quickly identify the events that took the longest time to execute.
When you select an event in the Event Log table, details about that event are displayed in the Event Attributes table. For example, Figure 3-4 shows the following attributes:
Event start, end, and duration times
User ID of person who issued the request on the servlet
Method, class name, and URI of invoked servlet
Execution context ID (ECID)
Different event types have different attributes. For example, if this were a JDBC event, you could scroll among the attributes to see the SQL statement, the JDBC connection pool used, and the stack from which it was called. The interface makes it easy to scan for unexpected behavior that can be analyzed in deeper detail.
The JFR graphical user interface in JRockit Mission Control allows you to analyze the run-time trail of system activity that occurs as the result of a particular event. In this example, the run-time trail is analyzed by first defining an operative set. An operative set is any set of events that you choose to work in JRockit Mission Control.
In the example shown in this section, an operative set is created for the events that have the same execution context ID (ECID) attribute as the servlet invocation event selected in the Event Log table, shown in Figure 3-4. The operative set is then analyzed to see the execution flow that resulted from that servlet invocation. (Note that this operative set could be expanded to include events that match on different attributes as well; for example, events containing a specific SQL statement but not necessarily the same ECID.)
Figure 3-5 Operative Set Defined by Execution Context ID (ECID)
This operative set is defined by right-clicking the desired event in the Event Log, and then selecting Operative Set > Add matching ECID > ecid. See Figure 3-6.
Figure 3-6 Defining an Operative Set by Matching ECID
The operative set is then displayed by selecting Show Only Operative Set above the event log table, shown in Figure 3-7. Note how the operative set is indicated in the range navigator.
Figure 3-7 Displaying an Operative Set
The run-time trail of execution flow that results from the request that generated the servlet invocation event can be viewed by including additional event types. For example, Figure 3-8 shows the operative set when all WebLogic Server event types are added, using the Event Type Browser, and listing the events in chronological order. (You can sort the events chronologically by selecting the Start Time column head.)
Figure 3-8 Adding all WebLogic Server Events to Operative Set
In this example, note a portion of the execution flow shown in the Event Log:
The servlet URI is invoked.
The servlet uses an EJB, which requires access to the database.
A JDBC connection is obtained and a transaction is started.
The operative set can be further analyzed by constraining the time interval of the execution flow and adding correlated events from additional producers. By constraining the time interval for displayed events, you can add events to the Event Log that occurred simultaneously with the operative set. This allows you to see additional details about the execution context that can help diagnose performance issues.
The time interval can be constrained by using the range selection bars in the range navigator. You can grab these bars with your pointer and drag them inward or outward to change the range of events displayed in the Event Log. The range selection bars are activated when you hover your pointer over either end of the navigator, as shown in Figure 3-9.
Figure 3-9 Range Navigator Selection Bars
Events from additional producers, such as the JRockit JVM, can be selected in the Event Types Browser. Note that JVM events do not have ECID attributes, so they cannot be included among the WLDF events in the operative set. So to view the JVM events, you need to de-select Show Only Operative Set.
At this point the events that are displayed in the Event Log are those that occurred during the selected time interval but not correlated otherwise. Figure 3-10 shows drilling down into JDBC activity by selecting only JDBC events and JVM socket events. The Event Log is updated and listed in chronological order to show the socket activity that occurred simultaneously to the flow of the JDBC events in the selected time interval.
Figure 3-10 Adding JVM Events to JDBC Event Log