11 Performance Monitoring

Oracle Real-Time Decisions includes a robust performance monitoring system for observing the behavior of Inline Services. Performance Monitoring parameters are set, and a snapshot view of some of the common counters can be observed, through Fusion Middleware Control. A chronological view can be obtained by enabling the performance monitor. Once enabled, a comma-separated value (CSV) file is produced that can be used to observe behavior over time.

Caution:

This file grows without limit, and should be enabled only for active troubleshooting.

This section contains the following topics:

11.1 Setting Performance Monitoring Parameters

The performance monitoring parameters are set using the SDManagementCluster > Members > Properties > PerformanceMonitoring MBean. You can access this MBean using Fusion Middleware Control; see Chapter 13, "Managing Oracle Real-Time Decisions" for more information.

Table 11-1 describes the properties governing performance monitoring.

Table 11-1 Performance Monitoring Properties

Property Name Description

DSPerfCounterEnabled

Enables the writing of DS performance counters. This property should not be enabled indefinitely, because the file grows without limit.

DSPerfCounterAppend

If true, performance data is appended to an existing file, if any, otherwise any existing file is overwritten when the server restarts.

DSPerfCounterLogFile

The tab-separated CSV file into which DS performance counts are periodically appended. If MS Excel is available, ds_perf.xls, supplied in the etc directory of the installation, provides a convenient view. See the first row of ds_perf.xls for instructions on linking ds_perf.xls to ds_perf.csv as a datasource.

DSPerfCounterLogInterval

The update interval in milliseconds for DS performance counts.


11.2 Viewing Common Performance Monitoring Snapshot Values

A snapshot of some of the performance counters is available for viewing through the SDManagementCluster > Members > Decision Service MBean. Press the F5 key to refresh the values.

Performance monitoring does not have to be enabled to use this view.

11.3 CSV File Contents

This section describes the fields of the CSV file containing performance counters.

Table 11-2 Fields of CSV File With Performance Counters

Field Name Description

Date/Time

The time of day at which the current row of counters was appended to the file. Millisecond precision is available to facilitate correlations with messages in the server's log file.

Max Allowable Running Requests

The maximum number of Inline Service requests that can run concurrently.

The value is derived from configuration settings. It should be chosen to minimize the operating system's thread scheduling overhead, and hence provide maximum throughput for a busy system.

The value can be set manually, by setting a non-zero value in either the cluster-wide configuration property, SDManagementCluster > Properties > Misc > IntegrationPointMaxConcurrentJobs, or in the server-specific property, SDManagementCluster > Members > Properties > Misc > IntegrationPointMaxConcurrentJobs.

The preferred value is chosen by setting the property to zero, in which case the value is calculated according to the following formula:

NumCPUs * Math.ceil(1/(1-DSRequestIOFactor)) + 5

The formula uses these terms:

  • NumCPUs: Server-specific configuration property SDManagementCluster > Members > Properties > Misc > NumCPUs. Use the number of physical CPUs in the machine.

  • Math.ceil: Means "round up to the next higher integer value."

  • DSRequestIOFactor: Server-specific configuration property SDManagementCluster > Members > Properties > Misc > IntegrationPointRequestIOFactor. The fraction of time Integration Point requests spend doing input/output operations, or otherwise waiting for systems external to this virtual machine. The default value is 0.5.

Peak Requests Running

The largest number of requests that have been running at the same time since the server was started.

Max Requests Running

The largest number of requests that have been running at the same time during the current logging interval.

Requests Running

The number of Inline Service requests that are currently running. This value will always be less than or equal to the Max Allowable Running Requests value.

Request Queue Capacity

The configured maximum number of requests that can wait at the same time in this server to run. This is the value of the cluster-wide property SDManagement-Cluster > Properties > Misc > IntegrationPointQueueSize, or the server-specific property, SDManagement-Cluster > Members > Properties > Misc > IntegrationPointQueueSize.

When a request arrives and the request queue is full, the request is rejected and a Server Too Busy error is logged in the server.

The property should be set to a value slightly less than the number of concurrent HTTP requests (threads) supported by the Web server; otherwise, the request queue could never fill up, because the requests would be rejected first by the Web server.

Peak Queue Length

The largest number of Inline Service requests that have been waiting at the same time to run in this server since the server started. This will always be less than or equal to Request Queue Capacity.

Max Queue Length

The largest number of Inline Service requests that have been waiting at the same time to run in this server during the current logging interval. This will always be less than or equal to Request Queue Capacity.

Requests Waiting (Queue Length)

The number of Inline Service requests that are currently waiting to run.

Requests When Queue Full, Total

The total number of requests that have arrived while the server's request queue was full. Each of these requests was rejected with a Server Too Busy error.

Requests Queued, Total

The total number of Inline Service requests that were required to wait to run until other requests finished running.

If all requests are being queued, the system is very busy.

Requests Seen, Total

The total number of Inline Service requests for this server.

Requests In System

The current number of Inline Service requests being processed by this server. The number includes those waiting to run, and those already running.

Timed Out Requests, Total

The total number of requests that have failed to finish running before their guaranteed service level timeout, as specified by cluster-wide property SDManagementCluster > Properties > Misc > IntegrationPointGuaranteedRequestTimeout.

This count includes all timed-out requests since the server was started.

If this number is growing but the number of queued requests is not growing, this is an indication that the Inline Service logic handling the request is too slow to satisfy the response time guarantee, even on an idle system. One or more Integration Point requests must be optimized, or the response time guarantee must be increased.

Timed Out Requests

The number of requests that failed to finish running before their guaranteed service level.

Timed Out While Running, Total

The total number of requests, observed since the server started, to have started running and not finish within their response time guarantee.

The server's processing power consumed by these requests is largely wasted, because the clients will ignore their late responses. When the system is very busy, it sometimes times out requests that are still waiting to run, thus avoiding wasting resources on them.

Timed Out While Running

The number of requests, observed during the current logging interval started, to have started running and not finish within their response time guarantee.

The server's processing power consumed by these requests is largely wasted, because the clients will ignore their late responses. When the system is very busy, it sometimes times out requests that are still waiting to run, thus avoiding wasting resources on them.

Timed Out Requests Still Running

The number of requests that have started running, timed out, and are still running. A non-zero value could be an indication of a programming problem in one or more Integration Points.

Request Run Time, Average (ms)

The average time, in milliseconds, during the current logging interval that requests ran. Excludes wait time, if any.

Request Run Time, Max (ms)

The largest amount of time, in milliseconds, during the current logging interval, that any single request ran. Excludes wait time, if any.

Run Times < [0.1 GRT]

The number of requests that finished running during the current logging interval and ran less than 10% of the configured guaranteed response time.

There are nine similarly formatted columns, showing the run time distribution for 0.10, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, and 2.0 times the guaranteed response time.

Run Times < N and >= M

The number of requests that finished running during the current logging interval and ran less than N milliseconds and greater than or equal to M milliseconds.

Run Times >= [2.0 GRT]

The number of requests that finished running during the current logging interval and ran two or more times the configured guaranteed response time.

Request Wait Time, Average (ms)

The average time, in milliseconds, that requests waited on the request queue prior to running or timing out.

Request Wait Time, Max (ms)

The largest amount of time, in milliseconds, during the current logging interval, that any single request waited on the request queue.

Includes only those requests that finished running, or timed out before running, during the current logging interval.

Wait Times < [0.1 GRT]

The number of requests that finished running during the current logging interval, and were placed on the request queue before running, but waited there less than 10% of the configured guaranteed response time.

There are nine similarly formatted columns, showing the wait time distribution for 0.10, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, and 2.0 times the guaranteed response time.

Wait Times < N and >= M

The number of requests that finished running during the current logging interval and waited on the request queue less than N milliseconds and greater than or equal to M milliseconds before running.

Wait Times >= [2.0 GRT]

The number of requests that finished running during the current logging interval and waited two or more times the configured guaranteed response time before timing out.

Sessions, Current

The number of Decision Server sessions still open in this server.

Sessions, Total

The total number of Decision Server sessions created by this server.

Stale Sessions Closed Asynchronously

The total number of Decision Server sessions that have been closed by kernel jobs, instead of by request threads.

This is usually unimportant. In a busy system, most stale sessions are closed by request threads and the kernel jobs are engaged only as the system winds down. It could be of interest to someone observing a lot of kernel-job activity (see Kernel Jobs Running, Current).

Stale Sessions Closed by Requests

The total number of Decision Server sessions that have timed out and been closed by request threads. Most sessions will be closed this way, especially on a busy server.

After processing an Inline Service request, the calling thread will be asked to close at most one stale session before returning to the caller.

Requests Forwarded, Current

The total number of Inline Service requests that have been forwarded from this server to other servers, and for which no acknowledgment has yet been received to indicate that the request has been processed by the forwarded-to server.

Requests Forwarded, Peak

Largest number of Inline Service requests forwarded.

Requests Forwarded, Total

Total number of Inline Service requests forwarded.

Received Requests Forwarded, Current

The total number of Inline Service requests that were forwarded from other servers to this server, and which have not yet been completely processed by this server.

Received Requests Forwarded, Peak

Largest number of received Inline Service requests forwarded.

Received Requests Forwarded, Total

Total number of received Inline Service requests forwarded.

Remote Session Keys, Current

The current number of session keys that this server knows reference sessions hosted by other servers. If a request arrives with one of these keys, it will be forwarded to the other server.

Remote Session Keys, Total

The total number of times that session keys were registered in this server for sessions hosted by other servers. This is an aggregation of "Remote Sessions Keys, Current".

Kernel Jobs Running, Current

The number of maintenance activities currently running in the server. Maintenance activities include model maintenance, session timing, and timed-out request processing.

Kernel Jobs Running, Peak

The largest number of maintenance activities that have run at the same time in this server. This value will always be less than or equal to the cluster-wide property SDManagement-Cluster > Properties > Misc > WorkerThreadPoolSize,

or the server-specific property, SDManagement-Cluster > Members > Properties > Misc > WorkerThreadPoolSize.

Snapshot Period (ms)

The period of time, in milliseconds, over which the server collected data before logging this row of counters.


11.4 XLS File Contents

This section describes the contents of the Microsoft Excel file, ds_perf.xls, included in the etc directory of the installation.

At the top, cell B1 contains a comment describing how to link ds_perf.xls to the tab-separated counter file as a datasource:

"To specify path to the ds_perf.csv file, place cursor in cell B2 and select "Import External Data" > "Edit Text Import" from the "Data" menu, and navigate to your {$install_directory}\log\ folder and select the ds_perf.csv file. Use default parsing settings when prompted. Data will then be automatically refreshed every 3 minutes. To change interval and other settings, select from the "Data" menu the selection "Import External Data" > "Data Range Properties""

In row 2 are the headers containing the names of each counter. All of the headers from the CSV file appear here, with values below them.

The following columns appear after the values from the CSV file, with formulas showing values calculated from the CSV values:

  • Gross Throughput (req/sec): The average rate of requests finishing during the current logging interval, in requests per second. The formula is:

    RequestsFinished / SnapshotPeriod * 1000
    
  • Net Throughput (req/sec): The average rate of requests finishing during the current logging interval, excluding requests that timed out. The formula is:

    (RequestsFinished - Timeouts) / SnapshotPeriod * 1000
    
  • Utilization (%): The percentage of the server's capacity utilized during the current logging interval. The formula is:

    (RunTimeAverage * RequestsFinished) / (MaxAllowableRunningRequests * SnapshotPeriod) * 100
    

    This value can be briefly larger than 100 when requests are finishing that started running in previous logging intervals.

11.5 Switching Off Authentication and Authorization

By default, Oracle RTD has authentication and authorization switched on. In order to improve performance in Decision Server, you can switch off authorization and authentication.

For Oracle RTD Decision Server authorization to be switched off, perform the following steps:

  1. Log in to Fusion Middleware Control, as described in Section 2.1.1, "Logging into Fusion Middleware Control."

  2. In the Target Navigation Pane, under Application Deployments, right-click the OracleRTD server entry, then select System MBean Browser.

  3. In the System MBean Browser, scroll down to the Application Defined MBeans, select OracleRTD and then the server name where Oracle RTD is deployed.

  4. Select SDClusterPropertyManager, then Cluster.

  5. Set RequireIntegrationPointAuthorization to false.

You can switch off Decision Server authentication on a web service. For details, see "Web Service Security" in Oracle Fusion Middleware Platform Developer's Guide for Oracle Real-Time Decisions.