Dealing With Performance and Throughput Problems

This part of the System Administration Guide explains how to gather data that may help to diagnose performance or throughput problems and how to cooperate with Oracle to resolve these. This part of the guide assumes that the installation and configuration of the Oracle technology stack used for running Oracle Health Insurance applications are according to the instructions in the Installation Guide and that proper system management is in place.

Configuration to Influence Performance

This section summarizes parameters and configuration items that influence the performance of Oracle Health Insurance applications.

Memory Settings

The Installation Guide provides guidelines for memory settings for managed servers. Memory settings have a major impact on performance. Managed server not being able to free memory affects the responsiveness of the system badly. The managed server spends an increasing amount of time trying to free up memory (known as Garbage Collection). Use a tool like Oracle Enterprise Manager to monitor the memory usage of a managed server.

Work Manager Configuration

Oracle Health Insurance applications make use of Oracle Fusion Middleware Work Managers for managing the number of worker threads that process requests, like UI and web service requests, and tasks. A task is the equivalent of a unit of work. Oracle can help to determine the initial settings. These are usually on the basis of available resources.

Oracle Health Insurance System Properties

The following table lists examples of system properties that influence the performance of the system.

Table 1. Oracle Health Insurance System Properties
Category	System Property	Impact on Performance
User Interface	`ohi.ui.maxrowstoretrieve`	For ADF user interface pages only. This value affects memory usage and page load times. Set it on a per-page basis using the more specific `ohi.ui.maxrowstoretrieve.<function code>` variant.
Web Services	`ohi.ws.client.connectiontimeout`	The time in milliseconds before the attempt to connect to an outbound service times out. If it takes a long time to establish the connection, then the task may take a long time to complete. During this time, the thread sending the message is not available to perform other tasks.
Web Services	`ohi.ws.client.readtimeout`	The time in milliseconds that the client waits for the server to respond to a request. If the service is slow to respond, then the task may take a long time to complete. And again, during this time, the thread sending the message is not available to perform other tasks.
Processing	`ohi.processing.filldepth` `ohi.processing.fillthreshhold`	The `filldepth` determines the number of tasks that de-queues and submits for processing to the Work Manager thread pool at any point in time. The `fillthreshhold` determines when the system looks for more tasks to submit for processing. These go hand-in-hand and work in combination with the maximum number of threads that are set for the core Work Manager. If the `fillthreshold` value is too low, the managed server may not have enough work to keep all its worker threads busy. Increasing the value for `filldepth` fetches more tasks or work items in one go but increases the memory footprint. The Installation Guide mentions default values that provide good starting points to start performance testing.
Processing	`ohi.processing.groupsize` `ohi.processing.groupsize.<0>`	The number of activities to be grouped (when applicable) into a collection of tasks that are to be put into the processing grid as one atomic unit. One processing node processes the complete collection. The parameter `<0>` refers to the specific activity type. If this value is too low, it may cause contention in the coherence and frequent commits to the database, which affects the performance. If this value is too high, it increases the memory footprint, causing the Java Virtual Machine (JVM) to run into memory-related issues (for example, frequent full GCs and `OutOfMemoryError`, etc.). The Installation Guide mentions default values that provide good starting points to start performance testing.

The Installation Guide documents these and other system properties in more detail.

Deployment Architecture

By default, (if setting up the load balancer configuration that way) a managed server processes user interface, web services requests, and handles task processing. If that is the case, tune properly the associated Work Managers such that a burst of web services requests do not bring down the machine or adversely affect task processing.

Additional processing capacity is possible by adding managed servers, scaling out vertically. It is also possible to set up specialized nodes, for example, a few nodes that handle user interface requests, a few that handle task processing, and a few that handle web services requests. This may be beneficial to better control and tune the machines for the various requests. Other sections of this guide document how to disable task processing.

Performance Tests

Hardware, configuration, setup, and usage of Oracle Health Insurance applications differ on a per-customer basis. As a result, there is no simple recipe that leads to optimal performance and throughput. Performance testing is an important part of the implementation project, to validate and tune environments and settings.

Gathering Performance Figures

Monitor Oracle Database Performance

It is possible to monitor Oracle Database via Enterprise Manager Grid Control. It features many performance tuning and monitoring capabilities.

Oracle’s Automatic Workload Repository (AWR) is the source of information for database features like the Automatic Database Diagnostic Monitor (ADDM) and the SQL Tuning Advisor. The database takes default snapshots to populate the AWR with relevant data every hour and keeps it for 7 days.

It is essential to generate an AWR report for each Real Application Cluster (RAC) database node. Generate the AWR reports for the period in which the problem occurred and attach the AWR report for each active node in a RAC cluster to the Service Request (SR).

Monitor a Managed Server Using Oracle Enterprise Manager (OEM)

Use the OEM to gather performance figures for the Oracle Database and the Oracle Fusion Middleware application server. It is also possible to define alarms on certain threshold values. For example, notification to the system administrator in case the memory consumption exceeds a certain limit.

It is also possible to use third-party tools to monitor the technology stack. In that case, make sure that the monitoring tools themselves do not unnecessarily affect the performance. For example, the use of specific (settings of) profiling tools may drastically affect the performance.

Using JVM Monitoring Tools to Gather Performance Figures

Garbage Collection Statistics

The JVM provides the runtime environment in which both the WebLogic application server and the Oracle Health Insurance application run. It is good practice capturing the Java Virtual Machine’s garbage collection statistics, for example, by including the following in a startup script:

# GC logging
TIMESTAMP=`date "+%Y-%m-%d_%H_%M_%S"`
# LOG_BASE, APP_NAME and SERVER_NAME are used to put the GC logs in separate directories. Make sure they are set
JAVA_OPTIONS="${JAVA_OPTIONS} -Xloggc:${LOG_BASE}/${APP_NAME}/gc/gc_${SERVER_NAME}_${TIMESTAMP}.log"
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps"

Heap Dump

In case of memory issues, Oracle development team requests to take a heap dump of the managed server by using the Java Memory Map (JMAP) tool:

jmap -dump:live,format=b,file=heap-<system_identifier>-<date>.dump [pid]

The live option triggers a full garbage collection cycle.

It is good practice capturing a heap dump in case of out of memory errors so that these are readily available for detailed analysis.

# LOG_BASE and APP_NAME are used to write the heap dump in a specific directory. Make sure they are set
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:+HeapDumpOnOutOfMemoryError"
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:HeapDumpPath=${LOG_BASE}/${APP_NAME}/heapdumps"

Running the Java Flight Recorder

Java Flight Recorder (JFR) is a profiling and event collection framework built into the Oracle Java Development Kit. It allows Java administrators and developers to gather detailed low-level information about how the JVM and the Java application are behaving. Java Mission Control tool can interpret Flight Recorder and help Oracle development team to diagnose performance or memory-related issues in Oracle Health Insurance applications.

To run the Flight Recorder, set the following Java options:

# Java Mission Control: enable Flight Recorder
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:+UnlockCommercialFeatures -XX:+FlightRecorder”

There is an option to specify additional command-line options, for example, to save the recording to disk and determine the max-age of the recording (defaults to 15 minutes):

-XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=${LOG_BASE}/${APP_NAME}/{SERVER_NAME}/flightrecordings,maxage=1h

Steps to Enable Java Flight Recording on Demand

To create Java Flight Recording on demand, you need to unlock the commercial features of the application.
1. Add the following flag to unlock commercial features:
  java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder
Execute the following command at runtime to enable commercial features:
jcmd <PID> VM.unlock_commercial_features
where <PID> is the ID of the Java process.
Run the following command:
jcmd <PID> JFR.start duration=<time> filename=/<your_desired_location>/<node_name>.jfr
where <time> is the duration in seconds for running the flight recording.

Additional details on the command-line options are available in the product’s documentation.

Logging Performance Figures

ADF UI Request Handling Performance Logging

The following logger configuration enables the logging of timing information for ADF user interface requests:

<logger name="com.oracle.healthinsurance.common.service.ui.userassistance.components" level="debug" />

A log to process the elapsed time each ADF UI request needs looks like:

DEBUG com.oracle.healthinsurance...SlaLogger - [UI] Request: userId=42;contextPath=/base;
requestURL=http://localhost:7001/base/faces/SearchClaims;page=View Claim CL0025_reprocess_2_15;time(ms)=852

The entire message logs in on one line. A different format is used in this guide for readability.

Connectivity

REST Service Request Handling Performance Logging

The following logger configuration enables logging of timing information for REST service requests:

<logger name="com.oracle.healthinsurance.http.jersey.filter.HttpInitFilter" level="debug" />

A log to process the elapsed time that each REST Service request needs looks like:

DEBUG com.oracle.healthinsurance...HttpInitFilter - [WS] Request: contextPath=/api;
requestURL=http://localhost:7001/api/someRestService; time(ms)=58

Entire message on one line. Formatted differently for readability.

Task Processing Performance

Task processing performance (over time) is a good measure or indicator of system performance. The following queries, or variants of these, may provide a good starting point.

Average Processing Time in Milliseconds Per Task Type

The following query lists the average processing time in seconds per task type for tasks that the system executes on the current day:

select taty.name tasktype
,      round(avg(extract(day from(task.process_stop_datetime-task.process_start_datetime))*24*60*60 +
                 extract(hour from(task.process_stop_datetime-task.process_start_datetime))*60*60 +
                 extract(minute from(task.process_stop_datetime-task.process_start_datetime))*60 +
                 extract(second from(task.process_stop_datetime-task.process_start_datetime))*1000), 0) avg_elapsed_time_ms
from   ohi_claims_owner.ohi_tasks_bv       task
join   ohi_claims_owner.ohi_task_types_bv  taty on task.taty_id = taty.id
where  task.creation_date > trunc(sysdate)
group  by taty.name
order  by 2 desc nulls last;

Logging Performance Related Service Requests

It is essential to follow the guidelines in this section for performance-related service requests that log with the Oracle support team to expedite problem resolution. The support or development engineers ask for these if the problem is unclear or there are missing details to analyze the problem properly.

It is essential to always specify the exact name and version of the Oracle Health Insurance application.
Clear problem description, for example, pay attention to aspects like:
- Is the problem specific to the user interface, web services, or processing?
- Were there any special circumstances, like a peak in web services traffic or in the number of objects that are essential for processing, or were there any significant changes made to the system configuration?
- List any recent changes to the environment, for example:
  - Was there a substantial increase in the number of members or Claims, Policies, or Authorizations that the system must process?
  - Does the system process more or new integration requests?
  - Was there a recent upgrade to the application?
- Was it necessary to restart components?
- Was the system responding slowly or were out-of-memory exceptions encountered?
Specify details of the environment in which the problem occurred:
- Environment name;
- The purpose of the environment, for example, production, development, integration test, performance test;
- Database setup, for example, RAC or standalone;
- Middleware domain setup:
  - The number of nodes;
  - Details about the purpose of these nodes (for example, nodes handling user interface, web service, and task processing, or using specialized nodes?).
Attach the following files to the service request:

Table 2. Logging Performance Related Service Requests
File type	Description
AWR reports	In case multiple RAC nodes execute the Oracle database, attach AWR reports for all database nodes. Be as specific as possible. For example, if the problem occurred in a specific time frame, limit the AWR to contain information only for the specified period. If possible, for example, reproduce the problem: create a snapshot, reproduce the problem, create another snapshot, and create an AWR report subsequently for the period between these snapshots.
Application log files	From the time in which the problem occurred. Please attach entire files, not just snippets. If possible, add timing information (see instructions elsewhere in this chapter to get these for user interfaces or web services).
Server log files	Of the specified period in which the problem occurred. Again, please attach entire files, not just snippets.
Oracle Health Insurance properties file	For the environment.
Garbage collection statistics	See instructions elsewhere in this chapter to gather garbage collection data automatically.
Heap dump	In case of out-of-memory issues. See instructions elsewhere in this chapter for automatically capturing a heap dump in case of out-of-memory exceptions.
Java Flight Recording files	See instructions elsewhere in this chapter for automatically creating these JFR files.

User Interface Performance Checklist

Please provide answers to the following questions when encountering a user interface specific performance issue:

What was the action that caused the issue? For example, searching a Claim, press view Policies link, opening some dialog?
What was the time of the day? What other activities is the system processing around that time?
Is it possible to reproduce the problem at will (at least three times) on the same machine?
Is it possible to reproduce the problem in other environments at will?

In case the issue relates to an ADF user interface page:

How many browser tabs were open?
Does the problem reproduce after closing all other tabs?
What is running on the local workstation? Is it a virus scanner or any backup program? Alternatively, what is the CPU usage on the machine?
Does the problem reproduce after logging out of the system and starting from scratch?
Is it possible to reproduce the problem at will on other workstations?
Does the problem reproduce if all other user interface users log out and only one user logs in to reproduce?
What is running on the local workstation, for example, virus scanner, backup program? Alternatively, what is the CPU usage on the machine?
Does the problem reproduce after logging out of the system and starting from scratch?
Is it possible to reproduce the problem in other workstations at will?
Does the problem reproduce if all other user interface users log out and only one user logs in to reproduce?