Dealing with Performance and Throughput Problems

In this part of the Operations Guide it is explained how to gather data that may help to diagnose performance or throughput problems and how to cooperate with Oracle in order to resolve these. In this part of the guide it is assumed that the Oracle technology stack that is used to run Oracle Health Insurance Components applications is installed and configured in accordance with the instructions in the Installation Guide and that proper systems management is in place.

Configuration to Influence Performance

This section provides an overview of parameters and configuration items that influence performance of Oracle Health Insurance Components applications.

Memory Settings

The Installation Guide provides guidelines for memory settings for managed servers. Memory settings have a major impact on performance. If a managed server cannot free memory the responsiveness of the system will be impacted badly and the managed server will spend an increasing amount of time trying to free up memory (known as 'garbage collection'). The memory usage of a managed server can be monitored using a tool like Oracle Enterprise Manager.

Work Manager configuration

Oracle Health Insurance Components make use of Oracle Fusion Middleware Work Managers for managing the number of worker threads that process requests, like UI and web service requests, and tasks. A task is the equivalent of a unit of work. Oracle can help to determine the initial settings. These are usually based on available resources.

Oracle Health Insurance System Properties

The following table lists examples of system properties that influence performance of the system.

Category System Property Impact on Performance

User Interface

ohi.ui.maxrowstoretrieve

For ADF UI pages only. Memory usage and page load times are impacted by this value.

This can be set on a per page basis using the more specific "ohi.ui.maxrowstoretrieve.<function code>" variant.

Web Services

ohi.ws.client.connectiontimeout

The time in milliseconds before the attempt to connect to an outbound service times out. If it takes a long time to establish the connection then the task may take a long time to complete. During this time, the thread sending the message is not available to perform other tasks.

Web Services

ohi.ws.client.readtimeout

The time in milliseconds that the client will wait for the server to respond to the request. If the service is slow to respond then the task may take a long time to complete. And again, during this time, the thread sending the message is not available to perform other tasks.

Processing

ohi.processing.filldepth

ohi.processing.fillthreshhold

The filldepth determines the number of tasks that will be dequeueud and submitted for processing to the Work Manager thread pool at any given time.

The fillthreshhold determines when the system will look for more tasks to be submitted for processing.

These go hand in hand and work in combination with the maximum number of threads that is set for the core work manager. If the fillthreshold value is too low the managed server may not have enough work to keep all its worker threads busy. Increasing the filldepth will fetch more tasks or work items in one go but increases the memory footprint.

The Installation Guide mentions default values that provide good starting points to start performance testing.

These and other system properties are documented in more detail in the Installation Guide.

Deployment Architecture

By default (i.e. if the load balancer configuration is set up that way) a managed server will process UI and Web Services requests as well as handle task processing. If that is the case, make sure to properly tune the associated work managers such that a burst of Web Services requests does not bring down the machine or adversely impacts task processing.

Additional processing capacity can be made available by adding managed servers, i.e. scaling out in a vertical manner. It is also possible to set up specialized nodes, e.g. a few nodes that handle UI requests, a few that handle task processing and a few that handle Web Services requests. This may be beneficial to better control and tune the machines for the various types of requests. How task processing can be disabled is documented elsewhere in this guide.

Performance Tests

Hardware, configuration, set up and usage of Oracle Health Insurance Components will differ on a per customer basis. As a result, there is no simple recipe that will lead to the optimal performance and throughput. Performance testing is an important part of the implementation project, to validate and tune environments and settings.

Gathering Performance Figures

Monitor Oracle Database Performance

The Oracle Database can be monitored via Enterprise Manager Grid control, and features a number of performance tuning and monitoring capabilities.

Oracle’s Automatic Workload Repository (AWR) is the source of information for database features like the Automatic Database Diagnostic Monitor (ADDM) and the SQL Tuning Advisor. By default snapshots to populate the AWR with relevant data are taken every hour and are retained for 7 days. Note: for RAC-clustered databases make sure that an AWR report is generated for each database node. Generate the AWR reports for the period in which the problem occurred and make sure to attach the AWR report for each active node in a RAC cluster to the Service Request (SR).

Monitor a Managed Server using Oracle Enterprise Manager (OEM)

OEM can be used to gather performance figures for the Oracle Database and for the Oracle Fusion Middleware Application Server. It is also possible to define alarms on certain threshold values, e.g. if the memory consumption exceeds a certain limit then a system administrator could be notified.

Third party tools may also be used to monitor the technology stack. In that case make sure that the monitoring tools themselves do not unnecessarily impact the performance. For example, use of specific (settings of) profiling tools could drastically impact the performance!

Using JVM Monitoring Tools to gather Performance Figures

Garbage Collection Statistics

The JVM provides the runtime environment in which both the WebLogic application server and the Oracle Health Insurance application runs. It is good practice to capture the JVM’s garbage collection statistics, for example by including the following in a startup script:

# GC logging
TIMESTAMP=`date "+%Y-%m-%d_%H_%M_%S"`
# LOG_BASE, APP_NAME and SERVER_NAME are used to put the GC logs in separate directories. Make sure they are set
JAVA_OPTIONS="${JAVA_OPTIONS} -Xloggc:${LOG_BASE}/${APP_NAME}/gc/gc_${SERVER_NAME}_${TIMESTAMP}.log"
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps"

Heap Dump

In case of memory issues, Oracle development will request you to take a heap dump of the managed server by using the jmap tool as follows:

jmap -dump:live,format=b,file=heap-<system_identifier>-<date>.dump [pid]

Note that the live option will trigger a full garbage collection cycle.

It is good practice to capture a heap dump in case of out of memory errors so that these are readily available for detailed analysis:

# LOG_BASE and APP_NAME are used to write the heap dump in a specific directory. Make sure they are set
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:+HeapDumpOnOutOfMemoryError"
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:HeapDumpPath=${LOG_BASE}/${APP_NAME}/heapdumps"

Running the Java Flight Recorder

Java Flight Recorder is a profiling and event collection framework built into the Oracle JDK. It allows Java administrators and developers to gather detailed low level information about how the Java Virtual Machine (JVM) and the Java application are behaving. Flight recordings can be interpreted with the Java Mission Control tool and help Oracle development to diagnose performance or memory related issues in Oracle Health Insurance Components applications.

To run the Flight Recorder set the following Java options:

# Java Mission Control: enable Flight Recorder
JAVA_OPTIONS="${JAVA_OPTIONS} -XX:+UnlockCommercialFeatures -XX:+FlightRecorder”

Additional command line options can be specified, e.g. to save the recording to disk and determine the max age of the recording (defaults to 15 minutes):

-XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=${LOG_BASE}/${APP_NAME}/{SERVER_NAME}/flightrecordings,maxage=1h

Additional details on the command line options are available in the product’s documentation.

Logging Performance Figures

ADF UI Request Handling Performance Logging

The following logger configuration enables the logging of timing information for ADF User Interface requests:

<logger name="com.oracle.healthinsurance.common.service.ui.userassistance.components" level="debug" />

For each ADF UI request, the elapsed time needed to process the request is logged like this:

DEBUG com.oracle.healthinsurance...SlaLogger - [UI] Request: userId=42;contextPath=/base;
requestURL=http://localhost:7001/base/faces/SearchClaims;page=View Claim CL0025_reprocess_2_15;time(ms)=852

Note: the entire message is logged on one line but was formatted differently in this guide for readability.

REST Service Request Handling Performance Logging

The following logger configuration enables logging of timing information for REST Service requests:

<logger name="com.oracle.healthinsurance.http.jersey.filter.HttpInitFilter" level="debug" />

For each REST Service request, the elapsed time needed to process the request is logged as in the following example:

DEBUG com.oracle.healthinsurance...HttpInitFilter - [WS] Request: contextPath=/api;
requestURL=http://localhost:7001/api/someRestService; time(ms)=58

Note: the entire message is logged on one line but was formatted differently in this guide for readability.

SOAP Service Request Handling Performance Logging

The following logger configuration enables the logging of timing information for SOAP Service requests:

<logger name="com.oracle.healthinsurance.common.interaction.ws.servletfilter" level="debug" />

For each SOAP Web Service request, the elapsed time needed to process the request is logged as in the following example:

DEBUG com.oracle.healthinsurance...WebserviceContextInitFilter - [WS] Request: contextPath=/ohi-web-services;
requestURL=http://localhost:7001/ohi-web-services/someSOAPService; time(ms)=58

Note: the entire message is logged on one line but was formatted differently in this guide for readability.

Task Processing Performance

Task processing performance (over time) is a good measure or indicator for the performance of the system. The following queries, or variants of these, may provide a good starting point.

Average Processing Time in milliseconds per Task Type

The following query lists the average processing time in seconds per task type for tasks that were executed by the system on the current day:

select taty.name tasktype
,      round(avg(extract(day from(task.process_stop_datetime-task.process_start_datetime))*24*60*60 +
                 extract(hour from(task.process_stop_datetime-task.process_start_datetime))*60*60 +
                 extract(minute from(task.process_stop_datetime-task.process_start_datetime))*60 +
                 extract(second from(task.process_stop_datetime-task.process_start_datetime))*1000), 0) avg_elapsed_time_ms
from   ohi_tasks task
,      ohi_task_types_tl taty
where  task.taty_id = taty.base_table_id
and    taty.language = 'en__OHI'
and    task.creation_date > trunc(sysdate)
group  by taty.name
order  by 2 desc nulls last;

The guidelines in this section should be followed for performance related Service Requests that are logged through Oracle Support in order to expedite problem resolution. If the problem is not clearly explained or the details to analyze the problem are not provided Support or Development Engineers will ask for these first.

  • Always specify the exact name and version of the Oracle Health Insurance Components application.

  • Clear problem description, e.g. pay attention to aspects like:

    • Is the problem specific to the UI, Web Services or Processing?

    • Were there any special circumstances, like a peak in Web Services traffic or in the number of objects that needed to be processed or were significant changes made to the configuration of the system?

    • List any recent changes to the environment, for example:

      • was there a substantial increase in the number of members / claims / policies / authorizations that the system must process?

      • does the system process more / new integration requests?

      • was the application recently upgraded?

    • Was it necessary to restart components?

    • Was the system responding slow or were out of memory exceptions encountered?

  • Specify details of the environment in which the problem occurs:

    • environment name;

    • purpose of the environment, e.g.: production, development, integration test, performance test;

    • database setup, e.g.: RAC or standalone

    • middleware domain setup:

      • number of nodes;

      • details about the purpose of these nodes (e.g. nodes handling UI, Web Service and Task Processing or do you use specialized nodes)

  • Attach the following files to the Service Request:

File type Description

AWR reports

  • In case the Oracle database is executed on multiple RAC nodes attach AWR reports for all database nodes.

  • Be as specific as possible, e.g. if the problem occurred in a specific time frame limit the AWR to contain information for the time period only.

  • If possible, e.g. if the problem can be reproduced: create a snapshot, reproduce the problem, create another snapshot and subsequently create an AWR report for the period between these snapshots.

Application log files

From the time period in which the problem occurred. Please attach entire files, not just snippets. If possible add timing information (see instructions elsewhere in this chapter to obtain these for UI or Web Services).

Server log files

Of the time period in which the problem occurred. Again, please attach entire files, not just snippets.

OHI properties file

For the environment.

Garbage collection statistics

See instructions elsewhere in this chapter to automatically gather garbage collection data.

Heap dump

In case of out of memory issues. See instructions elsewhere in this chapter for automatically capturing a heap dump in case of out of memory exceptions.

Java Flight Recording files

See instructions elsewhere in this chapter for automatically creating these JFR files.

User Interface Performance Checklist

Please provide answers to the following questions when encountering a UI specific performance issue:

  1. What exact action was performed: for example: searched a claim, press view policies link, opened some dialog?

  2. What is the time of the day? What other activities is the system processing around that time?

  3. Can the problem be reproduced at will (at least 3 times) on the same machine?

  4. Can the problem be reproduced at other environments at will?

In case the issue is related to an ADF User Interface page:

  1. How many browser tabs were open?

  2. Does the problem reproduce when all other tabs are closed?

  3. What’s running on the local work station, e.g. virus scanners, backup programs? Alternatively: what’s the CPU usage on the machine?

  4. Does the problem reproduce after logging out from the system and starting from scratch?

  5. Can the problem be reproduced at will on other work stations?

  6. Does the problem reproduce if all other UI users log out and only one user logs in to reproduce?

  7. What’s running on the local work station, e.g. virus scanners, backup programs? Alternatively: what’s the CPU usage on the machine?

  8. Does the problem reproduce after logging out from the system and starting from scratch?

  9. Can the problem be reproduced at will on other work stations?

  10. Does the problem reproduce if all other UI users log out and only one user logs in to reproduce?