Gathering Metrics

Oracle Health Insurance applications can gather the following metrics:

  • JVM memory consumption and processor metrics.

  • Application-specific metrics like timers and counters for Dynamic Logic execution or web service request handling.

The Prometheus format exposes the metric data. Prometheus can scrape metrics data that Oracle Health Insurance applications expose, using the /prometheus endpoint.

Metric Data

The following table lists the metrics per the Oracle Health Insurance application:

Table 1. Metric Data
Metric Name Type Description Tags Oracle Health Insurance Applications

ohi.dylo.timer

Timer

Keeps track of the time of Dynamic Logic executions.

code: Dynamic Logic Code success: true if the execution was successful, false otherwise.

All

ohi.resource.client.timer

Timer

Keeps track of the time requests that external REST resources take.

resource: resource name

method: HTTP method. For example, GET or PUT.

status: HTTP return code. For example, 200 or 404.

All

ohi.resource.timer

Timer

Keeps track of time to handle requests for the Oracle Health Insurance application’s HTTP API resources.

resource: Resource name.

method: HTTP method. For example, GET or PUT.

status: HTTP return code. For example, 200 or 404.

All

ohi.task.dequeue

Counter

Counts the number of tasks that de-queue from the task queue.

Not applicable

All

ohi.task.enqueue

Counter

Counts the number of tasks that enqueue to the task queue.

Not applicable

All

ohi.task.timer

Timer

Times task execution.

code: task type code.

status: Task status. For example, COMPLETED or ERRORED.

All

ohi.exchange.timer

Timer

Keeps track of time of exchange execution.

integration: Integration code.

status: Exchange status. For example, C (for completed) or F (for failed).

Oracle Insurance Gateway

jvm.gc.max.data.size.bytes

Gauge

Maximum size of long-lived heap memory pool.

None

All

jvm.gc.live.data.size.bytes

Gauge

Size of long-lived heap memory pool after reclamation.

None

All

jvm.gc.memory.allocated.bytes

Counter

Incremented for an increase in the size of the (young) heap memory pool after one garbage collection to before the next.

None

All

jvm.gc.memory.promoted.bytes

Counter

Count of positive increases in the size of the old generation memory pool before garbage collection to after garbage collection.

None

All

jvm.memory.usage.after.gc.percent

Gauge

The percentage of long-lived heap pool used after the last garbage collection event, in the range [0..1].

area: The memory area.

pool: Pool name.

All

jvm.gc.overhead.percent

Gauge

An approximation of the percent of CPU time used by garbage collection activities over the last look back period or since monitoring began, whichever is shorter, in the range [0..1].

None

All

jvm.gc.concurrent.phase.time.seconds

Timer

Time spent in concurrent phase.

action:Action of the Garbage Collector.

cause:What caused the garbage collection.

All

jvm.gc.pause.seconds

Timer

Time spent in garbage collection pause.

action:Action of the Garbage Collector.

cause:What caused the garbage collection.

All

ohi.extract.extracts.started.count

Counter

Number of extracts started.

resourceName:The resource name to extract.

All

ohi.extract.entities.processed.count

Counter

Number of processed (root) entities.

entity: The name of the (root) entity.

All

ohi.activityprocessing.activities.started.count

Counter

Number of activities started.

activityType:The code of the activity type.

level: The level of the activity. For example, GL for Global.

All

ohi.activityprocessing.activities.completed.timer

Timer

Duration of a completed activity.

activityType:The code of the activity type.

level: The level of the activity. For example, GL for Global.

status: The completion status. For example, CO for Completed.

All

ohi.activityprocessing.activities.completed.noduration.count

Timer

Number of completed activities with missing start or stop time.

activityType:The code of the activity type.

level: The level of the activity. For example, GL for Global.

status: The completion status. For example, CO for Completed.

All

ohi.persistence.query.count

Counter

Number of queries run.

entity:The name of the entity.

queryName: The name of the query.

All

ohi.persistence.query.rowcount

Counter

Number of rows retrieved.

entity:The name of the entity.

queryName: The name of the query.

All

spring.batch.job

TIMER

Duration of job run.

name:The name of the job.

status: The status of the job. For example, SUCCESS or FAILURE.

All

spring.batch.job.active

LONG_TASK_TIMER

Currently active jobs.

name:The name of the job.

All

spring.batch.step

TIMER

Duration of step run.

name:The name of the step.

job.name:The name of the job.

status: The status of the job. For example, SUCCESS or FAILURE.

All

spring.batch.step.active

LONG_TASK_TIMER

Currently active step.

name:The name of the active step.

All

spring.batch.item.read

TIMER

Duration of item reading.

job.name:The name of the job.

step.name:The name of the step.

status: The status of the job. For example, SUCCESS or FAILURE.

All

spring.batch.item.process

TIMER

Duration of item processing.

job.name:The name of the job.

step.name:The name of the step.

status: The status of the job. For example, SUCCESS or FAILURE.

All

spring.batch.chunk.write

TIMER

Duration of chunk writing.

job.name:The name of the job.

step.name:The name of the step.

status: The status of the job. For example, SUCCESS or FAILURE.

All

Enable Metric Data Recording

By default, metrics gathering is disabled. In that case, the /prometheus endpoint returns an HTTP 200 response with no content.

To enable metrics gathering, you need to ensure that the ohi.instrumentation.gather.applicationmetrics is set to true.

After ensuring the above, to enable specific metrics gathering, you need to also set the following individual system properties as mentioned below.

  • ohi.instrumentation.gather.jvm: Set to true to enable recording of JVM metrics only. Requires restarting the application to take effect.

  • ohi.instrumentation.gather.applicationmetrics: Set to true to enable application metrics and spring batch metrics. Effective immediately. There is no requirement for a restart.

  • ohi.instrumentation.gather.system: Set to true to enable system metrics. Effective immediately. There is no requirement for a restart.

  • ohi.instrumentation.gather.gc: Set to true to enable the following garbage collection metrics:

    • JvmGCMetrics

    • JvmHeapPressureMetrics
      Effective immediately. There is no requirement for a restart.

  • ohi.extract.extracts.started.count: Set to true to view number of extracts started. Effective immediately. There is no requirement for a restart.

  • ohi.extract.entities.processed.count: Set to true to view the number of processed (root) entities. Effective immediately. There is no requirement for a restart.

  • ohi.instrumentation.gather.activityprocessing: Set to true to enable the following activity processing metrics:

    • ohi.activityprocessing.activities.started

    • ohi.activityprocessing.activities.completed.timer
      Effective immediately. There is no requirement for a restart.

  • ohi.instrumentation.gather.persistence: Set to true to enable the following persistence metrics:

    • ohi.persistence.query.count

    • ohi.persistence.query.rowcount
      Effective immediately. There is no requirement for a restart.

If application metrics is false, but the group metrics is true, then it is a Pause operation, which means new values for the group metrics is not captured anymore, but the existing group metrics values are still visible and not removed.
If any group metrics is false, but application metrics is true, then it is a Delete operation, and the metrics values are deleted.

Record Non-Oracle Health Insurance Metrics

Oracle Health Insurance applications record metrics for which the name starts with the prefix ohi.. Non-Oracle Health Insurance applications may publish metrics that apply different naming conventions without recording. To enable recording of non-Oracle Health Insurance metrics as well, set the value for system property ohi.instrumentation.filter.ohi.nameprefix to false.

Add Application Tag for All Oracle Health Insurance Metrics

An application-specific identifier for metrics is important when similar metrics from various Oracle Health Insurance applications collect into one Prometheus instance. It allows filtering metric data on a per-application basis. There are multiple ways to add a source application tag or label to any metric:

  • Configure Prometheus to add such a common tag or label to all metrics it collects.

  • Alternatively, configure an Oracle Health Insurance application to add an application identifier as a tag to all metrics by setting system property ohi.instrumentation.common.application.tag to true.

Configuring Timers

Timers are the most memory-consuming type of meter, and their total footprint can vary significantly depending on the selected options. This section lists configuration options for timers.

Histograms and Percentiles

Assume a percentile histogram for a timer requires ~8kB of memory. That is the footprint for every combination of meter name and tags. By default, timers that the Oracle Health Insurance application configures do not collect percentile distributions or histogram data.

The two approaches for recording percentiles are:

  • Percentile Histograms: By enabling this option, the application accumulates values to an underlying histogram and publishes a predetermined set of buckets to Prometheus. Calculate percentiles off of this histogram using the Prometheus query language. To enable recording of percentile histograms, set system property ohi.instrumentation.<timer>.histogram, where the placeholder <timer> is the name of the timer, to true.

  • Percentile Distributions: By enabling this option, the application computes a percentile approximation for each meter ID, based on the set of names and tags, and publishes the percentile values to Prometheus. Note that this is not as flexible as using a percentile histogram because it is not possible to aggregate percentile approximations across tags. To enable recording of percentile distributions, specify the percentiles as the value for system property ohi.instrumentation.<timer>.percentiles, where the placeholder <timer> is the name of the timer. Specify percentiles as a comma-separated string. For example, set the median, 0.75 and 0.95 percentiles for the ohi.resource.timer:

ohi.instrumentation.ohi.resource.timer.percentiles=0.5,0.75,0.95

For example, verbatim /prometheus endpoint output for the ohi.resource.timer (for specific resource /currentproperties) without percentile distributions or histogram data:

# HELP ohi_resource_timer_seconds_max Resource timer
# TYPE ohi_resource_timer_seconds_max gauge
ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.273524648
# HELP ohi_resource_timer_seconds Resource timer
# TYPE ohi_resource_timer_seconds summary
ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0
ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.399261975

Verbatim /prometheus endpoint output for the same ohi.resource.timer with enabled histogram data:

# HELP ohi_resource_timer_seconds_max Resource timer
# TYPE ohi_resource_timer_seconds_max gauge
ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.256794629
# HELP ohi_resource_timer_seconds Resource timer
# TYPE ohi_resource_timer_seconds histogram
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001048576",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001398101",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001747626",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002097151",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002446676",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002796201",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003145726",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003495251",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003844776",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.004194304",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.005592405",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.006990506",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.008388607",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.009786708",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.011184809",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.01258291",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.013981011",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.015379112",} 1.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.016777216",} 2.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.022369621",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.027962026",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.033554431",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.039146836",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.044739241",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.050331646",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.055924051",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.061516456",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.067108864",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.089478485",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.111848106",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.134217727",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.156587348",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.178956969",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.20132659",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.223696211",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.246065832",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.268435456",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.357913941",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.447392426",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.536870911",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.626349396",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.715827881",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.805306366",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.894784851",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.984263336",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.073741824",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.431655765",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.789569706",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.147483647",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.505397588",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.863311529",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.22122547",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.579139411",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.937053352",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="4.294967296",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="5.726623061",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="7.158278826",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="8.589934591",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="10.021590356",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="11.453246121",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="12.884901886",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="14.316557651",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="15.748213416",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="17.179869184",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="22.906492245",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="28.633115306",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="30.0",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="+Inf",} 5.0
ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0
ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.391383318

And finally, verbatim /prometheus endpoint output for the same ohi.resource.timer with enabled percentile distributions:

# HELP ohi_resource_timer_seconds_max Resource timer
# TYPE ohi_resource_timer_seconds_max gauge
ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.280607455
# HELP ohi_resource_timer_seconds Resource timer
# TYPE ohi_resource_timer_seconds summary
ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.5",} 0.027262976
ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.75",} 0.087031808
ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.95",} 0.284164096
ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0
ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.438916162

Note that a time window of two minutes binds the values for the max gauge and percentiles. For the max gauge, for example, it means that its value is the maximum value during the time window. If no new values that record in the time window length, the max gauge resets to 0.0 as a new time window starts.

Determine Timers for Which Metrics Record and Publish

For more fine-grained control, combine the following properties to configure for which timers the application collects and publishes metric data:

  • ohi.instrumentation.<timer>.regex.tagname, where placeholder <timer> is the name of the timer.

  • ohi.instrumentation.<timer>.regex, where placeholder <timer> is the name of the timer.

For a particular timer, the application verifies if the tag name matches the regular expression. In such a case, metrics data for the timer publish.

For example, to collect metrics data for timer ohi.resource.timer for specific HTTP API resources (also known as IP resources) configure the following properties:

ohi.instrumentation.ohi.resource.timer.regex.tagname=resource
ohi.instrumentation.ohi.resource.timer.regex=^(?!\\/generic\\/).+

Resource paths starting with /generic/ identify generic HTTP API resources (note that generic is the resource that summarizes the generic resources). The regular expression will not match these. As a result, the application records and publishes metrics data for IP resources only.

Alternatively, to collect metrics data for timer ohi.resource.timer for generic resources only, thus ignoring timer data for IP resources, configure the following properties:

ohi.instrumentation.ohi.resource.timer.regex.tagname=resource
ohi.instrumentation.ohi.resource.timer.regex=^\\/generic\\/.+

Configure ohi.resource.client.timer Resource Tag Construction

The resource tag for an ohi.resource.client.timer must point to the name of the resource and not be more specific than that. For example, for a /persons resource with the following URL:

/persons/1234

the value for the resource tag must be /persons and not the specific person resource /persons/1234.

The explosion of ohi.resource.client.timer metrics records and publish when using overly specific values like /persons/1234. That is the reason for Oracle Health Insurance applications stopping after the first path segment while determining the resource tag for an ohi.resource.client.timer.

If that first path segment is, for example, a must-have (load balancer) context root that is not specific enough to identify the actual resource name. Then, configure the system to continue traversing the resource path until it encounters a path segment prefix that was not in the list of path segment prefixes to ignore.

Use the system property ohi.instrumentation.resourceclienttimer.segment.prefixes to specify a comma-separated list of known segment prefixes. For example, for an external /persons resource with the following URL:

/loadbalancer-url/api/persons/1234

Add an external /providers resource with the following URL:

/provider-system-api/providers/ABCD

Configure the property:

ohi.instrumentation.resourceclienttimer.segment.prefixes=loadbalancer-url,api,provider-system-api

for Oracle Health Insurance applications to determine resource tag values /loadbalancer-url/api/persons and /provider-system-api/providers respectively.