Gathering Metrics

OHI applications can gather the following metrics:

  • JVM memory consumption and processor metrics

  • Application-specific metrics like timers and counters for Dynamic Logic execution or Web Service request handling.

Metric data is exposed in Prometheus format. Prometheus can scrape metrics data using the "/prometheus" endpoint that OHI applications expose.

Metric Data

The following table lists the metrics per OHI application:

Metric Name Type Description Tags OHI Applications

ohi.dylo.timer

timer

Times Dynamic Logic executions

code: dynamic logic code
success: true if the execution was successful, false otherwise

All

ohi.resource.client.timer

timer

Times requests to external REST resources

resource: resource name
method: HTTP method, e.g. GET or PUT
status: HTTP return code, e.g. 200 or 404

All

ohi.resource.timer

timer

Times handling of requests for OHI application’s HTTP API resources

resource: resource name
method: HTTP method, e.g. GET or PUT
status: HTTP return code, e.g. 200 or 404

All

ohi.task.dequeue

counter

Counts the number of tasks that were dequeued from the task queue

Not applicable

All

ohi.task.enqueue

counter

Counts the number of tasks that were enqueued to the task queue

Not applicable

All

ohi.task.timer

timer

Times task execution

code: task type code
status: task status, e.g. COMPLETED or ERRORED

All

ohi.exchange.timer

timer

Times exchange execution

integration: integration code
status: exchange status, e.g. C (for completed) or F (for failed)

Oracle Insurance Gateway

Enable Metric Data recording

By default, metrics gathering is disabled. In that case the "/prometheus" endpoint returns an HTTP 200 response without content.
To enable metrics gathering set the following system properties:

  • ohi.instrumentation.gather.jvmtelemetry: set to true to enable recording of JVM telemetry. Requires restarting of the application to take effect.

  • ohi.instrumentation.gather.applicationmetrics: set to true to enable recording of metrics. Effective immediate, no restart required.

Record non-OHI Metrics

OHI applications record metrics for which the name starts with prefix "ohi.". Metrics that may be published by non-OHI components, that apply different naming conventions, are not recorded. To enable recording of non-OHI metrics as well, set the value for system property ohi.instrumentation.filter.ohi.nameprefix to false.

Add Application Tag for all OHI Metrics

An application-specific identifier for metrics is important when similar metrics are collected into one Prometheus instance from various (OHI) applications. It allows filtering metric data on a per application basis. There are multiple ways to add a source application tag or label to any metric:

  • Configure Prometheus to add such a common tag or label to all metrics it collects.

  • Alternatively, configure an OHI application to add an application identifier as a tag to all metrics by setting system property ohi.instrumentation.common.application.tag to true.

Configuring Timers

Timers are the most memory-consuming type of meter, and their total footprint can vary significantly depending on the selected options. This section lists configuration options for timers.

Histograms and Percentiles

As a rule of thumb, assume a percentile histogram for a timer to require ~8kB of memory. Note, that is the footprint for every combination of meter name and tags.
By default, the timers configured in OHI applications do not collect percentile distributions or histogram data.

The two approaches for recording percentiles are:

  • Percentile histograms: by enabling this option, the application accumulates values to an underlying histogram and publishes a predetermined set of buckets to Prometheus. Calculate percentiles off of this histogram using the Prometheus query language.
    To enable recording of percentile histograms, set system property ohi.instrumentation.<timer>.histogram, where the placeholder <timer> is the name of the timer, to true.

  • Percentile distributions: by enabling this option, the application computes a percentile approximation for each meter ID, based on the set of name and tags, and publishes the percentile values to Prometheus. Note that this is not as flexible as using a percentile histogram because it is not possible to aggregate percentile approximations across tags.
    To enable recording of percentile distributions, specify the percentiles as the value for system property ohi.instrumentation.<timer>.percentiles where the placeholder <timer> is the name of the timer.Percentiles must be specified as a comma-separated string. For example, set the median, 0.75 and 0.95 percentiles for the "ohi.resource.timer" as follows:

    ohi.instrumentation.ohi.resource.timer.percentiles=0.5,0.75,0.95

Example: verbatim "/prometheus" endpoint output for the "ohi.resource.timer" (for specific resource "/currentproperties") without percentile distributions or histogram data:

# HELP ohi_resource_timer_seconds_max Resource timer
# TYPE ohi_resource_timer_seconds_max gauge
ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.273524648
# HELP ohi_resource_timer_seconds Resource timer
# TYPE ohi_resource_timer_seconds summary
ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0
ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.399261975

Verbatim "/prometheus" endpoint output for the same "ohi.resource.timer" with histogram data enabled:

# HELP ohi_resource_timer_seconds_max Resource timer
# TYPE ohi_resource_timer_seconds_max gauge
ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.256794629
# HELP ohi_resource_timer_seconds Resource timer
# TYPE ohi_resource_timer_seconds histogram
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001048576",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001398101",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001747626",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002097151",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002446676",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002796201",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003145726",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003495251",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003844776",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.004194304",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.005592405",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.006990506",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.008388607",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.009786708",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.011184809",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.01258291",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.013981011",} 0.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.015379112",} 1.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.016777216",} 2.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.022369621",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.027962026",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.033554431",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.039146836",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.044739241",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.050331646",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.055924051",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.061516456",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.067108864",} 3.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.089478485",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.111848106",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.134217727",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.156587348",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.178956969",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.20132659",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.223696211",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.246065832",} 4.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.268435456",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.357913941",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.447392426",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.536870911",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.626349396",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.715827881",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.805306366",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.894784851",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.984263336",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.073741824",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.431655765",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.789569706",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.147483647",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.505397588",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.863311529",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.22122547",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.579139411",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.937053352",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="4.294967296",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="5.726623061",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="7.158278826",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="8.589934591",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="10.021590356",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="11.453246121",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="12.884901886",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="14.316557651",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="15.748213416",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="17.179869184",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="22.906492245",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="28.633115306",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="30.0",} 5.0
ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="+Inf",} 5.0
ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0
ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.391383318

And finally, verbatim "/prometheus" endpoint output for the same "ohi.resource.timer" with percentile distributions enabled:

# HELP ohi_resource_timer_seconds_max Resource timer
# TYPE ohi_resource_timer_seconds_max gauge
ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.280607455
# HELP ohi_resource_timer_seconds Resource timer
# TYPE ohi_resource_timer_seconds summary
ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.5",} 0.027262976
ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.75",} 0.087031808
ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.95",} 0.284164096
ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0
ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.438916162

Note that the values for the 'max' gauge and percentiles are bound to a time window of 2 minutes. For the 'max' gauge for example, it means that its value is the maximum value during the time window. If no new values are recorded for the time window length, the 'max' gauge will be reset to 0.0 as a new time window starts.

Determine Timers for which Metrics are Recorded and Published

For more fine-grained control, combine the following properties to configure for which timers the application collects and publishes metric data:

  • ohi.instrumentation.<timer>.regex.tagname, where placeholder <timer> is the name of the timer.

  • ohi.instrumentation.<timer>.regex, where placeholder <timer> is the name of the timer.

For the given timer, the application verifies if the tag name matches the regular expression. Metrics data for the timer is published if that is the case.

For example, to collect metrics data for timer "ohi.resource.timer" for specific HTTP API resources (also known as IP resources), configure the following properties:

ohi.instrumentation.ohi.resource.timer.regex.tagname=resource
ohi.instrumentation.ohi.resource.timer.regex=^(?!\\/generic\\/).+

Generic HTTP API resources are identified by resource paths starting with "/generic/" (note that "generic" is the resource that provides an overview of the generic resources). The regular expression will not match these. As a result, the application records and publishes metrics data for IP resources only.

Alternatively, to collect metrics data for timer "ohi.resource.timer" for generic resources only, thus ignoring timer data for IP resources, configure the following properties:

ohi.instrumentation.ohi.resource.timer.regex.tagname=resource
ohi.instrumentation.ohi.resource.timer.regex=^\\/generic\\/.+

Configure ohi.resource.client.timer Resource Tag construction

The resource tag for an "ohi.resource.client.timer" should point to the name of the resource and not be more specific than that. For example, for a "/persons" resource with the following URL

/persons/1234

the value for the resource tag should be "/persons" and not the specific person resource "/persons/1234".

Using overly specific values like "/persons/1234" would lead to an explosion of "ohi.resource.client.timer" metrics being recorded and published. That is the reason for OHI applications to stop after the first path segment when determining the resource tag for an "ohi.resource.client.timer".

If that first path segment is, for example, a (load balancer) context root that should be included but is not specific enough in order to identify the actual resource name then the system needs to be configured to continue to traverse the resource path until it encounters a path segment prefix that was not in the list of path segment prefixes to ignore.

Use the system property ohi.instrumentation.resourceclienttimer.segment.prefixes to specify a comma-separated list of known segment prefixes. For example, for an external "/persons" resource with the following URL

/loadbalancer-url/api/persons/1234

and an external "/providers" resource with the following URL

/provider-system-api/providers/ABCD

configure the property as follows:

ohi.instrumentation.resourceclienttimer.segment.prefixes=loadbalancer-url,api,provider-system-api

in order for OHI applications to determine resource tag values "/loadbalancer-url/api/persons" and "/provider-system-api/providers" respectively.