Gathering Metrics
Oracle Health Insurance applications can gather the following metrics:
-
JVM memory consumption and processor metrics.
-
Application-specific metrics like timers and counters for Dynamic Logic execution or web service request handling.
The Prometheus format exposes the metric data. Prometheus can scrape
metrics data that Oracle Health Insurance applications
expose, using the /prometheus
endpoint.
Metric Data
The following table lists the metrics per the Oracle Health Insurance application:
Metric Name | Type | Description | Tags | Oracle Health Insurance Applications |
---|---|---|---|---|
|
timer |
Keeps track of the time of Dynamic Logic executions. |
|
All |
|
timer |
Keeps track of the time requests that external REST resources take. |
|
All |
|
timer |
Keeps track of time to handle requests for the Oracle Health Insurance application’s HTTP API resources. |
|
All |
|
counter |
Counts the number of tasks that de-queue from the task queue. |
Not applicable |
All |
|
counter |
Counts the number of tasks that enqueue to the task queue. |
Not applicable |
All |
|
timer |
Times task execution. |
|
All |
|
timer |
Keeps track of time of exchange execution. |
|
Oracle Insurance Gateway |
|
Gauge |
Maximum size of long-lived heap memory pool. |
None |
All |
|
Gauge |
Size of long-lived heap memory pool after reclamation. |
None |
All |
|
Counter |
Incremented for an increase in the size of the (young) heap memory pool after one garbage collection to before the next. |
None |
All |
|
Counter |
Count of positive increases in the size of the old generation memory pool before garbage collection to after garbage collection. |
None |
All |
|
Gauge |
The percentage of long-lived heap pool used after the last garbage collection event, in the range [0..1]. |
|
All |
|
Gauge |
An approximation of the percent of CPU time used by garbage collection activities over the last look back period or since monitoring began, whichever is shorter, in the range [0..1]. |
None |
All |
|
Timer |
Time spent in concurrent phase. |
|
All |
|
Timer |
Time spent in garbage collection pause. |
|
All |
|
Counter |
Number of extracts started. |
|
All |
|
Counter |
Number of processed (root) entities. |
|
All |
|
Counter |
Number of activities started. |
|
All |
|
Timer |
Duration of a completed activity. |
|
All |
|
Timer |
Number of completed activities with missing start or stop time. |
|
All |
Enable Metric Data Recording
By default, metrics gathering is disabled. In that case, the
/prometheus
endpoint returns an HTTP 200
response with no content.
Set the following system properties to enable metrics gathering:
-
ohi.instrumentation.gather.jvm
: Set totrue
to enable recording of JVM metrics only. Requires restarting the application to take effect. -
ohi.instrumentation.gather.applicationmetrics
: Set totrue
to enable application metrics. Effective immediately. There is no requirement for a restart. -
ohi.instrumentation.gather.system
: Set totrue
to enable system metrics. Effective immediately. There is no requirement for a restart. -
ohi.instrumentation.gather.gc
: Set totrue
to enable the following garbage collection metrics:-
JvmGCMetrics
-
JvmHeapPressureMetrics
Effective immediately. There is no requirement for a restart.
-
-
ohi.extract.extracts.started.count
: Set totrue
to view number of extracts started. Effective immediately. There is no requirement for a restart. -
ohi.extract.entities.processed.count
: Set totrue
to view the number of processed (root) entities. Effective immediately. There is no requirement for a restart. -
ohi.instrumentation.gather.activityprocessing
: Set totrue
to enable the following activity processing metrics:-
ohi.activityprocessing.activities.started
-
ohi.activityprocessing.activities.completed.timer
Effective immediately. There is no requirement for a restart.
-
Record Non-Oracle Health Insurance Metrics
Oracle Health Insurance applications record metrics for which the name starts with the prefix
ohi.
. Non-Oracle Health Insurance applications may publish metrics that apply different naming conventions without recording. To enable recording of
non-Oracle Health Insurance metrics as well, set the value for system property
ohi.instrumentation.filter.ohi.nameprefix
to false
.
Add Application Tag for All Oracle Health Insurance Metrics
An application-specific identifier for metrics is important when similar metrics from various Oracle Health Insurance applications collect into one Prometheus instance. It allows filtering metric data on a per-application basis. There are multiple ways to add a source application tag or label to any metric:
-
Configure Prometheus to add such a common tag or label to all metrics it collects.
-
Alternatively, configure an Oracle Health Insurance application to add an application identifier as a tag to all metrics by setting system property
ohi.instrumentation.common.application.tag
totrue
.
Configuring Timers
Timers are the most memory-consuming type of meter, and their total footprint can vary significantly depending on the selected options. This section lists configuration options for timers.
Histograms and Percentiles
Assume a percentile histogram for a timer requires
~8kB
of memory. That is the footprint for every combination of
meter name and tags.
By default, timers that the Oracle Health Insurance application configures do not collect
percentile distributions or histogram data.
The two approaches for recording percentiles are:
-
Percentile Histograms: By enabling this option, the application accumulates values to an underlying histogram and publishes a predetermined set of buckets to Prometheus. Calculate percentiles off of this histogram using the Prometheus query language. To enable recording of percentile histograms, set system property
ohi.instrumentation.<timer>.histogram
, where the placeholder<timer>
is the name of the timer, totrue
. -
Percentile Distributions: By enabling this option, the application computes a percentile approximation for each meter ID, based on the set of names and tags, and publishes the percentile values to Prometheus. Note that this is not as flexible as using a percentile histogram because it is not possible to aggregate percentile approximations across tags. To enable recording of percentile distributions, specify the percentiles as the value for system property
ohi.instrumentation.<timer>.percentiles
, where the placeholder<timer>
is the name of the timer. Specify percentiles as a comma-separated string. For example, set the median, 0.75 and 0.95 percentiles for theohi.resource.timer
:
ohi.instrumentation.ohi.resource.timer.percentiles=0.5,0.75,0.95
For example, verbatim /prometheus
endpoint output for the ohi.resource.timer
(for specific resource /currentproperties
) without percentile
distributions or histogram data:
# HELP ohi_resource_timer_seconds_max Resource timer # TYPE ohi_resource_timer_seconds_max gauge ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.273524648 # HELP ohi_resource_timer_seconds Resource timer # TYPE ohi_resource_timer_seconds summary ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0 ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.399261975
Verbatim /prometheus
endpoint output for the same ohi.resource.timer
with enabled
histogram data:
# HELP ohi_resource_timer_seconds_max Resource timer # TYPE ohi_resource_timer_seconds_max gauge ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.256794629 # HELP ohi_resource_timer_seconds Resource timer # TYPE ohi_resource_timer_seconds histogram ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001048576",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001398101",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.001747626",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002097151",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002446676",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.002796201",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003145726",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003495251",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.003844776",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.004194304",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.005592405",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.006990506",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.008388607",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.009786708",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.011184809",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.01258291",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.013981011",} 0.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.015379112",} 1.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.016777216",} 2.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.022369621",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.027962026",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.033554431",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.039146836",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.044739241",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.050331646",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.055924051",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.061516456",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.067108864",} 3.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.089478485",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.111848106",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.134217727",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.156587348",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.178956969",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.20132659",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.223696211",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.246065832",} 4.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.268435456",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.357913941",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.447392426",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.536870911",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.626349396",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.715827881",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.805306366",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.894784851",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="0.984263336",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.073741824",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.431655765",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="1.789569706",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.147483647",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.505397588",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="2.863311529",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.22122547",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.579139411",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="3.937053352",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="4.294967296",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="5.726623061",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="7.158278826",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="8.589934591",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="10.021590356",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="11.453246121",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="12.884901886",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="14.316557651",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="15.748213416",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="17.179869184",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="22.906492245",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="28.633115306",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="30.0",} 5.0 ohi_resource_timer_seconds_bucket{method="GET",resource="/currentproperties",status="200",le="+Inf",} 5.0 ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0 ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.391383318
And finally, verbatim /prometheus
endpoint output for the same
ohi.resource.timer
with enabled percentile distributions:
# HELP ohi_resource_timer_seconds_max Resource timer # TYPE ohi_resource_timer_seconds_max gauge ohi_resource_timer_seconds_max{method="GET",resource="/currentproperties",status="200",} 0.280607455 # HELP ohi_resource_timer_seconds Resource timer # TYPE ohi_resource_timer_seconds summary ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.5",} 0.027262976 ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.75",} 0.087031808 ohi_resource_timer_seconds{method="GET",resource="/currentproperties",status="200",quantile="0.95",} 0.284164096 ohi_resource_timer_seconds_count{method="GET",resource="/currentproperties",status="200",} 5.0 ohi_resource_timer_seconds_sum{method="GET",resource="/currentproperties",status="200",} 0.438916162
Note that a time window
of two minutes binds the values for the max
gauge and percentiles.
For the max
gauge, for example, it means that its value is the maximum value during
the time window.
If no new values that record in the time window length, the max
gauge resets
to 0.0
as a new time window starts.
Determine Timers for Which Metrics Record and Publish
For more fine-grained control, combine the following properties to configure for which timers the application collects and publishes metric data:
-
ohi.instrumentation.<timer>.regex.tagname
, where placeholder<timer>
is the name of the timer. -
ohi.instrumentation.<timer>.regex
, where placeholder<timer>
is the name of the timer.
For a particular timer, the application verifies if the tag name matches the regular expression. In such a case, metrics data for the timer publish.
For example, to collect metrics data for timer ohi.resource.timer
for
specific HTTP API resources (also known as IP resources) configure the
following properties:
ohi.instrumentation.ohi.resource.timer.regex.tagname=resource ohi.instrumentation.ohi.resource.timer.regex=^(?!\\/generic\\/).+
Resource paths starting
with /generic/
identify generic HTTP API resources (note that generic
is the resource that summarizes the generic resources). The regular expression will not
match these. As a result, the application records and publishes metrics
data for IP resources only.
Alternatively, to collect metrics data for timer ohi.resource.timer
for
generic resources only, thus ignoring timer data for IP resources,
configure the following properties:
ohi.instrumentation.ohi.resource.timer.regex.tagname=resource ohi.instrumentation.ohi.resource.timer.regex=^\\/generic\\/.+
Configure ohi.resource.client.timer Resource Tag Construction
The resource tag for an ohi.resource.client.timer
must point to the name of the resource
and not be more specific than that. For example, for a /persons
resource with the following URL:
/persons/1234
the value for the resource tag must be /persons
and not the specific
person resource /persons/1234
.
The explosion of
ohi.resource.client.timer
metrics records and publish when using overly specific values like /persons/1234
.
That is the reason for Oracle Health Insurance applications stopping after the first path segment while
determining the resource tag for an ohi.resource.client.timer
.
If that first path segment is, for example, a must-have (load balancer) context root that is not specific enough to identify the actual resource name. Then, configure the system to continue traversing the resource path until it encounters a path segment prefix that was not in the list of path segment prefixes to ignore.
Use the system property ohi.instrumentation.resourceclienttimer.segment.prefixes
to specify a comma-separated list of known segment prefixes.
For example, for an external /persons
resource with the following URL:
/loadbalancer-url/api/persons/1234
Add an external /providers
resource with the following URL:
/provider-system-api/providers/ABCD
Configure the property:
ohi.instrumentation.resourceclienttimer.segment.prefixes=loadbalancer-url,api,provider-system-api
for Oracle Health Insurance applications to determine resource tag values /loadbalancer-url/api/persons
and /provider-system-api/providers
respectively.