25 Monitoring ECE in a Cloud Native Environment
Learn how to monitor system processes, such as memory and thread usage, of your Oracle Communications Elastic Charging Engine (ECE) components in a cloud native environment.
Topics in this document:
About Monitoring ECE in a Cloud Native Environment
You can set up monitoring of your ECE components in a cloud native environment. When configured to do so, ECE exposes JVM, Coherence, and application metric data through a single endpoint in an OpenMetrics/Prometheus exposition format. You can then use an external centralized metrics service, such as Prometheus, to scrape the ECE cloud native metrics and store them for analysis and monitoring.
ECE cloud native exposes metric data for the following components by default:
-
ECE Server
-
BRM Gateway
-
Customer Updater
-
Diameter Gateway
-
EM Gateway
-
HTTP Gateway
-
CDR Formatter
-
Pricing Updater
-
Radius Gateway
-
Rated Event Formatter
Setting up monitoring of these ECE cloud native components involves the following high-level tasks:
-
Ensuring that the ECE metric endpoints are enabled. See "Enabling ECE Metric Endpoints".
ECE metric data will be exposed through the following endpoint: http://localhost:19612/metrics.
-
Setting up a centralized metrics service, such as Prometheus, to scrape metrics from the endpoint.
-
Setting up a visualization tool, such as Grafana, to display your ECE metric data in a graphical format.
Enabling ECE Metric Endpoints
The default ECE cloud native configuration exposes JVM, Coherence, and application metric data for all ECE components to a single REST endpoint. If you create any additional instances of ECE components, you must configure it to expose metric data.
To ensure that the ECE metric endpoints are enabled:
-
Open your override-values.yaml file for oc-cn-ece-helm-chart.
-
Verify that the charging.metrics.port key is set to the port number at which you want to expose the ECE metrics. The default is 19612.
-
Verify that each ECE component instance has metrics enabled.
Each application role under the charging key can be configured to enable or disable metrics. In the jvmOpts key, setting the ece.metrics.http.service.enabled option either enables (true) or disables (false) the metrics service for that role.
For example, these override-values.yaml entries would enable the metrics service for ecs1.
charging: labels: "ece" jmxport: "9999" … metrics: port: "19612" ecs1: jmxport: "" replicas: 1 … jvmOpts: "-Dece.metrics.http.service.enabled=true" restartCount: "0"
-
Save and close your override-values.yaml file.
-
Run the helm upgrade command to update your ECE Helm release:
helm upgrade EceReleaseName oc-cn-ece-helm-chart --namespace EceNameSpace --values OverrideValuesFile
where:
-
EceReleaseName is the release name for oc-cn-ece-helm-chart.
-
EceNameSpace is the name space in which to create ECE Kubernetes objects for the ECE Helm chart.
-
OverrideValuesFile is the name and location of your override-values.yaml file for oc-cn-ece-helm-chart.
-
ECE Cloud Native Metrics
ECE cloud native collects metrics in the following groups to produce data for monitoring your ECE components:
JVM Metrics
The JVM Metrics group contains standard metrics about the central processing unit (CPU) and memory utilization of JVMs, which are members of the ECE grid. Table 25-1 lists the metrics in this group.
Table 25-1 JVM Metrics
Metric Name | Type | Description |
---|---|---|
jvm_memory_bytes_init |
Gauge |
Contains the initial size, in bytes, for the Java heap and non-heap memory. |
jvm_memory_bytes_committed |
Gauge |
Contains the committed size, in bytes, for the Java heap and non-heap memory. |
jvm_memory_bytes_used |
Gauge |
Contains the amount of Java heap and non-heap memory, in bytes, that are in use. |
jvm_memory_bytes_max |
Gauge |
Contains the maximum size, in bytes, for the Java heap and non-heap memory. |
jvm_memory_pool_bytes_init |
Gauge |
Contains the initial size, in bytes, of the following JVM memory pools: G1 Survivor Space, G1 Old Gen, and G1 Survivor Space. |
jvm_memory_pool_bytes_committed |
Gauge |
Contains the committed size, in bytes, of the following JVM memory pools: G1 Survivor Space, G1 Old Gen, and G1 Survivor Space. |
jvm_memory_pool_bytes_used |
Gauge |
Contains the amount of Java memory space, in bytes, is in use by the following JVM memory pools: G1 Survivor Space, G1 Old Gen, and G1 Survivor Space. |
jvm_buffer_count_buffers |
Gauge |
Contains the estimated number of mapped and direct buffers in the JVM memory pool. |
jvm_buffer_total_capacity_bytes |
Gauge |
Contains the estimated total capacity, in bytes, of the mapped and direct buffers in the JVM memory pool. |
process_cpu_usage |
Gauge |
Contains the CPU usage information (in percentage) for each ECE component on the server. This data is collected from the corresponding MBean attributes by JVMs. |
process_files_open_files |
Gauge |
Contains the total number of file-descriptors currently available for an ECE component and the descriptors that are in use for that ECE component. |
coherence_os_system_cpu_load |
Gauge |
Contains the CPU load information (in percentage) for each system in the cluster. These statistics are based on the average data collected from all the ECE grid members running on a server. |
system_load_average_1m |
Gauge |
Contains the system load average (the number of items waiting in the CPU run-queue) information for each machine in the cluster. These statistics are based on the average data collected from all the ECE grid members running on a server. |
coherence_os_free_swap_space_size |
Gauge |
Contains system swap usage information (by default in megabytes) for each system in the cluster. These statistics are based on the average data collected from all the ECE grid members running on a server. |
BRS Metrics
The BRS Metrics group contains the metrics for tracking throughput and latency of the charging clients that use batch request service (BRS). Table 25-2 lists the metrics in this group.
Table 25-2 ECE BRS Metrics
Metric Name | Metric Type | Description |
---|---|---|
ece_brs_task_processed |
Counter |
Tracks the total number of requests that have been accepted, processed, timed out, or rejected by the ECE component. You can use this to track the approximate processing rate over time, aggregate over all client applications, and so on. |
ece_brs_task.pending_count | Gauge |
Contains the number of requests that are pending by the ECE component. |
ece.brs.current.latency.by.type | Gauge |
Tracks the latency of a charging client for each charging operation type in the current query interval. This metric provides the latency information for the following operation types: Initiate, Update, Terminate, Cancel, Price_Enquiry, Balance_Query, Debit_Amount, Debit_Unit, Refund_Amount, and Refund_Unit. |
ece.brs.current.latency | Gauge |
Tracks the current operation latency for a charging client in the current scrape interval. This metric contains the BRS statistics tracked using the charging.brsConfigurations MBean attributes. This configuration tracks maximum and average latency for an operation type since the last query. The maximum window size for the collecting this data is 30 seconds, so the query has to be run within every 30 seconds. This metric provides the latency information for the following operation types: Initiate, Update, Terminate, Cancel, Price_Enquiry, Balance_Query, Debit_Amount, Debit_Unit, Refund_Amount, Refund_Unit, and Spending_Limit_Report. |
Session Metrics
The Session Metrics group contains metrics on ECE server sessions. Table 25-3 lists the metrics in this group.
Table 25-3 Session Metrics
Metric Name | Type | Description |
---|---|---|
ece_session_metrics | Counter | Contains the total number of sessions opened or closed by rating group, node, or cluster. |
Rated Events Metrics
The Rated Events Metrics group contains metrics on rated events processed by ECE server sessions. Table 25-4 lists the metrics in this group.
Table 25-4 Rated Events Metrics
Metric Name | Type | Description |
---|---|---|
ece_rated_events_formatted |
Counter |
Contains the number of successful or failed formatted rated events per RatedEventFormatter worker thread upon each formatting job operation from NoSQL or the Oracle database. |
ece_rated_events_cached |
Counter | Contains the total number of rated events cached by each ECE node. |
ece_rated_events_inserted |
Counter |
Contains the total number of rated events that were successfully inserted into the cache. |
ece_rated_events_insert_failed |
Counter |
Contains the total number of rated events that failed to be inserted into the cache. |
ece_rated_events_purged |
Counter |
Contains the total number of rated events that were purged. |
ece_requests_by_result_code |
Counter |
Tracks the total requests processed by using the result code. |
CDR Formatter Metrics
The CDR Formatter Metrics group contains the metrics for tracking Charging Function (CHF) records. Table 25-5 lists the metrics in this group.
Table 25-5 CDR Formatter Metrics
Metric Name | Metric Type | Description |
---|---|---|
ece_chf_records_processed |
Counter |
Tracks the total number of CHF records that have been processed by the CDR formatter. |
ece_chf_records_purged |
Counter |
Tracks the total number of CHF records that have been purged by the CDR formatter. |
ece_chf_records_loaded |
Counter |
Tracks the total number of CHF records that have been loaded by the CDR formatter. |
Coherence Metrics
All Coherence metrics that are available through the Coherence metrics endpoint are also accessible through the ECE metrics endpoint. For more information about the Coherence metrics, see "Oracle Coherence MBeans Reference" in Oracle Fusion Middleware Managing Oracle Coherence.
For information about querying for Coherence metrics, see "Querying for Coherence Metrics" in Oracle Fusion Middleware Managing Oracle Coherence.