11 Monitoring ECE Composable Service Processes
You can monitor system processes, such as memory and thread usage, in the Oracle Communications Elastic Charging Engine (ECE) composable services.
About Monitoring the ECE Composable Services
You can set up monitoring of your ECE composable services using a Kubernetes Service Monitor, the Prometheus Operator, and Grafana Dashboards. Service Monitor exposes JVM and application metric data through a single endpoint in an OpenMetrics/Prometheus exposition format. Prometheus then scrapes the metrics and stores them for analysis and monitoring through the Grafana Dashboards.
Setting Up Monitoring of the ECE Composable Services
Setting up monitoring of the ECE composable services involves the following high-level tasks:
-
Deploying the following prerequisite software on your system:
-
Deploying the Prometheus Operator for the ECE composable services. See "Installing Prometheus Operator".
-
Deploying Grafana for the ECE composable services. See "Installing Grafana".
For the list of compatible versions, see BRM Compatibility Matrix.
-
-
Enabling the Service Monitor and Grafana Dashboards. See "Enabling the Service Monitor".
Enabling the Service Monitor
By default, the Service Monitor and Grafana Dashboards are disabled in the ECE composable services. You must enable both if you want to monitor your system's processes using Prometheus and Grafana.
To enable the Service Monitor and Grafana Dashboards:
-
Create an override-values.yaml file for the oc-ccs-version Helm chart.
-
Enable the Service Monitor and Grafana Dashboards by setting these keys:
-
serviceMonitor.enabled: Set this to true.
-
serviceMonitor.namespace: Set this to the namespace in which to deploy the Service Monitor CRD.
-
grafanaDashboards.enabled: Set this to true.
-
grafanaDashboards.grafanaNamespace: Set this to the namespace in which to deploy the Grafana Dashboards.
-
-
Optionally, modify the default settings for Grafana:
-
grafanaDashboards.labels.grafana_dashboard: Specify the labels to add to the Grafana CRDs. This helps Grafana discover the dashboards.
-
grafanaDashboards.annotations.k8s-sidecar-target-directory: Set this to the directory for the Grafana sidecar.
-
-
Save and close your override-values.yaml file.
-
Run the helm upgrade command to update your Helm release:
helm upgrade EceCompServicesReleaseName oc-ccs-version --namespace EceCompServicesNameSpace --values override-values.yamlwhere EceCompServicesReleaseName is the release name for the ECE composable services, and EceCompServicesNameSpace is the namespace in which to create Kubernetes objects for the Helm chart.
Metrics for the ECE Composable Services
The ECE composable services can collect metrics in the following groups to produce monitoring data:
Charging Gateway Metrics
The Charging Gateway group contains standard metrics for JVM CPU and memory utilization. Table 11-1 lists the metrics in this group.
Table 11-1 Charging Gateway Metrics
| Metric | Type | Description |
|---|---|---|
|
ccs_service_processor_seconds_count |
Histogram |
Contains the time taken to process the request. |
|
ccs_service_request_seconds_count |
Gauge |
Contains the total amount of time to process the request. |
|
ccs_service_topic_messages_received |
Counter |
Tracks the count of messages received from the topic. |
|
ccs_service_request_failed |
Counter |
Tracks the count of failed subscriber requests. |
|
ccs_service_cgf_kafka_subscriber_count |
Gauge |
Contains the count of instances of Kafka subscribers. |
|
ccs_service_processor_duplicate |
Counter |
Tracks the count of duplicate CHF CDR events. |
|
ccs_service_processor_isn_gaps |
Counter |
Tracks the count of ISN gaps. |
|
ccs_service_processor_isn_gap_occurrences |
Counter |
Tracks the count of ISN gap occurrences. |
|
ccs_service_processor_isn_gap_filled |
Counter |
Tracks the count of ISN gaps filled. |
|
ccs_service_processor_closed_cdr_count |
Counter |
Tracks the count of closed CDRs. |
|
ccs_service_retransmitted_requests |
Counter |
Tracks the number of retransmitted requests. |
|
ccs_cgf_remote_records_consumed_total |
Counter |
Tracks the number of records consumed from the remote site. |
|
ccs_cgf_remote_records_published_total |
Counter |
Tracks the count of records published that were consumed from the remote site. |
|
ccs_suspect_retry_count |
Counter |
Tracks the suspect records retried by the coordinator. |
|
ccs_suspect_published_count |
Counter |
Tracks the suspect records successfully published by the coordinator. |
|
ccs_local_site_ownership_change_record_count |
Counter |
Tracks the CDR site ownership transfers handled by the local site ownership service. |
|
ccs_remote_site_ownership_change_record_count |
Counter |
Tracks the CDR site ownership transfers handled by the remote site ownership service. |
|
ccs_cgf_db_stable_state |
Gauge |
Contains the stable database availability state: up (1) or down (0). |
|
ccs_cgf_db_observed_state |
Gauge |
Contains the most recently observed availability state of the database: up (1) or down (0). |
|
ccs_cgf_assignor_remote_assignment_active |
Gauge |
Contains the remote assignment active state: remote assignment active (1) or normal (0). |
|
ccs_cgf_assignor_partitions_assigned |
Gauge |
Contains the current partition assignments by group, topic, and site. |
|
ccs_scheduled_task_seconds_count |
Counter |
Tracks the time taken to execute a scheduled task. |
|
ccs_scheduled_task_enabled |
Gauge |
Whether the scheduled task is enabled (1) or disabled (0). |
|
ccs_scheduled_task_last_run |
Gauge |
Contains the epoch seconds of the most recent task execution. |
ECS Publisher Metrics
The ECS Publisher group contains metrics for tracking the processing performance of events sent to the Kafka server. Table 11-2 lists the metrics in this group.
Table 11-2 ECS Publisher Metrics
| Metric | Type | Description |
|---|---|---|
|
ece.ratedevent.publisher.publish.time |
Timer |
The end-to-end latency of a transactional batch publish, including the mapping/envelope build, asynchronous publish, and the wait up to publishTimeoutSeconds. It is recorded after a successful commit. |
|
ece.ratedevent.publisher.batch.size |
DistributionSummary |
The number of records published in the current transactional batch. |
|
ece.ratedevent.publisher.error |
Counter |
A monotonic count of publisher errors by bounded errorKind and errorScope. |
|
ece.ratedevent.publisher.dlq.published |
Counter |
Counts the number of records successfully published to the Dead Letter Queue (DLQ). |
|
ece.ratedevent.publisher.dlq.error |
Counter |
Counts the number of records that failed to be published to the DLQ. |