7 Metrics, KPIs, Alerts, and Alarms
This chapter details the Metrics, KPIs, Alerts, and Alarms used by OCNADD.
7.1 Metrics, Dimensions, and Common Attributes
This section defines the metrics, dimensions, and attributes used by OCNADD.
7.1.1 Dimensions and Common Attributes
This section includes information about Dimensions and Common Attributes of metrics for OCNADD.
Dimensions
The following table includes information about dimensions of OCNADD.
Table 7-1 Dimensions
| Dimension | Values / Type | Description |
|---|---|---|
| quantile | Integer values | It captures the latency values with ranges: 10 ms, 20 ms, 40 ms, 80 ms, 100 ms, 200 ms, 500 ms, 1000 ms, and 5000 ms. |
| instance_identifier | Prefix configured in Helm, UNKNOWN | Prefix of the pod configured in Helm when there are multiple instances in the same deployment. |
| processor_node_id | – | Stream processor node ID in the aggregation service. |
| serviceId | serviceType-N | Identifier for the service instance used for registration with the health monitoring service. |
| serviceType | CONFIGURATION, ALARM, OCNADD-ADMIN, AGGREGATION-DIAMETER, CORRELATION-DIAMETER | The OCNADD service type. |
| service | ocnaddadminservice, ocnaddconfiguration, ocnaddhealthmonitoring, ocnadddiameteraggregation, ocnadddiamtercorrelation | The name of the Data Director microservice. |
| request_type | Diameter Correlation | Type of the data feed created using REST; this is used to identify if the xDR feed is for HTTP2 or Diameter. |
| nf_feed_type | VCOLLECTOR | The source NF for the feed or the name of the Diameter data provider. |
| correlation-id | – | Taken from the correlation-id present in the metadata list. |
| way | – | Taken from the message-direction present in the metadata list. |
| srcIP | – | Obtained from the source IP address present in the metadata list of the Diameter message sent by vCollector. |
| dstIP | – | Obtained from the destination IP address present in the metadata list of the Diameter message sent by vCollector. |
| srcPort | – | Obtained from the source port present in the metadata list of the Diameter message sent by vCollector. |
| dstPort | – | Obtained from the destination port present in the metadata list of the Diameter message sent by vCollector. |
| worker_group | String | Name of the worker group in which the corresponding traffic processing services (relay agent and mediation groups) are running. |
| relay_agent_group | String | The name of the relay agent group through which the Diameter message from vCollector is transmitted and where processing services are running. |
| mediation_group | String | The name of the mediation group where xDR processing services are running, allowing third-party applications to consume the processed data. |
Attributes
The following table includes information about common attributes of OCNADD.
Table 7-2 Attributes
| Attribute | Description |
|---|---|
| application | The name of the application that the microservice is a part of. |
| microservice | The name of the microservice. |
| namespace | The Kubernetes namespace in which the microservice is running. |
| node | The name of the worker node that the microservice is running on. |
| pod | The name of the Kubernetes pod. |
7.1.2 Metrics
This section provides information about important metrics related to OCNADD.
To retrieve the following Diameter metrics and other supported OCNADD metrics, see "OCNADD Metrics" section in the Oracle Communications Network Analytics Data Director User Guide.
kafka_stream_processor_node_process_totalkafka_stream_processor_node_process_ratekafka_stream_task_dropped_records_totalkafka_stream_task_dropped_records_rateocnadd_health_total_alarm_raised_totalocnadd_health_total_alarm_cleared_totalocnadd_health_total_active_number_of_alarm_raised_totalocnadd_ext_kafka_feed_record_total
7.2 KPIs
This section provides information about important KPIs related to OCNADD.
Note:
- The namespace in the KPIs should be updated to reflect the current namespace used in the Data Director deployment.
- The queries should be used per relay agent and/or mediation group of the worker group wherever applicable, such as KPIs for ingress and egress MPS, failure/success rate, packet drop, etc. The label "worker_group" should be used to filter based on the worker group name in the KPI queries.
- The queries are in PromQL and MQL syntax. Use PromQL for CNE and MQL for OCI-based deployments.
To retrieve the following Diameter KPIs and other supported OCNADD KPIs, see "OCNADD KPIs" section in the Oracle Communications Network Analytics Data Director User Guide.
ocnadd_ingress_record_count_by_serviceocnadd_ingress_record_count_totalocnadd_ingress_mps_per_service_10mAggocnadd_ingress_mps_10mAggocnadd_ingress_mps_per_service_10mAgg_last_24hocnadd_ingress_record_count_per_service_10mAgg_last_24hocnadd_kafka_ingress_record_drop_rate_10minAggocnadd_kafka_ingress_record_drop_rate_per_service_10minAggocnadd_ext_kafka_feed_record_totalper external feed rate (MPS)- Memory Usage per POD
- CPU Usage per POD
- Service Status
7.3 Alerts
This section provides information about the OCNADD alerts and their descriptions
Alerts Interpretation
The table below defines the alert severity interpretation based on the infrastructure.
Table 7-3 Alerts Interpretation
| Alert Severity | Interpretation |
|---|---|
| Critical | Critical |
| Major | Error |
| Minor | Error |
| Warning | Warning |
| Info | Info |
Note:
Alert OIDs are deprecated for OCI deployments.For information on monitoring the following Diameter alerts and other supported OCNADD alerts, see "OCNADD Alerts" section in the Oracle Communications Network Analytics Data Director User Guide.
- System Level Alerts
- Application Level Alerts
- OCNADD Alert Configuration
- OCNADD configuration when Prometheus is deployed without operator
7.3.1 Adding SNMP Support
OCNADD forwards the Prometheus alerts as Simple Network Management Protocol (SNMP) traps
to the southbound SNMP servers. OCNADD uses two SNMP MIB files to generate the traps.
The alert manager configuration is modified by updating the
alertmanager.yaml file. In the alertmanager.yaml
file, the alerts can be grouped based on pod name, alert name, severity, namespace, and
so on. The Prometheus alert manager is integrated with the Oracle Communications Cloud
Native Core, Cloud Native Environment (CNE) snmp-notifier service. The external
SNMP servers are set up to receive the Prometheus alerts as SNMP traps. The operator
must update the MIB files along with the alert manager file to fetch the SNMP traps in
their environment.
Note:
- SNMP is not supported on OCI.
- The following procedure requires admin privileges.
Procedures:
- Alert Manager Configuration
- Integrating with snmp-notifier service
- Verifying SNMP notification
- OCNADD MIB FILES
To configure the alert manager, see "Alert Manager Configuration" section in the Oracle Communications Network Analytics Data Director User Guide.
7.4 Alarms
This section provides information on all the alarms generated by OCNADD.
Alarm Types
The following table depicts the OCNADD alarm types and their ranges:
Table 7-4 Alarm Types
| Alarm Type | Description | Range |
|---|---|---|
| SECURITY | Security Violation | 1000–1999 |
| COMMUNICATION | Communication Failure | 2000–2999 |
| QOS | Quality Of Service | 3000–3999 |
| PROCESSING_ERROR | Processing Error | 4000–4999 |
| OPERATIONAL_ALARMS | Operational Alarms | 5000–5999 |
Note:
Alarm Purge or Clear CriteriaThe raised alarm will persist in the database and will be cleared or purged when either of the following conditions is met:
- The corresponding service sends a clear alarm request to the Alarm
service.
It is purged after the expiry of the configured purge alarm timeout. By default, it is 7 days.
For information on using the following, see "OCNADD Alarms" section in the Oracle Communications Network Analytics Data Director User Guide:
- OCNADD OIDs
- Alarm Type
- Communication Failure Alarms
- Processing Error Alarms
- Operational Alarms