Metrics, KPIs, Alerts, and Alarms

7.1 Metrics, Dimensions, and Common Attributes

This section defines the metrics, dimensions, and attributes used by OCNADD.

7.1.1 Dimensions and Common Attributes

This section includes information about Dimensions and Common Attributes of metrics for OCNADD.

Dimensions

The following table includes information about dimensions of OCNADD.

Table 7-1 Dimensions

Dimension	Values / Type	Description
quantile	Integer values	It captures the latency values with ranges: 10 ms, 20 ms, 40 ms, 80 ms, 100 ms, 200 ms, 500 ms, 1000 ms, and 5000 ms.
instance_identifier	Prefix configured in Helm, UNKNOWN	Prefix of the pod configured in Helm when there are multiple instances in the same deployment.
processor_node_id	–	Stream processor node ID in the aggregation service.
serviceId	serviceType-N	Identifier for the service instance used for registration with the health monitoring service.
serviceType	CONFIGURATION, ALARM, OCNADD-ADMIN, AGGREGATION-DIAMETER, CORRELATION-DIAMETER	The OCNADD service type.
service	ocnaddadminservice, ocnaddconfiguration, ocnaddhealthmonitoring, ocnadddiameteraggregation, ocnadddiamtercorrelation	The name of the Data Director microservice.
request_type	Diameter Correlation	Type of the data feed created using REST; this is used to identify if the xDR feed is for HTTP2 or Diameter.
nf_feed_type	VCOLLECTOR	The source NF for the feed or the name of the Diameter data provider.
correlation-id	–	Taken from the correlation-id present in the metadata list.
way	–	Taken from the message-direction present in the metadata list.
srcIP	–	Obtained from the source IP address present in the metadata list of the Diameter message sent by vCollector.
dstIP	–	Obtained from the destination IP address present in the metadata list of the Diameter message sent by vCollector.
srcPort	–	Obtained from the source port present in the metadata list of the Diameter message sent by vCollector.
dstPort	–	Obtained from the destination port present in the metadata list of the Diameter message sent by vCollector.
worker_group	String	Name of the worker group in which the corresponding traffic processing services (relay agent and mediation groups) are running.
relay_agent_group	String	The name of the relay agent group through which the Diameter message from vCollector is transmitted and where processing services are running.
mediation_group	String	The name of the mediation group where xDR processing services are running, allowing third-party applications to consume the processed data.

Attributes

The following table includes information about common attributes of OCNADD.

Table 7-2 Attributes

Attribute	Description
application	The name of the application that the microservice is a part of.
microservice	The name of the microservice.
namespace	The Kubernetes namespace in which the microservice is running.
node	The name of the worker node that the microservice is running on.
pod	The name of the Kubernetes pod.

7.1.2 Metrics

This section provides information about important metrics related to OCNADD.

To retrieve the following Diameter metrics and other supported OCNADD metrics, see "OCNADD Metrics" section in the Oracle Communications Network Analytics Data Director User Guide.

kafka_stream_processor_node_process_total
kafka_stream_processor_node_process_rate
kafka_stream_task_dropped_records_total
kafka_stream_task_dropped_records_rate
ocnadd_health_total_alarm_raised_total
ocnadd_health_total_alarm_cleared_total
ocnadd_health_total_active_number_of_alarm_raised_total
ocnadd_ext_kafka_feed_record_total

7.2 KPIs

This section provides information about important KPIs related to OCNADD.

Note:

The namespace in the KPIs should be updated to reflect the current namespace used in the Data Director deployment.
The queries should be used per relay agent and/or mediation group of the worker group wherever applicable, such as KPIs for ingress and egress MPS, failure/success rate, packet drop, etc. The label "worker_group" should be used to filter based on the worker group name in the KPI queries.
The queries are in PromQL and MQL syntax. Use PromQL for CNE and MQL for OCI-based deployments.

To retrieve the following Diameter KPIs and other supported OCNADD KPIs, see "OCNADD KPIs" section in the Oracle Communications Network Analytics Data Director User Guide.

ocnadd_ingress_record_count_by_service
ocnadd_ingress_record_count_total
ocnadd_ingress_mps_per_service_10mAgg
ocnadd_ingress_mps_10mAgg
ocnadd_ingress_mps_per_service_10mAgg_last_24h
ocnadd_ingress_record_count_per_service_10mAgg_last_24h
ocnadd_kafka_ingress_record_drop_rate_10minAgg
ocnadd_kafka_ingress_record_drop_rate_per_service_10minAgg
ocnadd_ext_kafka_feed_record_total per external feed rate (MPS)
Memory Usage per POD
CPU Usage per POD
Service Status

7.3 Alerts

This section provides information about the OCNADD alerts and their descriptions

Alerts Interpretation

The table below defines the alert severity interpretation based on the infrastructure.

Table 7-3 Alerts Interpretation

Alert Severity	Interpretation
Critical	Critical
Major	Error
Minor	Error
Warning	Warning
Info	Info

Note:

Alert OIDs are deprecated for OCI deployments.

For information on monitoring the following Diameter alerts and other supported OCNADD alerts, see "OCNADD Alerts" section in the Oracle Communications Network Analytics Data Director User Guide.

System Level Alerts
Application Level Alerts
OCNADD Alert Configuration
OCNADD configuration when Prometheus is deployed without operator

7.3.1 Adding SNMP Support

OCNADD forwards the Prometheus alerts as Simple Network Management Protocol (SNMP) traps to the southbound SNMP servers. OCNADD uses two SNMP MIB files to generate the traps. The alert manager configuration is modified by updating the alertmanager.yaml file. In the alertmanager.yaml file, the alerts can be grouped based on pod name, alert name, severity, namespace, and so on. The Prometheus alert manager is integrated with the Oracle Communications Cloud Native Core, Cloud Native Environment (CNE) snmp-notifier service. The external SNMP servers are set up to receive the Prometheus alerts as SNMP traps. The operator must update the MIB files along with the alert manager file to fetch the SNMP traps in their environment.

Note:

SNMP is not supported on OCI.
The following procedure requires admin privileges.

Procedures:

Alert Manager Configuration
Integrating with snmp-notifier service
Verifying SNMP notification
OCNADD MIB FILES

To configure the alert manager, see "Alert Manager Configuration" section in the Oracle Communications Network Analytics Data Director User Guide.

7.4 Alarms

This section provides information on all the alarms generated by OCNADD.

Alarm Types

The following table depicts the OCNADD alarm types and their ranges:

Table 7-4 Alarm Types

Alarm Type	Description	Range
SECURITY	Security Violation	1000–1999
COMMUNICATION	Communication Failure	2000–2999
QOS	Quality Of Service	3000–3999
PROCESSING_ERROR	Processing Error	4000–4999
OPERATIONAL_ALARMS	Operational Alarms	5000–5999

Note:

Alarm Purge or Clear Criteria

The raised alarm will persist in the database and will be cleared or purged when either of the following conditions is met:

The corresponding service sends a clear alarm request to the Alarm service.
It is purged after the expiry of the configured purge alarm timeout. By default, it is 7 days.

For information on using the following, see "OCNADD Alarms" section in the Oracle Communications Network Analytics Data Director User Guide:

OCNADD OIDs
Alarm Type
Communication Failure Alarms
Processing Error Alarms
Operational Alarms

7 Metrics, KPIs, Alerts, and Alarms

7.1 Metrics, Dimensions, and Common Attributes

7.1.1 Dimensions and Common Attributes

7.1.2 Metrics

7.2 KPIs

7.3 Alerts

7.3.1 Adding SNMP Support

7.4 Alarms