7 Metrics, KPIs, Alerts, and Alarms

This chapter details the Metrics, KPIs, Alerts, and Alarms used by OCNADD.

7.1 Metrics, Dimensions, and Common Attributes

This section defines the metrics, dimensions, and attributes used by OCNADD.

7.1.1 Dimensions and Common Attributes

This section includes information about Dimensions and Common Attributes of metrics for OCNADD.

Dimensions

The following table includes information about dimensions of OCNADD.

Table 7-1 Dimensions

Dimension Values / Type Description
quantile Integer values It captures the latency values with ranges: 10 ms, 20 ms, 40 ms, 80 ms, 100 ms, 200 ms, 500 ms, 1000 ms, and 5000 ms.
instance_identifier Prefix configured in Helm, UNKNOWN Prefix of the pod configured in Helm when there are multiple instances in the same deployment.
processor_node_id Stream processor node ID in the aggregation service.
serviceId serviceType-N Identifier for the service instance used for registration with the health monitoring service.
serviceType CONFIGURATION, ALARM, OCNADD-ADMIN, AGGREGATION-DIAMETER, CORRELATION-DIAMETER The OCNADD service type.
service ocnaddadminservice, ocnaddconfiguration, ocnaddhealthmonitoring, ocnadddiameteraggregation, ocnadddiamtercorrelation The name of the Data Director microservice.
request_type Diameter Correlation Type of the data feed created using REST; this is used to identify if the xDR feed is for HTTP2 or Diameter.
nf_feed_type VCOLLECTOR The source NF for the feed or the name of the Diameter data provider.
correlation-id Taken from the correlation-id present in the metadata list.
way Taken from the message-direction present in the metadata list.
srcIP Obtained from the source IP address present in the metadata list of the Diameter message sent by vCollector.
dstIP Obtained from the destination IP address present in the metadata list of the Diameter message sent by vCollector.
srcPort Obtained from the source port present in the metadata list of the Diameter message sent by vCollector.
dstPort Obtained from the destination port present in the metadata list of the Diameter message sent by vCollector.
worker_group String Name of the worker group in which the corresponding traffic processing services (relay agent and mediation groups) are running.
relay_agent_group String The name of the relay agent group through which the Diameter message from vCollector is transmitted and where processing services are running.
mediation_group String The name of the mediation group where xDR processing services are running, allowing third-party applications to consume the processed data.

Attributes

The following table includes information about common attributes of OCNADD.

Table 7-2 Attributes

Attribute Description
application The name of the application that the microservice is a part of.
microservice The name of the microservice.
namespace The Kubernetes namespace in which the microservice is running.
node The name of the worker node that the microservice is running on.
pod The name of the Kubernetes pod.

7.1.2 Metrics

This section provides information about important metrics related to OCNADD.

To retrieve the following Diameter metrics and other supported OCNADD metrics, see "OCNADD Metrics" section in the Oracle Communications Network Analytics Data Director User Guide.

  • kafka_stream_processor_node_process_total
  • kafka_stream_processor_node_process_rate
  • kafka_stream_task_dropped_records_total
  • kafka_stream_task_dropped_records_rate
  • ocnadd_health_total_alarm_raised_total
  • ocnadd_health_total_alarm_cleared_total
  • ocnadd_health_total_active_number_of_alarm_raised_total
  • ocnadd_ext_kafka_feed_record_total

7.2 KPIs

This section provides information about important KPIs related to OCNADD.

Note:

  • The namespace in the KPIs should be updated to reflect the current namespace used in the Data Director deployment.
  • The queries should be used per relay agent and/or mediation group of the worker group wherever applicable, such as KPIs for ingress and egress MPS, failure/success rate, packet drop, etc. The label "worker_group" should be used to filter based on the worker group name in the KPI queries.
  • The queries are in PromQL and MQL syntax. Use PromQL for CNE and MQL for OCI-based deployments.

To retrieve the following Diameter KPIs and other supported OCNADD KPIs, see "OCNADD KPIs" section in the Oracle Communications Network Analytics Data Director User Guide.

  • ocnadd_ingress_record_count_by_service
  • ocnadd_ingress_record_count_total
  • ocnadd_ingress_mps_per_service_10mAgg
  • ocnadd_ingress_mps_10mAgg
  • ocnadd_ingress_mps_per_service_10mAgg_last_24h
  • ocnadd_ingress_record_count_per_service_10mAgg_last_24h
  • ocnadd_kafka_ingress_record_drop_rate_10minAgg
  • ocnadd_kafka_ingress_record_drop_rate_per_service_10minAgg
  • ocnadd_ext_kafka_feed_record_total per external feed rate (MPS)
  • Memory Usage per POD
  • CPU Usage per POD
  • Service Status

7.3 Alerts

This section provides information about the OCNADD alerts and their descriptions

Alerts Interpretation

The table below defines the alert severity interpretation based on the infrastructure.

Table 7-3 Alerts Interpretation

Alert Severity Interpretation
Critical Critical
Major Error
Minor Error
Warning Warning
Info Info

Note:

Alert OIDs are deprecated for OCI deployments.

For information on monitoring the following Diameter alerts and other supported OCNADD alerts, see "OCNADD Alerts" section in the Oracle Communications Network Analytics Data Director User Guide.

  • System Level Alerts
  • Application Level Alerts
  • OCNADD Alert Configuration
  • OCNADD configuration when Prometheus is deployed without operator

7.3.1 Adding SNMP Support

OCNADD forwards the Prometheus alerts as Simple Network Management Protocol (SNMP) traps to the southbound SNMP servers. OCNADD uses two SNMP MIB files to generate the traps. The alert manager configuration is modified by updating the alertmanager.yaml file. In the alertmanager.yaml file, the alerts can be grouped based on pod name, alert name, severity, namespace, and so on. The Prometheus alert manager is integrated with the Oracle Communications Cloud Native Core, Cloud Native Environment (CNE) snmp-notifier service. The external SNMP servers are set up to receive the Prometheus alerts as SNMP traps. The operator must update the MIB files along with the alert manager file to fetch the SNMP traps in their environment.

Note:

  • SNMP is not supported on OCI.
  • The following procedure requires admin privileges.

Procedures:

  • Alert Manager Configuration
  • Integrating with snmp-notifier service
  • Verifying SNMP notification
  • OCNADD MIB FILES

To configure the alert manager, see "Alert Manager Configuration" section in the Oracle Communications Network Analytics Data Director User Guide.

7.4 Alarms

This section provides information on all the alarms generated by OCNADD.

Alarm Types

The following table depicts the OCNADD alarm types and their ranges:

Table 7-4 Alarm Types

Alarm Type Description Range
SECURITY Security Violation 1000–1999
COMMUNICATION Communication Failure 2000–2999
QOS Quality Of Service 3000–3999
PROCESSING_ERROR Processing Error 4000–4999
OPERATIONAL_ALARMS Operational Alarms 5000–5999

Note:

Alarm Purge or Clear Criteria

The raised alarm will persist in the database and will be cleared or purged when either of the following conditions is met:

  • The corresponding service sends a clear alarm request to the Alarm service.

    It is purged after the expiry of the configured purge alarm timeout. By default, it is 7 days.

For information on using the following, see "OCNADD Alarms" section in the Oracle Communications Network Analytics Data Director User Guide:

  • OCNADD OIDs
  • Alarm Type
  • Communication Failure Alarms
  • Processing Error Alarms
  • Operational Alarms