9 OCNADD KPIs

This section provides information about Key Performance Indicators (KPIs) used for Oracle Communications Network Analytics Data Director (OCNADD).

Note:

The "namespace" in the KPIs should be updated to reflect the current namespace used in the OCNADD deployment.

The queries should be used per worker group wherever applicable like KPIs for ingress and egress MPS, failure or success rate, packet drop, etc. The label "worker_group" should be used to filter on the basis of the worker group name in the KPI queries.

The following KPIs are added in OCNADD 23.4.0.0.1.

Table 9-1 ocnadd_ingress_record_count_by_service

KPI Detail Measures the total ingress records in kafka source topics per aggregation service at the current time
Metric Used for the KPI sum by (service)(kafka_stream_processor_node_process_total{namespace="$NAMESPACE", service=~".*aggregation.*"})
Service Operation NA
Response Code NA

Table 9-2 ocnadd_ingress_record_count_total

KPI Detail Measures the total ingress records in kafka source topics at the current time
Metric Used for the KPI sum (kafka_stream_processor_node_process_total{namespace="$NAMESPACE", service=~".*aggregation.*"})
Service Operation NA
Response Code NA

Table 9-3 ocnadd_ingress_mps_per_service_5mAgg

KPI Detail Measures the ingress MPS per service aggregated over 5min
Metric Used for the KPI sum by (service)(rate(kafka_stream_processor_node_process_total{namespace="$NAMESPACE",service=~".*aggregation.*"}[5m]))
Service Operation NA
Response Code NA

Table 9-4 ocnadd_ingress_mps_5mAgg

KPI Detail Measures the ingress MPS aggregated over 5min
Metric Used for the KPI sum(rate(kafka_stream_processor_node_process_total{namespace="$NAMESPACE",service=~".*aggregation.*"}[5m]))
Service Operation NA
Response Code NA

Table 9-5 ocnadd_ingress_mps_per_service_5mAgg_last_24h

KPI Detail Measures the ingress MPS per service aggregated over 5min for last 24 hours
Metric Used for the KPI sum by (service)(rate(kafka_stream_processor_node_process_total{namespace="$NAMESPACE",service=~".*aggregation.*"}[5m]))[24h:5m]
Service Operation NA
Response Code NA

Table 9-6 ocnadd_ingress_record_count_per_service_5mAgg_last_24h

KPI Detail Measures the ingress messages per service aggregated over 5min for last 24 hours
Metric Used for the KPI sum by (service)(increase(kafka_stream_processor_node_process_total{namespace="$NAMESPACE",service=~".*aggregation.*"}[5m]))[24h:5m]
Service Operation NA
Response Code NA

Table 9-7 ocnadd_kafka_ingress_record_drop_rate_5minAgg

KPI Detail Measures the total ingress message drop rate aggregated over 5min
Metric Used for the KPI sum(rate(kafka_stream_task_dropped_records_total{namespace="$NAMESPACE",service=~".*aggregation.*"}[5m]))
Service Operation NA
Response Code NA

Table 9-8 ocnadd_kafka_ingress_record_drop_rate_per_service_5minAgg

KPI Detail Measures the total ingress message drop rate per service aggregated over 5min
Metric Used for the KPI sum(rate(kafka_stream_task_dropped_records_total{namespace="$NAMESPACE",service=~".*aggregation.*}[5m])) by (service,pod)
Service Operation NA
Response Code NA

Table 9-9 ocnadd_egress_request_count_total_by_3rdparty_destination_endpoint

KPI Detail Total egress requests per 3rd party application per destination endpoint
Metric Used for the KPI sum by (instance_identifier,destination_endpoint)(ocnadd_egress_requests_total{namespace="$NAMESPACE"})
Service Operation POST
Response Code NA

Table 9-10 ocnadd_egress_response_count_total_by_3rdparty_destination_endpoint

KPI Detail Total egress responses per 3rd party application per destination endpoint
Metric Used for the KPI sum by (instance_identifier,destination_endpoint)(ocnadd_egress_responses_total{namespace="$NAMESPACE"}
Service Operation POST
Response Code NA

Table 9-11 ocnadd_egress_failure_count_total_by_3rdparty_destination_endpoint

KPI Detail Total egress failure count per 3rd party application per destination endpoint
Metric Used for the KPI sum by (destination_endpoint,instance_identifier)(ocnadd_egress_failed_request_total{namespace="$NAMESPACE"})
Service Operation POST
Response Code NA

Table 9-12 ocnadd_egress_request_rate_by_3rdparty_5mAgg

KPI Detail Total egress request rate per 3rd party application in 5min Aggregation
Metric Used for the KPI sum by (instance_identifier)(rate(ocnadd_egress_requests_total{namespace="$NAMESPACE"}[5m]))
Service Operation POST
Response Code NA

Table 9-13 ocnadd_egress_failure_rate_by_3rdparty_5mAgg

KPI Detail Total egress failure rate per 3rd party application in 5min Aggregation
Metric Used for the KPI sum by (instance_identifier)(irate(ocnadd_egress_failed_request_total{namespace="$NAMESPACE"}[5m]))

/

sum by (instance_identifier) (irate(ocnadd_egress_requests_total{namespace="$NAMESPACE"}[5m]))

Service Operation POST
Response Code NA

Table 9-14 ocnadd_egress_failure_rate_by_3rdparty_per_destination_endpoint_5mAgg

KPI Detail Total egress failure rate per 3rd party application per destination endpoint in 5min Aggregation
Metric Used for the KPI

sum by (instance_identifier, destination_endpoint)(irate(ocnadd_egress_failed_request_total{namespace="$NAMESPACE"}[5m]))

/

sum by (instance_identifier, destination_endpoint) (irate(ocnadd_egress_requests_total{namespace="$NAMESPACE"}[5m]))

Service Operation POST
Response Code NA

Table 9-15 ocnadd_e2e_avg_record_latency_by_3rdparty

KPI Detail Total e2e average latency per 3rd party application in 5min Aggregation
Metric Used for the KPI

(sum (irate(ocnadd_egress_e2e_request_processing_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier)

/

(sum (irate(ocnadd_egress_e2e_request_processing_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier)))

Service Operation POST
Response Code NA

Table 9-16 ocnadd_e2e_avg_record_latency_by_3rdparty_per_adapter_pod

KPI Detail Total e2e average latency per 3rd party application per egress adapter POD in 5min Aggregation
Metric Used for the KPI

(sum (irate(ocnadd_egress_e2e_request_processing_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod)

/

(sum (irate(ocnadd_egress_e2e_request_processing_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod)))

Service Operation POST
Response Code NA

Table 9-17 ocnadd_egress_adapter_processing_avg_record_latency_by_3rdparty_per_adapter_pod

KPI Detail Total service processing average latency per 3rd party application per adapter POD in 5min Aggregation
Metric Used for the KPI

(sum (irate(ocnadd_egress_service_request_processing_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod)

/

(sum (irate(ocnadd_egress_service_request_processing_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod)))

Service Operation POST
Response Code NA

Table 9-18 ocnadd_egress_adapter_request_processing_avg_record_latency_by_3rdparty_per_adapter_pod

KPI Detail Total request processing average latency per 3rd party application per adapter POD in 5min Aggregation, this includes network latency added by response from 3rd party application
Metric Used for the KPI

(sum (irate(ocnadd_egress_request_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod)

/

(sum (irate(ocnadd_egress_request_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod)))

Service Operation POST
Response Code NA

Table 9-19 ocnadd_egress_e2e_avg_latency_95percentile_for_a_given_egress_adapter

KPI Detail The 95 quantile value of e2e latency in milisec for egress adapter calculated over period of 5min
Metric Used for the KPI histogram_quantile(0.95, sum(rate(ocnadd_egress_e2e_request_processing_latency_seconds_bucket{namespace="$namespaces",service="$servicename"}[5m])) by (le))
Service Operation POST
Response Code NA

The following KPI should be used in the context of the management group and worker group, the namespaces may differ for the management and worker group if there is no default worker group.

Table 9-20 Memory Usage per POD

KPI Detail Measures the memory usage per POD
Metric Used for the KPI sum(container_memory_working_set_bytes{namespace=~"$Namespace",image!=""}/(1024*1024*1024)) by (pod)
Service Operation NA
Response Code NA

This KPI should be used in the context of the management group and worker group, the namespaces may differ for the management and worker group if there is no default worker group.

Table 9-21 CPU Usage per POD

KPI Detail Measures the CPU usage per POD
Metric Used for the KPI sum(rate(container_cpu_usage_seconds_total{namespace=~"$Namespace",image!=""}[2m])) by (pod) * 1000
Service Operation NA
Response Code NA

Table 9-22 Service Status

KPI Detail Provide the status of each of the data director service running in the namespace provided
Metric Used for the KPI up{namespace="$NAMESPACE"}
Service Operation NA
Response Code NA

Table 9-23 ocnadd_ext_kafka_feed_record_total per external feed rate(MPS)

KPI Detail The rate of messages consumed per sec per external Kafka consumer, calculated over period of 5min
Metric Used for the KPI sum(irate(ocnadd_ext_kafka_feed_record_total{namespace="$Namespace"}[5m])) by (feed_name)
Service Operation NA
Response Code NA