8 OCNADD KPIs
Important:
The "namespace" in the KPIs should be updated to reflect the current namespace used in the OCNADD deployment. The configured group message count should reflect in EGW_GROUP_MESSAGE_COUNT_FOR_LATENCY for KPIs.The following KPIs are added in OCNADD 22.0.0.
Table 8-1 ocnadd_ingress_record_count_by_service
KPI Detail | Measures the total ingress records in kafka source topics per aggregation service at the current time |
---|---|
Metric Used for the KPI | sum by (service)(kafka_stream_processor_node_process_total{namespace="$NAMESPACE"}) |
Service Operation | NA |
Response Code | NA |
Table 8-2 ocnadd_ingress_record_count_total
KPI Detail | Measures the total ingress records in kafka source topics at the current time |
---|---|
Metric Used for the KPI | sum (kafka_stream_processor_node_process_total{namespace="$NAMESPACE"}) |
Service Operation | NA |
Response Code | NA |
Table 8-3 ocnadd_ingress_mps_per_service_5mAgg
KPI Detail | Measures the total ingress MPS per service aggregated over 5min |
---|---|
Service Operation | NA |
Response Code | NA |
Metric Used for the KPI | sum by (service)(rate(kafka_stream_processor_node_process_total{namespace="$NAMESPACE"}[5m])) |
Table 8-4 ocnadd_total_ingress_mps_5mAgg
KPI Detail | Measures the total ingress MPS aggregated over 5min |
---|---|
Metric Used for the KPI | sum(rate(kafka_stream_processor_node_process_total{namespace="$NAMESPACE"}[5m])) |
Service Operation | NA |
Response Code | NA |
Table 8-5 ocnadd_ingress_mps_per_service_5mAgg_last_24h
KPI Detail | Measures the total ingress MPS per service aggregated over 5min for last 24 hours |
---|---|
Metric Used for the KPI | sum by (service)(rate(kafka_stream_processor_node_process_total{namespace="$NAMESPACE"}[5m]))[24h:5m] |
Service Operation | NA |
Response Code | NA |
Table 8-6 ocnadd_ingress_record_count_per_service_5mAgg_last_24h
KPI Detail | Measures the total ingress messages per service aggregated over 5min for last 24 hours |
---|---|
Metric Used for the KPI | sum by (service)(increase(kafka_stream_processor_node_process_total{namespace="$NAMESPACE"}[5m]))[24h:5m] |
Service Operation | NA |
Response Code | NA |
Table 8-7 ocnadd_kafka_ingress_record_drop_rate_5minAgg
KPI Detail | Measures the total ingress message drop rate aggregated over 5min |
---|---|
Metric Used for the KPI | sum(rate(kafka_stream_task_dropped_records_total{namespace="$NAMESPACE"}[5m])) |
Service Operation | NA |
Response Code | NA |
Table 8-8 ocnadd_kafka_ingress_record_drop_rate_per_service_5minAgg
KPI Detail | Measures the total ingress message drop rate per service aggregated over 5min |
---|---|
Metric Used for the KPI | sum(rate(kafka_stream_task_dropped_records_total{namespace="$NAMESPACE"}[5m])) by (service,pod) |
Service Operation | NA |
Response Code | NA |
Table 8-9 ocnadd_egw_request_count_total_by_3rdparty_destination_endpoint
KPI Detail | Total egress requests per 3rd party application per destination endpoint |
---|---|
Metric Used for the KPI | sum by (instance_identifier,destination_endpoint)(ocnadd_egressgateway_http_requests_total{namespace="$NAMESPACE"}) |
Service Operation | POST |
Response Code | NA |
Table 8-10 ocnadd_egw_response_count_total_by_3rdparty_destination_endpoint
KPI Detail | Total egress responses per 3rd party application per destination endpoint |
---|---|
Metric Used for the KPI | sum by (instance_identifier,destination_endpoint)(ocnadd_egressgateway_http_responses_total{namespace="$NAMESPACE"}) |
Service Operation | POST |
Response Code | NA |
Table 8-11 ocnadd_egw_failure_count_total_by_3rdparty_destination_endpoint
KPI Detail | Total egress failure count per 3rd party application per destination endpoint |
---|---|
Metric Used for the KPI | sum by (destination_endpoint,instance_identifier)(ocnadd_egressgateway_connection_failure_total{namespace="$NAMESPACE"}) |
Service Operation | POST |
Response Code | NA |
Table 8-12 ocnadd_egw_request_rate_by_3rdparty_5mAgg
KPI Detail | Total egress request rate per 3rd party application in 5min Aggregation |
---|---|
Metric Used for the KPI | sum by (instance_identifier)(rate(ocnadd_egressgateway_http_requests_total{namespace="$NAMESPACE"}[5m])) |
Service Operation | POST |
Response Code | NA |
Table 8-13 ocnadd_egw_failure_rate_by_3rdparty_5mAgg
KPI Detail | Total egress failure rate per 3rd party application in 5min Aggregation |
---|---|
Metric Used for the KPI |
sum by (instance_identifier)(rate(ocnadd_egressgateway_connection_failure_total{namespace="$NAMESPACE"}[5m])) / sum by (instance_identifier) (rate(ocnadd_egressgateway_http_requests_total{namespace="$NAMESPACE"}[5m])) |
Service Operation | POST |
Response Code | NA |
Table 8-14 ocnadd_egw_failure_rate_by_3rdparty_per_destination_endpoint_5mAgg
KPI Detail | Total egress failure rate per 3rd party application per destination endpoint in 5min Aggregation |
---|---|
Metric Used for the KPI |
sum by (instance_identifier, destination_endpoint)(rate(ocnadd_egressgateway_connection_failure_total{namespace="$NAMESPACE"}[5m])) / sum by (instance_identifier, destination_endpoint) (rate(ocnadd_egressgateway_http_requests_total{namespace="$NAMESPACE"}[5m])) |
Service Operation | POST |
Response Code | NA |
Table 8-15 ocnadd_e2e_avg_record_latency_by_3rdparty
KPI Detail | Total e2e average latency per 3rd party application in 5min Aggregation |
---|---|
Metric Used for the KPI |
(sum (irate(ocnadd_egressgateway_e2e_request_processing_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier) / (sum (irate(ocnadd_egressgateway_e2e_request_processing_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier) *$EGW_GROUP_MESSAGE_COUNT_FOR_LATENCY )) |
Service Operation | POST |
Response Code | NA |
Table 8-16 ocnadd_e2e_avg_record_latency_by_3rdparty_per_egw_pod
KPI Detail | Total e2e average latency per 3rd party application per EGW POD in 5min Aggregation |
---|---|
Metric Used for the KPI |
(sum (irate(ocnadd_egressgateway_e2e_request_processing_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod) / (sum (irate(ocnadd_egressgateway_e2e_request_processing_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod) *$EGW_GROUP_MESSAGE_COUNT_FOR_LATENCY )) |
Service Operation | POST |
Response Code | NA |
Table 8-17 ocnadd_egw_service_processing_avg_record_latency_by_3rdparty_per_egw_pod
KPI Detail | Total service processing average latency per 3rd party application per EGW POD in 5min Aggregation |
---|---|
Metric Used for the KPI |
(sum (irate(ocnadd_egressgateway_service_request_processing_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod) / (sum (irate(ocnadd_egressgateway_service_request_processing_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod) *$EGW_GROUP_MESSAGE_COUNT_FOR_LATENCY )) |
Service Operation | POST |
Response Code | NA |
Table 8-18 ocnadd_egw_request_processing_avg_record_latency_by_3rdparty_per_egw_pod
KPI Detail | Total request processing average latency per 3rd party application per EGW POD in 5min Aggregation, this includes network latency added by response from 3rd party application |
---|---|
Metric Used for the KPI |
(sum (irate(ocnadd_egressgateway_request_latency_seconds_sum{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod) / (sum (irate(ocnadd_egressgateway_request_latency_seconds_count{namespace="$NAMESPACE"}[5m])) by (instance_identifier,pod) *$EGW_GROUP_MESSAGE_COUNT_FOR_LATENCY )) |
Service Operation | POST |
Response Code | NA |
Table 8-19 ocnadd_egw_e2e_avg_record_latency_95percentile_by_3rdparty_per_egw_pod
KPI Detail | The 95 quantile value of e2e latency in milisec for egress gateway calculated over period of 5min |
---|---|
Metric Used for the KPI | histogram_quantile(0.95, sum(rate(ocnadd_egressgateway_e2e_request_processing_latency_seconds_bucket{namespace="$namespaces"}[5m])) by (le)) |
Service Operation | POST |
Response Code | NA |
Table 8-20 Memory Usage per POD
KPI Detail | Measures the memory usage per POD |
---|---|
Metric Used for the KPI | sum(container_memory_working_set_bytes{namespace=~"$Namespace",image!=""}/(1024*1024*1024)) by (pod) |
Service Operation | NA |
Response Code | NA |
Table 8-21 CPU Usage per POD
KPI Detail | Measures the CPU usage per POD |
---|---|
Metric Used for the KPI | sum(rate(container_cpu_usage_seconds_total{namespace=~"$Namespace",image!=""}[2m])) by (pod) * 1000 |
Service Operation | NA |
Response Code | NA |
Table 8-22 Service Status
KPI Detail | Provide the status of each of the data director service running in the namespace provided |
---|---|
Metric Used for the KPI | up{namespace="$NAMESPACE"} |
Service Operation | NA |
Response Code | NA |