5 SEPP Alerts
This section provides information about the SEPP alerts and their configuration.
Note:
For CNE1.8.4 or earlier versions:
- namespace: {{$labels.kubernetes_namespace}}
- podname: {{$labels.kubernetes_pod_name}}
For CNE 1.9.x or later versions:
- namespace: {{$labels.namespace}}
- podname: {{$labels.pod}}
5.1 System Level Alerts
5.1.1 SEPPPodMemoryUsageAlert
Table 5-1 SEPPPodMemoryUsageAlert
Trigger Condition | Pod memory usage is above the threshold (70% ) |
Severity | Warning |
Alert details provided | Summary'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory usage is {{ $value | printf "%.2f" }} which is above 70% (current value is: {{ $value }})'Expression:
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4003 |
Metric Used |
kube_pod_container_resource_limits Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution |
The alert gets cleared when the memory utilization falls below the critical threshold. Note: The threshold is configurable in the SeppAlertrules.yaml file. If guidance is required, contact My Oracle Support. |
5.1.2 SEPPPodCpuUsageAlert
Table 5-2 SEPPPodCpuUsageAlert
Field | Details |
---|---|
Trigger Condition | Pod CPU usage is above the threshold ( 70% ) |
Severity | Warning |
Alert details provided | Summary'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: CPU usage is {{ $value | printf "%.2f" }} which is usage is above 70% (current value is: {{ $value }})'Expression: (sum by (namespace,container) (rate(container_cpu_usage_seconds_total{container=~".*cn32c-svc.*|.*pn32c-svc.*|.*cn32f-svc.*|.*pn32f-svc.*|.*config-mgr-svc.*|.*n32-egress-gateway.*|.*n32-ingress-gateway.*|.*plmn-egress-gateway.*|.*plmn-ingress-gateway.*|.*nf-mediation.*"}[2m])) ) / (sum by (container, namespace) (kube_pod_container_resource_limits{resource="cpu",container=~".*cn32c-svc.*|.*pn32c-svc.*|.*cn32f-svc.*|.*pn32f-svc.*|.*config-mgr-svc.*|.*n32-egress-gateway.*|.*n32-ingress-gateway.*|.*plmn-egress-gateway.*|.*plmn-ingress-gateway.*|.*nf-mediation.*"}) ) * 100 >= 70 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4002 |
Metric Used |
container_cpu_usage_seconds_total Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system. |
Resolution | The alert gets cleared when the CPU utilization is below the
critical threshold.
Note: The threshold is configurable in the SeppAlertrules.yaml file. If guidance is required, contact My Oracle Support. |
5.2 Application Level Alerts
5.2.1 Common Alerts
5.2.1.1 SEPPN32fRoutingFailure
Table 5-3 SEPPN32fRoutingFailure
Trigger Condition | N32f service not able to forward message |
Severity | Info |
Alert details provided | Summary'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: N32f service not able to forward message because {{ $labels.error_msg }}'Expression: idelta(ocsepp_cn32f_requests_failure_total[2m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4001 |
Metric Used | ocsepp_cn32f_requests_failure_total |
Resolution |
The alert gets cleared when Consumer SEPP accepts request only if producer NF domain and PLMN match the Remote SEPP configured. Steps: The failure reason is present in the alert. Possible Resolutions :
|
5.2.1.2 SEPPConfigMgrRouteFailureAlert
Table 5-4 SEPPConfigMgrRouteFailureAlert
Trigger Condition | When routing failure occurs while posting remote SEPP or roaming partner set, this alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Route Failure has occurred because {{ $labels.errorReason }} Expression sum(increase(ocsepp_configmgr_routefailure_total{app="config-mgr-svc"}[5m]) >0 or (ocsepp_configmgr_routefailure_total{app="config-mgr-svc"} unless ocsepp_configmgr_routefailure_total{app="config-mgr-svc"} offset 5m )) by (namespace,errorCode) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4026 |
Metric Name | Metric ocsepp_configmgr_routefailure_total |
Resolution | The alert is cleared if no new failures are observed in 5 minutes window. |
5.2.1.3 EgressSbiErrorRateAbove1Percent
Table 5-5 EgressSbiErrorRateAbove1Percent
Trigger Condition | Sbi Transaction Error Rate exceeded configured threshold |
Severity | Major |
Alert details provided | Summary"Sbi Transaction Error Rate detected above 1 Percent of Total Sbi Transactions"Expression sum(rate(oc_egressgateway_sbiRouting_http_responses_total{Status!~"2.*"}[24h])) by (app,pod, namespace) /sum(rate(oc_egressgateway_sbiRouting_http_responses_total[24h])) by (app,pod, namespace) *100 >= 1 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7001 |
Metric Used | oc_egressgateway_sbiRouting_http_responses_total |
Resolution |
This alert will be raised when the total SBI transaction error rate will be above 1% of the total transaction done during 24 hour time period. Metric will be cleared when the error rate will be below 1% |
5.2.2 Handshake Alerts
5.2.2.1 SEPPCn32cHandshakeFailureAlert
Table 5-6 SEPPCn32cHandshakeFailureAlert
Trigger Condition | Handshake procedure has failed on Consumer SEPP |
Severity | Major |
Alert details provided | Summary'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Handshake procedure has failed on Consumer side because {{ $labels.reason }}'Expression: sum(increase(ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"}[5m]) >0 or (ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"} unless ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"} offset 5m )) by (namespace,remote_sepp_name,nfinstanceid,peer_fqdn,app) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.2001 |
Metric Used | ocsepp_n32c_handshake_failure_attempts_total filtered by app=cn32-svc |
Resolution 1 | The alert gets cleared when the N32C Handshake is established
after successful TCP connection to remote SEPP.
Failure reason:
Release name used while helm installation is other than
ocsepp-release .
Error Verification: Check the
failure reason in the alert. If the failure reason is 404 –route not found
or Route not found, follow the recovery steps:
|
Resolution 2 |
The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP. Steps: The failure reason is present in the alert. Possible Resolutions:
|
5.2.2.2 SEPPPn32cHandshakeFailureAlert
Table 5-7 SEPPPn32cHandshakeFailureAlert
Trigger Condition | Handshake procedure has failed on Producer sepp |
Severity | Major |
Alert details provided | Summary'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Handshake procedure has failed on Producer side because {{ $labels.error_msg }}'Expression: sum(increase(ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"}[5m]) >0 or (ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"} unless ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"} offset 5m )) by (namespace,remote_sepp_name,nfinstanceid,peer_fqdn,app) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.3001 |
Metric Used | ocsepp_n32c_handshake_failure_attempts_total filtered by app=pn32-svc |
Resolution |
The alert gets cleared when the N32C Handshake is successful due to TCP connection success of Producer to consumer SEPP. Steps: The failure reason is present in the alert. Possible Resolution: Update and reinitiate the Handshake. |
5.2.3 Upgrade Alerts
5.2.3.1 SEPPUpgradeStartedAlert
Table 5-8 SEPPUpgradeStartedAlert
Trigger Condition | Rest API trigger at start of Upgrade |
Severity | NA |
Alert details provided |
applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.8001 |
Metric Used | NA |
Resolution |
If a success alert is generated then start and failure alerts will be cleared. |
5.2.3.2 SEPPUpgradeFailedAlert
Table 5-9 SEPPUpgradeFailedAlert
Trigger Condition | Rest API trigger at failure of Upgrade |
Severity | NA |
Alert details provided |
applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.8002 |
Metric Used | NA |
Resolution |
If a success alert is generated then start and failure alerts will be cleared. |
5.2.3.3 SEPPUpgradeSuccessfulAlert
Table 5-10 SEPPUpgradeSuccessfulAlert
Trigger Condition | Rest API trigger at success of Upgrade |
Severity | NA |
Alert details provided |
applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.8003 |
Metric Used | NA |
Resolution |
If a success alert is generated then start and failure alerts will be cleared. |
5.2.4 Rollback Alerts
5.2.4.1 SEPPRollbackStartedAlert
Table 5-11 SEPPRollbackStartedAlert
Trigger Condition | Rest API trigger at start of Rollback |
Severity | NA |
Alert details provided |
applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.8004 |
Metric Used | NA |
Resolution |
If a success alert is generated then start and failure alerts will be cleared. |
5.2.4.2 SEPPRollbackFailedAlert
Table 5-12 SEPPRollbackFailedAlert
Trigger Condition | Rest API trigger at failure of Rollback |
Severity | NA |
Alert details provided |
applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.8005 |
Metric Used | NA |
Resolution |
If a success alert is generated then start and failure alerts will be cleared. |
5.2.4.3 SEPPRollbackSuccessfulAlert
Table 5-13 SEPPRollbackSuccessfulAlert
Trigger Condition | Rest API trigger at success of Rollback |
Severity | NA |
Alert details provided |
applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.8006 |
Metric Used | NA |
Resolution | Cleared after DEFAULT_DURATION_FOR_ALERT_EXPIRY minutes |
5.2.5 Global Rate Limiting on Ingress Gateway of SEPP Alerts
5.2.5.1 IngressGlobalMessageDropAbovePointOnePercent
Table 5-14 IngressGlobalMessageDropAbovePointOnePercent
Trigger Condition | Ingress Global Message Drop Rate detected greater than or equal to 0.1 Percent of Total Transactions. |
Severity | Warning |
Alert details provided | Summary"Ingress Global Message Drop Rate detected above 0.1 Percent of Total Transactions"Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 0.1 < 1 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7002 |
Metric Used | oc_ingressgateway_global_ratelimit_total |
Resolution |
The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 0.1% of the total messages received. This will get cleared once percentage of message rejected is below 0.1% or greater than or equal to 1%. |
5.2.5.2 IngressGlobalMessageDropAbove1Percent
Table 5-15 IngressGlobalMessageDropAbove1Percent
Trigger Condition | Ingress Global Message Drop Rate detected greater than or equal to 1 Percent of Total Transactions. |
Severity | Warning |
Alert details provided | Summary"Ingress Global Message Drop Rate detected above 1 Percent of Total Transactions"Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 1 < 10 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7003 |
Metric Used | oc_ingressgateway_global_ratelimit_total |
Resolution |
The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 1% of the total messages received. This will get cleared once percentage of message rejected is below 1% greater than or equal to 10%. |
5.2.5.3 IngressGlobalMessageDropAbove10Percent
Table 5-16 IngressGlobalMessageDropAbove10Percent
Trigger Condition | Ingress Global Message Drop Rate detected greater than or equal to 10 Percent of Total Transactions |
Severity | Minor |
Alert details provided | Summary"Ingress Global Message Drop Rate detected above 10 Percent of Total Transactions"Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 10 < 25 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7004 |
Metric Used | oc_ingressgateway_global_ratelimit_total |
Resolution |
The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 10% of the total messages received. This will get cleared once percentage of message rejected is below 10% or greater than or equal to 25% . |
5.2.5.4 IngressGlobalMessageDropAbove25Percent
Table 5-17 IngressGlobalMessageDropAbove25Percent
Trigger Condition | Ingress Global Message Drop Rate detected greater than or equal to 25 Percent of Total Transactions |
Severity | Major |
Alert details provided | Summary"Ingress Global Message Drop Rate detected above 25 Percent of Total Transactions"Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 25 < 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7005 |
Metric Used | oc_ingressgateway_global_ratelimit_total |
Resolution |
The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 25% of the total messages received.This will get cleared once percentage of message rejected is below 25% or greater than or equal to 50%. |
5.2.5.5 IngressGlobalMessageDropAbove50Percent
Table 5-18 IngressGlobalMessageDropAbove50Percent
Trigger Condition | Ingress Global Message Drop Rate detected greater than or equal to 50 Percent of Total Transactions |
Severity | Critical |
Alert details provided | Summary"Ingress Global Message Drop Rate detected above 50 Percent of Total Transactions"Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7006 |
Metric Used | oc_ingressgateway_global_ratelimit_total |
Resolution |
The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 50% of the total messages received.This will get cleared once percentage of message rejected is below 50%. |
5.2.6 Topology Hiding Alerts
5.2.6.1 SEPPN32fTopologyOperationFailureAlert
Table 5-19 SEPPN32fTopologyOperationFailureAlert
Field | Details |
---|---|
Trigger Condition | Topology Hiding or Recovery Failure exceeded configured threshold (1%) |
Severity | Major |
Alert details provided | Summary"Topology hiding/recovery operation failres reached more than configured threshold"Expression delta(ocsepp_topology_header_failure_total[2m])>0 or (ocsepp_topology_header_failure_total unless ocsepp_topology_header_failure_total offset 2m) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4004 |
Metric Used | ocsepp_topology_header_failure_total, ocsepp_topology_header_success_total |
Resolution |
This alert will be raised when the total Topology Hiding or Recovery failures reach more than 1%. Alert will be cleared when the error rate is below 1%. Possible Resolutions:
Note: The alert will be cleared only if the corresponding success metric is pegged. |
5.2.6.2 SEPPN32fTopologyBodyOperationFailureAlert
Table 5-20 SEPPN32fTopologyBodyOperationFailureAlert
Field | Details |
---|---|
Trigger Condition |
Topology Operation failed and exceeds defined threshold |
Severity | Major |
Alert details provided | Summary"Topology Hiding/Recovery Operation failures reached more than configured threshold"Expression: delta(ocsepp_topology_body_failure_total[2m])>0 or (ocsepp_topology_body_failure_total unless ocsepp_topology_body_failure_total offset 2m) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4006 |
Metric Used | ocsepp_topology_body_failure_total ocsepp_topology_body_success_total |
Resolution | This alert will be raised when the total Topology Hiding or Recovery
for message body failures reach more than 1%.
Alert will be cleared
when the error rate will be below 1%.
Possible Resolutions:
|
5.2.7 5G SBI Message Mediation Support Alerts
5.2.7.1 SEPPCN32fMediationFailure
Table 5-21 SEPPCN32fMediationFailure
Trigger Condition |
Mediation processing Failure |
Severity | Info |
Alert details provided | Summary"Mediation processing Failure"Expression: increase(ocsepp_cn32f_mediation_response_failure{status_code!="504 GATEWAY_TIMEOUT"}[10m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4007 |
Metric Used | ocsepp_cn32f_mediation_response_failure |
Resolution |
This alert will be raised when Mediation microservice is unable to apply rules on the incoming request & response from SEPP. Possible Resolution:
|
5.2.7.2 SEPPCN32fMediationUnreachable
Table 5-22 SEPPCN32fMediationUnreachable
Trigger Condition |
Mediation service is not accessible |
Severity | Critical |
Alert details provided | Summary"Mediation service is not accessible"Expression: increase(ocsepp_cn32f_mediation_response_failure {status_code="504 GATEWAY_TIMEOUT"}[10m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4008 |
Metric Used | ocsepp_cn32f_mediation_response_failure |
Resolution |
This alert will be raised when Mediation microservice is not accessible. Possible Resolution:
|
5.2.7.3 SEPPPN32fMediationFailure
Table 5-23 SEPPPN32fMediationFailure
Trigger Condition |
Mediation processing Failure |
Severity | Info |
Alert details provided | Summary"Mediation processing Failure"Expression: increase(ocsepp_pn32f_mediation_response_failure {status_code!="504 GATEWAY_TIMEOUT"}[10m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4009 |
Metric Used | ocsepp_pn32f_mediation_response_failure |
Resolution |
This alert will be raised when Mediation microservice is unable to apply rules on the incoming request & response from SEPP. Possible Resolution:
|
5.2.7.4 SEPPPN32fMediationUnreachable
Table 5-24 SEPPPN32fMediationUnreachable
Trigger Condition |
Mediation service is not accessible |
Severity | Critical |
Alert details provided | Summary"Mediation service is not accessible"Expression: increase(ocsepp_pn32f_mediation_response_failure {status_code="504 GATEWAY_TIMEOUT"}[10m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4010 |
Metric Used | ocsepp_pn32f_mediation_response_failure |
Resolution |
This alert will be raised when Mediation microservice is not accessible. Possible Resolution:
|
5.2.8 Overload Control Alerts
5.2.8.1 SEPPServiceOverload65Percent
Table 5-25 SEPPServiceOverload65Percent
Trigger Condition | CPU memory of pn32f-svc more than 65% |
Severity | Warning |
Alert details provided | SummaryBackend service is in overload with load level > 65%Expression service_resource_overload_level == 1 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7007 |
Metric Used | service_resource_overload_level |
Resolution |
The alert will be cleared when CPU Memory for backend-svc goes below 60%. |
5.2.8.2 SEPPServiceOverload70Percent
Table 5-26 SEPPServiceOverload70Percent
Trigger Condition | CPU memory of pn32f-svc more than 70% |
Severity | Minor |
Alert details provided | SummaryBackend service is in overload with load level > 70%Expression service_resource_overload_level == 2 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7008 |
Metric Used | service_resource_overload_level |
Resolution |
The alert will be cleared when CPU Memory for backend-svc goes below 70% |
5.2.8.3 SEPPServiceOverload80Percent
Table 5-27 SEPPServiceOverload80Percent
Trigger Condition | CPU memory of pn32f-svc more than 80% |
Severity | Major |
Alert details provided | SummaryBackend service is in overload with load level > 80%Expression service_resource_overload_level == 3 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7009 |
Metric Used | service_resource_overload_level |
Resolution |
The alert will be cleared when CPU Memory for backend-svc goes below 80% |
5.2.8.4 SEPPServiceOverload90Percent
Table 5-28 SEPPServiceOverload90Percent
Trigger Condition | CPU memory of pn32f-svc more than 90% |
Severity | Critical |
Alert details provided | SummaryBackend service is in overload with load level > 90%Expression service_resource_overload_level == 4 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7010 |
Metric Used | service_resource_overload_level |
Resolution |
The alert will be cleared when CPU Memory for backend-svc goes below 90% |
5.2.9 Hosted SEPP Alerts
5.2.9.1 SEPPPn32fHSRoutingFailureAlert
Table 5-29 SEPPPn32fHSRoutingFailureAlert
Trigger Condition | When the routing failure rate at Pn32f service is greater than 20 percentage. |
Severity | Major |
Alert details provided | Allowed P-RSS Validation failure at Roaming Hub
Expression ((sum by(namespace, app, nfInstanceId, pod) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod) (ocsepp_pn32f_requests_total))) > 0.2 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4013 |
Metric Used | ocsepp_allowed_p_rss_routing_failure_total , ocsepp_pn32f_requests_total |
Resolution | The alert gets automatically cleared when the failure rate at pn32f microservice goes below 20 percent. |
5.2.9.2 SEPPCn32fHSRoutingFailureAlert
Table 5-30 SEPPCn32fHSRoutingFailureAlert
Trigger Condition | When the routing failure rate at Cn32f service is greater than 20 percentage. |
Severity | Minor |
Alert details provided | Allowed P-RSS Validation failure at Roaming Hub for
Consumer SEPP.
Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.5 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4014 |
Metric Used | ocsepp_allowed_p_rss_routing_failure_total , ocsepp_cn32f_requests_total |
Resolution | The alert gets automatically cleared when the failure rate at cn32f microservice goes below 50 percent. |
5.2.9.3 SEPPCn32fHSRoutingFailureAlert
Table 5-31 SEPPCn32fHSRoutingFailureAlert
Trigger Condition | When the routing failure rate at Cn32f service is greater than 20 percentage. |
Severity | Major |
Alert details provided | Allowed P-RSS Validation failure at Roaming Hub for
Consumer SEPP.
Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.6 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4015 |
Metric Used | ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total |
Resolution | The alert gets automatically cleared when the failure rate at cn32f microservice goes below 60 percent. |
5.2.9.4 SEPPCn32fHSRoutingFailureAlert
Table 5-32 SEPCn32fHSRoutingFailureAlert
Trigger Condition | When the routing failure rate at Cn32f service is greater than 20 percentage. |
Severity | Critical |
Alert details provided | Allowed P-RSS Validation failure at Roaming Hub for
Consumer SEPP.
Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.65 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4016 |
Metric Used | ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total |
Resolution | The alert gets automatically cleared when the failure rate at cn32f microservice goes below 65 percent. |
5.2.9.5 SEPPCn32fHSRoutingFailureAlert
Table 5-33 SEPCn32fHSRoutingFailureAlert
Trigger Condition | When the routing failure rate at Cn32f service is greater than 20 percentage. |
Severity | Warning |
Alert details provided | Allowed P-RSS Validation failure at Roaming Hub for
Consumer SEPP.
Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.25 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4017 |
Metric Used | ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total |
Resolution | The alert gets automatically cleared when the failure rate at cn32f microservice goes below 25 percent. |
5.2.10 SEPP Message Feed Alerts
5.2.10.1 DDUnreachableFromN32IGW
Table 5-34 DDUnreachableFromN32IGW
Trigger Condition | This alarm is raised when Data Director is not reachable from N32 Ingress Gateway. |
Severity | major |
Alert details provided | Summary (oc_ingressgateway_dd_unreachable{app="n32-ingress-gateway"} == 1) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4018 |
Metric Used | oc_ingressgateway_dd_unreachable |
Resolution | Alert gets cleared automatically when the connection with Data Director is established. |
5.2.10.2 DDUnreachableFromPLMNIGW
Table 5-35 DDUnreachableFromPLMNIGW
Trigger Condition | This alarm is raised when Data Director is not reachable from PLMN Ingress Gateway. |
Severity | major |
Alert details provided | Summary (oc_ingressgateway_dd_unreachable{app="n32-ingress-gateway"} == 1) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4019 |
Metric Used | oc_ingressgateway_dd_unreachable |
Resolution | Alert gets cleared automatically when the connection with Data Director is established. |
5.2.10.3 DDUnreachableFromN32EGW
Table 5-36 DDUnreachableFromN32EGW
Trigger Condition | This alarm is raised when Data Director is not reachable from N32 Egress Gateway. |
Severity | major |
Alert details provided | Summary (oc_egressgateway_dd_unreachable{app="n32-egress-gateway"} == 1) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4020 |
Metric Used | oc_egressgateway_dd_unreachable |
Resolution | Alert gets cleared automatically when the connection with Data Director is established. |
5.2.10.4 DDUnreachableFromPLMNEGW
Table 5-37 DDUnreachableFromPLMNEGW
Trigger Condition | This alarm is raised when Data Director is not reachable from PLMN Egress Gateway. |
Severity | major |
Alert details provided | Summary (oc_egressgateway_dd_unreachable{app="plmn-egress-gateway"} == 1) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4021 |
Metric Used | oc_egressgateway_dd_unreachable |
Resolution | Alert gets cleared automatically when the connection with Data Director is established. |
5.2.11 Steering of Roaming (SOR) Alerts
5.2.11.1 SEPPPn32fSORFailureAlertPercent30to40
Table 5-38 SEPPPn32fSORFailureAlertPercent30to40
Field | Details |
---|---|
Trigger Condition | 30% to 40% of SOR traffic results in failure. |
Severity | Minor |
Alert details provided | Summary:
'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' Expression:sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.3 and sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)<0.4 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4022 |
Metric Used | ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total |
Resolution |
This alert will be raised when the percentage failure of SOR responses is in the range 30%-40%, in the sample collected in last 2 min. Possible Resolutions :
|
5.2.11.2 SEPPPn32fSORFailureAlertPercent40to50
Table 5-39 SEPPPn32fSORFailureAlertPercent40to50
Field | Details |
---|---|
Trigger Condition | 40% to 50% of SOR traffic results in failure. |
Severity | Major |
Alert details provided | Summary:
'namespace: {{$labels.namespace}}, timestamp: {{ with
query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end
}}'
Expression: sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.4 and sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)<0.5 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4023 |
Metric Used | ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total |
Resolution |
This alert will be raised when the percentage failure of SOR responses is in the range 40%-50%, in the sample collected in last 2 min. Possible Resolutions :
|
5.2.11.3 SEPPPn32fSORFailureAlertPercentAbove50
Table 5-40 SEPPPn32fSORFailureAlertPercentAbove50
Field | Details |
---|---|
Trigger Condition | 50% of SOR traffic results in failure |
Severity | Critical |
Alert details provided | Summary:
'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' Expression:sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.5 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4024 |
Metric Used | ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total |
Resolution |
This alert will be raised when the percentage failure of SOR responses is above 50%, in the sample collected in last 2 min. Possible Resolutions :
|
5.2.11.4 SEPPPn32fSORTimeoutFailureAlert
Table 5-41 SEPPPn32fSORTimeoutFailureAlert
Field | Details |
---|---|
Trigger Condition | Increase of more than five timeout errors in last two minutes for SOR. |
Severity | critical |
Alert details provided | Summary:
'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' Expression: idelta(ocsepp_pn32f_sor_timeout_failure_total[2m]) > 5 or (ocsepp_pn32f_sor_timeout_failure_total unless ocsepp_pn32f_sor_timeout_failure_total offset 2m) |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4025 |
Metric Used | ocsepp_pn32f_sor_timeout_failure_total |
Resolution |
This alert will be raised when the response received from SOR Server suggests that server is either down or unreachable for more than five error counts in the sample collected in last two minutes. Possible Resolutions :
|
5.2.12 Global Rate Limiting on Ingress Gateway of SEPP Alerts
5.2.12.1 Ingress RSS Rate Limit per RSS Message Drop Above Point one Percent Alert
Table 5-42 Ingress RSS Rate Limit per RSS Message Drop Above Point one Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 0.1 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised. |
Severity | Warning |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 0.1 Percent of Total Transactions of that RSSExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 0.1 < 10 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7011 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.2 Ingress RSS Rate Limit per RSS Message Drop Above 10 Percent Alert
Table 5-43 Ingress RSS Rate Limit per RSS Message Drop Above 10 Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 10 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised. |
Severity | Minor |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 10 Percent of Total Transactions of that RSSExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 10 < 25 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7012 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.3 Ingress RSS Rate Limit per RSS Message Drop Above 25 Percent Alert
Table 5-44 Ingress RSS Rate Limit per RSS Message Drop Above 25 Percent Alert:
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 25 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 25 Percent of Total Transactions of that RSSExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 25 < 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7013 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.4 Ingress RSS Rate Limit per RSS Message Drop Above 50 Percent Alert
Table 5-45 Ingress RSS Rate Limit per RSS Message Drop Above 50 Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 50 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised. |
Severity | Critical |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 50 Percent of Total Transactions of that RSSExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7014 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.5 Ingress RSS Rate Limit Message Drop Above Point one Percent Alert
Table 5-46 Ingress RSS Rate Limit Message Drop Above Point one Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 0.1 percent of total transactions, this metric will be pegged and corresponding alert will be raised. |
Severity | Warning |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 0.1 Percent of Total TransactionExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (namespace) *100 >= 0.1 < 1 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7015 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.6 Ingress RSS Rate Limit Message Drop Above one Percent Alert
Table 5-47 Ingress RSS Rate Limit Message Drop Above one Percent Alert:
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 1 percent of total transactions, this metric will be pegged and corresponding alert will be raised. |
Severity | Warning |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 1 Percent of Total TransactionsExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (namespace) *100 >= 1 < 10 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7016 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.7 Ingress RSS Rate Limit Message Drop Above 10 Percent Alert
Table 5-48 Ingress RSS Rate Limit Message Drop Above 10 Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 10 percent of total transactions, this metric will be pegged and corresponding alert will be raised. |
Severity | Minor |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 10 Percent of Total Transactions.Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 10 < 25 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7017 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.8 Ingress RSS Rate Limit Message Drop Above 25 Percent Alert
Table 5-49 Ingress RSS Rate Limit Message Drop Above 25 Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 25 percent of total transactions, this metric will be pegged and corresponding alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 25 Percent of Total TransactionsExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 25 < 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7018 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.12.9 Ingress RSS Rate Limit Message Drop Above 50 Percent Alert
Table 5-50 Ingress RSS Rate Limit Message Drop Above 50 Percent Alert
Trigger Condition | If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 50 percent of total transactions, this metric will be pegged andcorresponding alert will be raised. |
Severity | Critical |
Alert Details Provided |
Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 50 Percent of Total TransactionsExpression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.7019 |
Metric Name | oc_ingressgateway_rss_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.13 Cat-0 SBI Message Schema Validation Alerts
5.2.13.1 SEPPN32fMessageValidationOnHeaderFailureMinorAlert
Table 5-51 SEPPN32fMessageValidationOnHeaderFailureMinorAlert
Field | Details |
---|---|
Trigger Condition | Message validation failed for request query parameters for 40 % of requests (on which message validation was applied) in last 2 minutes. |
Severity | minor |
Alert Details Provided |
Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}Expression: (sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 40 < 60 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4026 |
Metric Used | ocsepp_message_validation_on_header_failure_total |
Resolution | The alerts gets cleared when the count is not between 40 to 60. |
5.2.13.2 SEPPN32fMessageValidationOnHeaderFailureMajorAlert
Table 5-52 SEPPN32fMessageValidationOnHeaderFailureMajorAlert
Field | Description |
---|---|
Trigger Condition | Message validation failed for request query parameters for 60 % of requests(on which message validation was applied) in last 2 minutes. |
Severity | major |
Alert Details Provided |
Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}Expression: (sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 60 < 80 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4027 |
Metric Name | ocsepp_message_validation_on_header_failure_total |
Resolution | The alerts gets cleared when the count is not between 60 to
80.Possible Resolutions:
|
5.2.13.3 SEPPN32fMessageValidationOnHeaderFailureCriticalAlert
Table 5-53 SEPPN32fMessageValidationOnHeaderFailureCriticalAlert
Field | Description |
---|---|
Trigger Condition | Message validation failed for request query parameters for 80 % of requests(on which message validation was applied) in last 2 minutes. |
Severity | critical |
Alert Details Provided |
Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}Expression: (sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 80 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4028 |
Metric Name | ocsepp_message_validation_on_header_failure_total |
Resolution | The alerts gets cleared when the count is not between 80 to 100. |
5.2.13.4 SEPPN32fMessageValidationOnBodyFailureMinorAlert
Table 5-54 SEPPN32fMessageValidationOnBodyFailureMinorAlert
Field | Description |
---|---|
Trigger Condition | Message validation failed for request body for 40 % of requests(on which message validation was applied) in last 2 minutes. |
Severity | minor |
Alert Details Provided |
Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}Expression: (sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 40 < 60 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4029 |
Metric Name | ocsepp_message_validation_on_body_failure_total |
Resolution | The alerts gets cleared when the count is not between 60 to 100. |
5.2.13.5 SEPPN32fMessageValidationOnBodyFailureMajorAlert
Table 5-55 SEPPN32fMessageValidationOnBodyFailureMajorAlert
Field | Details |
---|---|
Trigger Condition | Message validation failed for request body for 60 % of requests(on which message validation was applied) in last 2 minutes. |
Severity | major |
Alert Details Provided |
Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}Expression: (sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 60 < 80 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4030 |
Metric Name | ocsepp_message_validation_on_body_failure_total |
Resolution | The alerts gets cleared when the count is not between 80 to 100. |
5.2.13.6 SEPPN32fMessageValidationOnBodyFailureCriticalAlert
Table 5-56 SEPPN32fMessageValidationOnBodyFailureCriticalAlert
Field | Details |
---|---|
Trigger Condition | Message validation failed for request body for 80 % of requests(on which message validation was applied) in last 2 minutes. |
Severity | critical |
Alert Details Provided |
Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}Expression:(sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 80 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4031 |
Metric Name | ocsepp_message_validation_on_body_failure_total |
Resolution | The alerts gets cleared when the count is not between 80 to 100. |
5.2.14 Cat-1 Service API Validation Alerts
5.2.14.1 SEPPN32fServiceApiValidationFailureAlert
Table 5-57 SEPPN32fServiceApiValidationFailureAlert
Trigger Condition | Service API not in allowed list |
Severity | Major |
Alert details provided | SummaryN32f : Service API not in allowed listExpression: delta(ocsepp_security_service_api_failure_total[2m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4005 |
Metric Used | ocsepp_security_service_api_failure_total |
Resolution 1 |
This alert will be raised when there is difference of at least 1 between first and last data point in sample collected in last 2 minutes. Alert will be cleared after 2 minutes. Possible Resolutions:
|
Resolution 2 |
The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP. Steps: The failure reason is present in the alert. Possible Resolutions:
|
5.2.15 Cat-2 Network ID Validation Alerts
5.2.15.1 SEPPN32fNetworkIDValidationHeaderFailureAlert
Table 5-58 SEPPN32fNetworkIDValidationHeaderFailureAlert
Field | Details |
---|---|
Trigger Condition | If Network ID Validation for Header fails, this metrics will be pegged and corresponding alert will be raised. |
Severity | Major |
Alert details provided | Summary: 'namespace: {{ $labels.namespace}},
timestamp: {{ with query "time()" }}{{ . | first | value |
humanizeTimestamp }}{{ end }}: Network ID Validation has failed because
{{ $labels.cause }}'
Expression: sum(increase(ocsepp_network_id_validation_header_failure_total[2m]) >0 or (ocsepp_network_id_validation_header_failure_total unless ocsepp_network_id_validation_header_failure_total offset 2m )) by (namespace, remote_sepp_name, nf_instance_id, peer_fqdn, plmn_identifier, app, resource_uri, pod) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4011 |
Metric Used | ocsepp_network_id_validation_header_failure_total |
Resolution | The alerts gets cleared when the count goes below 0. |
5.2.15.2 SEPPN32fNetworkIDValidationBodyIEFailureAlert
Table 5-59 SEPPN32fNetworkIDValidationBodyIEFailureAlert
Field | Details |
---|---|
Trigger Condition | If Network ID Validation for Body fails, this metrics will be pegged and corresponding alert will be raised. |
Severity | Major |
Alert details provided | Summary: 'namespace: {{
$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first |
value | humanizeTimestamp }}{{ end }}: Network ID Body Validation has
failed because {{ $labels.cause }}'
Expression: sum(increase(ocsepp_network_id_validation_body_failure_total[2m]) >0 or (ocsepp_network_id_validation_body_failure_total unless ocsepp_network_id_validation_body_failure_total offset 2m )) by (namespace, remote_sepp_name, nf_instance_id, peer_fqdn, plmn_identifier, app, resource_uri, pod) > 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4012 |
Metric Used | ocsepp_network_id_validation_body_failure_total |
Resolution | The alerts gets cleared when the count goes below 0. |
5.2.16 Cat-3 Previous Location Check Alerts
5.2.16.1 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent30to40
Table 5-60 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent30to40
Trigger Condition | When previous location check validation failure error is detected between 30 to 40 Percent of Total Transactions , this alert will be raised. |
Severity | Minor |
Alert Details Provided |
Summary Previous location check validation failure detected between 30 to 40 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.3 and sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.4 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4032 |
Metric Name | ocsepp_previous_location_validation_failure_total |
Resolution | The alerts gets cleared when the previous location check validation failure error does not lie between 30 to 40 percent of total transactions. |
5.2.16.2 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent40to50
Table 5-61 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent40to50
Trigger Condition | When previous location check validation failure error is detected between 40 to 50 Percent of Total Transactions , this alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary Previous location check validation failure detected between 40 to 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.4 and sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.5 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4033 |
Metric Name | ocsepp_previous_location_validation_failure_total |
Resolution | The alerts gets cleared when the previous location check validation failure error does not lie between 40 to 50 percent of total transactions. |
5.2.16.3 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercentAbove50
Table 5-62 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercentAbove50
Trigger Condition | When previous location check validation failure error is detected above 50 Percent of Total Transactions , this alert will be raised. |
Severity | Critical |
Alert Details Provided |
Summary Previous location check validation failure detected above 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.5" |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4034 |
Metric Name | ocsepp_previous_location_validation_failure_total |
Resolution | The alerts gets cleared when the previous location check validation failure error does not lie above 50 percent of total transactions. |
5.2.16.4 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent30to40
Table 5-63 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent30to40
Trigger Condition | When previous location check exception failure is detected between 30 to 40 Percent of Total Transactions , this alert will be raised. |
Severity | Minor |
Alert Details Provided |
Summary Previous location check exception failure detected between 30 to 40 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.3 and sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.4 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4035 |
Metric Name | ocsepp_previous_location_exception_failure_total |
Resolution | The alerts gets cleared when the previous location check exception failure does not lie between 30 to 40 percent of total transactions. |
5.2.16.5 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent40to50
Table 5-64 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent40to50
Trigger Condition | When previous location check exception failure error is detected between 40 to 50 Percent of Total Transactions , this alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary Previous location check exception failure detected between 40 to 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.4 and sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.5 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4036 |
Metric Name | ocsepp_previous_location_exception_failure_total |
Resolution | The alerts gets cleared when the previous location check exception failure error does not lie between 40 to 50 percent of total transactions. |
5.2.16.6 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercentAbove50
Table 5-65 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercentAbove50
Trigger Condition | When previous location check exception failure error is detected above 50 Percent of Total Transactions , this alert will be raised. |
Severity | Critical |
Alert Details Provided |
Summary Previous location check exception failure detected above 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.5 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4037 |
Metric Name | ocsepp_previous_location_exception_failure_total |
Resolution | The alerts gets cleared when the previous location check exception failure error does not lie above 50 percent of total transactions. |
5.2.17 Cat-3 Time Check for Roaming Subscribers
5.2.17.1 pn32fTimeUnauthLocChkValFailAlrtMinor
Table 5-66 pn32fTimeUnauthLocChkValFailAlrtMinor
Field | Details |
---|---|
Trigger Condition | Triggered in case of a minor failure for Cat-3Time Unauthenticated Location Check. |
Severity | Minor |
Alert Details Provided |
Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) >= 1 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) <= 10 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4055 |
Metric Name | ocsepp_time_unauthenticated_location_validation_failure_total |
Resolution | The alert gets cleared when the failure count is above 10. |
5.2.17.2 pn32fTimeUnauthLocChkValFailAlrtMajor
Table 5-67 pn32fTimeUnauthLocChkValFailAlrtMajor
Field | Details |
---|---|
Trigger Condition | Triggered in case of a major failure for Cat-3 Time Unauthenticated Location Check. |
Severity | Major |
Alert Details Provided |
Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) >= 11 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) <= 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4056 |
Metric Name | ocsepp_time_unauthenticated_location_validation_failure_total |
Resolution | The alert gets cleared when the failure count is not in between 10 and 50. |
5.2.17.3 pn32fTimeUnauthLocChkValFailAlrtCritical
Table 5-68 pn32fTimeUnauthLocChkValFailAlrtCritical
Field | Details |
---|---|
Trigger Condition | Triggered in case of a critical failure for Cat-3 Time Unauthenticated Location Check. |
Severity | Critical |
Alert Details Provided |
Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} Expression
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4057 |
Metric Name | ocsepp_time_unauthenticated_location_validation_failure_total |
Resolution | The alert gets cleared when the failure count is below 51. |
5.2.17.4 pn32fTimeUnauthLocChkExcepFailAlrtMinor
Table 5-69 pn32fTimeUnauthLocChkExcepFailAlrtMinor
Field | Details |
---|---|
Trigger Condition | Triggered in case of a minor exception for Cat-3 Time Unauthenticated Location Check. |
Severity | Minor |
Alert Details Provided |
Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) >= 1 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) <= 10 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4058 |
Metric Name | ocsepp_time_unauthenticated_location_exception_failure_total |
Resolution | The alert gets cleared when the exception count is above 10. |
5.2.17.5 pn32fTimeUnauthLocChkExcepFailAlrtMajor
Table 5-70 pn32fTimeUnauthLocChkExcepFailAlrtMajor
Field | Details |
---|---|
Trigger Condition | Triggered in case of a major exception for Cat-3 Time Unauthenticated Location Check. |
Severity | Major |
Alert Details Provided |
Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) >= 11 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) <= 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4059 |
Metric Name | ocsepp_time_unauthenticated_location_exception_failure_total |
Resolution | The alert gets cleared when the exception count is not in between 10 and 50. |
5.2.17.6 pn32fTimeUnauthLocChkExcepFailAlrtCritical
Table 5-71 pn32fTimeUnauthLocChkExcepFailAlrtCritical
Field | Details |
---|---|
Trigger Condition | Triggered in case of a critical exception for Cat-3 Time Unauthenticated Location Check. |
Severity | Critical |
Alert Details Provided |
Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} Expression
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4060 |
Metric Name | ocsepp_time_unauthenticated_location_exception_failure_total |
Resolution | The alert gets cleared when the exception count is below 51. |
5.2.18 Rate Limiting for Egress Roaming Signaling per PLMN Alerts
5.2.18.1 Egress Request Rate Limit per PLMN Message Drop Above 10 Percent Alert
Table 5-72 Egress Request Rate Limit per PLMN Message Drop Above 10 Percent Alert
Trigger Condition | If a request is dropped due to the tokens in the bucket are exhausted and drop rate per PLMN is detected above 10 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised. |
Severity | Minor |
Alert Details Provided |
Summary Egress Rate Limiting Request Drop Rate detected per PLMN above 10 Percent of Total Transactions Expression sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 10 < 25 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4039 |
Metric Name | oc_ingressgateway_plmn_egress_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.18.2 Egress Request Rate Limit per PLMN Message Drop Above 25 Percent Alert
Table 5-73 Egress Request Rate Limit per PLMN Message Drop Above 25 Percent Alert
Trigger Condition | If a request is dropped due to the tokens in the bucket are exhausted and drop rate per PLMN is detected above 25 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary Egress Rate Limiting Request Drop Rate detected per PLMN above 25 Percent of Total Transactions Expression sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 10 < 25 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4040 |
Metric Name | oc_ingressgateway_plmn_egress_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.18.3 Egress Request Rate Limit per PLMN Message Drop Above 50 Percent Alert
Table 5-74 Egress Request Rate Limit per PLMN Message Drop Above 50 Percent Alert
Trigger Condition | If a request is dropped due to the tokens in the bucket are exhausted and the drop rate per PLMN is detected above 50 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised. |
Severity | Critical |
Alert Details Provided |
Summary Egress Rate Limiting Request Drop Rate detected per PLMN above 50 Percent of Total Transactions Expression sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 50 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4041 |
Metric Name | oc_ingressgateway_plmn_egress_ratelimit_total |
Resolution | The alerts gets cleared when the count goes down. |
5.2.19 Separate Port Configurations for N32c and N32f on the Egress Routes Alerts
5.2.19.1 EgressInterfaceConnectionFailure
Table 5-75 EgressInterfaceConnectionFailure
Field | Details |
---|---|
Trigger Condition | If the destination host and port mentioned in the Remote profile are unreachable or not available, then the alert will be raised. |
Severity | Major |
Alert Details Provided |
Summary: Egress connection failure on the interfaceExpression: sum(increase(oc_egressgateway_connection_failure_total{app="n32-egress-gateway"}[5m])) by (namespace,app,Host,Port) >0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4042 |
Metric Name | oc_egressgateway_connection_failure_total |
Resolution | If the destination host and port are reachable, then the alert will be cleared. |
5.2.20 Support for TLS 1.3
5.2.20.1 SEPPConnectionFailurePLMNIGWAlert
Table 5-76 SEPPConnectionFailurePLMNIGWAlert
Field | Details |
---|---|
Trigger Condition | Connection failure occurs for incoming traffic at PLMN Ingress Gateway |
Severity | Major |
Alert details provided |
Summary:
Expression:
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4043 |
Metric used | oc_ingressgateway_connection_failure_total |
Resolution | After resolving the reason for the connection failure, this alert will be removed. |
5.2.20.2 SEPPConnectionFailureN32IGWAlert
Table 5-77 SEPPConnectionFailureN32IGWAlert
Field | Details |
---|---|
Trigger Condition | Connection failure occurs for incoming traffic at N32 Ingress Gateway |
Severity | Major |
Alert details provided |
Summary:
Expression:
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4044 |
Metric used | oc_ingressgateway_connection_failure_total |
Resolution | After resolving the reason for connection failure, this alert will be removed. |
5.2.20.3 SEPPX509CertificateExpiryAlertMinor
Table 5-78 SEPPX509CertificateExpiryAlertMinor
Field | Details |
---|---|
Trigger Condition | When TLS certificate is valid for only 6 months before expiration. |
Severity | Minor |
Alert details provided |
Summery:
Expression:
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4045 |
Metric used | security_cert_x509_expiration_seconds |
Resolution | Only after certificates have been updated, this alert will be removed. |
5.2.20.4 SEPPX509CertificateExpiryAlertMajor
Table 5-79 SEPPX509CertificateExpiryAlertMajor
Field | Details |
---|---|
Trigger Condition | When TLS certificate is valid for only 3 months before expiration. |
Severity | Major |
Alert details provided |
Summery:
Expression:
|
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4046 |
Metric used | security_cert_x509_expiration_seconds |
Resolution | Only after certificates have been updated, this alert will be removed. |
5.2.20.5 SEPPX509CertificateExpiryAlertCritical
Table 5-80 SEPPX509CertificateExpiryAlertCritical
Field | Details |
---|---|
Trigger Condition | When TLS certificate is valid for only 1 month before expiration. |
Severity | Critical |
Alert details provided |
Summery:
Expression:
|
OID |
|
Metric used | security_cert_x509_expiration_seconds |
Resolution | Only after certificates have been updated, this alert will be removed. |
5.2.21 Multiple SEPP Instances on Shared cnDBTier Cluster Alerts
5.2.21.1 Cn32fConnectionFailureWithDatabaseAlert
Table 5-81 Cn32fConnectionFailureWithDatabaseAlert
Field | Details |
---|---|
Trigger Condition | ocsepp_cn32f_database_connectivity_healthy = 0 |
Severity | Major |
Alert Details Provided |
Summary: Alert is raised when connectivity is broken between CN32f and cnDBTier. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_cn32f_database_connectivity_healthy == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4050 |
Metric Name | ocsepp_cn32f_database_connectivity_healthy |
Resolution | Restore the connectivity between SEPP and cnDBTier. |
5.2.21.2 Cn32cConnectionFailureWithDatabaseAlert
Table 5-82 Cn32cConnectionFailureWithDatabaseAlert
Field | Details |
---|---|
Trigger Condition | ocsepp_cn32c_database_connectivity_healthy == 0 |
Severity | Major |
Alert Details Provided |
Summary: Alert is raised when connectivity is broken between CN32c and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_cn32c_database_connectivity_healthy == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4051 |
Metric Name | ocsepp_cn32c_database_connectivity_healthy |
Resolution | Restore the connectivity between SEPP and cnDBTier. |
5.2.21.3 Pn32fConnectionFailureWithDatabaseAlert
Table 5-83 Pn32fConnectionFailureWithDatabaseAlert
Field | Details |
---|---|
Trigger Condition | ocsepp_pn32f_database_connectivity_healthy == 0 |
Severity | Major |
Alert Details Provided |
Summary: Alert is raised when connectivity is broken between PN32F and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_pn32f_database_connectivity_healthy == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4052 |
Metric Name | ocsepp_pn32f_database_connectivity_healthy |
Resolution | Restore the connectivity between SEPP and cnDBTier. |
5.2.21.4 Pn32cConnectionFailureWithDatabaseAlert
Table 5-84 Pn32cConnectionFailureWithDatabaseAlert
Field | Details |
---|---|
Trigger Condition | ocsepp_pn32c_database_connectivity_healthy == 0 |
Severity | Major |
Alert Details Provided |
Summary: Alert is raised when connectivity is broken between PN32C and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_pn32c_database_connectivity_healthy == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4053 |
Metric Name | ocsepp_pn32c_database_connectivity_healthy |
Resolution | Restore the connectivity between SEPP and cnDBTier. |
5.2.21.5 ConfigManagerConnectionFailureWithDatabaseAlert
Table 5-85 ConfigManagerConnectionFailureWithDatabaseAlert
Trigger Condition | ocsepp_configmgr_database_connectivity_healthy == 0 |
Severity | Major |
Alert Details Provided |
Summary: Alert is raised when connectivity is broken between PN32C and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_configmgr_database_connectivity_healthy == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4054 |
Metric Name | ocsepp_pn32c_database_connectivity_healthy |
Resolution | Restore the connectivity between SEPP and cnDBTier. |
5.2.21.6 Cn32fIncorrectDatabaseConfigurationAlert
Table 5-86 Cn32fIncorrectDatabaseConfigurationAlert
Field | Details |
---|---|
Trigger Condition | This alert will be raised when incorrect database configuration provided for cn32f service and resulting in connection failure with database. |
Severity | Major |
Alert Details Provided |
Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="cn32f-svc"} unless on (namespace) absent(hikaricp_connections{app="cn32f-svc"})) == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4057 |
Metric Name | NA |
Resolution | Configure correct values in the deployment of the Cn32f pod. |
5.2.21.7 Cn32cIncorrectDatabaseConfigurationAlert
Table 5-87 Cn32cIncorrectDatabaseConfigurationAlert
Field | Details |
---|---|
Trigger Condition | This alert will be raised when incorrect database configuration provided for cn32c service and resulting in connection failure with database. |
Severity | Major |
Alert Details Provided |
Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="cn32c-svc"} unless on (namespace) absent(hikaricp_connections{app="cn32c-svc"})) == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4056 |
Metric Name | NA |
Resolution | Configure correct values in the deployment of the Cn32c pod. |
5.2.21.8 Pn32fIncorrectDatabaseConfigurationAlert
Table 5-88 Pn32fIncorrectDatabaseConfigurationAlert
Field | Details |
---|---|
Trigger Condition | This alert will be raised when incorrect database configuration provided for pn32f service and resulting in connection failure with database. |
Severity | Major |
Alert Details Provided |
Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="pn32f-svc"} unless on (namespace) absent(hikaricp_connections{app="pn32f-svc"})) == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4058 |
Metric Name | NA |
Resolution | Configure correct values in the deployment of the Pn32f pod. |
5.2.21.9 pn32cIncorrectDbConf
Table 5-89 pn32cIncorrectDbConf
Field | Details |
---|---|
Trigger Condition | This alert will be raised when incorrect database configuration provided for pn32c service and resulting in connection failure with database. |
Severity | Major |
Alert Details Provided |
Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="pn32c-svc"} unless on (namespace) absent(hikaricp_connections{app="pn32c-svc"})) == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4059 |
Metric Name | NA |
Resolution | Configure correct values in the deployment of the pn32c pod. |
5.2.21.10 ConfigManagerIncorrectDatabaseConfigurationAlert
Table 5-90 ConfigManagerIncorrectDatabaseConfigurationAlert
Field | Details |
---|---|
Trigger Condition | This alert will be raised when incorrect database configuration provided for config manager service and resulting in connection failure with database. |
Severity | Major |
Alert Details Provided |
Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="config-mgr-svc"} unless on (namespace) absent(hikaricp_connections{app="config-mgr-svc"})) == 0 |
OID | 1.3.6.1.4.1.323.5.3.46.1.2.4055 |
Metric Name | NA |
Resolution | Configure correct values in the deployment of the ConfigManager pod. |
5.2.22 Proactive Status Updates on SEPP Alerts
5.2.22.1 EgressGatewayPeerUnhealthyAlert
Table 5-91 EgressGatewayPeerUnhealthyAlert
Field | Details |
---|---|
Trigger Condition | When a peer becomes unhealthy or
oc_egressgateway_peer_health_status for a peer value = 1
|
Severity | Major |
Alert Details Provided |
Summary Peer is unhealthy Expression sum(oc_egressgateway_peer_health_status{app="n32-egress-gateway"}) by (namespace,app,peer) >0 |
OID |
1.3.6.1.4.1.323.5.3.46.1.2.4048 |
Metric Name |
|
Resolution | When peer becomes healthy again, that is, the
oc_egressgateway_peer_health_status for the peer becomes 0.
|
5.2.22.2 EgressGatewayAllPeersUnhealthyAlert
Table 5-92 EgressGatewayAllPeersUnhealthyAlert
Field | Details |
---|---|
Trigger Condition | When all peers in a peerset become unhealthy. |
Severity | Critical |
Alert Details Provided |
Summary All peers unhealthy Expression (sum(oc_egressgateway_peer_count) by (namespace) -sum(oc_egressgateway_peer_available_count) by (namespace))==sum(oc_egressgateway_peer_count) by (namespace) |
OID |
1.3.6.1.4.1.323.5.3.46.1.2.4049 |
Metric Name |
|
Resolution | When all peers in a peerset become healthy or when even 1 peer in a peerset becomes healthy. |