5 SEPP Alerts

This section provides information about the SEPP alerts and their configuration.

Note:

For CNE1.8.4 or earlier versions:

  • namespace: {{$labels.kubernetes_namespace}}
  • podname: {{$labels.kubernetes_pod_name}}

For CNE 1.9.x or later versions:

  • namespace: {{$labels.namespace}}
  • podname: {{$labels.pod}}

5.1 System Level Alerts

5.1.1 SEPPPodMemoryUsageAlert

Table 5-1 SEPPPodMemoryUsageAlert

Trigger Condition Pod memory usage is above the threshold (70% )
Severity Warning
Alert details provided Summary
'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, 
timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: 
Memory usage is {{ $value | printf "%.2f" }} which is above 70% (current value is: {{ $value }})'
Expression:
(sum by(namespace,container,pod) (container_memory_usage_bytes{container=~".*cn32c-svc.*|.*pn32c-svc.*|.*cn32f-svc.*|.*pn32f-svc.*|.*config-mgr-svc.*|.*n32-egress-gateway.*|.*n32-ingress-gateway.*|.*plmn-egress-gateway.*|.*plmn-ingress-gateway.*|.*nf-mediation.*"})) / (sum by (namespace,container,pod)(kube_pod_container_resource_limits{resource="memory",container=~".*cn32c-svc.*|.*pn32c-svc.*|.*cn32f-svc.*|.*pn32f-svc.*|.*config-mgr-svc.*|.*n32-egress-gateway.*|.*n32-ingress-gateway.*|.*plmn-egress-gateway.*|.*plmn-ingress-gateway.*|.*nf-mediation.*"}) ) * 100 >= 70
OID 1.3.6.1.4.1.323.5.3.46.1.2.4003
Metric Used

kube_pod_container_resource_limits

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution

The alert gets cleared when the memory utilization falls below the critical threshold.

Note: The threshold is configurable in the SeppAlertrules.yaml file.

If guidance is required, contact My Oracle Support.

5.1.2 SEPPPodCpuUsageAlert

Table 5-2 SEPPPodCpuUsageAlert

Field Details
Trigger Condition Pod CPU usage is above the threshold ( 70% )
Severity Warning
Alert details provided Summary
'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, 
timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: 
CPU usage is {{ $value | printf "%.2f" }}
 which is usage is above 70% (current value is: {{ $value }})'
Expression:
(sum by (namespace,container) (rate(container_cpu_usage_seconds_total{container=~".*cn32c-svc.*|.*pn32c-svc.*|.*cn32f-svc.*|.*pn32f-svc.*|.*config-mgr-svc.*|.*n32-egress-gateway.*|.*n32-ingress-gateway.*|.*plmn-egress-gateway.*|.*plmn-ingress-gateway.*|.*nf-mediation.*"}[2m])) ) / (sum by (container, namespace) (kube_pod_container_resource_limits{resource="cpu",container=~".*cn32c-svc.*|.*pn32c-svc.*|.*cn32f-svc.*|.*pn32f-svc.*|.*config-mgr-svc.*|.*n32-egress-gateway.*|.*n32-ingress-gateway.*|.*plmn-egress-gateway.*|.*plmn-ingress-gateway.*|.*nf-mediation.*"}) ) * 100 >= 70
OID 1.3.6.1.4.1.323.5.3.46.1.2.4002
Metric Used

container_cpu_usage_seconds_total

Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Resolution The alert gets cleared when the CPU utilization is below the critical threshold.

Note: The threshold is configurable in the SeppAlertrules.yaml file.

If guidance is required, contact My Oracle Support.

5.2 Application Level Alerts

5.2.1 Common Alerts

5.2.1.1 SEPPN32fRoutingFailure

Table 5-3 SEPPN32fRoutingFailure

Trigger Condition N32f service not able to forward message
Severity Info
Alert details provided Summary
'namespace: {{$labels.namespace}}, podname: {{$labels.pod}},
 timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: 
N32f service not able to forward message because {{ $labels.error_msg }}'
Expression:
 idelta(ocsepp_cn32f_requests_failure_total[2m]) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4001
Metric Used ocsepp_cn32f_requests_failure_total
Resolution

The alert gets cleared when Consumer SEPP accepts request only if producer NF domain and PLMN match the Remote SEPP configured.

Steps:

The failure reason is present in the alert.

Possible Resolutions :

  1. Check whether the Remote SEPP is present in database.
  2. Validate the Remote SEPP PLMN which is configured.
  3. Validate the handshake is completed with the remote SEPP and context is present in database.
  4. Validate the producer NF Domain.
  5. Check whether the Remote SEPP Set for required Remote SEPP is present in the database.
  6. Check whether the N32F route is present in database (common_configuration table).
5.2.1.2 SEPPConfigMgrRouteFailureAlert

Table 5-4 SEPPConfigMgrRouteFailureAlert

Trigger Condition When routing failure occurs while posting remote SEPP or roaming partner set, this alert will be raised.
Severity Major
Alert Details Provided

Summary

namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Route Failure has occurred because {{ $labels.errorReason }}

Expression

sum(increase(ocsepp_configmgr_routefailure_total{app="config-mgr-svc"}[5m]) >0 or (ocsepp_configmgr_routefailure_total{app="config-mgr-svc"} unless ocsepp_configmgr_routefailure_total{app="config-mgr-svc"} offset 5m )) by (namespace,errorCode) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4026
Metric Name Metric ocsepp_configmgr_routefailure_total
Resolution The alert is cleared if no new failures are observed in 5 minutes window.
5.2.1.3 EgressSbiErrorRateAbove1Percent

Table 5-5 EgressSbiErrorRateAbove1Percent

Trigger Condition Sbi Transaction Error Rate exceeded configured threshold
Severity Major
Alert details provided Summary
"Sbi Transaction Error Rate detected above 1 Percent of Total Sbi
    Transactions"
Expression
 sum(rate(oc_egressgateway_sbiRouting_http_responses_total{Status!~"2.*"}[24h]))
      by (app,pod, namespace) /sum(rate(oc_egressgateway_sbiRouting_http_responses_total[24h])) by
      (app,pod, namespace) *100 >= 1 
OID 1.3.6.1.4.1.323.5.3.46.1.2.7001
Metric Used oc_egressgateway_sbiRouting_http_responses_total
Resolution

This alert will be raised when the total SBI transaction error rate will be above 1% of the total transaction done during 24 hour time period. Metric will be cleared when the error rate will be below 1%

5.2.2 Handshake Alerts

5.2.2.1 SEPPCn32cHandshakeFailureAlert

Table 5-6 SEPPCn32cHandshakeFailureAlert

Trigger Condition Handshake procedure has failed on Consumer SEPP
Severity Major
Alert details provided Summary
'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, 
timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}:
 Handshake procedure has failed on Consumer side because {{ $labels.reason }}'
Expression:
 sum(increase(ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"}[5m])
    >0 or (ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"}  unless
    ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"}  offset 5m )) by
    (namespace,remote_sepp_name,nfinstanceid,peer_fqdn,app)  > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.2001
Metric Used ocsepp_n32c_handshake_failure_attempts_total filtered by app=cn32-svc
Resolution 1 The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP.

Failure reason: Release name used while helm installation is other than ocsepp-release.

Error Verification: Check the failure reason in the alert. If the failure reason is 404 –route not found or Route not found, follow the recovery steps:

  1. Run the following command to get pod details:

    $ kubectl get pods –n <namespace>

    Example:
    
    # kubectl get pods -n csepp
    NAME                                                 READY   STATUS                  RESTARTS   AGE
    ocsepp-release-appinfo-6cdc48fc47-c9gfv              1/1     Running                 0          8d
    ocsepp-release-cn32c-svc-6547db777d-76gwd            1/1     Running                 0          8d
    ocsepp-release-cn32f-svc-7cd54bdf68-czbnb            1/1     Running                 0          8d
    ocsepp-release-config-mgr-svc-79c95d4b9d-8stk7       1/1     Running                 0          8d
    ocsepp-release-n32-egress-gateway-54c658b947-s5f9m   0/2     Pending                 0          23h
    ocsepp-release-n32-egress-gateway-54c658b947-scvvp   2/2     Running                 0          7d23h
    ocsepp-release-n32-ingress-gateway-777c68cb9-8jsdc   0/2     Pending                 0          23h
    ocsepp-release-n32-ingress-gateway-777c68cb9-98t7x   0/2     Init:ImagePullBackOff   0          23h
    ocsepp-release-pn32c-svc-58bff857f-jmfdd             1/1     Running                 0          8d
    ocsepp-release-pn32f-svc-784d5c7568-rh24g            
    
  2. Run the following command to navigate to the pod:

    $ kubectl exec –it <config-mgr-pod name> –n <namespace> bash

    Example:

    $ kubectl exec -it ocsepp-release-config-mgr-svc-79c95d4b9d-8stk7 -n csepp bash
  3. Run the command to get the existing route details present on N32 Egress Gateway:

    curl -X GET http://<config-manager-service-name>:9090/sepp/nf-common-component/v1/egw/n32/routesconfiguration

    Example:

    curl -X GET http://ocsepp-release-config-mgr-svc:9090/sepp/nf-common-component/v1/egw/n32/routesconfiguration
  4. If this output is null, add the configuration details in config-mgr-svc deployment.

    For more information about the configuration details, see the Deployment Configuration for Config-mgr-svc section in Oracle Communications Cloud Native Core Security Edge Protection Proxy Installation Guide.

  5. After the config-mgr-svc pod is restarted, run the step1 to step3 again. After adding the configuration, rerun the curl command mentioned in step3 to get the route details.
  6. Delete and add the RemoteSepp and reinitiate the handshake.

    If the value is still null, contact My Oracle Support.
Resolution 2

The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP.

Steps:

The failure reason is present in the alert.

Possible Resolutions:

  1. Disable the Remote SEPP.
  2. Delete the Remote SEPP.
  3. Update and reinitiate Handshake.
5.2.2.2 SEPPPn32cHandshakeFailureAlert

Table 5-7 SEPPPn32cHandshakeFailureAlert

Trigger Condition Handshake procedure has failed on Producer sepp
Severity Major
Alert details provided Summary
 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}},
 timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}:
 Handshake procedure has failed on Producer side because {{ $labels.error_msg }}'
Expression:
sum(increase(ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"}[5m])
    >0 or (ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"}  unless
    ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"}  offset 5m )) by
    (namespace,remote_sepp_name,nfinstanceid,peer_fqdn,app)  > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.3001
Metric Used ocsepp_n32c_handshake_failure_attempts_total filtered by app=pn32-svc
Resolution

The alert gets cleared when the N32C Handshake is successful due to TCP connection success of Producer to consumer SEPP.

Steps:

The failure reason is present in the alert.

Possible Resolution:

Update and reinitiate the Handshake.

5.2.3 Upgrade Alerts

5.2.3.1 SEPPUpgradeStartedAlert

Table 5-8 SEPPUpgradeStartedAlert

Trigger Condition Rest API trigger at start of Upgrade
Severity NA
Alert details provided

applicationname

alertname

servicename

releasename

namespace

oid

severity

vendor

sourcerelease

targetrelease

OID 1.3.6.1.4.1.323.5.3.46.1.2.8001
Metric Used NA
Resolution

If a success alert is generated then start and failure alerts will be cleared.

5.2.3.2 SEPPUpgradeFailedAlert

Table 5-9 SEPPUpgradeFailedAlert

Trigger Condition Rest API trigger at failure of Upgrade
Severity NA
Alert details provided

applicationname

alertname

servicename

releasename

namespace

oid

severity

vendor

sourcerelease

targetrelease

OID 1.3.6.1.4.1.323.5.3.46.1.2.8002
Metric Used NA
Resolution

If a success alert is generated then start and failure alerts will be cleared.

5.2.3.3 SEPPUpgradeSuccessfulAlert

Table 5-10 SEPPUpgradeSuccessfulAlert

Trigger Condition Rest API trigger at success of Upgrade
Severity NA
Alert details provided

applicationname

alertname

servicename

releasename

namespace

oid

severity

vendor

sourcerelease

targetrelease

OID 1.3.6.1.4.1.323.5.3.46.1.2.8003
Metric Used NA
Resolution

If a success alert is generated then start and failure alerts will be cleared.

5.2.4 Rollback Alerts

5.2.4.1 SEPPRollbackStartedAlert

Table 5-11 SEPPRollbackStartedAlert

Trigger Condition Rest API trigger at start of Rollback
Severity NA
Alert details provided

applicationname

alertname

servicename

releasename

namespace

oid

severity

vendor

sourcerelease

targetrelease

OID 1.3.6.1.4.1.323.5.3.46.1.2.8004
Metric Used NA
Resolution

If a success alert is generated then start and failure alerts will be cleared.

5.2.4.2 SEPPRollbackFailedAlert

Table 5-12 SEPPRollbackFailedAlert

Trigger Condition Rest API trigger at failure of Rollback
Severity NA
Alert details provided

applicationname

alertname

servicename

releasename

namespace

oid

severity

vendor

sourcerelease

targetrelease

OID 1.3.6.1.4.1.323.5.3.46.1.2.8005
Metric Used NA
Resolution

If a success alert is generated then start and failure alerts will be cleared.

5.2.4.3 SEPPRollbackSuccessfulAlert

Table 5-13 SEPPRollbackSuccessfulAlert

Trigger Condition Rest API trigger at success of Rollback
Severity NA
Alert details provided

applicationname

alertname

servicename

releasename

namespace

oid

severity

vendor

sourcerelease

targetrelease

OID 1.3.6.1.4.1.323.5.3.46.1.2.8006
Metric Used NA
Resolution Cleared after DEFAULT_DURATION_FOR_ALERT_EXPIRY minutes

5.2.5 Global Rate Limiting on Ingress Gateway of SEPP Alerts

5.2.5.1 IngressGlobalMessageDropAbovePointOnePercent

Table 5-14 IngressGlobalMessageDropAbovePointOnePercent

Trigger Condition Ingress Global Message Drop Rate detected greater than or equal to 0.1 Percent of Total Transactions.
Severity Warning
Alert details provided Summary
"Ingress Global Message Drop Rate detected above 0.1 Percent of Total Transactions"
Expression
 sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by
      (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 0.1 <
      1  
OID 1.3.6.1.4.1.323.5.3.46.1.2.7002
Metric Used oc_ingressgateway_global_ratelimit_total
Resolution

The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 0.1% of the total messages received. This will get cleared once percentage of message rejected is below 0.1% or greater than or equal to 1%.

5.2.5.2 IngressGlobalMessageDropAbove1Percent

Table 5-15 IngressGlobalMessageDropAbove1Percent

Trigger Condition Ingress Global Message Drop Rate detected greater than or equal to 1 Percent of Total Transactions.
Severity Warning
Alert details provided Summary
"Ingress Global Message Drop Rate detected above 1 Percent of Total Transactions"
Expression
 sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by
      (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 1 <
      10 
OID 1.3.6.1.4.1.323.5.3.46.1.2.7003
Metric Used oc_ingressgateway_global_ratelimit_total
Resolution

The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 1% of the total messages received. This will get cleared once percentage of message rejected is below 1% greater than or equal to 10%.

5.2.5.3 IngressGlobalMessageDropAbove10Percent

Table 5-16 IngressGlobalMessageDropAbove10Percent

Trigger Condition Ingress Global Message Drop Rate detected greater than or equal to 10 Percent of Total Transactions
Severity Minor
Alert details provided Summary
"Ingress Global Message Drop Rate detected above 10 Percent of Total Transactions"
Expression
 sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by
      (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 10 <
      25 
OID 1.3.6.1.4.1.323.5.3.46.1.2.7004
Metric Used oc_ingressgateway_global_ratelimit_total
Resolution

The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 10% of the total messages received. This will get cleared once percentage of message rejected is below 10% or greater than or equal to 25% .

5.2.5.4 IngressGlobalMessageDropAbove25Percent

Table 5-17 IngressGlobalMessageDropAbove25Percent

Trigger Condition Ingress Global Message Drop Rate detected greater than or equal to 25 Percent of Total Transactions
Severity Major
Alert details provided Summary
"Ingress Global Message Drop Rate detected above 25 Percent of Total Transactions"
Expression
 sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by
      (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 25 <
      50 
OID 1.3.6.1.4.1.323.5.3.46.1.2.7005
Metric Used oc_ingressgateway_global_ratelimit_total
Resolution

The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 25% of the total messages received.This will get cleared once percentage of message rejected is below 25% or greater than or equal to 50%.

5.2.5.5 IngressGlobalMessageDropAbove50Percent

Table 5-18 IngressGlobalMessageDropAbove50Percent

Trigger Condition Ingress Global Message Drop Rate detected greater than or equal to 50 Percent of Total Transactions
Severity Critical
Alert details provided Summary
"Ingress Global Message Drop Rate detected above 50 Percent of Total Transactions"
Expression
 sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by
      (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >=
      50
OID 1.3.6.1.4.1.323.5.3.46.1.2.7006
Metric Used oc_ingressgateway_global_ratelimit_total
Resolution

The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 50% of the total messages received.This will get cleared once percentage of message rejected is below 50%.

5.2.6 Topology Hiding Alerts

5.2.6.1 SEPPN32fTopologyOperationFailureAlert

Table 5-19 SEPPN32fTopologyOperationFailureAlert

Field Details
Trigger Condition Topology Hiding or Recovery Failure exceeded configured threshold (1%)
Severity Major
Alert details provided Summary
"Topology hiding/recovery operation failres reached more than configured threshold"
Expression
 delta(ocsepp_topology_header_failure_total[2m])>0 or 
(ocsepp_topology_header_failure_total unless ocsepp_topology_header_failure_total offset 2m)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4004
Metric Used ocsepp_topology_header_failure_total, ocsepp_topology_header_success_total
Resolution

This alert will be raised when the total Topology Hiding or Recovery failures reach more than 1%.

Alert will be cleared when the error rate is below 1%.

Possible Resolutions:

  1. Check the header for which alert is raised, header name present in alert label.
  2. Verify the error_msg using "ocsepp_topology_header_failure_total" metric and KPI.
  3. Fix or add configuration for the header.

Note: The alert will be cleared only if the corresponding success metric is pegged.

5.2.6.2 SEPPN32fTopologyBodyOperationFailureAlert

Table 5-20 SEPPN32fTopologyBodyOperationFailureAlert

Field Details
Trigger Condition

Topology Operation failed and exceeds defined threshold

Severity Major
Alert details provided Summary
"Topology Hiding/Recovery Operation failures reached more than configured
    threshold"
Expression:
delta(ocsepp_topology_body_failure_total[2m])>0 or 
(ocsepp_topology_body_failure_total unless ocsepp_topology_body_failure_total offset 2m)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4006
Metric Used ocsepp_topology_body_failure_total

ocsepp_topology_body_success_total
Resolution This alert will be raised when the total Topology Hiding or Recovery for message body failures reach more than 1%.

Alert will be cleared when the error rate will be below 1%.

Possible Resolutions:
  1. Check the apiUrl, method for which alert is raised, apiUrl present in alert label.
  2. Verify the error_msg using "ocsepp_topology_body_failure_total" metric and KPI.
  3. Fix or add configuration for the body Identifiers.
Note: The alert will be cleared only if the corresponding success metric is pegged.

5.2.7 5G SBI Message Mediation Support Alerts

5.2.7.1 SEPPCN32fMediationFailure

Table 5-21 SEPPCN32fMediationFailure

Trigger Condition

Mediation processing Failure

Severity Info
Alert details provided Summary
 "Mediation processing Failure"
Expression:
increase(ocsepp_cn32f_mediation_response_failure{status_code!="504
      GATEWAY_TIMEOUT"}[10m]) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4007
Metric Used ocsepp_cn32f_mediation_response_failure
Resolution

This alert will be raised when Mediation microservice is unable to apply rules on the incoming request & response from SEPP.

Possible Resolution:
  1. Check if the Mediation Rules exist.
  2. Check the Agenda Group in the mediation rule is matching from the request and response sent from SEPP.
5.2.7.2 SEPPCN32fMediationUnreachable

Table 5-22 SEPPCN32fMediationUnreachable

Trigger Condition

Mediation service is not accessible

Severity Critical
Alert details provided Summary
"Mediation service is not accessible"
Expression:
 increase(ocsepp_cn32f_mediation_response_failure {status_code="504
      GATEWAY_TIMEOUT"}[10m]) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4008
Metric Used ocsepp_cn32f_mediation_response_failure
Resolution

This alert will be raised when Mediation microservice is not accessible.

Possible Resolution:
  1. Check if the Mediation microservice pod is up.
  2. Check if Mediation Service Name and servicePort number is correct.
5.2.7.3 SEPPPN32fMediationFailure

Table 5-23 SEPPPN32fMediationFailure

Trigger Condition

Mediation processing Failure

Severity Info
Alert details provided Summary
"Mediation processing Failure"
Expression:
increase(ocsepp_pn32f_mediation_response_failure {status_code!="504
      GATEWAY_TIMEOUT"}[10m]) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4009
Metric Used ocsepp_pn32f_mediation_response_failure
Resolution

This alert will be raised when Mediation microservice is unable to apply rules on the incoming request & response from SEPP.

Possible Resolution:
  1. Check if the Mediation Rules exist.
  2. Check the Agenda Group in the mediation rule is matching from the request and response sent from SEPP.
5.2.7.4 SEPPPN32fMediationUnreachable

Table 5-24 SEPPPN32fMediationUnreachable

Trigger Condition

Mediation service is not accessible

Severity Critical
Alert details provided Summary
"Mediation service is not accessible"
Expression:
increase(ocsepp_pn32f_mediation_response_failure {status_code="504
      GATEWAY_TIMEOUT"}[10m]) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4010
Metric Used ocsepp_pn32f_mediation_response_failure
Resolution

This alert will be raised when Mediation microservice is not accessible.

Possible Resolution:
  1. Check if the Mediation microservice pod is up.
  2. Check if Mediation Service Name and servicePort number is correct.

5.2.8 Overload Control Alerts

5.2.8.1 SEPPServiceOverload65Percent

Table 5-25 SEPPServiceOverload65Percent

Trigger Condition CPU memory of pn32f-svc more than 65%
Severity Warning
Alert details provided Summary
Backend service is in overload with load level > 65%
Expression
 service_resource_overload_level == 1 
OID 1.3.6.1.4.1.323.5.3.46.1.2.7007
Metric Used service_resource_overload_level
Resolution

The alert will be cleared when CPU Memory for backend-svc goes below 60%.

5.2.8.2 SEPPServiceOverload70Percent

Table 5-26 SEPPServiceOverload70Percent

Trigger Condition CPU memory of pn32f-svc more than 70%
Severity Minor
Alert details provided Summary
Backend service is in overload with load level > 70%
Expression
 service_resource_overload_level == 2 
OID 1.3.6.1.4.1.323.5.3.46.1.2.7008
Metric Used service_resource_overload_level
Resolution

The alert will be cleared when CPU Memory for backend-svc goes below 70%

5.2.8.3 SEPPServiceOverload80Percent

Table 5-27 SEPPServiceOverload80Percent

Trigger Condition CPU memory of pn32f-svc more than 80%
Severity Major
Alert details provided Summary
Backend service is in overload with load level > 80%
Expression
 service_resource_overload_level == 3
OID 1.3.6.1.4.1.323.5.3.46.1.2.7009
Metric Used service_resource_overload_level
Resolution

The alert will be cleared when CPU Memory for backend-svc goes below 80%

5.2.8.4 SEPPServiceOverload90Percent

Table 5-28 SEPPServiceOverload90Percent

Trigger Condition CPU memory of pn32f-svc more than 90%
Severity Critical
Alert details provided Summary
Backend service is in overload with load level > 90%
Expression
 service_resource_overload_level == 4
OID 1.3.6.1.4.1.323.5.3.46.1.2.7010
Metric Used service_resource_overload_level
Resolution

The alert will be cleared when CPU Memory for backend-svc goes below 90%

5.2.9 Hosted SEPP Alerts

5.2.9.1 SEPPPn32fHSRoutingFailureAlert

Table 5-29 SEPPPn32fHSRoutingFailureAlert

Trigger Condition When the routing failure rate at Pn32f service is greater than 20 percentage.
Severity Major
Alert details provided Allowed P-RSS Validation failure at Roaming Hub

Expression

((sum by(namespace, app, nfInstanceId, pod) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod) (ocsepp_pn32f_requests_total))) > 0.2

OID 1.3.6.1.4.1.323.5.3.46.1.2.4013
Metric Used ocsepp_allowed_p_rss_routing_failure_total , ocsepp_pn32f_requests_total
Resolution The alert gets automatically cleared when the failure rate at pn32f microservice goes below 20 percent.
5.2.9.2 SEPPCn32fHSRoutingFailureAlert

Table 5-30 SEPPCn32fHSRoutingFailureAlert

Trigger Condition When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity Minor
Alert details provided Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP.

Expression

((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.5

OID 1.3.6.1.4.1.323.5.3.46.1.2.4014
Metric Used ocsepp_allowed_p_rss_routing_failure_total , ocsepp_cn32f_requests_total
Resolution The alert gets automatically cleared when the failure rate at cn32f microservice goes below 50 percent.
5.2.9.3 SEPPCn32fHSRoutingFailureAlert

Table 5-31 SEPPCn32fHSRoutingFailureAlert

Trigger Condition When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity Major
Alert details provided Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP.

Expression

((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.6

OID 1.3.6.1.4.1.323.5.3.46.1.2.4015
Metric Used ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total
Resolution The alert gets automatically cleared when the failure rate at cn32f microservice goes below 60 percent.
5.2.9.4 SEPPCn32fHSRoutingFailureAlert

Table 5-32 SEPCn32fHSRoutingFailureAlert

Trigger Condition When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity Critical
Alert details provided Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP.

Expression

((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.65

OID 1.3.6.1.4.1.323.5.3.46.1.2.4016
Metric Used ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total
Resolution The alert gets automatically cleared when the failure rate at cn32f microservice goes below 65 percent.
5.2.9.5 SEPPCn32fHSRoutingFailureAlert

Table 5-33 SEPCn32fHSRoutingFailureAlert

Trigger Condition When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity Warning
Alert details provided Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP.

Expression

((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.25

OID 1.3.6.1.4.1.323.5.3.46.1.2.4017
Metric Used ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total
Resolution The alert gets automatically cleared when the failure rate at cn32f microservice goes below 25 percent.

5.2.10 SEPP Message Feed Alerts

5.2.10.1 DDUnreachableFromN32IGW

Table 5-34 DDUnreachableFromN32IGW

Trigger Condition This alarm is raised when Data Director is not reachable from N32 Ingress Gateway.
Severity major
Alert details provided Summary

(oc_ingressgateway_dd_unreachable{app="n32-ingress-gateway"} == 1)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4018
Metric Used oc_ingressgateway_dd_unreachable
Resolution Alert gets cleared automatically when the connection with Data Director is established.
5.2.10.2 DDUnreachableFromPLMNIGW

Table 5-35 DDUnreachableFromPLMNIGW

Trigger Condition This alarm is raised when Data Director is not reachable from PLMN Ingress Gateway.
Severity major
Alert details provided Summary

(oc_ingressgateway_dd_unreachable{app="n32-ingress-gateway"} == 1)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4019
Metric Used oc_ingressgateway_dd_unreachable
Resolution Alert gets cleared automatically when the connection with Data Director is established.
5.2.10.3 DDUnreachableFromN32EGW

Table 5-36 DDUnreachableFromN32EGW

Trigger Condition This alarm is raised when Data Director is not reachable from N32 Egress Gateway.
Severity major
Alert details provided Summary

(oc_egressgateway_dd_unreachable{app="n32-egress-gateway"} == 1)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4020
Metric Used oc_egressgateway_dd_unreachable
Resolution Alert gets cleared automatically when the connection with Data Director is established.
5.2.10.4 DDUnreachableFromPLMNEGW

Table 5-37 DDUnreachableFromPLMNEGW

Trigger Condition This alarm is raised when Data Director is not reachable from PLMN Egress Gateway.
Severity major
Alert details provided Summary (oc_egressgateway_dd_unreachable{app="plmn-egress-gateway"} == 1)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4021
Metric Used oc_egressgateway_dd_unreachable
Resolution Alert gets cleared automatically when the connection with Data Director is established.

5.2.11 Steering of Roaming (SOR) Alerts

5.2.11.1 SEPPPn32fSORFailureAlertPercent30to40

Table 5-38 SEPPPn32fSORFailureAlertPercent30to40

Field Details
Trigger Condition 30% to 40% of SOR traffic results in failure.
Severity Minor
Alert details provided Summary:

'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'

Expression:

sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.3 and sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)<0.4

OID 1.3.6.1.4.1.323.5.3.46.1.2.4022
Metric Used ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total
Resolution

This alert will be raised when the percentage failure of SOR responses is in the range 30%-40%, in the sample collected in last 2 min.

Possible Resolutions :

  1. Check the below headers in the response coming from SOR server. If any of these is missing, it will cause SOR Failure:
    1. Server Header
    2. Location Header
  2. Check if the redirection code (3xx) received from SOR should be the same as the one configured through CNC Console. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
  3. Check if the SOR Server is sending the response code 5xx and whether the code is not configured through CNC Console or retry to Producer NF is disabled. This code can be viewed in the metric ocsepp_pn32f_sor_failure_total.
  4. Check if any client error(4xx) is coming while connecting to SoR. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
5.2.11.2 SEPPPn32fSORFailureAlertPercent40to50

Table 5-39 SEPPPn32fSORFailureAlertPercent40to50

Field Details
Trigger Condition 40% to 50% of SOR traffic results in failure.
Severity Major
Alert details provided Summary:

'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'

Expression:

sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.4 and sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)<0.5

OID 1.3.6.1.4.1.323.5.3.46.1.2.4023
Metric Used ocsepp_pn32f_sor_failure_total

and

ocsepp_pn32f_sor_requests_total
Resolution

This alert will be raised when the percentage failure of SOR responses is in the range 40%-50%, in the sample collected in last 2 min.

Possible Resolutions :

  1. Check the below headers in the response coming from SoR server, if any of these is missing, it will cause SOR Failure:
    1. Server Header
    2. Location Header
  2. Check if the redirection code (3xx) received from SOR should be same as one configured through CNC Console. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
  3. Check if SOR Server is sending response code 5xx and the code is not configured through CNC Console or Retry to Producer NF is disabled. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
  4. Check if any client error (4xx) is coming while connecting to SOR. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
5.2.11.3 SEPPPn32fSORFailureAlertPercentAbove50

Table 5-40 SEPPPn32fSORFailureAlertPercentAbove50

Field Details
Trigger Condition 50% of SOR traffic results in failure
Severity Critical
Alert details provided Summary:

'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'

Expression:

sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.5

OID 1.3.6.1.4.1.323.5.3.46.1.2.4024
Metric Used ocsepp_pn32f_sor_failure_total

and

ocsepp_pn32f_sor_requests_total
Resolution

This alert will be raised when the percentage failure of SOR responses is above 50%, in the sample collected in last 2 min.

Possible Resolutions :

  1. Check the below headers in the response coming from SOR server, if any of these is missing, it will cause SOR Failure:
    1. Server Header
    2. Location Header
  2. Check if the redirection code(3xx) received from SOR should be same as one configured via CNC Console. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
  3. Check if SOR Server is sending response code 5xx and the code is not configured through CNC Console or retry to Producer NF is disabled. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
  4. Check if any client error(4xx) is coming while connecting to SOR. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.
5.2.11.4 SEPPPn32fSORTimeoutFailureAlert

Table 5-41 SEPPPn32fSORTimeoutFailureAlert

Field Details
Trigger Condition Increase of more than five timeout errors in last two minutes for SOR.
Severity critical
Alert details provided Summary:

'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'

Expression:

idelta(ocsepp_pn32f_sor_timeout_failure_total[2m]) > 5 or (ocsepp_pn32f_sor_timeout_failure_total unless ocsepp_pn32f_sor_timeout_failure_total offset 2m)
OID 1.3.6.1.4.1.323.5.3.46.1.2.4025
Metric Used ocsepp_pn32f_sor_timeout_failure_total
Resolution

This alert will be raised when the response received from SOR Server suggests that server is either down or unreachable for more than five error counts in the sample collected in last two minutes.

Possible Resolutions :

  1. Check and fix if the SOR server is unreachable.
  2. Check and fix if the configuration made through CNC Console has wrong values for server. Check if the FQDN and port configured are correct.
  3. The scheme selected must be supported by SOR server.

5.2.12 Global Rate Limiting on Ingress Gateway of SEPP Alerts

5.2.12.1 Ingress RSS Rate Limit per RSS Message Drop Above Point one Percent Alert

Table 5-42 Ingress RSS Rate Limit per RSS Message Drop Above Point one Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 0.1 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity Warning
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 0.1 Percent of Total Transactions of that RSS

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 0.1 < 10
OID 1.3.6.1.4.1.323.5.3.46.1.2.7011
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.2 Ingress RSS Rate Limit per RSS Message Drop Above 10 Percent Alert

Table 5-43 Ingress RSS Rate Limit per RSS Message Drop Above 10 Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 10 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity Minor
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 10 Percent of Total Transactions of that RSS

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 10 < 25
OID 1.3.6.1.4.1.323.5.3.46.1.2.7012
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.3 Ingress RSS Rate Limit per RSS Message Drop Above 25 Percent Alert

Table 5-44 Ingress RSS Rate Limit per RSS Message Drop Above 25 Percent Alert:

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 25 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity Major
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 25 Percent of Total Transactions of that RSS

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 25 < 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.7013
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.4 Ingress RSS Rate Limit per RSS Message Drop Above 50 Percent Alert

Table 5-45 Ingress RSS Rate Limit per RSS Message Drop Above 50 Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 50 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity Critical
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 50 Percent of Total Transactions of that RSS

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.7014
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.5 Ingress RSS Rate Limit Message Drop Above Point one Percent Alert

Table 5-46 Ingress RSS Rate Limit Message Drop Above Point one Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 0.1 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity Warning
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate detected above 0.1 Percent of Total Transaction

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (namespace) *100 >= 0.1 < 1
OID 1.3.6.1.4.1.323.5.3.46.1.2.7015
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.6 Ingress RSS Rate Limit Message Drop Above one Percent Alert

Table 5-47 Ingress RSS Rate Limit Message Drop Above one Percent Alert:

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 1 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity Warning
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate detected above 1 Percent of Total Transactions

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (namespace) *100 >= 1 < 10
OID 1.3.6.1.4.1.323.5.3.46.1.2.7016
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.7 Ingress RSS Rate Limit Message Drop Above 10 Percent Alert

Table 5-48 Ingress RSS Rate Limit Message Drop Above 10 Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 10 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity Minor
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate detected above 10 Percent of Total Transactions.

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 10 < 25
OID 1.3.6.1.4.1.323.5.3.46.1.2.7017
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.8 Ingress RSS Rate Limit Message Drop Above 25 Percent Alert

Table 5-49 Ingress RSS Rate Limit Message Drop Above 25 Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 25 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity Major
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate detected above 25 Percent of Total Transactions

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 25 < 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.7018
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.12.9 Ingress RSS Rate Limit Message Drop Above 50 Percent Alert

Table 5-50 Ingress RSS Rate Limit Message Drop Above 50 Percent Alert

Trigger Condition If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 50 percent of total transactions, this metric will be pegged andcorresponding alert will be raised.
Severity Critical
Alert Details Provided

Summary:

Ingress RSS Based Rate Limiting Message Drop Rate detected above 50 Percent of Total Transactions

Expression:

sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.7019
Metric Name oc_ingressgateway_rss_ratelimit_total
Resolution The alerts gets cleared when the count goes down.

5.2.13 Cat-0 SBI Message Schema Validation Alerts

5.2.13.1 SEPPN32fMessageValidationOnHeaderFailureMinorAlert

Table 5-51 SEPPN32fMessageValidationOnHeaderFailureMinorAlert

Field Details
Trigger Condition Message validation failed for request query parameters for 40 % of requests (on which message validation was applied) in last 2 minutes.
Severity minor
Alert Details Provided

Summary:

Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}

Expression:

(sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 40 < 60

OID 1.3.6.1.4.1.323.5.3.46.1.2.4026
Metric Used ocsepp_message_validation_on_header_failure_total
Resolution The alerts gets cleared when the count is not between 40 to 60.
5.2.13.2 SEPPN32fMessageValidationOnHeaderFailureMajorAlert

Table 5-52 SEPPN32fMessageValidationOnHeaderFailureMajorAlert

Field Description
Trigger Condition Message validation failed for request query parameters for 60 % of requests(on which message validation was applied) in last 2 minutes.
Severity major
Alert Details Provided

Summary:

Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}

Expression:

(sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 60 < 80

OID 1.3.6.1.4.1.323.5.3.46.1.2.4027
Metric Name ocsepp_message_validation_on_header_failure_total
Resolution The alerts gets cleared when the count is not between 60 to 80.Possible Resolutions:
  1. Check Logs or Metrics:

    Review the following metrics for message validation failures:

    • ocsepp_message_validation_on_body_failure
    • ocsepp_message_validation_on_header_failure
  2. To identify the Failing Resource URI and HTTP Method, do the following:
    • For request body validation failures, search for the text: "Message validation failed for request body for request"
    • For query parameter validation failures, search for: "Message validation failed for request query parameter(s) for request"
    • For more detailed information about logs, refer to Oracle Communications Cloud Native Core, Security Edge Protection Proxy Troubleshooting Guide.
  3. In CNC Console GUI, navigate to SEPP and select Security Countermeasure from the left-hand menu.
    • Click Cat 0 - SBI Message Schema Validation to open the Message Validation List.
    • Search for the relevant resource URI to retrieve the corresponding schema.
    • Compare the request body or query parameters against the schema to ensure the request complies with the schema. If necessary, update the schema to reflect the correct structure.
5.2.13.3 SEPPN32fMessageValidationOnHeaderFailureCriticalAlert

Table 5-53 SEPPN32fMessageValidationOnHeaderFailureCriticalAlert

Field Description
Trigger Condition Message validation failed for request query parameters for 80 % of requests(on which message validation was applied) in last 2 minutes.
Severity critical
Alert Details Provided

Summary:

Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}

Expression:

(sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 80
OID 1.3.6.1.4.1.323.5.3.46.1.2.4028
Metric Name ocsepp_message_validation_on_header_failure_total
Resolution The alerts gets cleared when the count is not between 80 to 100.
5.2.13.4 SEPPN32fMessageValidationOnBodyFailureMinorAlert

Table 5-54 SEPPN32fMessageValidationOnBodyFailureMinorAlert

Field Description
Trigger Condition Message validation failed for request body for 40 % of requests(on which message validation was applied) in last 2 minutes.
Severity minor
Alert Details Provided

Summary:

Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}

Expression:

(sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 40 < 60

OID 1.3.6.1.4.1.323.5.3.46.1.2.4029
Metric Name ocsepp_message_validation_on_body_failure_total
Resolution The alerts gets cleared when the count is not between 60 to 100.
5.2.13.5 SEPPN32fMessageValidationOnBodyFailureMajorAlert

Table 5-55 SEPPN32fMessageValidationOnBodyFailureMajorAlert

Field Details
Trigger Condition Message validation failed for request body for 60 % of requests(on which message validation was applied) in last 2 minutes.
Severity major
Alert Details Provided

Summary:

Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}

Expression:

(sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 60 < 80
OID 1.3.6.1.4.1.323.5.3.46.1.2.4030
Metric Name ocsepp_message_validation_on_body_failure_total
Resolution The alerts gets cleared when the count is not between 80 to 100.
5.2.13.6 SEPPN32fMessageValidationOnBodyFailureCriticalAlert

Table 5-56 SEPPN32fMessageValidationOnBodyFailureCriticalAlert

Field Details
Trigger Condition Message validation failed for request body for 80 % of requests(on which message validation was applied) in last 2 minutes.
Severity critical
Alert Details Provided

Summary:

Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }}

Expression:(sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 80

OID 1.3.6.1.4.1.323.5.3.46.1.2.4031
Metric Name ocsepp_message_validation_on_body_failure_total
Resolution The alerts gets cleared when the count is not between 80 to 100.

5.2.14 Cat-1 Service API Validation Alerts

5.2.14.1 SEPPN32fServiceApiValidationFailureAlert

Table 5-57 SEPPN32fServiceApiValidationFailureAlert

Trigger Condition Service API not in allowed list
Severity Major
Alert details provided Summary
N32f : Service API not in allowed list
Expression:
delta(ocsepp_security_service_api_failure_total[2m]) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4005
Metric Used ocsepp_security_service_api_failure_total
Resolution 1

This alert will be raised when there is difference of at least 1 between first and last data point in sample collected in last 2 minutes. Alert will be cleared after 2 minutes.

Possible Resolutions:

  1. Check the Resource URI + Method for which alert is raised.
  2. Verify the error_msg using "ocsepp_security_service_api_failure_total" metric and KPI.
  3. Fix or add configuration for the Resource URI + Method in Service API's and Allowed List.
Resolution 2

The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP.

Steps:

The failure reason is present in the alert.

Possible Resolutions:

  1. Disable the Remote SEPP.
  2. Delete the Remote SEPP.
  3. Update and reinitiate Handshake.

5.2.15 Cat-2 Network ID Validation Alerts

5.2.15.1 SEPPN32fNetworkIDValidationHeaderFailureAlert

Table 5-58 SEPPN32fNetworkIDValidationHeaderFailureAlert

Field Details
Trigger Condition If Network ID Validation for Header fails, this metrics will be pegged and corresponding alert will be raised.
Severity Major
Alert details provided Summary: 'namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Network ID Validation has failed because {{ $labels.cause }}'

Expression:

sum(increase(ocsepp_network_id_validation_header_failure_total[2m]) >0 or (ocsepp_network_id_validation_header_failure_total unless ocsepp_network_id_validation_header_failure_total offset 2m )) by (namespace, remote_sepp_name, nf_instance_id, peer_fqdn, plmn_identifier, app, resource_uri, pod) > 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4011
Metric Used ocsepp_network_id_validation_header_failure_total
Resolution The alerts gets cleared when the count goes below 0.
5.2.15.2 SEPPN32fNetworkIDValidationBodyIEFailureAlert

Table 5-59 SEPPN32fNetworkIDValidationBodyIEFailureAlert

Field Details
Trigger Condition If Network ID Validation for Body fails, this metrics will be pegged and corresponding alert will be raised.
Severity Major
Alert details provided Summary:

'namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Network ID Body Validation has failed because {{ $labels.cause }}'

Expression:

sum(increase(ocsepp_network_id_validation_body_failure_total[2m]) >0 or (ocsepp_network_id_validation_body_failure_total unless ocsepp_network_id_validation_body_failure_total offset 2m )) by (namespace, remote_sepp_name, nf_instance_id, peer_fqdn, plmn_identifier, app, resource_uri, pod) > 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4012
Metric Used ocsepp_network_id_validation_body_failure_total
Resolution The alerts gets cleared when the count goes below 0.

5.2.16 Cat-3 Previous Location Check Alerts

5.2.16.1 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent30to40

Table 5-60 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent30to40

Trigger Condition When previous location check validation failure error is detected between 30 to 40 Percent of Total Transactions , this alert will be raised.
Severity Minor
Alert Details Provided

Summary

Previous location check validation failure detected between 30 to 40 Percent of Total Transactions

Expression

sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.3 and sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.4
OID 1.3.6.1.4.1.323.5.3.46.1.2.4032
Metric Name ocsepp_previous_location_validation_failure_total
Resolution The alerts gets cleared when the previous location check validation failure error does not lie between 30 to 40 percent of total transactions.
5.2.16.2 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent40to50

Table 5-61 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent40to50

Trigger Condition When previous location check validation failure error is detected between 40 to 50 Percent of Total Transactions , this alert will be raised.
Severity Major
Alert Details Provided

Summary

Previous location check validation failure detected between 40 to 50 Percent of Total Transactions

Expression

sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.4 and sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.5
OID 1.3.6.1.4.1.323.5.3.46.1.2.4033
Metric Name ocsepp_previous_location_validation_failure_total
Resolution The alerts gets cleared when the previous location check validation failure error does not lie between 40 to 50 percent of total transactions.
5.2.16.3 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercentAbove50

Table 5-62 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercentAbove50

Trigger Condition When previous location check validation failure error is detected above 50 Percent of Total Transactions , this alert will be raised.
Severity Critical
Alert Details Provided

Summary

Previous location check validation failure detected above 50 Percent of Total Transactions

Expression

sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.5"
OID 1.3.6.1.4.1.323.5.3.46.1.2.4034
Metric Name ocsepp_previous_location_validation_failure_total
Resolution The alerts gets cleared when the previous location check validation failure error does not lie above 50 percent of total transactions.
5.2.16.4 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent30to40

Table 5-63 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent30to40

Trigger Condition When previous location check exception failure is detected between 30 to 40 Percent of Total Transactions , this alert will be raised.
Severity Minor
Alert Details Provided

Summary

Previous location check exception failure detected between 30 to 40 Percent of Total Transactions

Expression

sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.3 and sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.4
OID 1.3.6.1.4.1.323.5.3.46.1.2.4035
Metric Name ocsepp_previous_location_exception_failure_total
Resolution The alerts gets cleared when the previous location check exception failure does not lie between 30 to 40 percent of total transactions.
5.2.16.5 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent40to50

Table 5-64 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent40to50

Trigger Condition When previous location check exception failure error is detected between 40 to 50 Percent of Total Transactions , this alert will be raised.
Severity Major
Alert Details Provided

Summary

Previous location check exception failure detected between 40 to 50 Percent of Total Transactions

Expression

sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.4 and sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.5
OID 1.3.6.1.4.1.323.5.3.46.1.2.4036
Metric Name ocsepp_previous_location_exception_failure_total
Resolution The alerts gets cleared when the previous location check exception failure error does not lie between 40 to 50 percent of total transactions.
5.2.16.6 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercentAbove50

Table 5-65 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercentAbove50

Trigger Condition When previous location check exception failure error is detected above 50 Percent of Total Transactions , this alert will be raised.
Severity Critical
Alert Details Provided

Summary

Previous location check exception failure detected above 50 Percent of Total Transactions

Expression

sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.5
OID 1.3.6.1.4.1.323.5.3.46.1.2.4037
Metric Name ocsepp_previous_location_exception_failure_total
Resolution The alerts gets cleared when the previous location check exception failure error does not lie above 50 percent of total transactions.

5.2.17 Cat-3 Time Check for Roaming Subscribers

5.2.17.1 pn32fTimeUnauthLocChkValFailAlrtMinor

Table 5-66 pn32fTimeUnauthLocChkValFailAlrtMinor

Field Details
Trigger Condition Triggered in case of a minor failure for Cat-3Time Unauthenticated Location Check.
Severity Minor
Alert Details Provided

Summary

namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}

Expression

sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) >= 1 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) <= 10
OID 1.3.6.1.4.1.323.5.3.46.1.2.4055
Metric Name ocsepp_time_unauthenticated_location_validation_failure_total
Resolution The alert gets cleared when the failure count is above 10.
5.2.17.2 pn32fTimeUnauthLocChkValFailAlrtMajor

Table 5-67 pn32fTimeUnauthLocChkValFailAlrtMajor

Field Details
Trigger Condition Triggered in case of a major failure for Cat-3 Time Unauthenticated Location Check.
Severity Major
Alert Details Provided

Summary

namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}

Expression

sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) >= 11 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) <= 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.4056
Metric Name ocsepp_time_unauthenticated_location_validation_failure_total
Resolution The alert gets cleared when the failure count is not in between 10 and 50.
5.2.17.3 pn32fTimeUnauthLocChkValFailAlrtCritical

Table 5-68 pn32fTimeUnauthLocChkValFailAlrtCritical

Field Details
Trigger Condition Triggered in case of a critical failure for Cat-3 Time Unauthenticated Location Check.
Severity Critical
Alert Details Provided

Summary

namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}

Expression

sum(increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m  ) by (namespace,nf_instance_id,app,pod) >=51

OID 1.3.6.1.4.1.323.5.3.46.1.2.4057
Metric Name ocsepp_time_unauthenticated_location_validation_failure_total
Resolution The alert gets cleared when the failure count is below 51.
5.2.17.4 pn32fTimeUnauthLocChkExcepFailAlrtMinor

Table 5-69 pn32fTimeUnauthLocChkExcepFailAlrtMinor

Field Details
Trigger Condition Triggered in case of a minor exception for Cat-3 Time Unauthenticated Location Check.
Severity Minor
Alert Details Provided

Summary

namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}

Expression

sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) >= 1 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) <= 10
OID 1.3.6.1.4.1.323.5.3.46.1.2.4058
Metric Name ocsepp_time_unauthenticated_location_exception_failure_total
Resolution The alert gets cleared when the exception count is above 10.
5.2.17.5 pn32fTimeUnauthLocChkExcepFailAlrtMajor

Table 5-70 pn32fTimeUnauthLocChkExcepFailAlrtMajor

Field Details
Trigger Condition Triggered in case of a major exception for Cat-3 Time Unauthenticated Location Check.
Severity Major
Alert Details Provided

Summary

namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}

Expression

sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) >= 11 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) <= 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.4059
Metric Name ocsepp_time_unauthenticated_location_exception_failure_total
Resolution The alert gets cleared when the exception count is not in between 10 and 50.
5.2.17.6 pn32fTimeUnauthLocChkExcepFailAlrtCritical

Table 5-71 pn32fTimeUnauthLocChkExcepFailAlrtCritical

Field Details
Trigger Condition Triggered in case of a critical exception for Cat-3 Time Unauthenticated Location Check.
Severity Critical
Alert Details Provided

Summary

namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Expression
sum(increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m ) by (namespace,nf_instance_id,app,pod) >=51
OID 1.3.6.1.4.1.323.5.3.46.1.2.4060
Metric Name ocsepp_time_unauthenticated_location_exception_failure_total
Resolution The alert gets cleared when the exception count is below 51.

5.2.18 Rate Limiting for Egress Roaming Signaling per PLMN Alerts

5.2.18.1 Egress Request Rate Limit per PLMN Message Drop Above 10 Percent Alert

Table 5-72 Egress Request Rate Limit per PLMN Message Drop Above 10 Percent Alert

Trigger Condition If a request is dropped due to the tokens in the bucket are exhausted and drop rate per PLMN is detected above 10 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised.
Severity Minor
Alert Details Provided

Summary

Egress Rate Limiting Request Drop Rate detected per PLMN above 10 Percent of Total Transactions

Expression

sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 10 < 25
OID 1.3.6.1.4.1.323.5.3.46.1.2.4039
Metric Name oc_ingressgateway_plmn_egress_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.18.2 Egress Request Rate Limit per PLMN Message Drop Above 25 Percent Alert

Table 5-73 Egress Request Rate Limit per PLMN Message Drop Above 25 Percent Alert

Trigger Condition If a request is dropped due to the tokens in the bucket are exhausted and drop rate per PLMN is detected above 25 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised.
Severity Major
Alert Details Provided

Summary

Egress Rate Limiting Request Drop Rate detected per PLMN above 25 Percent of Total Transactions

Expression

sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 10 < 25
OID 1.3.6.1.4.1.323.5.3.46.1.2.4040
Metric Name oc_ingressgateway_plmn_egress_ratelimit_total
Resolution The alerts gets cleared when the count goes down.
5.2.18.3 Egress Request Rate Limit per PLMN Message Drop Above 50 Percent Alert

Table 5-74 Egress Request Rate Limit per PLMN Message Drop Above 50 Percent Alert

Trigger Condition If a request is dropped due to the tokens in the bucket are exhausted and the drop rate per PLMN is detected above 50 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised.
Severity Critical
Alert Details Provided

Summary

Egress Rate Limiting Request Drop Rate detected per PLMN above 50 Percent of Total Transactions

Expression

sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 50
OID 1.3.6.1.4.1.323.5.3.46.1.2.4041
Metric Name oc_ingressgateway_plmn_egress_ratelimit_total
Resolution The alerts gets cleared when the count goes down.

5.2.19 Separate Port Configurations for N32c and N32f on the Egress Routes Alerts

5.2.19.1 EgressInterfaceConnectionFailure

Table 5-75 EgressInterfaceConnectionFailure

Field Details
Trigger Condition If the destination host and port mentioned in the Remote profile are unreachable or not available, then the alert will be raised.
Severity Major
Alert Details Provided

Summary:

Egress connection failure on the interface

Expression:

sum(increase(oc_egressgateway_connection_failure_total{app="n32-egress-gateway"}[5m])) by (namespace,app,Host,Port) >0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4042
Metric Name oc_egressgateway_connection_failure_total
Resolution If the destination host and port are reachable, then the alert will be cleared.

5.2.20 Support for TLS 1.3

5.2.20.1 SEPPConnectionFailurePLMNIGWAlert

Table 5-76 SEPPConnectionFailurePLMNIGWAlert

Field Details
Trigger Condition Connection failure occurs for incoming traffic at PLMN Ingress Gateway
Severity Major
Alert details provided
Summary:
namespace:
                        {{$labels.namespace}}, timestamp:
                        {{ with query "time()"
                        }}{{ . | first | value |
                        humanizeTimestamp }}{{ end
                        }}: Incoming connection failure on
                        plmn-ingress-gateway due to {{
                        $labels.error_reason }}
Expression:
sum(increase(oc_ingressgateway_connection_failure_total{app="plmn-ingress-gateway"}[5m]) >0 or (oc_ingressgateway_connection_failure_total{app="plmn-ingress-gateway"} unless oc_ingressgateway_connection_failure_total{app="plmn-ingress-gateway"} offset 5m )) by (namespace,app) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4043
Metric used oc_ingressgateway_connection_failure_total
Resolution After resolving the reason for the connection failure, this alert will be removed.
5.2.20.2 SEPPConnectionFailureN32IGWAlert

Table 5-77 SEPPConnectionFailureN32IGWAlert

Field Details
Trigger Condition Connection failure occurs for incoming traffic at N32 Ingress Gateway
Severity Major
Alert details provided
Summary:
namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . |
                      first | value | humanizeTimestamp }}{{ end }}:
                      Incoming connection failure on n32-ingress-gateway due to {{ $labels.error_reason
                  }}

Expression:

sum(increase(oc_ingressgateway_connection_failure_total{app="n32-ingress-gateway"}[5m]) >0 or (oc_ingressgateway_connection_failure_total{app="n32-ingress-gateway"} unless oc_ingressgateway_connection_failure_total{app="n32-ingress-gateway"} offset 5m )) by (namespace,app) > 0
OID 1.3.6.1.4.1.323.5.3.46.1.2.4044
Metric used oc_ingressgateway_connection_failure_total
Resolution After resolving the reason for connection failure, this alert will be removed.
5.2.20.3 SEPPX509CertificateExpiryAlertMinor

Table 5-78 SEPPX509CertificateExpiryAlertMinor

Field Details
Trigger Condition When TLS certificate is valid for only 6 months before expiration.
Severity Minor
Alert details provided
Summery:
Certificate expiry in less than 6 months

Expression:

security_cert_x509_expiration_seconds - time() <= 15724800
OID 1.3.6.1.4.1.323.5.3.46.1.2.4045
Metric used security_cert_x509_expiration_seconds
Resolution Only after certificates have been updated, this alert will be removed.
5.2.20.4 SEPPX509CertificateExpiryAlertMajor

Table 5-79 SEPPX509CertificateExpiryAlertMajor

Field Details
Trigger Condition When TLS certificate is valid for only 3 months before expiration.
Severity Major
Alert details provided
Summery:
Certificate expiry in less than 3 months
Expression:
security_cert_x509_expiration_seconds - time() <= 7862400
OID 1.3.6.1.4.1.323.5.3.46.1.2.4046
Metric used security_cert_x509_expiration_seconds
Resolution Only after certificates have been updated, this alert will be removed.
5.2.20.5 SEPPX509CertificateExpiryAlertCritical

Table 5-80 SEPPX509CertificateExpiryAlertCritical

Field Details
Trigger Condition When TLS certificate is valid for only 1 month before expiration.
Severity Critical
Alert details provided
Summery:
Certificate expiry in less than 1 month
Expression:
security_cert_x509_expiration_seconds - time() <= 2592000
OID
1.3.6.1.4.1.323.5.3.46.1.2.4047
Metric used security_cert_x509_expiration_seconds
Resolution Only after certificates have been updated, this alert will be removed.

5.2.21 Multiple SEPP Instances on Shared cnDBTier Cluster Alerts

5.2.21.1 Cn32fConnectionFailureWithDatabaseAlert

Table 5-81 Cn32fConnectionFailureWithDatabaseAlert

Field Details
Trigger Condition ocsepp_cn32f_database_connectivity_healthy = 0
Severity Major
Alert Details Provided

Summary:

Alert is raised when connectivity is broken between CN32f and cnDBTier. Metric value is pegged as 0 and then alert is raised.

Expression:

ocsepp_cn32f_database_connectivity_healthy == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4050
Metric Name ocsepp_cn32f_database_connectivity_healthy
Resolution Restore the connectivity between SEPP and cnDBTier.
5.2.21.2 Cn32cConnectionFailureWithDatabaseAlert

Table 5-82 Cn32cConnectionFailureWithDatabaseAlert

Field Details
Trigger Condition ocsepp_cn32c_database_connectivity_healthy == 0
Severity Major
Alert Details Provided

Summary:

Alert is raised when connectivity is broken between CN32c and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised.

Expression:

ocsepp_cn32c_database_connectivity_healthy == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4051
Metric Name ocsepp_cn32c_database_connectivity_healthy
Resolution Restore the connectivity between SEPP and cnDBTier.
5.2.21.3 Pn32fConnectionFailureWithDatabaseAlert

Table 5-83 Pn32fConnectionFailureWithDatabaseAlert

Field Details
Trigger Condition ocsepp_pn32f_database_connectivity_healthy == 0
Severity Major
Alert Details Provided

Summary:

Alert is raised when connectivity is broken between PN32F and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised.

Expression:

ocsepp_pn32f_database_connectivity_healthy == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4052
Metric Name ocsepp_pn32f_database_connectivity_healthy
Resolution Restore the connectivity between SEPP and cnDBTier.
5.2.21.4 Pn32cConnectionFailureWithDatabaseAlert

Table 5-84 Pn32cConnectionFailureWithDatabaseAlert

Field Details
Trigger Condition ocsepp_pn32c_database_connectivity_healthy == 0
Severity Major
Alert Details Provided

Summary:

Alert is raised when connectivity is broken between PN32C and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised.

Expression:

ocsepp_pn32c_database_connectivity_healthy == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4053
Metric Name ocsepp_pn32c_database_connectivity_healthy
Resolution Restore the connectivity between SEPP and cnDBTier.
5.2.21.5 ConfigManagerConnectionFailureWithDatabaseAlert

Table 5-85 ConfigManagerConnectionFailureWithDatabaseAlert

Trigger Condition ocsepp_configmgr_database_connectivity_healthy == 0
Severity Major
Alert Details Provided

Summary:

Alert is raised when connectivity is broken between PN32C and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised.

Expression:

ocsepp_configmgr_database_connectivity_healthy == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4054
Metric Name ocsepp_pn32c_database_connectivity_healthy
Resolution Restore the connectivity between SEPP and cnDBTier.
5.2.21.6 Cn32fIncorrectDatabaseConfigurationAlert

Table 5-86 Cn32fIncorrectDatabaseConfigurationAlert

Field Details
Trigger Condition This alert will be raised when incorrect database configuration provided for cn32f service and resulting in connection failure with database.
Severity Major
Alert Details Provided

Summary:

Due to incorrect database configuration, connection failed with database.

Expression:

(up{app="cn32f-svc"} unless on (namespace) absent(hikaricp_connections{app="cn32f-svc"})) == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4057
Metric Name NA
Resolution Configure correct values in the deployment of the Cn32f pod.
5.2.21.7 Cn32cIncorrectDatabaseConfigurationAlert

Table 5-87 Cn32cIncorrectDatabaseConfigurationAlert

Field Details
Trigger Condition This alert will be raised when incorrect database configuration provided for cn32c service and resulting in connection failure with database.
Severity Major
Alert Details Provided

Summary:

Due to incorrect database configuration, connection failed with database.

Expression:

(up{app="cn32c-svc"} unless on (namespace) absent(hikaricp_connections{app="cn32c-svc"})) == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4056
Metric Name NA
Resolution Configure correct values in the deployment of the Cn32c pod.
5.2.21.8 Pn32fIncorrectDatabaseConfigurationAlert

Table 5-88 Pn32fIncorrectDatabaseConfigurationAlert

Field Details
Trigger Condition This alert will be raised when incorrect database configuration provided for pn32f service and resulting in connection failure with database.
Severity Major
Alert Details Provided

Summary:

Due to incorrect database configuration, connection failed with database.

Expression:

(up{app="pn32f-svc"} unless on (namespace) absent(hikaricp_connections{app="pn32f-svc"})) == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4058
Metric Name NA
Resolution Configure correct values in the deployment of the Pn32f pod.
5.2.21.9 pn32cIncorrectDbConf

Table 5-89 pn32cIncorrectDbConf

Field Details
Trigger Condition This alert will be raised when incorrect database configuration provided for pn32c service and resulting in connection failure with database.
Severity Major
Alert Details Provided

Summary:

Due to incorrect database configuration, connection failed with database.

Expression:

(up{app="pn32c-svc"} unless on (namespace) absent(hikaricp_connections{app="pn32c-svc"})) == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4059
Metric Name NA
Resolution Configure correct values in the deployment of the pn32c pod.
5.2.21.10 ConfigManagerIncorrectDatabaseConfigurationAlert

Table 5-90 ConfigManagerIncorrectDatabaseConfigurationAlert

Field Details
Trigger Condition This alert will be raised when incorrect database configuration provided for config manager service and resulting in connection failure with database.
Severity Major
Alert Details Provided

Summary:

Due to incorrect database configuration, connection failed with database.

Expression:

(up{app="config-mgr-svc"} unless on (namespace) absent(hikaricp_connections{app="config-mgr-svc"})) == 0

OID 1.3.6.1.4.1.323.5.3.46.1.2.4055
Metric Name NA
Resolution Configure correct values in the deployment of the ConfigManager pod.

5.2.22 Proactive Status Updates on SEPP Alerts

5.2.22.1 EgressGatewayPeerUnhealthyAlert

Table 5-91 EgressGatewayPeerUnhealthyAlert

Field Details
Trigger Condition When a peer becomes unhealthy or oc_egressgateway_peer_health_status for a peer value = 1
Severity Major
Alert Details Provided

Summary

Peer is unhealthy

Expression

sum(oc_egressgateway_peer_health_status{app="n32-egress-gateway"}) by (namespace,app,peer) >0
OID
1.3.6.1.4.1.323.5.3.46.1.2.4048
Metric Name
oc_egressgateway_peer_health_status
Resolution When peer becomes healthy again, that is, the oc_egressgateway_peer_health_status for the peer becomes 0.
5.2.22.2 EgressGatewayAllPeersUnhealthyAlert

Table 5-92 EgressGatewayAllPeersUnhealthyAlert

Field Details
Trigger Condition When all peers in a peerset become unhealthy.
Severity Critical
Alert Details Provided

Summary

All peers unhealthy

Expression

(sum(oc_egressgateway_peer_count) by (namespace) -sum(oc_egressgateway_peer_available_count) by (namespace))==sum(oc_egressgateway_peer_count) by (namespace)
OID
1.3.6.1.4.1.323.5.3.46.1.2.4049
Metric Name
oc_egressgateway_peer_count,oc_egressgateway_peer_available_count
Resolution When all peers in a peerset become healthy or when even 1 peer in a peerset becomes healthy.