SEPP Alerts

Field	Details
Trigger Condition	Pod CPU usage is above the threshold ( 70% )
Severity	Warning
Alert details provided	Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: CPU usage is {{ $value \| printf "%.2f" }} which is usage is above 70% (current value is: {{ $value }})' Expression: (sum by (namespace,container) (rate(container_cpu_usage_seconds_total{container=~".cn32c-svc.\|.pn32c-svc.\|.cn32f-svc.\|.pn32f-svc.\|.config-mgr-svc.\|.n32-egress-gateway.\|.n32-ingress-gateway.\|.plmn-egress-gateway.\|.plmn-ingress-gateway.\|.nf-mediation."}[2m])) ) / (sum by (container, namespace) (kube_pod_container_resource_limits{resource="cpu",container=~".cn32c-svc.\|.pn32c-svc.\|.cn32f-svc.\|.pn32f-svc.\|.config-mgr-svc.\|.n32-egress-gateway.\|.n32-ingress-gateway.\|.plmn-egress-gateway.\|.plmn-ingress-gateway.\|.nf-mediation."}) ) * 100 >= 70
OID	1.3.6.1.4.1.323.5.3.46.1.2.4002
Metric Used	container_cpu_usage_seconds_total Note : This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.
Resolution	The alert gets cleared when the CPU utilization is below the critical threshold. Note: The threshold is configurable in the SeppAlertrules.yaml file. If guidance is required, contact My Oracle Support.

5.2.1 Common Alerts

5.2.1.1 SEPPN32fRoutingFailure

Table 5-3 SEPPN32fRoutingFailure

Trigger Condition	N32f service not able to forward message
Severity	Info
Alert details provided	Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: N32f service not able to forward message because {{ $labels.error_msg }}' Expression: idelta(ocsepp_cn32f_requests_failure_total[2m]) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4001
Metric Used	ocsepp_cn32f_requests_failure_total
Resolution	The alert gets cleared when Consumer SEPP accepts request only if producer NF domain and PLMN match the Remote SEPP configured. Steps: The failure reason is present in the alert. Possible Resolutions : Check whether the Remote SEPP is present in database. Validate the Remote SEPP PLMN which is configured. Validate the handshake is completed with the remote SEPP and context is present in database. Validate the producer NF Domain. Check whether the Remote SEPP Set for required Remote SEPP is present in the database. Check whether the N32F route is present in database (common_configuration table).

5.2.1.2 SEPPConfigMgrRouteFailureAlert

Table 5-4 SEPPConfigMgrRouteFailureAlert

Trigger Condition	When routing failure occurs while posting remote SEPP or roaming partner set, this alert will be raised.
Severity	Major
Alert Details Provided	Summary namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Route Failure has occurred because {{ $labels.errorReason }} Expression sum(increase(ocsepp_configmgr_routefailure_total{app="config-mgr-svc"}[5m]) >0 or (ocsepp_configmgr_routefailure_total{app="config-mgr-svc"} unless ocsepp_configmgr_routefailure_total{app="config-mgr-svc"} offset 5m )) by (namespace,errorCode) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4026
Metric Name	Metric ocsepp_configmgr_routefailure_total
Resolution	The alert is cleared if no new failures are observed in 5 minutes window.

5.2.1.3 EgressSbiErrorRateAbove1Percent

Table 5-5 EgressSbiErrorRateAbove1Percent

Trigger Condition	Sbi Transaction Error Rate exceeded configured threshold
Severity	Major
Alert details provided	Summary "Sbi Transaction Error Rate detected above 1 Percent of Total Sbi Transactions" Expression sum(rate(oc_egressgateway_sbiRouting_http_responses_total{Status!~"2."}[24h])) by (app,pod, namespace) /sum(rate(oc_egressgateway_sbiRouting_http_responses_total[24h])) by (app,pod, namespace) 100 >= 1
OID	1.3.6.1.4.1.323.5.3.46.1.2.7001
Metric Used	oc_egressgateway_sbiRouting_http_responses_total
Resolution	This alert will be raised when the total SBI transaction error rate will be above 1% of the total transaction done during 24 hour time period. Metric will be cleared when the error rate will be below 1%

5.2.2 Handshake Alerts

5.2.2.1 SEPPCn32cHandshakeFailureAlert

Table 5-6 SEPPCn32cHandshakeFailureAlert

Trigger Condition	Handshake procedure has failed on Consumer SEPP
Severity	Major
Alert details provided	Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Handshake procedure has failed on Consumer side because {{ $labels.reason }}' Expression: sum(increase(ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"}[5m]) >0 or (ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"} unless ocsepp_n32c_handshake_failure_attempts_total{app="cn32c-svc"} offset 5m )) by (namespace,remote_sepp_name,nfinstanceid,peer_fqdn,app) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.2001
Metric Used	ocsepp_n32c_handshake_failure_attempts_total filtered by app=cn32-svc
Resolution 1	The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP. Failure reason: Release name used while helm installation is other than `ocsepp-release`. Error Verification: Check the failure reason in the alert. If the failure reason is 404 –route not found or Route not found, follow the recovery steps: Run the following command to get pod details: `$ kubectl get pods –n <namespace>` Example: # kubectl get pods -n csepp NAME READY STATUS RESTARTS AGE ocsepp-release-appinfo-6cdc48fc47-c9gfv 1/1 Running 0 8d ocsepp-release-cn32c-svc-6547db777d-76gwd 1/1 Running 0 8d ocsepp-release-cn32f-svc-7cd54bdf68-czbnb 1/1 Running 0 8d ocsepp-release-config-mgr-svc-79c95d4b9d-8stk7 1/1 Running 0 8d ocsepp-release-n32-egress-gateway-54c658b947-s5f9m 0/2 Pending 0 23h ocsepp-release-n32-egress-gateway-54c658b947-scvvp 2/2 Running 0 7d23h ocsepp-release-n32-ingress-gateway-777c68cb9-8jsdc 0/2 Pending 0 23h ocsepp-release-n32-ingress-gateway-777c68cb9-98t7x 0/2 Init:ImagePullBackOff 0 23h ocsepp-release-pn32c-svc-58bff857f-jmfdd 1/1 Running 0 8d ocsepp-release-pn32f-svc-784d5c7568-rh24g Run the following command to navigate to the pod: `$ kubectl exec –it <config-mgr-pod name> –n <namespace> bash` Example: `$ kubectl exec -it ocsepp-release-config-mgr-svc-79c95d4b9d-8stk7 -n csepp bash` Run the command to get the existing route details present on N32 Egress Gateway: `curl -X GET http://<config-manager-service-name>:9090/sepp/nf-common-component/v1/egw/n32/routesconfiguration` Example: `curl -X GET http://ocsepp-release-config-mgr-svc:9090/sepp/nf-common-component/v1/egw/n32/routesconfiguration` If this output is null, add the configuration details in `config-mgr-svc` deployment. For more information about the configuration details, see the Deployment Configuration for Config-mgr-svc section in Oracle Communications Cloud Native Core Security Edge Protection Proxy Installation Guide. After the `config-mgr-svc` pod is restarted, run the step1 to step3 again. After adding the configuration, rerun the curl command mentioned in step3 to get the route details. Delete and add the RemoteSepp and reinitiate the handshake. If the value is still null, contact My Oracle Support.
Resolution 2	The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP. Steps: The failure reason is present in the alert. Possible Resolutions: Disable the Remote SEPP. Delete the Remote SEPP. Update and reinitiate Handshake.

5.2.2.2 SEPPPn32cHandshakeFailureAlert

Table 5-7 SEPPPn32cHandshakeFailureAlert

Trigger Condition	Handshake procedure has failed on Producer sepp
Severity	Major
Alert details provided	Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Handshake procedure has failed on Producer side because {{ $labels.error_msg }}' Expression: sum(increase(ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"}[5m]) >0 or (ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"} unless ocsepp_n32c_handshake_failure_attempts_total{app="pn32c-svc"} offset 5m )) by (namespace,remote_sepp_name,nfinstanceid,peer_fqdn,app) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.3001
Metric Used	ocsepp_n32c_handshake_failure_attempts_total filtered by app=pn32-svc
Resolution	The alert gets cleared when the N32C Handshake is successful due to TCP connection success of Producer to consumer SEPP. Steps: The failure reason is present in the alert. Possible Resolution: Update and reinitiate the Handshake.

5.2.3 Upgrade Alerts

5.2.3.1 SEPPUpgradeStartedAlert

Table 5-8 SEPPUpgradeStartedAlert

Trigger Condition	Rest API trigger at start of Upgrade
Severity	NA
Alert details provided	applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease
OID	1.3.6.1.4.1.323.5.3.46.1.2.8001
Metric Used	NA
Resolution	If a success alert is generated then start and failure alerts will be cleared.

5.2.3.2 SEPPUpgradeFailedAlert

Table 5-9 SEPPUpgradeFailedAlert

Trigger Condition	Rest API trigger at failure of Upgrade
Severity	NA
Alert details provided	applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease
OID	1.3.6.1.4.1.323.5.3.46.1.2.8002
Metric Used	NA
Resolution	If a success alert is generated then start and failure alerts will be cleared.

5.2.3.3 SEPPUpgradeSuccessfulAlert

Table 5-10 SEPPUpgradeSuccessfulAlert

Trigger Condition	Rest API trigger at success of Upgrade
Severity	NA
Alert details provided	applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease
OID	1.3.6.1.4.1.323.5.3.46.1.2.8003
Metric Used	NA
Resolution	If a success alert is generated then start and failure alerts will be cleared.

5.2.4 Rollback Alerts

5.2.4.1 SEPPRollbackStartedAlert

Table 5-11 SEPPRollbackStartedAlert

Trigger Condition	Rest API trigger at start of Rollback
Severity	NA
Alert details provided	applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease
OID	1.3.6.1.4.1.323.5.3.46.1.2.8004
Metric Used	NA
Resolution	If a success alert is generated then start and failure alerts will be cleared.

5.2.4.2 SEPPRollbackFailedAlert

Table 5-12 SEPPRollbackFailedAlert

Trigger Condition	Rest API trigger at failure of Rollback
Severity	NA
Alert details provided	applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease
OID	1.3.6.1.4.1.323.5.3.46.1.2.8005
Metric Used	NA
Resolution	If a success alert is generated then start and failure alerts will be cleared.

5.2.4.3 SEPPRollbackSuccessfulAlert

Table 5-13 SEPPRollbackSuccessfulAlert

Trigger Condition	Rest API trigger at success of Rollback
Severity	NA
Alert details provided	applicationname alertname servicename releasename namespace oid severity vendor sourcerelease targetrelease
OID	1.3.6.1.4.1.323.5.3.46.1.2.8006
Metric Used	NA
Resolution	Cleared after DEFAULT_DURATION_FOR_ALERT_EXPIRY minutes

5.2.5 Global Rate Limiting on Ingress Gateway of SEPP Alerts

5.2.5.1 IngressGlobalMessageDropAbovePointOnePercent

Table 5-14 IngressGlobalMessageDropAbovePointOnePercent

Trigger Condition	Ingress Global Message Drop Rate detected greater than or equal to 0.1 Percent of Total Transactions.
Severity	Warning
Alert details provided	Summary "Ingress Global Message Drop Rate detected above 0.1 Percent of Total Transactions" Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 0.1 < 1
OID	1.3.6.1.4.1.323.5.3.46.1.2.7002
Metric Used	oc_ingressgateway_global_ratelimit_total
Resolution	The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 0.1% of the total messages received. This will get cleared once percentage of message rejected is below 0.1% or greater than or equal to 1%.

5.2.5.2 IngressGlobalMessageDropAbove1Percent

Table 5-15 IngressGlobalMessageDropAbove1Percent

Trigger Condition	Ingress Global Message Drop Rate detected greater than or equal to 1 Percent of Total Transactions.
Severity	Warning
Alert details provided	Summary "Ingress Global Message Drop Rate detected above 1 Percent of Total Transactions" Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 1 < 10
OID	1.3.6.1.4.1.323.5.3.46.1.2.7003
Metric Used	oc_ingressgateway_global_ratelimit_total
Resolution	The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 1% of the total messages received. This will get cleared once percentage of message rejected is below 1% greater than or equal to 10%.

5.2.5.3 IngressGlobalMessageDropAbove10Percent

Table 5-16 IngressGlobalMessageDropAbove10Percent

Trigger Condition	Ingress Global Message Drop Rate detected greater than or equal to 10 Percent of Total Transactions
Severity	Minor
Alert details provided	Summary "Ingress Global Message Drop Rate detected above 10 Percent of Total Transactions" Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 10 < 25
OID	1.3.6.1.4.1.323.5.3.46.1.2.7004
Metric Used	oc_ingressgateway_global_ratelimit_total
Resolution	The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 10% of the total messages received. This will get cleared once percentage of message rejected is below 10% or greater than or equal to 25% .

5.2.5.4 IngressGlobalMessageDropAbove25Percent

Table 5-17 IngressGlobalMessageDropAbove25Percent

Trigger Condition	Ingress Global Message Drop Rate detected greater than or equal to 25 Percent of Total Transactions
Severity	Major
Alert details provided	Summary "Ingress Global Message Drop Rate detected above 25 Percent of Total Transactions" Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 25 < 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.7005
Metric Used	oc_ingressgateway_global_ratelimit_total
Resolution	The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 25% of the total messages received.This will get cleared once percentage of message rejected is below 25% or greater than or equal to 50%.

5.2.5.5 IngressGlobalMessageDropAbove50Percent

Table 5-18 IngressGlobalMessageDropAbove50Percent

Trigger Condition	Ingress Global Message Drop Rate detected greater than or equal to 50 Percent of Total Transactions
Severity	Critical
Alert details provided	Summary "Ingress Global Message Drop Rate detected above 50 Percent of Total Transactions" Expression sum(rate(oc_ingressgateway_global_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_global_ratelimit_total[5m])) by (namespace) *100 >= 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.7006
Metric Used	oc_ingressgateway_global_ratelimit_total
Resolution	The alert will be raised when the percentage of messages rejected for Global Rate Limit will be greater than or equal to 50% of the total messages received.This will get cleared once percentage of message rejected is below 50%.

5.2.6 Topology Hiding Alerts

5.2.6.1 SEPPN32fTopologyOperationFailureAlert

Table 5-19 SEPPN32fTopologyOperationFailureAlert

Field	Details
Trigger Condition	Topology Hiding or Recovery Failure exceeded configured threshold (1%)
Severity	Major
Alert details provided	Summary "Topology hiding/recovery operation failres reached more than configured threshold" Expression delta(ocsepp_topology_header_failure_total[2m])>0 or (ocsepp_topology_header_failure_total unless ocsepp_topology_header_failure_total offset 2m)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4004
Metric Used	ocsepp_topology_header_failure_total, ocsepp_topology_header_success_total
Resolution	This alert will be raised when the total Topology Hiding or Recovery failures reach more than 1%. Alert will be cleared when the error rate is below 1%. Possible Resolutions: Check the header for which alert is raised, header name present in alert label. Verify the error_msg using "ocsepp_topology_header_failure_total" metric and KPI. Fix or add configuration for the header. Note: The alert will be cleared only if the corresponding success metric is pegged.

5.2.6.2 SEPPN32fTopologyBodyOperationFailureAlert

Table 5-20 SEPPN32fTopologyBodyOperationFailureAlert

Field	Details
Trigger Condition	Topology Operation failed and exceeds defined threshold
Severity	Major
Alert details provided	Summary "Topology Hiding/Recovery Operation failures reached more than configured threshold" Expression: delta(ocsepp_topology_body_failure_total[2m])>0 or (ocsepp_topology_body_failure_total unless ocsepp_topology_body_failure_total offset 2m)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4006
Metric Used	ocsepp_topology_body_failure_total ocsepp_topology_body_success_total
Resolution	This alert will be raised when the total Topology Hiding or Recovery for message body failures reach more than 1%. Alert will be cleared when the error rate will be below 1%. Possible Resolutions: Check the apiUrl, method for which alert is raised, apiUrl present in alert label. Verify the error_msg using "ocsepp_topology_body_failure_total" metric and KPI. Fix or add configuration for the body Identifiers. Note: The alert will be cleared only if the corresponding success metric is pegged.

5.2.7 5G SBI Message Mediation Support Alerts

5.2.7.1 SEPPCN32fMediationFailure

Table 5-21 SEPPCN32fMediationFailure

Trigger Condition	Mediation processing Failure
Severity	Info
Alert details provided	Summary "Mediation processing Failure" Expression: increase(ocsepp_cn32f_mediation_response_failure{status_code!="504 GATEWAY_TIMEOUT"}[10m]) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4007
Metric Used	ocsepp_cn32f_mediation_response_failure
Resolution	This alert will be raised when Mediation microservice is unable to apply rules on the incoming request & response from SEPP. Possible Resolution: Check if the Mediation Rules exist. Check the Agenda Group in the mediation rule is matching from the request and response sent from SEPP.

5.2.7.2 SEPPCN32fMediationUnreachable

Table 5-22 SEPPCN32fMediationUnreachable

Trigger Condition	Mediation service is not accessible
Severity	Critical
Alert details provided	Summary "Mediation service is not accessible" Expression: increase(ocsepp_cn32f_mediation_response_failure {status_code="504 GATEWAY_TIMEOUT"}[10m]) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4008
Metric Used	ocsepp_cn32f_mediation_response_failure
Resolution	This alert will be raised when Mediation microservice is not accessible. Possible Resolution: Check if the Mediation microservice pod is up. Check if Mediation Service Name and servicePort number is correct.

5.2.7.3 SEPPPN32fMediationFailure

Table 5-23 SEPPPN32fMediationFailure

Trigger Condition	Mediation processing Failure
Severity	Info
Alert details provided	Summary "Mediation processing Failure" Expression: increase(ocsepp_pn32f_mediation_response_failure {status_code!="504 GATEWAY_TIMEOUT"}[10m]) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4009
Metric Used	ocsepp_pn32f_mediation_response_failure
Resolution	This alert will be raised when Mediation microservice is unable to apply rules on the incoming request & response from SEPP. Possible Resolution: Check if the Mediation Rules exist. Check the Agenda Group in the mediation rule is matching from the request and response sent from SEPP.

5.2.7.4 SEPPPN32fMediationUnreachable

Table 5-24 SEPPPN32fMediationUnreachable

Trigger Condition	Mediation service is not accessible
Severity	Critical
Alert details provided	Summary "Mediation service is not accessible" Expression: increase(ocsepp_pn32f_mediation_response_failure {status_code="504 GATEWAY_TIMEOUT"}[10m]) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4010
Metric Used	ocsepp_pn32f_mediation_response_failure
Resolution	This alert will be raised when Mediation microservice is not accessible. Possible Resolution: Check if the Mediation microservice pod is up. Check if Mediation Service Name and servicePort number is correct.

5.2.8 Overload Control Alerts

5.2.8.1 SEPPServiceOverload65Percent

Table 5-25 SEPPServiceOverload65Percent

Trigger Condition	CPU memory of pn32f-svc more than 65%
Severity	Warning
Alert details provided	Summary Backend service is in overload with load level > 65% Expression service_resource_overload_level == 1
OID	1.3.6.1.4.1.323.5.3.46.1.2.7007
Metric Used	service_resource_overload_level
Resolution	The alert will be cleared when CPU Memory for backend-svc goes below 60%.

5.2.8.2 SEPPServiceOverload70Percent

Table 5-26 SEPPServiceOverload70Percent

Trigger Condition	CPU memory of pn32f-svc more than 70%
Severity	Minor
Alert details provided	Summary Backend service is in overload with load level > 70% Expression service_resource_overload_level == 2
OID	1.3.6.1.4.1.323.5.3.46.1.2.7008
Metric Used	service_resource_overload_level
Resolution	The alert will be cleared when CPU Memory for backend-svc goes below 70%

5.2.8.3 SEPPServiceOverload80Percent

Table 5-27 SEPPServiceOverload80Percent

Trigger Condition	CPU memory of pn32f-svc more than 80%
Severity	Major
Alert details provided	Summary Backend service is in overload with load level > 80% Expression service_resource_overload_level == 3
OID	1.3.6.1.4.1.323.5.3.46.1.2.7009
Metric Used	service_resource_overload_level
Resolution	The alert will be cleared when CPU Memory for backend-svc goes below 80%

5.2.8.4 SEPPServiceOverload90Percent

Table 5-28 SEPPServiceOverload90Percent

Trigger Condition	CPU memory of pn32f-svc more than 90%
Severity	Critical
Alert details provided	Summary Backend service is in overload with load level > 90% Expression service_resource_overload_level == 4
OID	1.3.6.1.4.1.323.5.3.46.1.2.7010
Metric Used	service_resource_overload_level
Resolution	The alert will be cleared when CPU Memory for backend-svc goes below 90%

5.2.9 Hosted SEPP Alerts

5.2.9.1 SEPPPn32fHSRoutingFailureAlert

Table 5-29 SEPPPn32fHSRoutingFailureAlert

Trigger Condition	When the routing failure rate at Pn32f service is greater than 20 percentage.
Severity	Major
Alert details provided	Allowed P-RSS Validation failure at Roaming Hub Expression ((sum by(namespace, app, nfInstanceId, pod) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod) (ocsepp_pn32f_requests_total))) > 0.2
OID	1.3.6.1.4.1.323.5.3.46.1.2.4013
Metric Used	ocsepp_allowed_p_rss_routing_failure_total , ocsepp_pn32f_requests_total
Resolution	The alert gets automatically cleared when the failure rate at pn32f microservice goes below 20 percent.

5.2.9.2 SEPPCn32fHSRoutingFailureAlert

Table 5-30 SEPPCn32fHSRoutingFailureAlert

Trigger Condition	When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity	Minor
Alert details provided	Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP. Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.5
OID	1.3.6.1.4.1.323.5.3.46.1.2.4014
Metric Used	ocsepp_allowed_p_rss_routing_failure_total , ocsepp_cn32f_requests_total
Resolution	The alert gets automatically cleared when the failure rate at cn32f microservice goes below 50 percent.

5.2.9.3 SEPPCn32fHSRoutingFailureAlert

Table 5-31 SEPPCn32fHSRoutingFailureAlert

Trigger Condition	When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity	Major
Alert details provided	Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP. Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.6
OID	1.3.6.1.4.1.323.5.3.46.1.2.4015
Metric Used	ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total
Resolution	The alert gets automatically cleared when the failure rate at cn32f microservice goes below 60 percent.

5.2.9.4 SEPPCn32fHSRoutingFailureAlert

Table 5-32 SEPCn32fHSRoutingFailureAlert

Trigger Condition	When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity	Critical
Alert details provided	Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP. Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.65
OID	1.3.6.1.4.1.323.5.3.46.1.2.4016
Metric Used	ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total
Resolution	The alert gets automatically cleared when the failure rate at cn32f microservice goes below 65 percent.

5.2.9.5 SEPPCn32fHSRoutingFailureAlert

Table 5-33 SEPCn32fHSRoutingFailureAlert

Trigger Condition	When the routing failure rate at Cn32f service is greater than 20 percentage.
Severity	Warning
Alert details provided	Allowed P-RSS Validation failure at Roaming Hub for Consumer SEPP. Expression ((sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_allowed_p_rss_routing_failure_total) ) / (sum by(namespace, app, nfInstanceId, pod, sourceRss) (ocsepp_cn32f_requests_total))) > 0.25
OID	1.3.6.1.4.1.323.5.3.46.1.2.4017
Metric Used	ocsepp_allowed_p_rss_routing_failure_total, ocsepp_cn32f_requests_total
Resolution	The alert gets automatically cleared when the failure rate at cn32f microservice goes below 25 percent.

5.2.10 SEPP Message Feed Alerts

5.2.10.1 DDUnreachableFromN32IGW

Table 5-34 DDUnreachableFromN32IGW

Trigger Condition	This alarm is raised when Data Director is not reachable from N32 Ingress Gateway.
Severity	major
Alert details provided	Summary (oc_ingressgateway_dd_unreachable{app="n32-ingress-gateway"} == 1)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4018
Metric Used	oc_ingressgateway_dd_unreachable
Resolution	Alert gets cleared automatically when the connection with Data Director is established.

5.2.10.2 DDUnreachableFromPLMNIGW

Table 5-35 DDUnreachableFromPLMNIGW

Trigger Condition	This alarm is raised when Data Director is not reachable from PLMN Ingress Gateway.
Severity	major
Alert details provided	Summary (oc_ingressgateway_dd_unreachable{app="n32-ingress-gateway"} == 1)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4019
Metric Used	oc_ingressgateway_dd_unreachable
Resolution	Alert gets cleared automatically when the connection with Data Director is established.

5.2.10.3 DDUnreachableFromN32EGW

Table 5-36 DDUnreachableFromN32EGW

Trigger Condition	This alarm is raised when Data Director is not reachable from N32 Egress Gateway.
Severity	major
Alert details provided	Summary (oc_egressgateway_dd_unreachable{app="n32-egress-gateway"} == 1)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4020
Metric Used	oc_egressgateway_dd_unreachable
Resolution	Alert gets cleared automatically when the connection with Data Director is established.

5.2.10.4 DDUnreachableFromPLMNEGW

Table 5-37 DDUnreachableFromPLMNEGW

Trigger Condition	This alarm is raised when Data Director is not reachable from PLMN Egress Gateway.
Severity	major
Alert details provided	Summary (oc_egressgateway_dd_unreachable{app="plmn-egress-gateway"} == 1)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4021
Metric Used	oc_egressgateway_dd_unreachable
Resolution	Alert gets cleared automatically when the connection with Data Director is established.

5.2.11 Steering of Roaming (SOR) Alerts

5.2.11.1 SEPPPn32fSORFailureAlertPercent30to40

Table 5-38 SEPPPn32fSORFailureAlertPercent30to40

Field	Details
Trigger Condition	30% to 40% of SOR traffic results in failure.
Severity	Minor
Alert details provided	Summary: 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}' Expression: sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.3 and sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)<0.4
OID	1.3.6.1.4.1.323.5.3.46.1.2.4022
Metric Used	ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total
Resolution	This alert will be raised when the percentage failure of SOR responses is in the range 30%-40%, in the sample collected in last 2 min. Possible Resolutions : Check the below headers in the response coming from SOR server. If any of these is missing, it will cause SOR Failure: Server Header Location Header Check if the redirection code (3xx) received from SOR should be the same as the one configured through CNC Console. This code can be viewed in the metricocsepp_pn32f_sor_failure_total. Check if the SOR Server is sending the response code 5xx and whether the code is not configured through CNC Console or retry to Producer NF is disabled. This code can be viewed in the metric ocsepp_pn32f_sor_failure_total. Check if any client error(4xx) is coming while connecting to SoR. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.

5.2.11.2 SEPPPn32fSORFailureAlertPercent40to50

Table 5-39 SEPPPn32fSORFailureAlertPercent40to50

Field	Details
Trigger Condition	40% to 50% of SOR traffic results in failure.
Severity	Major
Alert details provided	Summary: 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}' Expression: sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.4 and sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)<0.5
OID	1.3.6.1.4.1.323.5.3.46.1.2.4023
Metric Used	ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total
Resolution	This alert will be raised when the percentage failure of SOR responses is in the range 40%-50%, in the sample collected in last 2 min. Possible Resolutions : Check the below headers in the response coming from SoR server, if any of these is missing, it will cause SOR Failure: Server Header Location Header Check if the redirection code (3xx) received from SOR should be same as one configured through CNC Console. This code can be viewed in the metricocsepp_pn32f_sor_failure_total. Check if SOR Server is sending response code 5xx and the code is not configured through CNC Console or Retry to Producer NF is disabled. This code can be viewed in the metricocsepp_pn32f_sor_failure_total. Check if any client error (4xx) is coming while connecting to SOR. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.

5.2.11.3 SEPPPn32fSORFailureAlertPercentAbove50

Table 5-40 SEPPPn32fSORFailureAlertPercentAbove50

Field	Details
Trigger Condition	50% of SOR traffic results in failure
Severity	Critical
Alert details provided	Summary: 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}' Expression: sum(rate(ocsepp_pn32f_sor_failure_total[2m]))by(namespace,nf_instance_id,app)/sum(rate(ocsepp_pn32f_sor_requests_total[2m]))by(namespace,nf_instance_id,app)>=0.5
OID	1.3.6.1.4.1.323.5.3.46.1.2.4024
Metric Used	ocsepp_pn32f_sor_failure_total and ocsepp_pn32f_sor_requests_total
Resolution	This alert will be raised when the percentage failure of SOR responses is above 50%, in the sample collected in last 2 min. Possible Resolutions : Check the below headers in the response coming from SOR server, if any of these is missing, it will cause SOR Failure: Server Header Location Header Check if the redirection code(3xx) received from SOR should be same as one configured via CNC Console. This code can be viewed in the metricocsepp_pn32f_sor_failure_total. Check if SOR Server is sending response code 5xx and the code is not configured through CNC Console or retry to Producer NF is disabled. This code can be viewed in the metricocsepp_pn32f_sor_failure_total. Check if any client error(4xx) is coming while connecting to SOR. This code can be viewed in the metricocsepp_pn32f_sor_failure_total.

5.2.11.4 SEPPPn32fSORTimeoutFailureAlert

Table 5-41 SEPPPn32fSORTimeoutFailureAlert

Field	Details
Trigger Condition	Increase of more than five timeout errors in last two minutes for SOR.
Severity	critical
Alert details provided	Summary: 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}' Expression: idelta(ocsepp_pn32f_sor_timeout_failure_total[2m]) > 5 or (ocsepp_pn32f_sor_timeout_failure_total unless ocsepp_pn32f_sor_timeout_failure_total offset 2m)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4025
Metric Used	ocsepp_pn32f_sor_timeout_failure_total
Resolution	This alert will be raised when the response received from SOR Server suggests that server is either down or unreachable for more than five error counts in the sample collected in last two minutes. Possible Resolutions : Check and fix if the SOR server is unreachable. Check and fix if the configuration made through CNC Console has wrong values for server. Check if the FQDN and port configured are correct. The scheme selected must be supported by SOR server.

5.2.12 Global Rate Limiting on Ingress Gateway of SEPP Alerts

5.2.12.1 Ingress RSS Rate Limit per RSS Message Drop Above Point one Percent Alert

Table 5-42 Ingress RSS Rate Limit per RSS Message Drop Above Point one Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 0.1 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity	Warning
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 0.1 Percent of Total Transactions of that RSS Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 0.1 < 10
OID	1.3.6.1.4.1.323.5.3.46.1.2.7011
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.2 Ingress RSS Rate Limit per RSS Message Drop Above 10 Percent Alert

Table 5-43 Ingress RSS Rate Limit per RSS Message Drop Above 10 Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 10 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity	Minor
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 10 Percent of Total Transactions of that RSS Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 10 < 25
OID	1.3.6.1.4.1.323.5.3.46.1.2.7012
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.3 Ingress RSS Rate Limit per RSS Message Drop Above 25 Percent Alert

Table 5-44 Ingress RSS Rate Limit per RSS Message Drop Above 25 Percent Alert:

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 25 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity	Major
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 25 Percent of Total Transactions of that RSS Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 25 < 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.7013
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.4 Ingress RSS Rate Limit per RSS Message Drop Above 50 Percent Alert

Table 5-45 Ingress RSS Rate Limit per RSS Message Drop Above 50 Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate per RSS is detected above 50 percent of total transactions of that RSS, this metric will be pegged and corresponding alert will be raised.
Severity	Critical
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate per RSS detected above 50 Percent of Total Transactions of that RSS Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.7014
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.5 Ingress RSS Rate Limit Message Drop Above Point one Percent Alert

Table 5-46 Ingress RSS Rate Limit Message Drop Above Point one Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 0.1 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity	Warning
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 0.1 Percent of Total Transaction Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (namespace) *100 >= 0.1 < 1
OID	1.3.6.1.4.1.323.5.3.46.1.2.7015
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.6 Ingress RSS Rate Limit Message Drop Above one Percent Alert

Table 5-47 Ingress RSS Rate Limit Message Drop Above one Percent Alert:

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 1 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity	Warning
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 1 Percent of Total Transactions Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (namespace) *100 >= 1 < 10
OID	1.3.6.1.4.1.323.5.3.46.1.2.7016
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.7 Ingress RSS Rate Limit Message Drop Above 10 Percent Alert

Table 5-48 Ingress RSS Rate Limit Message Drop Above 10 Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 10 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity	Minor
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 10 Percent of Total Transactions. Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 10 < 25
OID	1.3.6.1.4.1.323.5.3.46.1.2.7017
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.8 Ingress RSS Rate Limit Message Drop Above 25 Percent Alert

Table 5-49 Ingress RSS Rate Limit Message Drop Above 25 Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 25 percent of total transactions, this metric will be pegged and corresponding alert will be raised.
Severity	Major
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 25 Percent of Total Transactions Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 25 < 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.7018
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.12.9 Ingress RSS Rate Limit Message Drop Above 50 Percent Alert

Table 5-50 Ingress RSS Rate Limit Message Drop Above 50 Percent Alert

Trigger Condition	If a request has to be dropped when all the tokens in the bucket are exhausted and drop rate is detected above 50 percent of total transactions, this metric will be pegged andcorresponding alert will be raised.
Severity	Critical
Alert Details Provided	Summary: Ingress RSS Based Rate Limiting Message Drop Rate detected above 50 Percent of Total Transactions Expression: sum(rate(oc_ingressgateway_rss_ratelimit_total{Status="dropped"}[5m])) by (Remote_SEPP_Set,namespace)/sum(rate(oc_ingressgateway_rss_ratelimit_total[5m])) by (Remote_SEPP_Set,namespace) *100 >= 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.7019
Metric Name	oc_ingressgateway_rss_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.13 Cat-0 SBI Message Schema Validation Alerts

5.2.13.1 SEPPN32fMessageValidationOnHeaderFailureMinorAlert

Table 5-51 SEPPN32fMessageValidationOnHeaderFailureMinorAlert

Field	Details
Trigger Condition	Message validation failed for request query parameters for 40 % of requests (on which message validation was applied) in last 2 minutes.
Severity	minor
Alert Details Provided	Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }} Expression: (sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 40 < 60
OID	1.3.6.1.4.1.323.5.3.46.1.2.4026
Metric Used	ocsepp_message_validation_on_header_failure_total
Resolution	The alerts gets cleared when the count is not between 40 to 60.

5.2.13.2 SEPPN32fMessageValidationOnHeaderFailureMajorAlert

Table 5-52 SEPPN32fMessageValidationOnHeaderFailureMajorAlert

Field	Description
Trigger Condition	Message validation failed for request query parameters for 60 % of requests(on which message validation was applied) in last 2 minutes.
Severity	major
Alert Details Provided	Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }} Expression: (sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 60 < 80
OID	1.3.6.1.4.1.323.5.3.46.1.2.4027
Metric Name	ocsepp_message_validation_on_header_failure_total
Resolution	The alerts gets cleared when the count is not between 60 to 80.Possible Resolutions: Check Logs or Metrics: Review the following metrics for message validation failures: `ocsepp_message_validation_on_body_failure` `ocsepp_message_validation_on_header_failure` To identify the Failing Resource URI and HTTP Method, do the following: For request body validation failures, search for the text: "Message validation failed for request body for request" For query parameter validation failures, search for: "Message validation failed for request query parameter(s) for request" For more detailed information about logs, refer to Oracle Communications Cloud Native Core, Security Edge Protection Proxy Troubleshooting Guide. In CNC Console GUI, navigate to SEPP and select Security Countermeasure from the left-hand menu. Click Cat 0 - SBI Message Schema Validation to open the Message Validation List. Search for the relevant resource URI to retrieve the corresponding schema. Compare the request body or query parameters against the schema to ensure the request complies with the schema. If necessary, update the schema to reflect the correct structure.

5.2.13.3 SEPPN32fMessageValidationOnHeaderFailureCriticalAlert

Table 5-53 SEPPN32fMessageValidationOnHeaderFailureCriticalAlert

Field	Description
Trigger Condition	Message validation failed for request query parameters for 80 % of requests(on which message validation was applied) in last 2 minutes.
Severity	critical
Alert Details Provided	Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }} Expression: (sum(rate(ocsepp_message_validation_on_header_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 80
OID	1.3.6.1.4.1.323.5.3.46.1.2.4028
Metric Name	ocsepp_message_validation_on_header_failure_total
Resolution	The alerts gets cleared when the count is not between 80 to 100.

5.2.13.4 SEPPN32fMessageValidationOnBodyFailureMinorAlert

Table 5-54 SEPPN32fMessageValidationOnBodyFailureMinorAlert

Field	Description
Trigger Condition	Message validation failed for request body for 40 % of requests(on which message validation was applied) in last 2 minutes.
Severity	minor
Alert Details Provided	Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }} Expression: (sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 40 < 60
OID	1.3.6.1.4.1.323.5.3.46.1.2.4029
Metric Name	ocsepp_message_validation_on_body_failure_total
Resolution	The alerts gets cleared when the count is not between 60 to 100.

5.2.13.5 SEPPN32fMessageValidationOnBodyFailureMajorAlert

Table 5-55 SEPPN32fMessageValidationOnBodyFailureMajorAlert

Field	Details
Trigger Condition	Message validation failed for request body for 60 % of requests(on which message validation was applied) in last 2 minutes.
Severity	major
Alert Details Provided	Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }} Expression: (sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 60 < 80
OID	1.3.6.1.4.1.323.5.3.46.1.2.4030
Metric Name	ocsepp_message_validation_on_body_failure_total
Resolution	The alerts gets cleared when the count is not between 80 to 100.

5.2.13.6 SEPPN32fMessageValidationOnBodyFailureCriticalAlert

Table 5-56 SEPPN32fMessageValidationOnBodyFailureCriticalAlert

Field	Details
Trigger Condition	Message validation failed for request body for 80 % of requests(on which message validation was applied) in last 2 minutes.
Severity	critical
Alert Details Provided	Summary: Namespace: {{ $labels.kubernetes_namespace }}, Podname: {{$labels.kubernetes_pod_name}}, App: {{ $labels.app }}, Nfinstanceid: {{ $labels.nfInstanceId }} Expression:(sum(rate(ocsepp_message_validation_on_body_failure_total[2m])) by (app, pod, namespace, nf_instance_id) /sum(rate(ocsepp_message_validation_applied_total[2m])) by (app, pod, namespace, nf_instance_id))*100 >= 80
OID	1.3.6.1.4.1.323.5.3.46.1.2.4031
Metric Name	ocsepp_message_validation_on_body_failure_total
Resolution	The alerts gets cleared when the count is not between 80 to 100.

5.2.14 Cat-1 Service API Validation Alerts

5.2.14.1 SEPPN32fServiceApiValidationFailureAlert

Table 5-57 SEPPN32fServiceApiValidationFailureAlert

Trigger Condition	Service API not in allowed list
Severity	Major
Alert details provided	Summary N32f : Service API not in allowed list Expression: delta(ocsepp_security_service_api_failure_total[2m]) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4005
Metric Used	ocsepp_security_service_api_failure_total
Resolution 1	This alert will be raised when there is difference of at least 1 between first and last data point in sample collected in last 2 minutes. Alert will be cleared after 2 minutes. Possible Resolutions: Check the Resource URI + Method for which alert is raised. Verify the error_msg using "ocsepp_security_service_api_failure_total" metric and KPI. Fix or add configuration for the Resource URI + Method in Service API's and Allowed List.
Resolution 2	The alert gets cleared when the N32C Handshake is established after successful TCP connection to remote SEPP. Steps: The failure reason is present in the alert. Possible Resolutions: Disable the Remote SEPP. Delete the Remote SEPP. Update and reinitiate Handshake.

5.2.15 Cat-2 Network ID Validation Alerts

5.2.15.1 SEPPN32fNetworkIDValidationHeaderFailureAlert

Table 5-58 SEPPN32fNetworkIDValidationHeaderFailureAlert

Field	Details
Trigger Condition	If Network ID Validation for Header fails, this metrics will be pegged and corresponding alert will be raised.
Severity	Major
Alert details provided	Summary: 'namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Network ID Validation has failed because {{ $labels.cause }}' Expression: sum(increase(ocsepp_network_id_validation_header_failure_total[2m]) >0 or (ocsepp_network_id_validation_header_failure_total unless ocsepp_network_id_validation_header_failure_total offset 2m )) by (namespace, remote_sepp_name, nf_instance_id, peer_fqdn, plmn_identifier, app, resource_uri, pod) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4011
Metric Used	ocsepp_network_id_validation_header_failure_total
Resolution	The alerts gets cleared when the count goes below 0.

5.2.15.2 SEPPN32fNetworkIDValidationBodyIEFailureAlert

Table 5-59 SEPPN32fNetworkIDValidationBodyIEFailureAlert

Field	Details
Trigger Condition	If Network ID Validation for Body fails, this metrics will be pegged and corresponding alert will be raised.
Severity	Major
Alert details provided	Summary: 'namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Network ID Body Validation has failed because {{ $labels.cause }}' Expression: sum(increase(ocsepp_network_id_validation_body_failure_total[2m]) >0 or (ocsepp_network_id_validation_body_failure_total unless ocsepp_network_id_validation_body_failure_total offset 2m )) by (namespace, remote_sepp_name, nf_instance_id, peer_fqdn, plmn_identifier, app, resource_uri, pod) > 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4012
Metric Used	ocsepp_network_id_validation_body_failure_total
Resolution	The alerts gets cleared when the count goes below 0.

5.2.16 Cat-3 Previous Location Check Alerts

5.2.16.1 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent30to40

Table 5-60 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent30to40

Trigger Condition	When previous location check validation failure error is detected between 30 to 40 Percent of Total Transactions , this alert will be raised.
Severity	Minor
Alert Details Provided	Summary Previous location check validation failure detected between 30 to 40 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.3 and sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.4
OID	1.3.6.1.4.1.323.5.3.46.1.2.4032
Metric Name	ocsepp_previous_location_validation_failure_total
Resolution	The alerts gets cleared when the previous location check validation failure error does not lie between 30 to 40 percent of total transactions.

5.2.16.2 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent40to50

Table 5-61 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercent40to50

Trigger Condition	When previous location check validation failure error is detected between 40 to 50 Percent of Total Transactions , this alert will be raised.
Severity	Major
Alert Details Provided	Summary Previous location check validation failure detected between 40 to 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.4 and sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.5
OID	1.3.6.1.4.1.323.5.3.46.1.2.4033
Metric Name	ocsepp_previous_location_validation_failure_total
Resolution	The alerts gets cleared when the previous location check validation failure error does not lie between 40 to 50 percent of total transactions.

5.2.16.3 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercentAbove50

Table 5-62 SEPPPn32fPreviousLocationCheckValidationFailureAlertPercentAbove50

Trigger Condition	When previous location check validation failure error is detected above 50 Percent of Total Transactions , this alert will be raised.
Severity	Critical
Alert Details Provided	Summary Previous location check validation failure detected above 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_validation_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.5"
OID	1.3.6.1.4.1.323.5.3.46.1.2.4034
Metric Name	ocsepp_previous_location_validation_failure_total
Resolution	The alerts gets cleared when the previous location check validation failure error does not lie above 50 percent of total transactions.

5.2.16.4 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent30to40

Table 5-63 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent30to40

Trigger Condition	When previous location check exception failure is detected between 30 to 40 Percent of Total Transactions , this alert will be raised.
Severity	Minor
Alert Details Provided	Summary Previous location check exception failure detected between 30 to 40 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.3 and sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.4
OID	1.3.6.1.4.1.323.5.3.46.1.2.4035
Metric Name	ocsepp_previous_location_exception_failure_total
Resolution	The alerts gets cleared when the previous location check exception failure does not lie between 30 to 40 percent of total transactions.

5.2.16.5 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent40to50

Table 5-64 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercent40to50

Trigger Condition	When previous location check exception failure error is detected between 40 to 50 Percent of Total Transactions , this alert will be raised.
Severity	Major
Alert Details Provided	Summary Previous location check exception failure detected between 40 to 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.4 and sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)<0.5
OID	1.3.6.1.4.1.323.5.3.46.1.2.4036
Metric Name	ocsepp_previous_location_exception_failure_total
Resolution	The alerts gets cleared when the previous location check exception failure error does not lie between 40 to 50 percent of total transactions.

5.2.16.6 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercentAbove50

Table 5-65 SEPPPn32fPreviousLocationCheckExceptionFailureAlertPercentAbove50

Trigger Condition	When previous location check exception failure error is detected above 50 Percent of Total Transactions , this alert will be raised.
Severity	Critical
Alert Details Provided	Summary Previous location check exception failure detected above 50 Percent of Total Transactions Expression sum(rate(ocsepp_previous_location_exception_failure_total[2m]))by(namespace)/sum(rate(ocsepp_previous_location_validation_requests_total[2m]))by(namespace)>=0.5
OID	1.3.6.1.4.1.323.5.3.46.1.2.4037
Metric Name	ocsepp_previous_location_exception_failure_total
Resolution	The alerts gets cleared when the previous location check exception failure error does not lie above 50 percent of total transactions.

5.2.17 Cat-3 Time Check for Roaming Subscribers

5.2.17.1 pn32fTimeUnauthLocChkValFailAlrtMinor

Table 5-66 pn32fTimeUnauthLocChkValFailAlrtMinor

Field	Details
Trigger Condition	Triggered in case of a minor failure for Cat-3Time Unauthenticated Location Check.
Severity	Minor
Alert Details Provided	Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) >= 1 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) <= 10
OID	1.3.6.1.4.1.323.5.3.46.1.2.4055
Metric Name	ocsepp_time_unauthenticated_location_validation_failure_total
Resolution	The alert gets cleared when the failure count is above 10.

5.2.17.2 pn32fTimeUnauthLocChkValFailAlrtMajor

Table 5-67 pn32fTimeUnauthLocChkValFailAlrtMajor

Field	Details
Trigger Condition	Triggered in case of a major failure for Cat-3 Time Unauthenticated Location Check.
Severity	Major
Alert Details Provided	Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) >= 11 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m) <= 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.4056
Metric Name	ocsepp_time_unauthenticated_location_validation_failure_total
Resolution	The alert gets cleared when the failure count is not in between 10 and 50.

5.2.17.3 pn32fTimeUnauthLocChkValFailAlrtCritical

Table 5-68 pn32fTimeUnauthLocChkValFailAlrtCritical

Field	Details
Trigger Condition	Triggered in case of a critical failure for Cat-3 Time Unauthenticated Location Check.
Severity	Critical
Alert Details Provided	Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} Expression `sum(increase(ocsepp_time_unauthenticated_location_validation_failure_total[2m]) or ocsepp_time_unauthenticated_location_validation_failure_total unless ocsepp_time_unauthenticated_location_validation_failure_total offset 2m ) by (namespace,nf_instance_id,app,pod) >=51`
OID	1.3.6.1.4.1.323.5.3.46.1.2.4057
Metric Name	ocsepp_time_unauthenticated_location_validation_failure_total
Resolution	The alert gets cleared when the failure count is below 51.

5.2.17.4 pn32fTimeUnauthLocChkExcepFailAlrtMinor

Table 5-69 pn32fTimeUnauthLocChkExcepFailAlrtMinor

Field	Details
Trigger Condition	Triggered in case of a minor exception for Cat-3 Time Unauthenticated Location Check.
Severity	Minor
Alert Details Provided	Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) >= 1 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) <= 10
OID	1.3.6.1.4.1.323.5.3.46.1.2.4058
Metric Name	ocsepp_time_unauthenticated_location_exception_failure_total
Resolution	The alert gets cleared when the exception count is above 10.

5.2.17.5 pn32fTimeUnauthLocChkExcepFailAlrtMajor

Table 5-70 pn32fTimeUnauthLocChkExcepFailAlrtMajor

Field	Details
Trigger Condition	Triggered in case of a major exception for Cat-3 Time Unauthenticated Location Check.
Severity	Major
Alert Details Provided	Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} Expression sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) >= 11 and sum by (namespace, nf_instance_id, app, pod) (increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m) <= 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.4059
Metric Name	ocsepp_time_unauthenticated_location_exception_failure_total
Resolution	The alert gets cleared when the exception count is not in between 10 and 50.

5.2.17.6 pn32fTimeUnauthLocChkExcepFailAlrtCritical

Table 5-71 pn32fTimeUnauthLocChkExcepFailAlrtCritical

Field	Details
Trigger Condition	Triggered in case of a critical exception for Cat-3 Time Unauthenticated Location Check.
Severity	Critical
Alert Details Provided	Summary namespace: {{ $labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} Expression `sum(increase(ocsepp_time_unauthenticated_location_exception_failure_total[2m]) or ocsepp_time_unauthenticated_location_exception_failure_total unless ocsepp_time_unauthenticated_location_exception_failure_total offset 2m ) by (namespace,nf_instance_id,app,pod) >=51`
OID	1.3.6.1.4.1.323.5.3.46.1.2.4060
Metric Name	ocsepp_time_unauthenticated_location_exception_failure_total
Resolution	The alert gets cleared when the exception count is below 51.

5.2.18 Rate Limiting for Egress Roaming Signaling per PLMN Alerts

5.2.18.1 Egress Request Rate Limit per PLMN Message Drop Above 10 Percent Alert

Table 5-72 Egress Request Rate Limit per PLMN Message Drop Above 10 Percent Alert

Trigger Condition	If a request is dropped due to the tokens in the bucket are exhausted and drop rate per PLMN is detected above 10 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised.
Severity	Minor
Alert Details Provided	Summary Egress Rate Limiting Request Drop Rate detected per PLMN above 10 Percent of Total Transactions Expression sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 10 < 25
OID	1.3.6.1.4.1.323.5.3.46.1.2.4039
Metric Name	oc_ingressgateway_plmn_egress_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.18.2 Egress Request Rate Limit per PLMN Message Drop Above 25 Percent Alert

Table 5-73 Egress Request Rate Limit per PLMN Message Drop Above 25 Percent Alert

Trigger Condition	If a request is dropped due to the tokens in the bucket are exhausted and drop rate per PLMN is detected above 25 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised.
Severity	Major
Alert Details Provided	Summary Egress Rate Limiting Request Drop Rate detected per PLMN above 25 Percent of Total Transactions Expression sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 10 < 25
OID	1.3.6.1.4.1.323.5.3.46.1.2.4040
Metric Name	oc_ingressgateway_plmn_egress_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.18.3 Egress Request Rate Limit per PLMN Message Drop Above 50 Percent Alert

Table 5-74 Egress Request Rate Limit per PLMN Message Drop Above 50 Percent Alert

Trigger Condition	If a request is dropped due to the tokens in the bucket are exhausted and the drop rate per PLMN is detected above 50 percent of total transactions of that PLMN, oc_ingressgateway_plmn_egress_ratelimit_total metric will be pegged and corresponding alert will be raised.
Severity	Critical
Alert Details Provided	Summary Egress Rate Limiting Request Drop Rate detected per PLMN above 50 Percent of Total Transactions Expression sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total{Status="ERL_MATCH_NO_TOKEN_LOW_PRI_REJECT"}[5m])) by (EgressRateLimitList,PLMN_ID,namespace)/sum(rate(oc_ingressgateway_plmn_egress_ratelimit_total[5m])) by (EgressRateLimitList,PLMN_ID,namespace) *100 >= 50
OID	1.3.6.1.4.1.323.5.3.46.1.2.4041
Metric Name	oc_ingressgateway_plmn_egress_ratelimit_total
Resolution	The alerts gets cleared when the count goes down.

5.2.19 Separate Port Configurations for N32c and N32f on the Egress Routes Alerts

5.2.19.1 EgressInterfaceConnectionFailure

Table 5-75 EgressInterfaceConnectionFailure

Field	Details
Trigger Condition	If the destination host and port mentioned in the Remote profile are unreachable or not available, then the alert will be raised.
Severity	Major
Alert Details Provided	Summary: Egress connection failure on the interface Expression: sum(increase(oc_egressgateway_connection_failure_total{app="n32-egress-gateway"}[5m])) by (namespace,app,Host,Port) >0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4042
Metric Name	oc_egressgateway_connection_failure_total
Resolution	If the destination host and port are reachable, then the alert will be cleared.

5.2.20 Support for TLS 1.3

5.2.20.1 SEPPConnectionFailurePLMNIGWAlert

Table 5-76 SEPPConnectionFailurePLMNIGWAlert

Field	Details
Trigger Condition	Connection failure occurs for incoming traffic at PLMN Ingress Gateway
Severity	Major
Alert details provided	Summary: `namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Incoming connection failure on plmn-ingress-gateway due to {{ $labels.error_reason }}` Expression: `sum(increase(oc_ingressgateway_connection_failure_total{app="plmn-ingress-gateway"}[5m]) >0 or (oc_ingressgateway_connection_failure_total{app="plmn-ingress-gateway"} unless oc_ingressgateway_connection_failure_total{app="plmn-ingress-gateway"} offset 5m )) by (namespace,app) > 0`
OID	1.3.6.1.4.1.323.5.3.46.1.2.4043
Metric used	oc_ingressgateway_connection_failure_total
Resolution	After resolving the reason for the connection failure, this alert will be removed.

5.2.20.2 SEPPConnectionFailureN32IGWAlert

Table 5-77 SEPPConnectionFailureN32IGWAlert

Field	Details
Trigger Condition	Connection failure occurs for incoming traffic at N32 Ingress Gateway
Severity	Major
Alert details provided	Summary: `namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Incoming connection failure on n32-ingress-gateway due to {{ $labels.error_reason }}` Expression: `sum(increase(oc_ingressgateway_connection_failure_total{app="n32-ingress-gateway"}[5m]) >0 or (oc_ingressgateway_connection_failure_total{app="n32-ingress-gateway"} unless oc_ingressgateway_connection_failure_total{app="n32-ingress-gateway"} offset 5m )) by (namespace,app) > 0`
OID	1.3.6.1.4.1.323.5.3.46.1.2.4044
Metric used	oc_ingressgateway_connection_failure_total
Resolution	After resolving the reason for connection failure, this alert will be removed.

5.2.20.3 SEPPX509CertificateExpiryAlertMinor

Table 5-78 SEPPX509CertificateExpiryAlertMinor

Field	Details
Trigger Condition	When TLS certificate is valid for only 6 months before expiration.
Severity	Minor
Alert details provided	Summery: `Certificate expiry in less than 6 months` Expression: `security_cert_x509_expiration_seconds - time() <= 15724800`
OID	1.3.6.1.4.1.323.5.3.46.1.2.4045
Metric used	security_cert_x509_expiration_seconds
Resolution	Only after certificates have been updated, this alert will be removed.

5.2.20.4 SEPPX509CertificateExpiryAlertMajor

Table 5-79 SEPPX509CertificateExpiryAlertMajor

Field	Details
Trigger Condition	When TLS certificate is valid for only 3 months before expiration.
Severity	Major
Alert details provided	Summery: `Certificate expiry in less than 3 months` Expression: `security_cert_x509_expiration_seconds - time() <= 7862400`
OID	1.3.6.1.4.1.323.5.3.46.1.2.4046
Metric used	security_cert_x509_expiration_seconds
Resolution	Only after certificates have been updated, this alert will be removed.

5.2.20.5 SEPPX509CertificateExpiryAlertCritical

Table 5-80 SEPPX509CertificateExpiryAlertCritical

Field	Details
Trigger Condition	When TLS certificate is valid for only 1 month before expiration.
Severity	Critical
Alert details provided	Summery: `Certificate expiry in less than 1 month` Expression: `security_cert_x509_expiration_seconds - time() <= 2592000`
OID	`1.3.6.1.4.1.323.5.3.46.1.2.4047`
Metric used	security_cert_x509_expiration_seconds
Resolution	Only after certificates have been updated, this alert will be removed.

5.2.21 Multiple SEPP Instances on Shared cnDBTier Cluster Alerts

5.2.21.1 Cn32fConnectionFailureWithDatabaseAlert

Table 5-81 Cn32fConnectionFailureWithDatabaseAlert

Field	Details
Trigger Condition	ocsepp_cn32f_database_connectivity_healthy = 0
Severity	Major
Alert Details Provided	Summary: Alert is raised when connectivity is broken between CN32f and cnDBTier. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_cn32f_database_connectivity_healthy == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4050
Metric Name	ocsepp_cn32f_database_connectivity_healthy
Resolution	Restore the connectivity between SEPP and cnDBTier.

5.2.21.2 Cn32cConnectionFailureWithDatabaseAlert

Table 5-82 Cn32cConnectionFailureWithDatabaseAlert

Field	Details
Trigger Condition	ocsepp_cn32c_database_connectivity_healthy == 0
Severity	Major
Alert Details Provided	Summary: Alert is raised when connectivity is broken between CN32c and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_cn32c_database_connectivity_healthy == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4051
Metric Name	ocsepp_cn32c_database_connectivity_healthy
Resolution	Restore the connectivity between SEPP and cnDBTier.

5.2.21.3 Pn32fConnectionFailureWithDatabaseAlert

Table 5-83 Pn32fConnectionFailureWithDatabaseAlert

Field	Details
Trigger Condition	ocsepp_pn32f_database_connectivity_healthy == 0
Severity	Major
Alert Details Provided	Summary: Alert is raised when connectivity is broken between PN32F and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_pn32f_database_connectivity_healthy == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4052
Metric Name	ocsepp_pn32f_database_connectivity_healthy
Resolution	Restore the connectivity between SEPP and cnDBTier.

5.2.21.4 Pn32cConnectionFailureWithDatabaseAlert

Table 5-84 Pn32cConnectionFailureWithDatabaseAlert

Field	Details
Trigger Condition	ocsepp_pn32c_database_connectivity_healthy == 0
Severity	Major
Alert Details Provided	Summary: Alert is raised when connectivity is broken between PN32C and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_pn32c_database_connectivity_healthy == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4053
Metric Name	ocsepp_pn32c_database_connectivity_healthy
Resolution	Restore the connectivity between SEPP and cnDBTier.

5.2.21.5 ConfigManagerConnectionFailureWithDatabaseAlert

Table 5-85 ConfigManagerConnectionFailureWithDatabaseAlert

Trigger Condition	ocsepp_configmgr_database_connectivity_healthy == 0
Severity	Major
Alert Details Provided	Summary: Alert is raised when connectivity is broken between PN32C and cnDBTier for more than 30 seconds. Metric value is pegged as 0 and then alert is raised. Expression: ocsepp_configmgr_database_connectivity_healthy == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4054
Metric Name	ocsepp_pn32c_database_connectivity_healthy
Resolution	Restore the connectivity between SEPP and cnDBTier.

5.2.21.6 Cn32fIncorrectDatabaseConfigurationAlert

Table 5-86 Cn32fIncorrectDatabaseConfigurationAlert

Field	Details
Trigger Condition	This alert will be raised when incorrect database configuration provided for cn32f service and resulting in connection failure with database.
Severity	Major
Alert Details Provided	Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="cn32f-svc"} unless on (namespace) absent(hikaricp_connections{app="cn32f-svc"})) == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4057
Metric Name	NA
Resolution	Configure correct values in the deployment of the Cn32f pod.

5.2.21.7 Cn32cIncorrectDatabaseConfigurationAlert

Table 5-87 Cn32cIncorrectDatabaseConfigurationAlert

Field	Details
Trigger Condition	This alert will be raised when incorrect database configuration provided for cn32c service and resulting in connection failure with database.
Severity	Major
Alert Details Provided	Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="cn32c-svc"} unless on (namespace) absent(hikaricp_connections{app="cn32c-svc"})) == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4056
Metric Name	NA
Resolution	Configure correct values in the deployment of the Cn32c pod.

5.2.21.8 Pn32fIncorrectDatabaseConfigurationAlert

Table 5-88 Pn32fIncorrectDatabaseConfigurationAlert

Field	Details
Trigger Condition	This alert will be raised when incorrect database configuration provided for pn32f service and resulting in connection failure with database.
Severity	Major
Alert Details Provided	Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="pn32f-svc"} unless on (namespace) absent(hikaricp_connections{app="pn32f-svc"})) == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4058
Metric Name	NA
Resolution	Configure correct values in the deployment of the Pn32f pod.

5.2.21.9 pn32cIncorrectDbConf

Table 5-89 pn32cIncorrectDbConf

Field	Details
Trigger Condition	This alert will be raised when incorrect database configuration provided for pn32c service and resulting in connection failure with database.
Severity	Major
Alert Details Provided	Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="pn32c-svc"} unless on (namespace) absent(hikaricp_connections{app="pn32c-svc"})) == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4059
Metric Name	NA
Resolution	Configure correct values in the deployment of the pn32c pod.

5.2.21.10 ConfigManagerIncorrectDatabaseConfigurationAlert

Table 5-90 ConfigManagerIncorrectDatabaseConfigurationAlert

Field	Details
Trigger Condition	This alert will be raised when incorrect database configuration provided for config manager service and resulting in connection failure with database.
Severity	Major
Alert Details Provided	Summary: Due to incorrect database configuration, connection failed with database. Expression: (up{app="config-mgr-svc"} unless on (namespace) absent(hikaricp_connections{app="config-mgr-svc"})) == 0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4055
Metric Name	NA
Resolution	Configure correct values in the deployment of the ConfigManager pod.

5.2.22 Proactive Status Updates on SEPP Alerts

5.2.22.1 EgressGatewayPeerUnhealthyAlert

Table 5-91 EgressGatewayPeerUnhealthyAlert

Field	Details
Trigger Condition	When a peer becomes unhealthy or `oc_egressgateway_peer_health_status` for a peer value = 1
Severity	Major
Alert Details Provided	Summary Peer is unhealthy Expression sum(oc_egressgateway_peer_health_status{app="n32-egress-gateway"}) by (namespace,app,peer) >0
OID	1.3.6.1.4.1.323.5.3.46.1.2.4048
Metric Name	`oc_egressgateway_peer_health_status`
Resolution	When peer becomes healthy again, that is, the`oc_egressgateway_peer_health_status` for the peer becomes 0.

5.2.22.2 EgressGatewayAllPeersUnhealthyAlert

Table 5-92 EgressGatewayAllPeersUnhealthyAlert

Field	Details
Trigger Condition	When all peers in a peerset become unhealthy.
Severity	Critical
Alert Details Provided	Summary All peers unhealthy Expression (sum(oc_egressgateway_peer_count) by (namespace) -sum(oc_egressgateway_peer_available_count) by (namespace))==sum(oc_egressgateway_peer_count) by (namespace)
OID	1.3.6.1.4.1.323.5.3.46.1.2.4049
Metric Name	`oc_egressgateway_peer_count,oc_egressgateway_peer_available_count`
Resolution	When all peers in a peerset become healthy or when even 1 peer in a peerset becomes healthy.

Trigger Condition	Pod memory usage is above the threshold (70% )
Severity	Warning
Alert details provided	Summary 'namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}: Memory usage is {{ $value \| printf "%.2f" }} which is above 70% (current value is: {{ $value }})' Expression: (sum by(namespace,container,pod) (container_memory_usage_bytes{container=~".cn32c-svc.\|.pn32c-svc.\|.cn32f-svc.\|.pn32f-svc.\|.config-mgr-svc.\|.n32-egress-gateway.\|.n32-ingress-gateway.\|.plmn-egress-gateway.\|.plmn-ingress-gateway.\|.nf-mediation."})) / (sum by (namespace,container,pod)(kube_pod_container_resource_limits{resource="memory",container=~".cn32c-svc.\|.pn32c-svc.\|.cn32f-svc.\|.pn32f-svc.\|.config-mgr-svc.\|.n32-egress-gateway.\|.n32-ingress-gateway.\|.plmn-egress-gateway.\|.plmn-ingress-gateway.\|.nf-mediation."}) ) * 100 >= 70
OID	1.3.6.1.4.1.323.5.3.46.1.2.4003
Metric Used	kube_pod_container_resource_limits Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.
Resolution	The alert gets cleared when the memory utilization falls below the critical threshold. Note: The threshold is configurable in the SeppAlertrules.yaml file. If guidance is required, contact My Oracle Support.

5 SEPP Alerts

5.1 System Level Alerts

5.1.1 SEPPPodMemoryUsageAlert

5.1.2 SEPPPodCpuUsageAlert

5.2 Application Level Alerts