Alerts

6 Alerts

This section provides information about the supported alerts and how to configure the alerts.

Note:

The performance and capacity of the SCP system may vary based on the call model, feature or interface configuration, network conditions, and underlying CNE and hardware environment.

You can configure alerts in Prometheus and ScpAlertrules.yaml file.

The following table provides information about Service Communication Proxy (SCP) alerts.

Caution:

User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. The PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content when the hyphens or any special characters are part of the copied content.

Note:

kubectl commands might vary based on the platform deployment. Replace kubectl with Kubernetes environment-specific command line tool to configure Kubernetes resources through kube-api server. The instructions provided in this document are as per the Oracle Communications Cloud Native Environment (OCCNE) version of kube-api server.
The alert file can be customized as required by the deployment environment. For example, namespace can be added as a filtered criteria to the alert expression to filter alerts only for a specific namespace.

6.1 System level alerts

This section lists the system level alerts.

6.1.1 SCPNotificationPodMemoryUsage

Table 6-1 SCPNotificationPodMemoryUsage

Field	Description
Severity	Major
Conditions	sum(container_memory_usage_bytes{image!="",pod=~".*scpc-notification.+"}) by (pod,namespace, instance) > 3006477107
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3001
Description	Notify Notification service Pod memory usage if it is above threshold Threshold value is 85% of allocated (4GB) memory: 3.4 GB
Recommended Actions	Cause: When high notification rate or very large NF profile size is present in notifications. Diagnostic Information: Monitor the notification metric: ocscp_nrf_notifications_requests_nf_total. Notification usage reduces after some time when it crosses 2.5 GB or 3 GB. Recovery: This alert is cleared automatically when the scpc-notification pod memory usage reduces below the defined threshold. Reduce the notification rate. These notifications are generated by NRF and can be controlled through NRF. For any assistance, contact My Oracle Support.

6.1.2 SCPWorkerPodMemoryUsage

Table 6-2 SCPWorkerPodMemoryUsage

Field	Description
Severity	major
Conditions	sum(container_memory_usage_bytes{image!="",pod=~".*scp-worker.+"}) by (pod,namespace, instance) > 6012954214
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7004
Description	Notify Worker per Pod memory usage is above threshold Threshold value is 85% of allocated (8GB) memory: 7.3 GB
Recommended Actions	Cause: When there is high traffic rate, alternate routing, more number of routing rules and rules size, and due to network or producer NF latency. Diagnostic Information: Monitor traffic rate, alerts, and latency on the KPI Dashboard. Check the traffic rates of the following metrics if they are too high: ocscp_metric_http_rx_req_total ocscp_metric_http_tx_req_total ocscp_metric_http_rx_res_total ocscp_metric_http_tx_res_total Check the upstream response time by using the following command and ensure whether upstream is taking too long to respond: ocscp_metric_upstream_service_time_total. Check the following platform metric for current memory usage by the scp-worker pod: container_memory_usage_bytes. Recovery: This alert is cleared automatically when the scp-worker pod memory usage reduces below the defined threshold. Reduce the traffic rate and improve the latency. For any assistance, contact My Oracle Support.

6.1.3 SCPInstanceDown

Table 6-3 SCPInstanceDown

Field	Description
Severity	Critical
Conditions	kube_pod_status_ready{pod =~ '.scp-worker.\|.scpc-notification.\|.scpc-subscription.\|.scpc-configuration.\|.scpc-audit.\|,\|.scpc-alternate-resolution.\|,condition =~ 'true'} !=1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7006
Description	Notify that if any pod in ocscp release is down. Provides information like pod name, instance id and app name.
Recommended Actions	Cause: When the following issues occur: The Control plane pods, such as configuration, subscription, notification, audit, and alternate-resolution are down due to connection failure with DB. Pod restarts due to kubernetes liveliness or readiness probe failures. Application restarts or starts failure. Diagnostic Information: Check if DB services are active by running the following command: `kubectl describe pod <podname> -n <namespace>` Check kubernetes events for probe failures in the platform logs. Check if any exception is reported in the SCP application logs. Recovery: This alert is cleared automatically when the inactive pod is active. Recover DB services if down. Collect the application logs and contact My Oracle Support for any assistance.

6.2 Application level alerts

This section lists the application level alerts.

6.2.1 SCPCcaFeatureEnabledWithoutHttps

Table 6-4 SCPCcaFeatureEnabledWithoutHttps

Field	Description
Severity	Info
Condition	ocscp_worker_cca_validation_feature_enabled_without_https > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.9022
Description	An alert is raised when the CCA validation feature is enabled without enabling HTTPS.
Recommended Actions	Cause: CCA validation feature is enabled without enabling HTTPS. Diagnostic Information: Deploy HTTPS SCP deployment. Recovery: The alert is cleared automatically if either the CCA feature is disabled or deployment is changed to HTTPS. For any assistance, contact My Oracle Support.

6.2.2 SCPIngressTrafficRateAboveMinorThreshold

Table 6-5 SCPIngressTrafficRateAboveMinorThreshold

Field	Description
Severity	minor
Condition	sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m]))by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name)>= 1200 to 1400
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7001
Description	Notify that Traffic rate is from 1200 to 1400 mps (user configure minor threshold value) with Locality and current value of traffic rate.
Recommended Actions	Cause: When the Consumer NF sends more traffic than expected. Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard. Refer to the rate of ocscp_metric_http_rx_req_total metric on the Grafana GUI. Recovery: This alert is cleared automatically when the ingress traffic reduces below the minor threshold or exceeds the major threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue. For any assistance, contact My Oracle Support.

6.2.3 SCPIngressTrafficRateAboveMajorThreshold

Table 6-6 SCPIngressTrafficRateAboveMajorThreshold

Field	Description
Severity	major
Conditions	sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m]))by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1400 to 1600
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7001
Description	Notify that Traffic rate is from 1400 to 1600 mps (user configure major threshold value) with Locality and current value of traffic rate.
Recommended Actions	Cause: When the Consumer NF sends more traffic than expected. Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard. Refer to the rate of ocscp_metric_http_rx_req_total metric on the Grafana GUI. Recovery: This alert is cleared automatically when the ingress traffic reduces below the major threshold or exceeds the critical threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue. If this alert continues for a long duration, then reduce the ingress traffic from consumer to pod. For any assistance, contact My Oracle Support.

6.2.4 SCPIngressTrafficRateAboveCriticalThreshold

Table 6-7 SCPIngressTrafficRateAboveCriticalThreshold

Field	Description
Severity	critical
Conditions	sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m]))by(kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1600
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7001
Description	Notify that Traffic rate is above 1600mps (user configure critical threshold value) with Locality and current value of traffic rate.
Recommended Actions	Cause: When the Consumer NF sends more traffic than expected. Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard. Refer to the rate of the ocscp_metric_http_rx_req_total metric on the Grafana GUI. Recovery: This alert is cleared automatically when the ingress traffic reduces below the critical threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue. If this alert continues for a long duration, then reduce the ingress traffic from consumer to pod. For any assistance, contact My Oracle Support.

6.2.5 SCPRoutingFailedForProducer

Table 6-8 SCPRoutingFailedForProducer

Field	Description
Severity	Info
Conditions	increase(ocscp_metric_routing_attempt_fail_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7005
Description	Notify that Routing failed for producer. Provides detail such as NFService Type, NFType, Locality, producer FQDN and value.
Recommended Actions	Cause: When routing fails to select a producer NF due to unavailability of routing rules for an NF service or producer. Diagnostic Information: Check whether the routing rules are configured for the NF for which routing failed. Check the notification logs for any error while processing the notification of the NF for which routing failed. Then, run the following command to get the notification logs:`kubectl logs <podname> -n <namespace>` Check whether the NF is reachable or not by using one of the following steps: Run the ping command from primary/secondary nodes using IP of service. Example of a ping command: `ping <IPAddress>`. Run the ping command from inside the pod, if FQDN of service is used. Instead of using the ping command, you can collect tcpdump for ensuring the connectivity. tcpdump must be run on the debug container for scp-worker microservice. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. Recovery: This alert is cleared automatically when the routing is complete for a producer NF or no more traffic is received in the next Promethues scrape interval. Check if the NF is deregistered. Register the NF to create routing rules if rules do not exist. For any assistance, contact My Oracle Support.

6.2.6 SCPAuditErrorResponse

Table 6-9 SCPAuditErrorResponse

Field	Description
Severity	Info
Conditions	ocscp_audit_error_response > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.4001
Description	Alert is raised when Audit module receives a 3xx, 4xx, or 5xx error from NRF. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. Note: Alert is cleared on the next audit cycle.
Recommended Actions	Cause: When the configured NRF sends error responses, down, or not reachable. Diagnostic Information: Check if NRF is up and reachable. To check the NRF status, see Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check if the NF is reachable or not using one of the following steps: Run the ping command from primary/secondary nodes using IP of NRF. Example of a ping command: `ping <IpAddress>`. Run the ping command from inside the pod if FQDN of NRF is used. Instead of using ping, you can collect tcpdump for ensuring connectivity. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. Monitor audit and worker service logs: `kubectl logs <pod name> -n <namespace>` Check jaeger traces for scp-worker. Recovery: The alert is cleared automatically during the next audit cycle and when no more errors are received. Collect audit and worker service logs and contact My Oracle Support for any assistance.

6.2.7 SCPAuditEmptyNFArrayResponse

Table 6-10 SCPAuditEmptyNFArrayResponse

Field	Description
Severity	Critical
Conditions	ocscp_audit_2xx_empty_nf_array_rx_total > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.4002
Description	Alert is generated when Audit module receives a 2xx response with empty NFInstance array from NRF. Alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. Alert is cleared if Audit receives a success response with non-empty NFInstance array or on next audit cycle if topology source is changed to LOCAL.
Recommended Actions	Cause: When NRF does not have any NF registered or due to any error condition on NRF. Diagnostic Information: Check if NRF contains any registered NF and validate as required. For more information, refer to NRF documents. Recovery: This alert is cleared automatically if Audit receives a success response with non-empty NFInstance array or during the next audit cycle when the topology source is changed to LOCAL. Register a NF with NRF or change the topology source to LOCAL. For any assistance, contact My Oracle Support.

6.2.8 DuplicateLocalityFoundInForeignNF

Table 6-11 DuplicateLocalityFoundInForeignNF

Field	Description
Severity	Major
Conditions	ocscp_notification_duplicate_foreign_location > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3002
Description	Alert is raised when an unknown NF or SCP is registered with duplicate locality from the present region.
Recommended Actions	Cause: When SCP discovers a duplicate locality of an NF from an unknown region. Diagnostic Information: Check logs for NF notification received by running the following command: `kubectl -n <namespace> logs <pod name>`. Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_duplicate_foreign_location (nfInstanceId). From the metric, get the NF Instance ID, Locality, and serving_scope. Check the NF Profile of the corresponding NF in the unknown region as identified by the serving_scope. Check and correct the locality in the NF Profile to ensure it aligns with localities of that unknown region that should be different from locality of SCP which reported this alert. Recovery: This alert is cleared automatically if an unknown NF or SCP is deregistered or registers update with the correct locality. Re-register NF with correct locality information. Collect logs for notification and audit service. For any assistance, contact My Oracle Support.

6.2.9 ForeignNFLocalityNotServed

Table 6-12 ForeignNFLocalityNotServed

Field	Description
Severity	Critical
Conditions	ocscp_notification_foreign_nf_locality_unserved > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3003
Description	Alert is raised when a Foreign Producer NF's locality is not served by any SCP.
Recommended Actions	Cause: When SCP discovers an unknown Producer NF's without any locality served by an SCP. Diagnostic Information: Check logs for received NF notification by running the following command:`kubectl get pods -n <namespace>`. Note: Use the complete name of notification pod in the following command:`kubectl logs <pod> -n <namespace>`. Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_foreign_nf_locality_unserved (nfInstanceId). Recovery: This alert is cleared automatically if the unknown NF is deregistered or registers update received with locality served by SCP. Re-register NF with correct locality information. For any assistance, contact My Oracle Support.

6.2.10 UnknownLocalityFoundInForeignNF

Table 6-13 UnknownLocalityFoundInForeignNF

Field	Description
Severity	critical
Conditions	ocscp_notification_foreign_nf_locality_absent > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3004
Description	Alert will be raised when a Foreign Producer NF's locality is unknown.
Recommended Actions	Cause: When SCP discovers an unknown Producer NF's without locality information. Diagnostic Information: Check logs for the received NF notification by running the following command: `kubectl get pods -n <namespace>`. Use the complete name of notification pod in the following command:`kubectl logs <pod> -n <namespace> -f --tail=0` Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_foreign_nf_locality_absent(nfInstanceId). Recovery: This alert is cleared automatically if unknown NF is deregistered or registers update received with locality known to SCP. Re-register NF with correct locality information. For any assistance, contact My Oracle Support.

6.2.11 SCPUpstreamResponseTimeout

Table 6-14 SCPUpstreamResponseTimeout

Field	Description
Severity	info
Conditions	idelta(ocscp_metric_upstream_response_timeout_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7011
Description	Alert is raised when Upstream connection to a producer NF fails
Recommended Actions	Cause: When a producer NF is down, not reachable, or latency is high. Diagnostic Information: Check whether the producer NF is up and network connectivity to the producer NF is established by using one of the following steps: Run the ping command from primary/secondary nodes by using IP of producer NF. Example of a ping command: `ping <IPAddress>`. Run the ping command from inside the pod, if FQDN of producer NF is used. Instead of using the ping command, you can collect tcpdump for ensuring the connectivity. tcpdump must be run on the debug container for scp-worker microservice. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. Check the upstream response time by using the following metric and determine if upstream is taking too long to respond: ocscp_metric_upstream_service_time (producer FQDN) Recovery: This alert is cleared automatically in the next scrape interval if the system does not observe any error. For any assistance, contact My Oracle Support.

6.2.12 SCPSingleNfInstanceAvailableForNFType

Table 6-15 SCPSingleNfInstanceAvailableForNFType

Field	Description
Severity	Major
Conditions	ocscp_no_nf_instance == 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3005
Description	Alert is raised when there is a single NFInstance available with SCP for an NFType.
Recommended Actions	Cause: When the `preventiveAuditOnLastNFInstanceDeletion` attribute is set to true, SCP has single NFInstance available for an NFType. Diagnostic Information: Check all SCP NRFs for specific NFType in the alert if only one NFInstance is available. For information about registered NFs, see Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check the number of NFs of a particular type by using API or CNC console of SCP. For information about procedures to check the NFs available with SCP, see " Configuring Service Communication Proxy using the CNC Console" in Oracle Communications Cloud Native Core, Service Communication Proxy User Guide. Recovery: This alert is cleared automatically in the next scrape interval if more than one NFInstance is available for a specified NFType in the alert. For any assistance, contact My Oracle Support.

6.2.13 SCPNoNfInstanceForNFType

Table 6-16 SCPNoNfInstanceForNFType

Field	Description
Severity	Critical
Conditions	ocscp_no_nf_instance == 1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3006
Description	Alert is raised when there is a no NFInstance available with SCP for a NFType
Recommended Actions	Cause: When the `preventiveAuditOnLastNFInstanceDeletion` flag is set to true, SCP has no NFInstance available for a NFTyp. Diagnostic Information: Check all SCP NRFs for specific NFType in the alert if no NFInstance is available. For information about registered NFs, see Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check the number of NFs of a particular type by using API or CNC console of SCP. For information about procedures to check the NFs available with SCP, see " Configuring Service Communication Proxy using the CNC Console" in Oracle Communications Cloud Native Core, Service Communication Proxy User Guide. Recovery: This alert is cleared automatically in the next scrape interval if at least one NFInstance is available for a specified NFType in the alert. For any assistance, contact My Oracle Support.

6.2.14 SCPIngressTrafficRateExceededConfiguredLimit

Table 6-17 SCPIngressTrafficRateExceededConfiguredLimit

Alert Parameters	Value
Description	Ingress traffic rate exceeds configured rate limit for consumer fqdn: {{$labels.ocscp_consumer_fqdn}}
Summary	'Ingress traffic rate exceeds configured rate limit for consumer fqdn: ocscpconsumerfqdn = {{$labels.ocscp_consumer_host}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumernftype = {{$labels.ocscp_consumer_nf_type}}, configuredingressrate = {{$labels.ocscp_configured_ingress_rate}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ',timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity	Critical
Condition	This alert is raised when the ingress traffic rate exceeds the configured rate for consumer FQDN. increase(ocscp_metric_ingress_rate_limiting_throttle_req_total[2m]) > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7012
Metric Used	ocscp_metric_ingress_rate_limiting_throttle_req_total
Recommended Actions	Cause: When the ingress traffic rate exceeds the configured rate limit for the consumer FQDN. Diagnostic Information: Check the ingress traffic rate from the consumer FQDN. To check the ingress rate, refer to the following metrics: cscp_metric_http_rx_req_total Check the ingress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared when no more requests get suppressed due to ingress rate limiting in the next scrape interval. For any assistance, contact My Oracle Support.

6.2.15 SCPIngressTrafficRoutedWithoutRateLimitTreatment

Table 6-18 SCPIngressTrafficRoutedWithoutRateLimitTreatment

Alert Parameters	Value
Description	Ingress traffic routed without rate limit treatment
Summary	'Ingress traffic routed without rate limit treatment: consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumerfqdn = {{$labels.ocscp_consumer_host}}, cause = {{$labels.ocscp_cause}} ,namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ', timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity	Major
Condition	This alert is raised when the ingress traffic routes without rate limiting treatment. increase(ocscp_metric_ingress_rate_limiting_not_applied_req_total[2m]) > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7013
Metric Used	ocscp_metric_ingress_rate_limiting_not_applied_req_total
Recommended Actions	Cause: When the ingress traffic routes without rate limiting treatment. Diagnostic Information: Check the ingress rate limiting configurations for the untreated FQDNs that can be obtained from the following metric: ocscp_metric_ingress_rate_limiting_not_applied_req_total(ocscp_consumer_fqdn) Check the ingress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared when no more requests get routed without ingress rate limiting treatment in the next scrape interval. For any assistance, contact My Oracle Support.

6.2.16 SCPEgressTrafficRateExceededConfiguredLimit

Table 6-19 SCPEgressTrafficRateExceededConfiguredLimit

Field	Description
Summary	'Egress traffic rate exceeds configured rate limit: producernftype = {{$labels.ocscp_nf_type}}, producernfservicetype = {{$labels.ocscp_nf_service_type}}, producernfinstanceid = {{$labels.ocscp_nf_instance_id}}, producerfqdn = {{$labels.ocscp_producer_fqdn}}, consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumerfqdn = {{$labels.ocscp_consumer_host}}, configuredegressrate = {{$labels.ocscp_configured_egress_rate}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ', timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity	Critical
Conditions	idelta(ocscp_metric_egress_rate_limiting_throttle_req_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7014
Description	Alert is raised when the egress traffic rate exceed the configured rate.
Recommended Actions	Cause: When the egress traffic rate exceeds the configured rate. Diagnostic Information: Check the egress traffic rate by using the following metric: ocscp_metric_http_tx_req_total. Check the egress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared when no more requests get suppressed due to egress rate limiting in the next scrape interval. For any assistance, contact My Oracle Support.

6.2.17 SCPEgressTrafficRoutedWithoutRateLimitTreatment

Table 6-20 SCPEgressTrafficRoutedWithoutRateLimitTreatment

Field	Description
Severity	Major
Conditions	idelta(ocscp_metric_egress_rate_limiting_not_applied_req_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7015
Description	Alert is raised when egress traffic routes without rate limiting.
Recommended Actions	Cause: When the egress traffic routes without rate limiting treatment. Diagnostic Information: Check the egress rate limiting configurations for the untreated producer FQDN. Obtain the producer FQDN by using the following metric: ocscp_metric_egress_rate_limiting_not_applied_req_total(ocscp_producer_fqdn) Check the egress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared when no more requests get routed without egress rate limiting treatment in the next scrape interval. For any assistance, contact My Oracle Support.

6.2.18 SCPNotificatoinRejectTopologySourceLocal

Table 6-21 SCPNotificatoinRejectTopologySourceLocal

Field	Description
Severity	Info
Conditions	increase(ocscp_notifications_rejected_topologysource_local_total[15m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3007
Description	Alert is raised when SCP rejects a notification from NRF due to topology source set to LOCAL for NF Type.
Recommended Actions	Cause: When NF Topology Source Info is set to LOCAL. Diagnostic Information: Check the topology source information of an NF Type. For information about the topology source APIs, see Oracle Communications Cloud Native Core, Service Communication Proxy User Guide. Recovery: This alert is cleared automatically after 15 minutes when NF Topology Source Info is set to NRF from LOCAL. For any assistance, contact My Oracle Support.

6.2.19 SCPNotificationProcessingFailureForNF

Table 6-22 SCPNotificationProcessingFailureForNF

Field	Description
Severity	Major
Conditions	increase(ocscp_failure_processed_nf_notification_total[15m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.3008
Description	Alerts is raised when Notification processing has failed on SCP.
Recommended Actions	Cause: When Notification processing has failed on SCP. Diagnostic Information: Check notification pod logs for any errors by running the following command: `kubectl logs <notification pod name> -n <scp namespace>` . To get the list of pods, run the following command: `kubectl get pod -n <scp namespace>` Sample logs: {"instant":{"epochSecond":1620272241,"nanoOfSecond":609935406},"thread":"runQueueThreadPool1","level":"ERROR","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.Process","message":"{logMsg=Notified profile IP endpoints already present in stored profile, logMsgCode=DUPLICATE_IPENDPOINT_OR_FQDN_FOUND_IN_STORED_PROFILE, rootCause=Notified Profile contains an ipEndPoint {\"ipv4Address\":\"10.75.203.74\",\"transport\":\"TCP\",\"port\":32673} which is already present in stored Profile with nfInstanceIid 93ED74AA-A29C-4450-9D7A-9278CAF6266D for serviceInstanceId audmcl08nv08-udmueauthn-589d6d5bcc-4dslh}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":34,"threadPriority":5,"messageTimestamp":"21-05-06 03:37:21.609+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-notification","engVersion":"23.4.4","mktgVersion":"23.4.4.0.0","vendor":"oracle","namespace":"scpsvc","node":"slave1","pod":"ocscp-scpc-notification-547f699c96-m7nc8","subsystem":"notification","instanceType":"prod","processId":"1"} {"instant":{"epochSecond":1620272241,"nanoOfSecond":734873890},"thread":"runQueueThreadPool1","level":"WARN","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.scheduler.RunQueueConsumer","message":"{logMsg=Profile Processing failed, nfInstanceId=93ED74AA-A29C-4450-9D7A-9278CAF6266D}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":34,"threadPriority":5,"messageTimestamp":"21-05-06 03:37:21.734+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-notification","engVersion":"23.4.4","mktgVersion":"23.4.4.0.0","vendor":"oracle","namespace":"scpsvc","node":"slave1","pod":"ocscp-scpc-notification-547f699c96-m7nc8","subsystem":"notification","instanceType":"prod","processId":"1"} For information about the topology source APIs, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared automatically after 15 minutes. For any assistance, contact My Oracle Support.

6.2.20 SCPSubscriptionFailureForNFType

Table 6-23 SCPSubscriptionFailureForNFType

Field	Description
Severity	Critical
Conditions	increase(ocscp_subscription_nf_failure_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.2001
Description	Alerts is raised when SCP subscription to NRF has failed. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.
Recommended Actions	Cause: When the subscription fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of the following steps: Run the ping command from primary or secondary nodes using IP of NRF. Example of a ping command: `ping <IPAddress>`. Run the ping command from inside the pod if FQDN of NRF is used. Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. If NRF is up, check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Sample logs: {"instant":{"epochSecond":1620275506,"nanoOfSecond":134773910},"thread":"pool-8-thread-1","level":"ERROR","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer","message":"{logMsg=Exception occurred while handling action for subscriber data, action=TO_BE_RENEWED, stackTrace=com.oracle.cgbu.cne.scp.soothsayer.subscription.operations.NrfSubscriptionClient.triggerPatchRequest(NrfSubscriptionClient.java:177)\ncom.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer.handleAction(SubscriptionDataConsumer.java:322)\ncom.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer.consumeSQ(SubscriptionDataConsumer.java:140)\ncom.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer.run(SubscriptionDataConsumer.java:84)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)\njava.base/java.lang.Thread.run(Thread.java:832)}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":33,"threadPriority":5,"messageTimestamp":"21-05-06 04:31:46.134+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-subscription","engVersion":"1.15.0","mktgVersion":"1.15.0.0.0","vendor":"oracle","namespace":"scpsvc","node":"master","pod":"ocscp-scpc-subscription-55cfb57cc6-2qp2g","subsystem":"subscription","instanceType":"prod","processId":"1"} Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support.

6.2.21 SCPReSubscriptionFailureForNFType

Table 6-24 SCPReSubscriptionFailureForNFType

Field	Description
Severity	Critical
Conditions	increase(ocscp_patch_subscription_nf_failure_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.2002
Description	Alerts is raised when SCP re-subscription to NRF has failed. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.
Recommended Actions	Cause: When the re-subscription fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of the following steps: Run the ping command from primary/secondary nodes by using IP of NRF. Example of a ping command: `ping <IPAddress>`. Run the ping command from inside the pod, if FQDN of NRF is used. Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support.

6.2.22 SCPNrfRegistrationFailureForRegionOrSetId

Table 6-25 SCPNrfRegistrationFailureForRegionOrSetId

Field	Description
Severity	Major
Conditions	increase(ocscp_nrf_registration_failure_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.2003
Description	Alerts is raised when SCP registration fails. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.
Recommended Actions	Cause: When the registration fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of the following steps: Run the ping command from primary or secondary nodes by using IP of NRF. Example of a ping command: `ping <IPAddress>`. Run the ping command from inside the pod if FQDN of NRF is used. Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Sample logs: {"instant":{"epochSecond":1620638888,"nanoOfSecond":78728229},"thread":"registration-0","level":"ERROR","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.subscription.processor.NrfRegistrationProcessor","message":"{logMsg=Registration will be retried after configured interval, configuredIntervalInSec=6}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":33,"threadPriority":5,"messageTimestamp":"21-05-10 09:28:08.078+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-subscription","engVersion":"23.4.4","mktgVersion":"23.4.4.0.0","vendor":"oracle","namespace":"scpsvc","node":"slave1","pod":"ocscp-scpc-subscription-66c68b9db6-6g582","subsystem":"subscription","instanceType":"prod","processId":"1"} Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support.

6.2.23 SCPNrfHeartbeatFailureForRegionOrSetId

Table 6-26 SCPNrfHeartbeatFailureForRegionOrSetId

Field	Description
Severity	Major
Conditions	increase(ocscp_nrf_heartbeat_failures_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.2004
Description	Alerts is raised when SCP Heartbeat fails.
Recommended Actions	Cause: When the Heartbeat fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of the following steps: Run the ping command from primaryor secondary nodes by using IP of NRF. Example of a ping command: `ping <IPAddress>`. Run the ping command from inside the pod if FQDN of NRF is used. Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: `tcpdump -w capture.pcap -i <pod interface>`. Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support.

6.2.24 SCPDBOperationFailure

Table 6-27 SCPDBOperationFailure

Field	Description
Severity	Warning
Conditions	increase(ocscp_db_operation_failure_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.2005
Description	Alert is raised for any DB operation failures.
Recommended Actions	Cause: When the SCP DB operation fails. Diagnostic Information: Check whether the DB service is up. Check the status/age of the mysql pod by using the following command: `kubectl get pods -n <namespace>`. Where, <namespace> is the namespace in which mysql pod is deployed. Recovery: This alert is cleared automatically when the DB service is up and running. For any assistance, contact My Oracle Support.

6.2.25 SCPGeneratedErrorsResponseForNFService

Table 6-28 SCPGeneratedErrorsResponseForNFService

Field	Description
Severity	Info
Conditions	increase(ocscp_metric_scp_generated_response_total[2m]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7016
Description	Alert is raised for NF type for which SCP generated response is triggered.
Recommended Actions	Cause: When the error response is generated for NF Service Type by SCP. Diagnostic Information: Monitor scp-worker logs to determine the reason for error responses generated by SCP. Check for error reason in the logs: kubectl logs <pod name> -n <namespace>. Recovery: This alert is cleared automatically when the cause for error response at SCP worker is corrected and configured. For any assistance, contact My Oracle Support.

6.2.26 SCPCircuitBreakingAppliedForNF

Table 6-29 SCPCircuitBreakingAppliedForNF

Field	Description
Severity	Info
Conditions	ocscp_circuit_breaking_applied > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.7017
Description	Alert is raised for NF when circuit breaking is applied.
Recommended Actions	Cause: When Circuit Breaking applies for producer NF FQDN based on the configured http2MaxRequests value. Diagnostic Information: Monitor scp-worker logs for number of error responses when outstanding requests exceed the configured http2MaxRequests value: kubectl logs <pod name> -n <namespace>. Check the latency to upstream producer from SCP. Use the following metric to check the same: ocscp_metric_upstream_service_time_total(ocscp_producer_fqdn or ocscp_nf_end_point). Recovery: This alert is cleared automatically when the configuration for http2MaxRequests for circuit breaking is configured beyond the traffic at worker or lower the traffic than the value configured for circuit breaking. For any assistance, contact My Oracle Support.

6.2.27 SCPUpgradeStarted

Table 6-30 SCPUpgradeStarted

Field	Description
Severity	Info
Conditions	When the SCP upgrade process for an SCP microservice starts.
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.6001
Description	Alert is raised when the SCP upgrade process for an SCP microservice starts.
Recommended Actions	Cause: When SCP upgrade is performed for a particular microservice. Diagnostic Information: Not applicable. Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to `false` in the `ocscp_values.yaml` file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is `true`. For any assistance, contact My Oracle Support.

6.2.28 SCPUpgradeFailed

Table 6-31 SCPUpgradeFailed

Field	Description
Severity	Critical
Conditions	When any SCP microservice upgrade fails during the upgrade process.
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.6002
Description	Alert is raised when any SCP microservice upgrade fails.
Recommended Actions	Cause: When any SCP microservice upgrade fails during the upgrade process. Diagnostic Information: Monitor new hook-jobs that might have failed after multiple attempts. Also, monitor any failed log. Run the following command to check the pod of hook-job: `kubectl get pods -n <namespace>`. Run the following command to check the logs: `kubectl logs <pod name> -n <namespace>`. Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to `false` in the `ocscp_values.yaml` file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is `true`. For any assistance, contact My Oracle Support.

6.2.29 SCPUpgradeSuccessful

Table 6-32 SCPUpgradeSuccessful

Field	Description
Severity	Info
Conditions	When any SCP microservice upgrade is completed.
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.6003
Description	Alert is raised when any SCP microservice upgrade is completed.
Recommended Actions	Cause: When any SCP microservice upgrade is completed. Diagnostic Information: Not applicable. Run the following command to check the pod of hook-job: `kubectl get pods -n <namespace>`. Run the following command to check the logs: `kubectl logs <pod name> -n <namespace>`. Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to `false` in the `ocscp_values.yaml` file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is `true`. For any assistance, contact My Oracle Support.

6.2.30 SCPRollbackStarted

Table 6-33 SCPRollbackStarted

Field	Description
Severity	Info
Conditions	When the rollback process for an SCP microservice starts.
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.6004
Description	Alert is raised when the rollback process for an SCP microservice starts.
Recommended Actions	Cause: When the rollback process for an SCP microservice starts. Diagnostic Information: Not applicable. Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to `false` in the `ocscp_values.yaml` file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is `true`. For any assistance, contact My Oracle Support.

6.2.31 SCPRollbackFailed

Table 6-34 SCPRollbackFailed

Field	Description
Severity	Critical
Conditions	When any SCP microservice rollback fails during the rollback process.
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.6005
Description	Alert is raised when any SCP microservice rollback fails.
Recommended Actions	Cause: When any SCP microservice rollback fails during the rollback process. Diagnostic Information: Monitor new hook-jobs that might have failed after multiple attempts. Also, monitor any failed log. Run the following command to check the pod of hook-job: `kubectl get pods -n <namespace>` Run the following command to check the logs: `kubectl logs <pod name> -n <namespace>` Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to `false` in the `ocscp_values.yaml` file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is `true`. For any assistance, contact My Oracle Support.

6.2.32 SCPRollbackSuccessful

Table 6-35 SCPRollbackSuccessful

Field	Description
Severity	Info
Conditions	When any SCP microservice rollback is completed.
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.6006
Description	Alert is raised when any SCP microservice rollback is completed.
Recommended Actions	Cause: When any SCP microservice rollback is completed. Diagnostic Information: Not applicable. Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to `false` in the `ocscp_values.yaml` file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is `true`. For any assistance, contact My Oracle Support.

6.2.33 ScpWorkerPodCpuUtilizationAboveWarnThreshold

Table 6-36 ScpWorkerPodCpuUtilizationAboveWarnThreshold

Field	Details
Description	CPU utilization of SCP worker at warn level
Summary	CPU utilization of SCP worker at warn level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity	Warning
Condition	This alert is raised when CPU utilization of SCP-Worker reaches the WARN level. ocscp_worker_pod_overload_control_cpu_utilization_warn > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7018
Metric Used	ocscp_worker_pod_overload_control_cpu_utilization_warn
Recommended Action	Cause: When CPU utilization of scp-worker reaches the WARN level. Diagnostic Information: Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Check the CPU threshold level status from the scp-worker pod logs under the warn level. Recovery: Reduce the incoming service request rate. This alert is automatically cleared when CPU utilization is reduced to below WARN threshold level. For any assistance, contact My Oracle Support.

6.2.34 ScpWorkerPodCpuUtilizationAboveMinorThreshold

Table 6-37 ScpWorkerPodCpuUtilizationAboveMinorThreshold

Field	Details
Description	CPU utilization of SCP worker at minor level
Summary	CPU utilization of SCP worker at minor level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity	Minor
Condition	This alert is raised when CPU utilization of scp-worker reaches the MINOR level. ocscp_worker_pod_overload_control_cpu_utilization_minor > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7019
Metric Used	ocscp_worker_pod_overload_control_cpu_utilization_minor
Recommended Action	Cause: When CPU utilization of SCP-Worker reaches the MINOR level. Diagnostic Information: Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Check the CPU threshold level status from the scp-worker pod logs under the warn level. Recovery: Reduce the incoming service request rate. This alert is automatically cleared when CPU utilization is reduced to below MINOR threshold level. For any assistance, contact My Oracle Support.

6.2.35 ScpWorkerPodCpuUtilizationAboveMajorThreshold

Table 6-38 ScpWorkerPodCpuUtilizationAboveMajorThreshold

Field	Details
Description	CPU utilization of SCP worker at major level
Summary	CPU utilization of SCP worker at major level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity	Major
Condition	This alert is raised when CPU utilization of scp-worker reaches the MAJOR level. ocscp_worker_pod_overload_control_cpu_utilization_major > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7020
Metric Used	ocscp_worker_pod_overload_control_cpu_utilization_major
Recommended Action	Cause: When CPU utilization of SCP-Worker reaches the MAJOR level. Diagnostic Information: Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Check the CPU threshold level status from the scp-worker pod logs under the warn level. Recovery: Reduce the incoming service request rate. This alert is automatically cleared when CPU utilization is reduced to below MAJOR threshold level. For any assistance, contact My Oracle Support.

6.2.36 ScpWorkerPodCpuUtilizationAboveCriticalThreshold

Table 6-39 ScpWorkerPodCpuUtilizationAboveCriticalThreshold

Field	Details
Description	CPU utilization of SCP worker at critical level
Summary	CPU utilization of SCP worker at critical level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity	Critical
Condition	This alert is raised when CPU utilization of scp-worker reaches the CRITICAL level. ocscp_worker_pod_overload_control_cpu_utilization_critical > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7021
Metric Used	ocscp_worker_pod_overload_control_cpu_utilization_critical
Recommended Action	Cause: When CPU utilization of scp-worker reaches the CRITICAL level. Diagnostic Information: Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Check the CPU threshold level status from the scp-worker pod logs under the warn level. Recovery: Reduce the incoming service request rate. This alert is automatically cleared when CPU utilization is reduced to below CRITICAL threshold level. For any assistance, contact My Oracle Support.

6.2.37 SCPUnhealthyPeerSCPDetected

Table 6-40 SCPUnhealthyPeerSCPDetected

Field	Details
Description	Next hop SCP is marked unhealthy
Summary	'Next hop SCP is marked unhealthy. peerScpFqdn: {{labels.peerScpName}}, scpFqdn: {{labels.scpFqdn}} , namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity	Info
Condition	This alert is raised when the peer SCP is marked as unhealthy. ocscp_peer_scp_unhealthy > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7022
Metric Used	ocscp_peer_scp_unhealthy
Recommended Action	Cause: The peer SCP is marked as unhealthy because of consecutive failure responses. Diagnostic Information: Check transport failures and routing errors on the peer SCP. Run the ping command from master or slave nodes using IP of Service. Sample command: `ping <IPAddress>`. If FQDN of service is used, run the ping command from inside the pod. If the pod does not support the ping command, get the debug container of SCP pod. If you do not want to use the ping command, collect `tcpdump` to establish the connection. Sample command: `tcpdump -w capture.pcap -i <pod interface>` Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection. For any assistance, contact My Oracle Support.

6.2.38 SCPDnsSrvQueryFailure

Table 6-41 SCPDnsSrvQueryFailure

Field	Details
Description	DNS SRV Query failed with cause {{$labels.cause}}
Summary	'DNS SRV Query failed with cause {{$labels.cause}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the DNS server lookup for the SRV query fails due to network or timeout error. ocscp_alternate_resolution_dnssrv_rx_error_res == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.8001
Metric Used	ocscp_alternate_resolution_dnssrv_rx_error_res
Recommended Action	Cause: When the DNS SRV lookup fails due to network or timeout error. Diagnostic Information: Check the DNS SRV server status and re-establish the status to normal. Recovery: This alert is automatically cleared when SCP performs a successful DNS SRV query. For any assistance, contact My Oracle Support.

6.2.39 SCPProducerOverloadThrottled

Table 6-42 SCPProducerOverloadThrottled

Field	Details
Description	Producer is in Throttled Overload state
Summary	'Producer is in Throttled Overload state. producerFqdn: {{$labels.producerFqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}
Severity	Info
Condition	This alert is raised when the producer NF is in the throttled congestion state. ocscp_producer_load_throttled == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.7023
Metric Used	ocscp_producer_load_throttled
Recommended Action	Cause: When the load of producer NF is higher than the throttled threshold configured for the service. Diagnostic Information: Check and configure the throttled threshold for each service using Routing Options REST APIs configurations as described in "Configuring Routing Options" in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Check the load for each service using the NF Profile REST APIs as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert clears automatically when the NF profile is deregistered or changed with load less than the throttled abatement threshold. For any assistance, contact My Oracle Support.

6.2.40 SCPProducerOverloadAlternateRouted

Table 6-43 SCPProducerOverloadAlternateRouted

Field	Details
Description	Producer is in Alternate Route Overload state
Summary	'Producer is in Alternate Route Overload state. producerFqdn: {{$labels.producerFqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}
Severity	Info
Condition	This alert is raised when the producer NF is in the alternate routing congestion state. ocscp_producer_load_alternateRoute == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.7024
Metric Used	ocscp_producer_load_alternateRoute
Recommended Action	Cause: When the load of producer NF is higher than alternate routing threshold configured for the service. Diagnostic Information: Check and configure the alternate routing threshold for each service using Routing Options REST APIs configurations as described in "Configuring Routing Options" in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Check the load for each service using the NF Profile REST APIs as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert clears automatically when the NF profile is deregistered or changed with load less than the alternate routing abatement threshold. For any assistance, contact My Oracle Support.

6.2.41 SCPSeppNotConfigured

Table 6-44 SCPSeppNotConfigured

Field	Details
Description	SEPP is not configured for PLMN
Summary	'SEPP is not configured for PLMN'Summary: 'SEPP is not configured for PLMN. plmnid: {{$labels.plmn_id}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when Security Edge Protection Proxy (SEPP) is not configured. ocscp_metric_sepp_not_configured_current == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.7025
Metric Used	cscp_metric_sepp_not_configured_current
Recommended Action	Cause: When SEPP routing related rules are not configured at SCP for selected PLMN in the inter-PLMN routing. Diagnostic Information: Check whether SEPP profile is registered or SEPP related configuration is made at SCP. Verify the routing rules created for selected inter-PLMN in the PLMN_SEPP_MAPPING table. Recovery: This alert clears automatically when the SEPP related routing rules are created at SCP for selected PLMN in the inter-PLMN routing. For any assistance, contact My Oracle Support.

6.2.42 SCPSeppRoutingFailed

Table 6-45 SCPSeppRoutingFailed

Field	Details
Description	Routing towards SEPP failed
Summary	Routing towards SEPP failed. sepp_fqdn: {{$labels.ocscp_sepp_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when routing towards SEPP fails. ocscp_metric_sepp_routing_attempt_fail_current == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.7026
Metric Used	ocscp_metric_sepp_routing_attempt_fail_current
Recommended Action	Cause: Inter-PLMN routing failed for the selected SEPP instances. Diagnostic Information: Check whether selected SEPP is up and healthy. Check if selected SEPP is reachable. Use the ping command from primary or secondary nodes using IP of SEPP. Sample ping command is `ping <IpAddress>` If FQDN of SEPP is used, try ping command from inside the pod. Alternative to ping, collect tcpdump for ensuring proper connectivity. Sample command: `tcpdump -w capture.pcap -i <pod interface>` Check scp-worker logs for any error response from selected SEPP. If there are error responses, monitor selected SEPP logs by running this command: `kubectl logs <pod name> -n <namespace>` Recovery: This alert clears automatically when routing is successful for selected SEPP. For any assistance, contact My Oracle Support.

6.2.43 SCPWorkerSSLCertificateExpire

Table 6-46 SCPWorkerSSLCertificateExpire

Field	Details
Description	Whenever an SCP SSL certificate is about to expire, an alert will be raised before the configured interval time.
Summary	'SCP Worker SSL Certificate is about to expire, ocscp_worker_fqdn: {{$labels.ocscp_producer_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}, certificatetype: {{$labels.certificatetype}} {{ . \| first \| value \| humanizeTimestamp }}{{ end }}
Severity	Major
Condition	ocscp_metric_ssl_certificate_expire_total > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.8024
Metric Used	ocscp_metric_ssl_certificate_expire_total
Recommended Action	Cause: The SCP TLS or SSL certificate expires before the configured time. Diagnostic Information: When this alert is raised, it means that the SSL certificates configured for SCP are about to expire. Verify the certificate expiry date. Recovery: Update the SCP SSL Secret with renewed SSL certificate. For any assistance, contact My Oracle Support.

6.2.44 SCPWorkerHTTPSConnectionFailure

Table 6-47 SCPWorkerHTTPSConnectionFailure

Field	Details
Description	SCP Worker HTTPS Connection Establishment is failed
Summary	Scp Worker HTTPS Connection Establishment is failed ocscp_worker_fqdn: {{$labels.ocscp_producer_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }
Severity	Info
Condition	This alert is raised when an egress connection fails. increase(ocscp_metric_https_egress_connection_failure_total{app_kubernetes_io_name="scp-worker"} [10m] ) > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.8025
Metric Used	ocscp_metric_https_egress_connection_failure_total
Recommended Action	Cause: SCP fails to send a message request to a producer NF due to HTTPS connection errors with the producer NF. Diagnostic Information: Check the errors of the ocscp_metric_https_egress_connection_failure_total metric on the Grafana dashboard. Run the following command to monitor scp-worker logs: `kubectl logs <pod name> -n <namespace>` Recovery: Ensure that all Transport Layer Security (TLS) or Secure Sockets Layer (SSL) certificates are configured at SCP and producer NFs are valid. For any assistance, contact My Oracle Support.

6.2.45 SCPGlobalEgressRLRemoteParticipantConnectivityFailure

Table 6-48 SCPGlobalEgressRLRemoteParticipantConnectivityFailure

Field	Details
Description	'SCP Global Egress RL Remote Participant Connectivity Failure for participant
Summary	'SCP Global Egress RL Remote Participant Connectivity Failure for participant: {{$labels.scp_remote_coh_cluster_name}}, scp_fqdn: {{$labels.scp_fqdn}}, scp_local_coh_cluster_name: {{$labels.scp_local_coh_cluster_name}}, scp_remote_coh_cluster_fqdn: {{$labels.scp_remote_coh_cluster_fqdn }}, scp_remote_coh_cluster_port: {{$labels.scp_remote_coh_cluster_port }}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when the remote participant SCP connection is not established or goes down. ocscp_global_egress_rl_remote_participant_connectivity_failure == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9001
Metric Used	ocscp_global_egress_rl_bucketkey_not_rate_controlled_total
Recommended Action	Cause: When the remote participant SCP connection is not established or down. Diagnostic Information: Check whether the FQDN or IP port are properly configured with remote SCP's scp-cache microservice. Check whether clusterName and NFInstanceID are configured for the remote SCP. Check whether communication path is active between the two SCP's scp-cache microservice. Run the following command to monitor scp-cache logs: `kubectl logs <pod name> -n <namespace>` Recovery: This alert clears automatically if the connection is established with the remote participant SCP. For any assistance, contact My Oracle Support.

6.2.46 SCPGlobalEgressRLRemoteParticipantWithDuplicateNFInstanceId

Table 6-49 SCPGlobalEgressRLRemoteParticipantWithDuplicateNFInstanceId

Field	Details
Description	SCP global egress RL remote participant configured with duplicate NF InstanceId for participant.
Summary	'SCP Global Egress RL Remote Participant Configured With Duplicate NFInstanceId for participant: {{$labels.scp_remote_coh_cluster_name}}, scp_fqdn: {{$labels.scp_fqdn}}, scp_nf_instance_id: {{$labels.scp_nf_instance_id}}, scp_local_coh_cluster_name: {{$labels.scp_local_coh_cluster_name}}, scp_remote_coh_cluster_fqdn: {{$labels.scp_remote_coh_cluster_fqdn }}, scp_remote_coh_cluster_port: {{$labels.scp_remote_coh_cluster_port }}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when a duplicate remote coherence participant is found. ocscp_global_egress_rl_remote_participant_is_duplicate == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9002
Metric Used	ocscp_global_egress_rl_remote_participant_is_duplicate
Recommended Action	Cause: Duplicate configuration of remote coherence participants with local SCP. Diagnostic Information: Ensure Global Rate Limit feature is enabled. Check whether the clusterName and NFInstanceID of local SCPs and remote SCPs are not duplicates. Check whether the clusterName and NFInstanceID of the first remote SCP and the second remote SCP are not duplicates. Configure unique values for clusterName and NFInstanceID for local SCPs as well as remote SCPs. Recovery: This alert is cleared automatically if no duplicate configurations between local and remote SCPs are found. For any assistance, contact My Oracle Support.

6.2.47 SCPMediationConnectivityFailure

Table 6-50 SCPMediationConnectivityFailure

Field	Details
Description	'SCP Mediation Connectivity Failed, scp_fqdn
Summary	'SCP Mediation Connectivity Failed, scp_fqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when Mediation connection is not established or request to Mediation is not successful. ocscp_mediation_http_not_reachable == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9002
Metric Used	ocscp_mediation_http_not_reachable
Recommended Action	Cause: The remote Mediation connection is not established or request to Mediation is not successful. Diagnostic Information: Check the errors of the ocscp_mediation_http_not_reachable metric on the Grafana dashboard. Run the following command to check the status of the mediation pod: `kubectl get pods -n <namespace>` Recovery: If the mediation pod is not in the ready state, run the following command to check the scp-mediation logs: `kubectl logs <pod name> -n <namespace>` If the mediation pod is absent, then redeploy or upgrade SCP with `mediationService` set to `true`. This alert clears automatically when the connection is established with Mediation when Mediation is invoked from any of the trigger points. For any assistance, contact My Oracle Support.

6.2.48 SCPNotificationQueuesUtilizationAboveMinorThreshold

Table 6-51 SCPNotificationQueuesUtilizationAboveMinorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Minor Threshold'
Summary	'SCP Notification Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when the queues in the notification service are utilized above 65% of the maximum size (user configure minor threshold value). ocscp_notification_queue_alert{severity="MINOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.3009
Metric Used	ocscp_notification_queue_alert
Recommended Action	Cause: The Notification module is getting more traffic than expected. Diagnostic Information: Monitor Notification traffic to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_notification_queue_utilization Recovery: This alert clears automatically when notification traffic goes below minor threshold or exceeds major threshold. For any assistance, contact My Oracle Support.

6.2.49 SCPNotificationQueuesUtilizationAboveMajorThreshold

Table 6-52 SCPNotificationQueuesUtilizationAboveMajorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Major Threshold'
Summary	'SCP Notification Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when the queues in the notification service is utilized above 75% of the maximum size (user configure major threshold value). ocscp_notification_queue_alert{severity="MAJOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.3010
Metric Used	ocscp_notification_queue_alert
Recommended Action	Cause: The Notification module is getting more traffic than expected. Diagnostic Information: Monitor Notification traffic to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_notification_queue_utilization Recovery: This alert clears automatically when notification traffic goes below major threshold or above critical major threshold. For any assistance, contact My Oracle Support.

6.2.50 SCPNotificationQueuesUtilizationAboveCriticalThreshold

Table 6-53 SCPNotificationQueuesUtilizationAboveCriticalThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Critical Threshold'
Summary	'SCP Notification Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the queues in notification service are utilized above 85% of the maximum size (user configure critical threshold value). ocscp_notification_queue_alert{severity="CRITICAL"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.3011
Metric Used	ocscp_notification_queue_alert
Recommended Action	Cause: The Notification module is getting more traffic than expected. Diagnostic Information: Monitor Notification traffic to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_notification_queue_utilization Recovery: This alert clears automatically when notification traffic goes below critical threshold. For any assistance, contact My Oracle Support.

6.2.51 SCPNrfProxyQueuesUtilizationAboveMinorThreshold

Table 6-54 SCPNrfProxyQueuesUtilizationAboveMinorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Minor Threshold'
Summary	'SCP Nrfproxy Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when the task queues in scp-nrfproxy service are utilized above 65% of the maximum size (user configure minor threshold value). ocscp_nrfproxy_queue_alert{severity="MINOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9010
Metric Used	ocscp_nrfproxy_queue_alert
Recommended Action	Cause: NrfProxy task queues are getting filled and the traffic is more than expected. Diagnostic Information: Monitor traffic towards nrfProxy to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_nrfproxy_queue_utilization Recovery: This alert clears automatically when the traffic goes below minor threshold or above major threshold. For any assistance, contact My Oracle Support.

6.2.52 SCPNrfProxyQueuesUtilizationAboveMajorThreshold

Table 6-55 SCPNrfProxyQueuesUtilizationAboveMajorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Major Threshold'
Summary	'SCP Nrfproxy Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when the task queues in scp-nrfproxy service are utilized above 75% of the maximum size (user configure major threshold value). ocscp_nrfproxy_queue_alert{severity="MAJOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9011
Metric Used	ocscp_nrfproxy_queue_alert
Recommended Action	Cause: NrfProxy task queues are getting filled and the traffic is more than expected. Diagnostic Information: Monitor traffic towards nrfProxy to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_nrfproxy_queue_utilization Recovery: This alert clears automatically when the traffic goes below major threshold or above critical threshold. For any assistance, contact My Oracle Support.

6.2.53 SCPNrfProxyQueuesUtilizationAboveCriticalThreshold

Table 6-56 SCPNrfProxyQueuesUtilizationAboveCriticalThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Critical Threshold'
Summary	'SCP Nrfproxy Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the task queues in scp-nrfproxy service are utilized above 85% of the maximum size (user configure critical threshold value). ocscp_nrfproxy_queue_alert{severity="CRITICAL"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9012
Metric Used	ocscp_nrfproxy_queue_alert
Recommended Action	Cause: NrfProxy task queues are getting filled and the traffic is more than expected. Diagnostic Information: Monitor traffic towards nrfProxy to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_nrfproxy_queue_utilization Recovery: This alert clears automatically when the traffic goes below critical threshold. For any assistance, contact My Oracle Support.

6.2.54 SCPWorkerQueuesUtilizationAboveMinorThreshold

Table 6-57 SCPWorkerQueuesUtilizationAboveMinorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Minor Threshold'
Summary	'SCP Worker Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when task queues in scp-worker service are utilized above 65% of the maximum size (user configure minor threshold value). ocscp_worker_queue_alert{severity="MINOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9007
Metric Used	ocscp_worker_queue_alert
Recommended Action	Cause: Worker task queues are getting filled and the traffic is more than expected. Diagnostic Information: Monitor traffic towards nrfProxy to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_worker_queue_utilization Recovery: This alert clears automatically when the traffic goes below minor threshold or above major threshold. For any assistance, contact My Oracle Support.

6.2.55 SCPWorkerQueuesUtilizationAboveMajorThreshold

Table 6-58 SCPWorkerQueuesUtilizationAboveMajorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Major Threshold'
Summary	'SCP Worker Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when task queues in scp-worker service are utilized above 75% of the maximum size (user configure major threshold value). ocscp_worker_queue_alert{severity="MAJOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9008
Metric Used	ocscp_worker_queue_alert
Recommended Action	Cause: Worker task queues are getting filled and the traffic is more than expected. Diagnostic Information: Monitor traffic towards nrfProxy to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_worker_queue_utilization Recovery: This alert clears automatically when the traffic goes below major threshold or goes above critical threshold. For any assistance, contact My Oracle Support.

6.2.56 SCPWorkerQueuesUtilizationAboveCriticalThreshold

Table 6-59 SCPWorkerQueuesUtilizationAboveCriticalThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Critical Threshold'
Summary	'SCP Worker Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when task queues in scp-worker service are utilized above 85% of the maximum size (user configure critical threshold value). ocscp_worker_queue_alert{severity="CRITICAL"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9009
Metric Used	ocscp_worker_queue_alert
Recommended Action	Cause: Worker task queues are getting filled and the traffic is more than expected. Diagnostic Information: Monitor traffic towards nrfProxy to pod using the KPI dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_worker_queue_utilization Recovery: This alert clears automatically when the traffic goes below critical threshold. For any assistance, contact My Oracle Support.

6.2.57 SCPCacheQueuesUtilizationAboveMinorThreshold

Table 6-60 SCPCacheQueuesUtilizationAboveMinorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Minor Threshold'
Summary	'SCP Cache Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when the task queues in the scp-cache service are utilized above 65% of their maximum size (the user-configured minor threshold value). ocscp_cache_queue_alert{severity="MINOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.13002
Metric Used	ocscp_cache_queue_utilization
Recommended Action	Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information: Monitor traffic towards the cache pod using the KPI Dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_cache_queue_utilization. Recovery: The alert is cleared automatically when minor threshold or goes above major threshold. For any assistance, contact My Oracle Support.

6.2.58 SCPCacheQueuesUtilizationAboveMajorThreshold

Table 6-61 SCPCacheQueuesUtilizationAboveMajorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Major Threshold'
Summary	SCP Cache Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when the task queues in the scp-cache service are utilized above 75% of their maximum size (the user-configured major threshold value). ocscp_cache_queue_alert{severity="MAJOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.13001
Metric Used	ocscp_cache_queue_utilization
Recommended Action	Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information: Monitor traffic towards the cache pod using the KPI Dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_cache_queue_utilization. Recovery: The alert is cleared automatically when traffic falls below a major threshold or goes above a critical threshold. For any assistance, contact My Oracle Support.

6.2.59 SCPCacheQueuesUtilizationAboveCriticalThreshold

Table 6-62 SCPCacheQueuesUtilizationAboveCriticalThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Critical Threshold'
Summary	'SCP Cache Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the task queues in the scp-cache service are utilized above 85% of their maximum size (the user-configured critical threshold value). ocscp_cache_queue_alert{severity="CRITICAL"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.13000
Metric Used	ocscp_cache_queue_utilization
Recommended Action	Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information: Monitor traffic towards the cache pod using the KPI Dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_cache_queue_utilization. Recovery: The alert is cleared automatically when traffic falls below a critical threshold. For any assistance, contact My Oracle Support.

6.2.60 SCPLoadManagerQueuesUtilizationAboveMinorThreshold

Table 6-63 SCPLoadManagerQueuesUtilizationAboveMinorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Minor Threshold'
Summary	'SCP Load Manager Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when the task queues in the scp-load-manager service are utilized above 65% of their maximum size (the user-configured minor threshold value). ocscp_load_manager_queue_alert{severity="MINOR"}
OID	1.3.6.1.4.1.323.5.3.35.1.2.11002
Metric Used	ocscp_load_manager_queue_utilization
Recommended Action	Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information: Monitor traffic towards the cache pod using the KPI Dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_load_manager_queue_utilization. Recovery: The alert is cleared automatically when traffic falls below a minor threshold. For any assistance, contact My Oracle Support.

6.2.61 SCPLoadManagerQueuesUtilizationAboveMajorThreshold

Table 6-64 SCPLoadManagerQueuesUtilizationAboveMajorThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Major Threshold'
Summary	'SCP Load Manager Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when the task queues in the scp-load-manager service are utilized above 75% of their maximum size (the user-configured major threshold value). ocscp_load_manager_queue_alert{severity="MAJOR"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.11001
Metric Used	ocscp_load_manager_queue_utilization
Recommended Action	Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information: Monitor traffic towards the cache pod using the KPI Dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_load_manager_queue_utilization. Recovery: The alert is cleared automatically when traffic falls below a major threshold. For any assistance, contact My Oracle Support.

6.2.62 SCPLoadManagerQueuesUtilizationAboveCriticalThreshold

Table 6-65 SCPLoadManagerQueuesUtilizationAboveCriticalThreshold

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Critical Threshold'
Summary	'SCP Load Manager Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the task queues in the scp-load-manager service are utilized above 85% of their maximum size (the user-configured critical threshold value). ocscp_load_manager_queue_alert{severity="CRITICAL"} == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.11000
Metric Used	ocscp_load_manager_queue_utilization
Recommended Action	Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information: Monitor traffic towards the cache pod using the KPI Dashboard. Refer to rate of the following metric on the Grafana dashboard: ocscp_load_manager_queue_utilization. Recovery: The alert is cleared automatically when traffic falls below a critical threshold. For any assistance, contact My Oracle Support.

6.2.63 SCPProducerNfSetUnhealthy

Table 6-66 SCPProducerNfSetUnhealthy

Field	Details
Description	All producer NFs in NF set are marked unhealthy
Summary	'All producer NFs in NF set are marked unhealthy. nfSet: {{$labels.ocscp_nf_setid}}, scpFqdn: {{$labels.scp_fqdn}} , namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Info
Condition	This alert is raised when all producer NFs in an NF Set are marked unhealthy. ocscp_metric_nf_set_unhealthy > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7027
Metric Used	ocscp_metric_nf_set_unhealthy
Recommended Action	Cause: All the producer NFs are marked unhealthy because of consecutive failure responses. Diagnostic Information: Check transport failures and routing errors on producer NFs. Run the ping command from primary or secondary nodes using IP of Service. Sample command: `ping <IPAddress>`. If FQDN of service is used, run the ping command from inside the pod. If the pod does not support the ping command, get the debug container of SCP pod. If you do not want to use the ping command, collect `tcpdump` to establish the connection. Sample command: `tcpdump -w capture.pcap -i <pod interface>` Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection. For any assistance, contact My Oracle Support.

6.2.64 SCPPeerSeppUnhealthy

Table 6-67 SCPPeerSeppUnhealthy

Field	Details
Description	Peer Sepp is marked unhealthy
Summary	'Peer Sepp is marked unhealthy. seppFqdn: {{$labels.ocscp_sepp_fqdn}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Info
Condition	This alert is raised when peer SEPP is marked unhealthy. ocscp_sepp_unhealthy > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7028
Metric Used	ocscp_sepp_unhealthy
Recommended Action	Cause: The peer SEPP is marked unhealthy because of consecutive failure responses. Diagnostic Information: Check transport failures and routing errors on peer SCP. Run the ping command from primary or secondary nodes using IP of Service. Sample command: `ping <IPAddress>`. If FQDN of service is used, run the ping command from inside the pod. If the pod does not support the ping command, get the debug container of SCP pod. If you do not want to use the ping command, collect `tcpdump` to establish the connection. Sample command: `tcpdump -w capture.pcap -i <pod interface>` Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection. For any assistance, contact My Oracle Support.

6.2.65 SCPMicroServiceUnreachable

Table 6-68 SCPMicroServiceUnreachable

Field	Details
Description	'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}: SCP communication between the micro-services indicated by source and destination has failed'
Summary	Summary: 'SCP communication between the micro-services indicated by source and destination has failed: {{$labels.instance}}, namespace: {{$labels.namespace}}, source:{{$labels.source}}, destination: {{$labels.destination}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the communication between SCP microservices indicated by source and destination has failed. ocscp_metric_svc_unreachable==1
OID	1.3.6.1.4.1.323.5.3.35.1.2.7029
Metric Used	ocscp_metric_svc_unreachable
Recommended Action	Cause: Communication between SCP microservices has failed. Diagnostic Information: Verify whether endpoints of all the services are in Running and Ready state. If not, restart the services. Recovery: This alert clears automatically when the required services are in Running and Ready state. For any assistance, contact My Oracle Support.

6.2.66 SCPTrafficFeedSendFailed

Table 6-69 SCPTrafficFeedSendFailed

Field	Details
Description	'Sending messages to Traffic Feed failed. Cause : {{$labels.ocscp_cause}}'
Summary	'Sending messages to Traffic Feed failed, cause: {{$labels.ocscp_cause}}, scp_fqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when sending messages to traffic feed fails. increase(ocscp_metric_trafficfeed_attempted_total{app_kubernetes_io_name="scp-worker"}[1h]) > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9003
Metric Used	ocscp_metric_trafficfeed_attempted_total
Recommended Action	Cause: Sending of message to traffic feed failed. Diagnostic Information: Check failure reason. Check if the traffic feed OCNADD configuration is correct. Check if the OCNADD server is reachable and available. Recovery: This alert clears automatically after 24 hrs if sending messages to traffic feed stops failing. For any assistance, contact My Oracle Support.

6.2.67 SCPTrafficFeedKafkaClusterUnhealthy

Table 6-70 SCPTrafficFeedKafkaClusterUnhealthy

Field	Details
Description	'Kafka cluster is marked unhealthy, Cause : {{$labels.ocscp_cause}}'
Summary	'Kafka cluster is marked unhealthy, cause: {{$labels.ocscp_cause}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when the Kafka cluster is unhealthy. ocscp_metric_trafficfeed_cluster_unhealthy == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9026
Metric Used	ocscp_metric_trafficfeed_cluster_unhealthy
Recommended Action	Cause: The Kafka cluster is unhealthy. Diagnostic Information: Check and diagnose OCNADD Kafka cluster. Recovery: This alert clears when the Kafka cluster recovers from the failure condition. For any assistance, contact My Oracle Support.

6.2.68 SCPTrafficFeedPartitionUnhealthy

Table 6-71 SCPTrafficFeedPartitionUnhealthy

Field	Details
Description	'Kafka partition {{$labels.kafka_partition_id}} is marked unhealthy, Cause : {{$labels.ocscp_cause}}'
Summary	'Kafka cluster is marked unhealthy, cause: {{$labels.ocscp_cause}}, partition_id: {{$labels.kafka_partition_id}}, topic: {{$labels.topic}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when the Kafka partition is unhealthy. ocscp_metric_trafficfeed_partition_unhealthy == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9025
Metric Used	ocscp_metric_trafficfeed_partition_unhealthy
Recommended Action	Cause: The Kafka partition is unhealthy. Diagnostic Information: Check and diagnose OCNADD Kafka cluster. Recovery: This alert clears when the Kafka partition recovers from the failure condition. For any assistance, contact My Oracle Support.

6.2.69 SCPServiceMeshFailure

Table 6-72 SCPServiceMeshFailure

Field	Details
Description	'SCP servicemesh failure encountered'
Summary	'SCP servicemesh failure encountered for nfservicetype: {{$labels.ocscp_nf_service_type}}, nftype: {{$labels.ocscp_nf_type}}, nfinstanceid: {{$labels.ocscp_nf_instance_id}}, serviceinstanceid: {{$labels.ocscp_service_instance_id}}, producerfqdn: {{$labels.ocscp_producer_fqdn}}, responsecode: {{$labels.ocscp_response_code}} serverheader:{{$labels.ocscp_server_header}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'SeverityInfo
Severity	Info
Condition	This alert is raised when service mesh failure occurs. increase(ocscp_metric_sidecarproxy_failures_total[2m]) > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.7030
Metric Used	ocscp_metric_sidecarproxy_failures_total
Recommended Action	Cause: Service mesh failure observed at SCP. Diagnostic Information: Service mesh is unable to reach peer NF due to connection failures or host is not known to service mesh or some other errors at service mesh. Verify the service mesh logs to detect error details. Check sidecar status of peer NF. Recovery: This alert clears automatically after 2 minutes if there is no service mesh failure observed by SCP with the same dimensions. For any assistance, contact My Oracle Support.

6.2.70 SCPHealthCheckFailedForPeerSCP

Table 6-73 SCPHealthCheckFailedForPeerSCP

Field	Details
Description	'SCP HealthCheck failed for peer SCP'
Summary	'SCP HealthCheck failed for peer SCP. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Info
Condition	This alert is raised when peer SCP or inter-SCP becomes unhealthy due to health check status and outlier detection. ocscp_interscp_health_check_status_failed == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9023
Metric Used	ocscp_interscp_health_check_status_failed
Recommended Action	Cause: When peer SCP is unhealthy to recieve any SBI message requests due to health check and outlier detection. Diagnostic Information: Monitor if the overall average load of peer SCP is greater than the configured threshold value. Check outlier detection. Recovery: This alert clears automatically if SCP-C decides SCP-P is healthy or available based on the current and previous status of outlier detection and health check. For any assistance, contact My Oracle Support.

6.2.71 SCPHealthCheckFailed

Table 6-74 SCPHealthCheckFailed

Field	Details
Description	'SCP HealthCheck failed'
Summary	'SCP HealthCheck failed. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Info
Condition	This alert is raised when SCP is unhealthy because the overall average load of SCP is greater than the configured threshold. ocscp_health_check_status_failed == 1
OID	1.3.6.1.4.1.323.5.3.35.1.2.9024
Metric Used	ocscp_health_check_status_failed
Recommended Action	Cause: When SCP is unhealthy to receive any SBI message requests due to the overall average load. Diagnostic Information: Monitor if the overall average load of SCP is greater than the configured threshold value. Recovery: This alert clears automatically when the overall average load of SCP is less than the configured threshold value. For any assistance, contact My Oracle Support.

6.2.72 ScpWorkerPodPendingTransUtilizationAboveMinorThreshold

Table 6-75 ScpWorkerPodPendingTransUtilizationAboveMinorThreshold

Field	Details
Description	Worker Pending Transaction lead to minor level
Summary	'Worker Pending Transaction lead to minor level.namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when pending transaction utilization of SCP-Worker reaches MINOR level. ocscp_worker_pod_overload_control_pendingTrans_utilization_minor > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9014
Metric Used	ocscp_worker_pod_overload_control_pendingTrans_utilization_minor
Recommended Action	Cause: When pending transactions utilization of SCP-Worker reaches MINOR level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of minor pending transaction utilization can be checked from database. Check MINOR level logs of SCP-Worker for pending transaction status. Recovery: This alert clears automatically when pending transaction utilization is below MINOR threshold level. For any assistance, contact My Oracle Support.

6.2.73 ScpWorkerPodPendingTransUtilizationAboveMajorThreshold

Table 6-76 ScpWorkerPodPendingTransUtilizationAboveMajorThreshold

Field	Details
Description	Worker Pending Transaction lead to major level
Summary	Worker Pending Transaction lead to major level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when pending transaction utilization of SCP-Worker reaches MAJOR level. ocscp_worker_pod_overload_control_pendingTrans_utilization_major > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9015
Metric Used	ocscp_worker_pod_overload_control_pendingTrans_utilization_major
Recommended Action	Cause: When pending transactions utilization of SCP-Worker reaches MAJOR level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of major pending transaction utilization can be checked from database. Check MAJOR level logs of SCP-Worker for pending transaction status. Recovery: This alert clears automatically when pending transaction utilization is below MAJOR threshold level. For any assistance, contact My Oracle Support.

6.2.74 ScpWorkerPodPendingTransUtilizationAboveCriticalThreshold

Table 6-77 ScpWorkerPodPendingTransUtilizationAboveCriticalThreshold

Field	Details
Description	Worker Pending Transaction lead to critical level
Summary	'Worker Pending Transaction lead to critical level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when pending transaction utilization of SCP-Worker reaches CRITICAL level. ocscp_worker_pod_overload_control_pendingTrans_utilization_critical > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9016
Metric Used	ocscp_worker_pod_overload_control_pendingTrans_utilization_critical
Recommended Action	Cause: When pending transactions utilization of SCP-Worker reaches CRITICAL level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of critical pending transaction utilization can be checked from database. Check CRITICAL level logs of SCP-Worker for pending transaction status. Recovery: This alert clears automatically when pending transaction utilization is below CRITICAL threshold level. For any assistance, contact My Oracle Support.

6.2.75 ScpWorkerPodPendingTransUtilizationAboveWarnThreshold

Table 6-78 ScpWorkerPodPendingTransUtilizationAboveWarnThreshold

Field	Details
Description	Worker Pending Transaction lead to Warn level
Summary	'Worker Pending Transaction lead to warn level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Warn
Condition	This alert is raised when pending transaction utilization of SCP-Worker reaches WARN level. ocscp_worker_pod_overload_control_pendingTrans_utilization_warn > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9017
Metric Used	ocscp_worker_pod_overload_control_pendingTrans_utilization_warn
Recommended Action	Cause: When pending transactions utilization of SCP-Worker reaches WARN level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of warn pending transaction utilization can be checked from database. Check WARN level logs of SCP-Worker for pending transaction status. Recovery: This alert clears automatically when pending transaction utilization is below WARN threshold level. For any assistance, contact My Oracle Support.

6.2.76 ScpWorkerPodResourceUtilizationAboveMinorThreshold

Table 6-79 ScpWorkerPodResourceUtilizationAboveMinorThreshold

Field	Details
Description	Worker overload control lead to minor level
Summary	'Worker overload control lead to minor level.namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Minor
Condition	This alert is raised when overload control resource utilization of SCP-Worker reaches MINOR level. ocscp_worker_pod_overload_control_resource_utilization_minor > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9018
Metric Used	ocscp_worker_pod_overload_control_resource_utilization_minor
Recommended Action	Cause: When overload control resource utilization of SCP-Worker reaches MINOR level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of minor overload control resource utilization can be checked from database. Check MINOR level logs of SCP-Worker for overload control resource utilization status. Recovery: This alert clears automatically when overload control resource utilization is below MINOR threshold level. For any assistance, contact My Oracle Support.

6.2.77 ScpWorkerPodResourceUtilizationAboveMajorThreshold

Table 6-80 ScpWorkerPodResourceUtilizationAboveMajorThreshold

Field	Details
Description	Worker overload control lead to major level
Summary	'Worker overload control lead to major level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Major
Condition	This alert is raised when overload control resource utilization of SCP-Worker reaches MAJOR level. ocscp_worker_pod_overload_control_resource_utilization_major > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9019
Metric Used	ocscp_worker_pod_overload_control_resource_utilization_major
Recommended Action	Cause: When overload control resource utilization of SCP-Worker reaches MAJOR level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of major overload control resource utilization can be checked from database. Check MAJOR level logs of SCP-Worker for overload control resource utilization status. Recovery: This alert clears automatically when overload control resource utilization is below MAJOR threshold level. For any assistance, contact My Oracle Support.

6.2.78 ScpWorkerPodResourceUtilizationAboveWarnThreshold

Table 6-81 ScpWorkerPodResourceUtilizationAboveWarnThreshold

Field	Details
Description	'Worker overload control lead to Warn level'
Summary	'Worker overload control lead to warn level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Warning
Condition	This alert is raised when overload control resource utilization of SCP-Worker reaches WARN level. ocscp_worker_pod_overload_control_resource_utilization_warn > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9021
Metric Used	ocscp_worker_pod_overload_control_resource_utilization_warn
Recommended Action	Cause: When overload control resource utilization of SCP-Worker reaches WARN level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of warn overload control resource utilization can be checked from database. Check WARN level logs of SCP-Worker for overload control resource utilization status. Recovery: This alert clears automatically when overload control resource utilization is below WARN threshold level. For any assistance, contact My Oracle Support.

6.2.79 ScpWorkerPodResourceUtilizationAboveCriticalThreshold

Table 6-82 ScpWorkerPodResourceUtilizationAboveCriticalThreshold

Field	Details
Description	Worker overload control lead to critical level
Summary	'Worker overload control lead to critical level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . \| first \| value \| humanizeTimestamp }}{{ end }}'
Severity	Critical
Condition	This alert is raised when overload control resource utilization of SCP-Worker reaches CRITICAL level. ocscp_worker_pod_overload_control_resource_utilization_critical > 0
OID	1.3.6.1.4.1.323.5.3.35.1.2.9020
Metric Used	ocscp_worker_pod_overload_control_resource_utilization_critical
Recommended Action	Cause: When overload control resource utilization of SCP-Worker reaches CRITICAL level. Diagnostic Information: Monitor pending transactions usage while processing traffic. The threshold value of critical overload control resource utilization can be checked from database. Check CRITICAL level logs of SCP-Worker for overload control resource utilization status. Recovery: This alert clears automatically when overload control resource utilization is below CRITICAL threshold level. For any assistance, contact My Oracle Support.

6.2.80 SCPDNSSRVNRFMigrationTaskFailure

Table 6-83 SCPDNSSRVNRFMigrationTaskFailure

Field	Description
Severity	critical
Condition	ocscp_configuration_dnssrv_nrf_migration_task_failure == 1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15001
Description	An alert is raised to notify that migration from static to DNS has failed.
Recommended Actions	Cause: If migration to DNS task was waiting for DNS SRV data and wait time elapsed. If any migration task fails due to no acknowledgement from other microservices. Wait time elapsed in task completion response for any migration task. Diagnostic Information: Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state. Recovery: DNS SRV data, wait time elapsed: Once data is received from DNS, the alert will be cleared. No acknowledgement: Keep retrying for success acknowledgement and clear or remove the alert on receiving success acknowledgement. Wait time elapsed for completion response: Resend the event and wait. Repeat this until a completion response is received, and on receiving a completion response, clear the alert. In all the raised conditions, the alert will also be cleared on receiving the new migration task. For any assistance, contact My Oracle Support.

6.2.81 SCPDNSSRVNRFNonMigrationTaskFailure

Table 6-84 SCPDNSSRVNRFNonMigrationTaskFailure

Field	Description
Severity	critical
Condition	ocscp_configuration_dnssrv_nrf_non_migration_task_failure == 1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15003
Description	An alert is raised to notify that the non-migrated task has failed.
Recommended Actions	Cause: If a non-migrated task fails due to no acknowledgement from other microservices. Wait time elapsed in task completion response. Diagnostic Information: Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state. Recovery: No acknowledgement: Keep retrying for success acknowledgement and clear or remove the alert on receiving success acknowledgement. Wait time elapsed: When a new task is submitted for the same SPN, this alert is immediately cleared; if subsequent task processing fails, again, the alert will be raised. For any assistance, contact My Oracle Support.

6.2.82 SCPDNSSRVNRFDuplicateTargetDetected

Table 6-85 SCPDNSSRVNRFDuplicateTargetDetected

Field	Description
Severity	critical
Condition	ocscp_configuration_dnssrv_nrf_duplicate_target_detected == 1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15002
Description	An alert is raised to notify that a duplicate target NRF has been detected in the DNS SRV records.
Recommended Actions	Cause: This alert is raised when a duplicate target FQDN is received from the DNS SRV for different NRF SRV FQDN(s). In this case, the first NRF SRV FQDN data received in the scpc-configuration service from the scpc-alternate-resolution service shall be processed, but the subsequent NRF SRV FQDN data will be ignored, and this alert shall be raised. Diagnostic Information: Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state. Recovery: If for the same NRF SRV FQDN receives non-overlapping target FQDNs, then the alert will be cleared. For any assistance, contact My Oracle Support.

6.2.83 SCPHighResponseTimeFromProducer

Table 6-86 SCPHighResponseTimeFromProducer

Field	Description
Severity	Info
Condition	(sum(rate(ocscp_metric_upstream_service_time_total{ocscp_upstream_service_time="15000ms"}[2m])) by (kubernetes_namespace) + sum(rate(ocscp_metric_upstream_service_time_total{ocscp_upstream_service_time=">15000ms"}[2m])) by (kubernetes_namespace)) > 200
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15004
Description	It notifies when the traffic exceeds 200 messages per second and the response delay from the producer takes more than 10 seconds.
Recommended Actions	Cause: More than 200 messages per second have an upstream response time above 10 seconds. Diagnostic Information: Monitor metric metricocscp_metric_upstream_service_time_total with ocscp_upstream_service_time="15000ms" and ocscp_upstream_service_time=">15000ms". Recovery: An alert is cleared automatically when the number of responses with a response delay of more than 10 seconds falls below 200 messages per second. If Alert is not getting cleared, then check for any producer NFs or specific service request types that are taking more than 10 seconds to respond and take corrective actions if needed. Note that immediate action may not be needed, as this alter is informational. However, having too many requests with a long response delay may cause performance degradation at SCP. For any assistance, contact My Oracle Support.

6.2.84 SCPCGroupVersionDetectionFailed

Table 6-87 SCPCGroupVersionDetectionFailed

Field	Description
Severity	critical
Condition	ocscp_worker_cgroup_version_detection_failed == 1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15005
Description	Notify that cgroup version detection has failed.
Recommended Actions	Cause: SCP is unable to detect the cgroup version from the underlying kernel with the command "stat -fc %T /sys/fs/cgroup/." The possible expected value is either tmpfs or cgroup2fs. Diagnostic Information: Option 1: Check worker error level logs for failure of cgroup version detection. Option 2: Login to the worker pod and run the command "stat -fc %T /sys/fs/cgroup." Verify whether it outputs either tmpfs or cgroup2fs. Recovery: Make sure the cgroup has either tmpfs or cgroup2fs. Do same version upgrade on SCP.. For any assistance, contact My Oracle Support.

6.2.85 SCPCPUUsageFileReadFailed

Table 6-88 SCPCPUUsageFileReadFailed

Field	Description
Severity	critical
Condition	ocscp_worker_cpu_usage_file_read_failed == 1
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15006
Description	Notify that the CPU usage file read operation failed within the detected cgroup version.
Recommended Actions	Cause: SCP encountered a failure in performing a read operation for the CPU usage file within the detected cgroup version. The file path is determined based on the detected cgroup version. Diagnostic Information: Option 1: Check worker warn level logs for failure in performing a read operation for the CPU usage file. Option 2: Log in to the worker pod and run the command "stat -fc %T /sys/fs/cgroup" to confirm whether it outputs either 'tmpfs' or 'cgroup2fs'. Depending on the detected cgroup version, inspect the file located at /sys/fs/cgroup/cpu/cpuacct.usage for 'tmpfs' or /sys/fs/cgroup/cpu.stat for 'cgroup2fs'. Ensure that the file exists at the specified path and possesses the required permissions for read operations. Recovery: Verify that the files are appropriate according to the cgroup version and possess the necessary permissions for read operations. Perform the same version upgrade on the SCP. For any assistance, contact My Oracle Support.

6.2.86 SCPIgnoreUnknownService

Table 6-89 SCPIgnoreUnknownService

Field	Description
Severity	Info
Condition	increase(ocscp_ignore_unknown_service_total[24h]) > 0
OID used for SNMP Traps	1.3.6.1.4.1.323.5.3.35.1.2.15000
Description	An alert is raised to notify that SCP ignored an unknown service in the NF profile.
Recommended Actions	Cause: SCP has received the NF profile with an unknown service and processed the profile by ignoring this unknown service. Diagnostic Information: Check the received NF profile for the unknown services. Recovery: If the unknown services are not present in the NF profile in the next scrapping interval, then the alert will be cleared. For any assistance, contact My Oracle Support.

6.3 Configuring Alerts

6.3.1 Applying Alerts Rule to CNE without Prometheus Operator

SCP Helm Chart Release Name: _NAME_

Prometheus NameSpace: _Namespace _

Perform the following procedure to configure Service Communication Proxy alerts in Prometheus.

Run the following command to check the name of the config map used by Prometheus:

$kubectl get configmap -n <_Namespace_>

Example:

$kubectl get configmap -n prometheus-alert2
NAME                                  DATA   AGE
lisa-prometheus-alert2-alertmanager   1      146d
lisa-prometheus-alert2-server         4      146d

Take a backup of the current config map of Prometheus. This command saves the configmap in the provided file. In the following command, the configmap is stored in the /tmp/tempConfig.yaml file:
```
$ kubectl get configmaps <_NAME_>-server -o yaml -n <_Namespace_> /tmp/tempConfig.yaml
```
Example:
```
$ kubectl get configmaps lisa-prometheus-alert2-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
```
Check and delete the "alertsscp" rule if it has already configured in the prometheus config map. If configured, this step removes the " alertsscp " rule. This is an optional step if configuring the alerts for the first time.
```
$ sed -i '/etc\/config\/alertsscp/d' /tmp/tempConfig.yaml
```
Add the "alertsscp" rule in the configmap dump file under the ' rule_files ' tag.
```
$ sed -i '/rule_files:/a\    \- /etc/config/alertsscp'  /tmp/tempConfig.yaml
```
Update the configmap using below command. Ensure to use the same configmap name that was used to take a backup of the prometheus configmap.
```
$ kubectl replace configmap <_NAME_>-server -f /tmp/tempConfig.yaml
```
Example:
```
$ kubectl replace configmap lisa-prometheus-alert2-server -f /tmp/tempConfig.yaml
```
Run the following command to patch the configmap with a new "alertsscp" rule:

Note:
The patch file provided is the ocscp_csar_23_2_0_0_0.zip folder provided with SCP, that is, SCPAlertrules.yaml.
```
$ kubectl patch  configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/SCPAlertrules.yaml)"
```
Example:
```
$ kubectl replace configmap lisa-prometheus-alert2-server -f /tmp/tempConfig.yaml
```

Note:

Prometheus takes about 20 seconds to apply the updated Config map.

6.3.2 Applying Alerts Rule to CNE with Prometheus Operator

Perform the following procedure to apply alerts rule to Cloud Native Environment (CNE) with Prometheus Operator (CNE 1.9.0 and later).

Run the following command to apply SCP alerts file to create Prometheus rules Custom Resource Definition (CRD):
```
kubectl apply -f <file_name> -n <scp namespace>
```
Where,
- <file_name> is the SCP alerts file.
- <scp namespace> is the SCP namespace.
Example:
```
kubectl apply -f ocscp_alerting_rules_promha_23.4.4.yaml -n scpsvc
```
Sample file delivered with SCP package:
```
ocscp_alerting_rules_promha_23.4.4.yaml
```

6.3.3 Configuring Service Communication Proxy Alert using the SCPAlertrules.yaml file

Note:

Default NameSpace is scpsvc for Service Communication Proxy. You can update the NameSpace as per the deployment.

To access the scpAlertsrules_<scp release number>.yaml file from the Scripts folder of ocscp_csar_23_2_0_0_0.zip, download the SCP package from My Oracle Support as described in "Downloading the SCP Package " in Oracle Communications Cloud Native Core, Service Communication Proxy Installation, Upgrade, and Fault Recovery Guide.

Alerts Details

Description and summary for alerts are added by the Prometheus alert manager.

Alerts are supported for three different resources/routing crosses threshold.

SCPIngress Traffic Rate Above Threshold
- Has three threshold level Minor (above 1400 mps to 2000mps), Major (1600 to 1800 mps), Critical (above 1800 mps). These values are configurable.
- In the description, information is presented similar to: "Ingress Traffic Rate at Locality: <Locality of scp> is above <threshold level (minor/major/critical> threshold (i.e. <value of threshold>)"
- In Summary: "Namespace: <Namespace of scp deployment that Locality>, Pod: <SCP-worker Pod name>: Current Ingress Traffic Rate is <Current rate of Ingress traffic > mps which is above 70 Percent of Max MPS(<upper limit of ingress traffic rate per pod>)"
  
  Note:
  Ingress traffic rate is per scp-worker pod in a namespace at particular SCP-Locality. Currently, 2000mps is the upper limit for per scp-worker pod.
SCP Routing Failed For Service
- It alerts for which NF Service Type and NF Type at particular locality, Routing failed
- Description: "Routing failed for service"
- Summary: "Routing failed for service: NFService Type = <Message NF Service Type>, NFType = <Message NF Type>, Locality = <SCP Locality where Routing Failed> and value = <Accumulated failure till now, of such message for NFType and NFService Type>"
  
  Note:
  The value field currently does not provide the number of failures in particular time interval, instead it provides the total number of Routing failures.
SCP Pod Memory Usage: Type of alert is SCPWorkerPodMemoryUsage.
- Pod memory usage for SCP Pods (Soothsayer and Worker) deployed at a particular node instance is provided.
- The Soothsayer pod threshold is 8 GB
- The Worker pod threshold is 4 GB
- Summary: Instance: "<Node Instance name>, NameSpace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker) Pod name>: <Soothsayer/Worker> Pod High Memory usage detected"
- Summary: "Instance: "<Node Instance name>, Namespace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker) Pod name>: Memory usage is above <threshold value>G (current value is: <current value of memory usage>)"

6.3.4 Configuring Alert Manager for SNMP Notifier

Grouping of alerts is based on:

podname
alertname
severity
namespace
nfServiceType
nfServiceInstanceId

User needs to add subroutes for SCP alerts in AlertManager config map as below:

Take a backup of the current config map of Alertmanager by running the following command:

kubectl get configmaps <NAME-alertmanager> -oyaml -n <Namespace> > /tmp/bkupAlertManagerConfig.yaml

Example:

kubectl get configmaps occne-prometheus-alertmanager -oyaml -n occne-infra > /tmp/bkupAlertManagerConfig.yaml

Edit Configmap to add subroute for SCP Trap OID:

kubectl edit configmaps <NAME-alertmanager> -n <Namespace>

Example:

kubectl edit configmaps occne-prometheus-alertmanager -n occne-infra

Add the subroute under 'route' in configmap:

routes:
      - receiver: default-receiver
        group_interval: 1m
        group_wait: 10s
        repeat_interval: 9y
        group_by: [podname, alertname, severity, namespace, nfservicetype, nfserviceinstanceid, servingscope, nftype]
        match_re:
          oid: ^1.3.6.1.4.1.323.5.3.35.(.*)

MIB Files for SCP

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

ocscp_mib_tc_23.4.4.mib: This is considered as SCP top level mib file, where the Objects and their data types are defined.
ocscp_mib_23.4.4.mib: This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged with ocscp_csar_23_2_0_0_0.zip. You can download the file from MOS as described in Oracle Communications Cloud Native Core, Service Communication Proxy Installation, Upgrade, and Fault Recovery Guide.