6 Alerts

This section provides information about the supported alerts and how to configure the alerts.

Note:

The performance and capacity of the SCP system may vary based on the call model, feature or interface configuration, network conditions, and underlying CNE and hardware environment.

You can configure alerts in Prometheus and ScpAlertrules.yaml file.

The following table describes the various severity types of alerts generated by SCP:

Table 6-1 Alerts Levels or Severity Types

Alerts Levels / Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of SCP.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of SCP.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of SCP.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of SCP.

Caution:

User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. The PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content when the hyphens or any special characters are part of the copied content.

Note:

  • kubectl commands might vary based on the platform deployment. Replace kubectl with Kubernetes environment-specific command line tool to configure Kubernetes resources through kube-api server. The instructions provided in this document are as per the Oracle Communications Cloud Native Environment (OCCNE) version of kube-api server.
  • The alert file can be customized as required by the deployment environment. For example, namespace can be added as a filtered criteria to the alert expression to filter alerts only for a specific namespace.

6.1 System level alerts

This section lists the system level alerts.

6.1.1 SCPNotificationPodMemoryUsage

Table 6-2 SCPNotificationPodMemoryUsage

Field Description
Description

Notify Notification service Pod memory usage if it is above threshold

Threshold value is 85% of allocated (4GB) memory: 3.4 GB

Summary Memory usage is above 85% for podname: {{$labels.kubernetes_pod_name}}, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}} with current value {{ $value }},scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Conditions sum(container_memory_usage_bytes{image!="",pod=~".*scpc-notification.+"}) by (kubernetes_pod_name,kubernetes_namespace, instance) > 3650722201
OID 1.3.6.1.4.1.323.5.3.35.1.2.3001
Metric Used ocscp_nrf_notifications_requests_nf_total
Recommended Actions

Cause: When high notification rate or very large NF profile size is present in notifications.

Diagnostic Information: Monitor the notification metric: ocscp_nrf_notifications_requests_nf_total.

Notification usage reduces after some time when it crosses 2.5 GB or 3 GB.

Recovery: This alert is cleared automatically when the scpc-notification pod memory usage reduces below the defined threshold.

Reduce the notification rate. These notifications are generated by NRF and can be controlled through NRF.

For any assistance, contact My Oracle Support.

6.1.2 SCPWorkerPodMemoryUsage

Table 6-3 SCPWorkerPodMemoryUsage

Field Description
Description Notify Worker per Pod memory usage is above threshold Threshold value is 85% of allocated (16GB) memory: 13.6 GB
Summary Memory usage is above 85% for podname: {{$labels.kubernetes_pod_name}}, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}} with current value {{ $value }},scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity major
Conditions sum(container_memory_usage_bytes{image!="",pod=~".*scp-worker.+"}) by (kubernetes_pod_name,kubernetes_namespace, instance) > 14602888806
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7004
Metric Used
  • ocscp_metric_http_rx_req_total
  • ocscp_metric_http_tx_req_total
  • ocscp_metric_http_rx_res_total
  • ocscp_metric_http_tx_res_total
Recommended Actions

Cause: When there is high traffic rate, alternate routing, more number of routing rules and rules size, and due to network or producer NF latency.

Diagnostic Information: Monitor traffic rate, alerts, and latency on the KPI Dashboard.

Check the traffic rates of the following metrics if they are too high:
  • ocscp_metric_http_rx_req_total
  • ocscp_metric_http_tx_req_total
  • ocscp_metric_http_rx_res_total
  • ocscp_metric_http_tx_res_total

Check the upstream response time by using the following command and ensure whether upstream is taking too long to respond: ocscp_metric_upstream_service_time_total.

Check the following platform metric for current memory usage by the scp-worker pod: container_memory_usage_bytes.

Recovery: This alert is cleared automatically when the scp-worker pod memory usage reduces below the defined threshold. Reduce the traffic rate and improve the latency.

For any assistance, contact My Oracle Support.

6.1.3 SCPInstanceDown

Table 6-4 SCPInstanceDown

Field Description
Description Notify that if any pod in ocscp release is down. Provides information like pod name, instance id and app name.
Summary Pod with podname: {{$labels.kubernetes_pod_name}} is Down , timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Critical
Conditions kube_pod_status_ready{pod =~ '.*scp-worker.*|.*scpc-notification.*|.*scpc-subscription.*|.*scpc-configuration.*|.*scpc-audit.*|.*scp-nrfproxy.*|.*scp-load-manager.*|.*scp-cache.*|.*scpc-alternate-resolution.*|.*scp-mediation.*',condition =~ 'true'} !=1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7006
Recommended Actions
Cause: When the following issues occur:
  • The Control plane pods, such as configuration, subscription, notification, audit, and alternate-resolution are down due to connection failure with DB.
  • Pod restarts due to kubernetes liveliness or readiness probe failures.

  • Application restarts or starts failure.
Diagnostic Information:
  • Check if DB services are active by running the following command:
    kubectl describe pod <podname> -n <namespace>
  • Check kubernetes events for probe failures in the platform logs.
  • Check if any exception is reported in the SCP application logs.

Recovery: This alert is cleared automatically when the inactive pod is active. Recover DB services if down. Collect the application logs and contact My Oracle Support for any assistance.

6.2 Application level alerts

This section lists the application level alerts.

6.2.1 SCPCcaFeatureEnabledWithoutHttps

Table 6-5 SCPCcaFeatureEnabledWithoutHttps

Field Description
Severity Info
Condition ocscp_worker_cca_validation_feature_enabled_without_https > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.9022
Description An alert is raised when the CCA validation feature is enabled without enabling HTTPS.
Recommended Actions

Cause: CCA validation feature is enabled without enabling HTTPS.

Diagnostic Information:

Deploy HTTPS SCP deployment.

Recovery: The alert is cleared automatically if either the CCA feature is disabled or deployment is changed to HTTPS.

For any assistance, contact My Oracle Support.

6.2.2 SCPIngressTrafficRateAboveMinorThreshold

Table 6-6 SCPIngressTrafficRateAboveMinorThreshold

Field Details
Description This alert notifies that the traffic rate has increased from 9800 to 11200 Mbps, based on the user-configured minor threshold value. It also includes the locality and the current traffic rate value.
Summary namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpfqdn: {{$labels.ocscp_fqdn}},scpauthority:{{$labels.ocscp_authority}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Current Ingress Traffic Rate is {{ $value | printf "%.2f" }} mps which is above 70 Percent of Max MPS(14000)
Severity minor
Condition sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m])) by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name,ocscp_fqdn,ocscp_authority) >= 9800 < 11200
OID 1.3.6.1.4.1.323.5.3.35.1.2.7001
Metric Used ocscp_metric_http_rx_req_total
Recommended Action

Cause: When the Consumer NF sends more traffic than expected.

Diagnostic Information:

Monitor the ingress traffic to pod using the KPI Dashboard.

Refer to the rate of ocscp_metric_http_rx_req_total metric on the Grafana GUI.

Recovery: This alert is cleared automatically when the ingress traffic reduces below the minor threshold or exceeds the major threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue.

For any assistance, contact My Oracle Support.

Note:

The alert expression is configured for the SCP profile (12 vCPUs and 16 Gi of memory). For a smaller resource profile (for example, 8 vCPUs and 12 Gi of memory), the per-worker pod MPS should be evaluated based on the dimensioning sheet, and the alert file must be updated accordingly.

6.2.3 SCPIngressTrafficRateAboveMajorThreshold

Table 6-7 SCPIngressTrafficRateAboveMajorThreshold

Field Details
Description This alert notifies that the traffic rate has increased from 11200 to 13300 Mbps, based on the user-configured major threshold value. It also includes the locality and the current traffic rate value.
Summary namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpfqdn: {{$labels.ocscp_fqdn}},scpauthority:{{$labels.ocscp_authority}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Current Ingress Traffic Rate is {{ $value | printf "%.2f" }} mps which is above 80 Percent of Max MPS(14000)
Severity major
Condition sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m])) by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name,ocscp_fqdn,ocscp_authority) >= 11200 < 13300
OID 1.3.6.1.4.1.323.5.3.35.1.2.7001
Metric Used ocscp_metric_http_rx_req_total
Recommended Action

Cause: When the Consumer NF sends more traffic than expected.

Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard.

Refer to the rate of ocscp_metric_http_rx_req_total metric on the Grafana GUI.

Recovery: This alert is cleared automatically when the ingress traffic reduces below the major threshold or exceeds the critical threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue.

If this alert continues for a long duration, then reduce the ingress traffic from consumer to pod.

For any assistance, contact My Oracle Support.

Note:

The alert expression is configured for the SCP profile (12 vCPUs and 16 Gi of memory). For a smaller resource profile (for example, 8 vCPUs and 12 Gi of memory), the per-worker pod MPS should be evaluated based on the dimensioning sheet, and the alert file must be updated accordingly.

6.2.4 SCPIngressTrafficRateAboveCriticalThreshold

Table 6-8 SCPIngressTrafficRateAboveCriticalThreshold

Field Details
Description This alert notifies that the traffic rate has increased above 13300 mps, based on the user-configured major threshold value. It also includes the locality and the current traffic rate value.
Summary namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpauthority:{{$labels.ocscp_authority}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Current Ingress Traffic Rate is {{ $value | printf "%.2f" }} mps which is above 95 Percent of Max MPS(14000)
Severity critical
Condition sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m])) by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name,ocscp_fqdn,ocscp_authority) >= 13300
OID 1.3.6.1.4.1.323.5.3.35.1.2.7001
Metric Used ocscp_metric_http_rx_req_total
Recommended Action

Cause: When the Consumer NF sends more traffic than expected.

Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard.

Refer to the rate of the ocscp_metric_http_rx_req_total metric on the Grafana GUI.

Recovery: This alert is cleared automatically when the ingress traffic reduces below the critical threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue.

If this alert continues for a long duration, then reduce the ingress traffic from consumer to pod.

For any assistance, contact My Oracle Support.

Note:

The alert expression is configured for the SCP profile (12 vCPUs and 16 Gi of memory). For a smaller resource profile (for example, 8 vCPUs and 12 Gi of memory), the per-worker pod MPS should be evaluated based on the dimensioning sheet, and the alert file must be updated accordingly.

6.2.5 SCPRoutingFailedForProducer

Table 6-9 SCPRoutingFailedForProducer

Field Description
Severity Info
Conditions increase(ocscp_metric_routing_attempt_fail_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7005
Description Notify that Routing failed for producer. Provides detail such as NFService Type, NFType, Locality, producer FQDN and value.
Recommended Actions

Cause: When routing fails to select a producer NF due to unavailability of routing rules for an NF service or producer.

Diagnostic Information:
  • Check whether the routing rules are configured for the NF for which routing failed.
  • Check the notification logs for any error while processing the notification of the NF for which routing failed. Then, run the following command to get the notification logs: kubectl logs <podname> -n <namespace>
  • Check whether the NF is reachable or not by using one of the following steps:
    • Run the ping command from primary/secondary nodes using IP of service. Example of a ping command: ping <IPAddress>.
    • Run the ping command from inside the pod, if FQDN of service is used.
    • Instead of using the ping command, you can collect tcpdump for ensuring the connectivity. tcpdump must be run on the debug container for scp-worker microservice. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.

Recovery: This alert is cleared automatically when the routing is complete for a producer NF or no more traffic is received in the next Promethues scrape interval.

Check if the NF is deregistered. Register the NF to create routing rules if rules do not exist.

For any assistance, contact My Oracle Support.

6.2.6 SCPAuditErrorResponse

Table 6-10 SCPAuditErrorResponse

Field Description
Severity Info
Conditions ocscp_audit_error_response > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.4001
Description

Alert is raised when Audit module receives a 3xx, 4xx, or 5xx error from NRF. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.

Note: Alert is cleared on the next audit cycle.

Recommended Actions

Cause: When the configured NRF sends error responses, down, or not reachable.

Diagnostic Information:
  • Check if NRF is up and reachable. To check the NRF status, see Oracle Communications Cloud Native Core, Network Repository Function User Guide.
  • Check if the NF is reachable or not using one of the following steps:
    • Run the ping command from primary/secondary nodes using IP of NRF. Example of a ping command: ping <IpAddress>.
    • Run the ping command from inside the pod if FQDN of NRF is used.
    • Instead of using ping, you can collect tcpdump for ensuring connectivity. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.
  • Monitor audit and worker service logs: kubectl logs <pod name> -n <namespace>
  • Check jaeger traces for scp-worker.

Recovery: The alert is cleared automatically during the next audit cycle and when no more errors are received. Collect audit and worker service logs and contact My Oracle Support for any assistance.

6.2.7 SCPAuditEmptyNFArrayResponse

Table 6-11 SCPAuditEmptyNFArrayResponse

Field Description
Description

Alert is generated when Audit module receives a 2xx response with empty NFInstance array from NRF. Alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.

Alert is cleared if Audit receives a success response with non-empty NFInstance array or on next audit cycle if topology source is changed to LOCAL.

Summary SCP Audit received Empty NF Array Response for nfType {{$labels.nfType}}, nrfRegionOrSetId: {{$labels.nrfRegionOrSetId}}, auditmethod: {{$labels.auditmethod}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Critical
Conditions ocscp_audit_2xx_empty_nf_array > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.4002
Recommended Actions

Cause: When NRF does not have any NF registered or due to any error condition on NRF.

Diagnostic Information: Check if NRF contains any registered NF and validate as required. For more information, refer to NRF documents.

Recovery: This alert is cleared automatically if Audit receives a success response with non-empty NFInstance array or during the next audit cycle when the topology source is changed to LOCAL.

Register a NF with NRF or change the topology source to LOCAL.

For any assistance, contact My Oracle Support.

6.2.8 DuplicateLocalityFoundInForeignNF

Table 6-12 DuplicateLocalityFoundInForeignNF

Field Description
Severity Major
Conditions ocscp_notification_duplicate_foreign_location > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3002
Description Alert is raised when an unknown NF or SCP is registered with duplicate locality from the present region.
Recommended Actions

Cause: When SCP discovers a duplicate locality of an NF from an unknown region.

Diagnostic Information: Check logs for NF notification received by running the following command: kubectl -n <namespace> logs <pod name>.

Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_duplicate_foreign_location (nfInstanceId).

From the metric, get the NF Instance ID, Locality, and serving_scope.

Check the NF Profile of the corresponding NF in the unknown region as identified by the serving_scope.

Check and correct the locality in the NF Profile to ensure it aligns with localities of that unknown region that should be different from locality of SCP which reported this alert.

Recovery: This alert is cleared automatically if an unknown NF or SCP is deregistered or registers update with the correct locality.

Re-register NF with correct locality information.

Collect logs for notification and audit service.

For any assistance, contact My Oracle Support.

6.2.9 ForeignNFLocalityNotServed

Table 6-13 ForeignNFLocalityNotServed

Field Description
Severity Critical
Conditions ocscp_notification_foreign_nf_locality_unserved > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3003
Description Alert is raised when a Foreign Producer NF's locality is not served by any SCP.
Recommended Actions

Cause: When SCP discovers an unknown Producer NF's without any locality served by an SCP.

Diagnostic Information: Check logs for received NF notification by running the following command:kubectl get pods -n <namespace>.

Note: Use the complete name of notification pod in the following command:kubectl logs <pod> -n <namespace> .

Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_foreign_nf_locality_unserved (nfInstanceId).

Recovery: This alert is cleared automatically if the unknown NF is deregistered or registers update received with locality served by SCP.

Re-register NF with correct locality information.

For any assistance, contact My Oracle Support.

6.2.10 UnknownLocalityFoundInForeignNF

Table 6-14 UnknownLocalityFoundInForeignNF

Field Description
Severity critical
Conditions ocscp_notification_foreign_nf_locality_absent > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3004
Description Alert will be raised when a Foreign Producer NF's locality is unknown.
Recommended Actions

Cause: When SCP discovers an unknown Producer NF's without locality information.

Diagnostic Information: Check logs for the received NF notification by running the following command: kubectl get pods -n <namespace>.

Use the complete name of notification pod in the following command:kubectl logs <pod> -n <namespace> -f --tail=0

Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_foreign_nf_locality_absent(nfInstanceId).

Recovery: This alert is cleared automatically if unknown NF is deregistered or registers update received with locality known to SCP.

Re-register NF with correct locality information.

For any assistance, contact My Oracle Support.

6.2.11 SCPUpstreamResponseTimeout

Table 6-15 SCPUpstreamResponseTimeout

Field Description
Description Alert is raised when Upstream connection to a producer NF fails
Summary SCP SUpstream Response Timeout for nfservicetype: {{$labels.ocscp_nf_service_type}}, nftype {{$labels.ocscp_nf_type}}, responsecode {{$labels.ocscp_response_code}}, nfclustername {{$labels.ocscp_producer_host}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity info
Conditions increase(ocscp_metric_upstream_response_timeout_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7011
Recommended Actions

Cause: When a producer NF is down, not reachable, or latency is high.

Diagnostic Information: Check whether the producer NF is up and network connectivity to the producer NF is established by using one of the following steps:
  • Run the ping command from primary/secondary nodes by using IP of producer NF. Example of a ping command: ping <IPAddress>.
  • Run the ping command from inside the pod, if FQDN of producer NF is used.
  • Instead of using the ping command, you can collect tcpdump for ensuring the connectivity. tcpdump must be run on the debug container for scp-worker microservice. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.

Check the upstream response time by using the following metric and determine if upstream is taking too long to respond: ocscp_metric_upstream_service_time (producer FQDN)

Recovery: This alert is cleared automatically in the next scrape interval if the system does not observe any error.

For any assistance, contact My Oracle Support.

6.2.12 SCPSingleNfInstanceAvailableForNFType

Table 6-16 SCPSingleNfInstanceAvailableForNFType

Field Description
Severity Major
Conditions ocscp_no_nf_instance == 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3005
Description Alert is raised when there is a single NFInstance available with SCP for an NFType.
Recommended Actions

Cause: When the preventiveAuditOnLastNFInstanceDeletion attribute is set to true, SCP has single NFInstance available for an NFType.

Diagnostic Information: Check all SCP NRFs for specific NFType in the alert if only one NFInstance is available.

For information about registered NFs, see Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Check the number of NFs of a particular type by using API or CNC console of SCP. For information about procedures to check the NFs available with SCP, see " Configuring Service Communication Proxy using the CNC Console" in Oracle Communications Cloud Native Core, Service Communication Proxy User Guide.

Recovery: This alert is cleared automatically in the next scrape interval if more than one NFInstance is available for a specified NFType in the alert.

For any assistance, contact My Oracle Support.

6.2.13 SCPNoNfInstanceForNFType

Table 6-17 SCPNoNfInstanceForNFType

Field Description
Severity Critical
Conditions ocscp_no_nf_instance == 1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3006
Description Alert is raised when there is a no NFInstance available with SCP for a NFType
Recommended Actions

Cause: When the preventiveAuditOnLastNFInstanceDeletion flag is set to true, SCP has no NFInstance available for a NFTyp.

Diagnostic Information: Check all SCP NRFs for specific NFType in the alert if no NFInstance is available.

For information about registered NFs, see Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Check the number of NFs of a particular type by using API or CNC console of SCP. For information about procedures to check the NFs available with SCP, see " Configuring Service Communication Proxy using the CNC Console" in Oracle Communications Cloud Native Core, Service Communication Proxy User Guide.

Recovery: This alert is cleared automatically in the next scrape interval if at least one NFInstance is available for a specified NFType in the alert.

For any assistance, contact My Oracle Support.

6.2.14 SCPIngressTrafficRateExceededConfiguredLimit

Table 6-18 SCPIngressTrafficRateExceededConfiguredLimit

Alert Parameters Value
Description Ingress traffic rate exceeds configured rate limit for consumer fqdn: {{$labels.ocscp_consumer_fqdn}}
Summary 'Ingress traffic rate exceeds configured rate limit for consumer fqdn: ocscpconsumerfqdn = {{$labels.ocscp_consumer_host}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumernftype = {{$labels.ocscp_consumer_nf_type}}, configuredingressrate = {{$labels.ocscp_configured_ingress_rate}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ',timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity Critical
Condition This alert is raised when the ingress traffic rate exceeds the configured rate for consumer FQDN.

increase(ocscp_metric_ingress_rate_limiting_throttle_req_total[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7012
Metric Used ocscp_metric_ingress_rate_limiting_throttle_req_total
Recommended Actions Cause:

When the ingress traffic rate exceeds the configured rate limit for the consumer FQDN.

Diagnostic Information:
  • Check the ingress traffic rate from the consumer FQDN.
  • To check the ingress rate, refer to the following metrics: cscp_metric_http_rx_req_total
  • Check the ingress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
Recovery:

This alert is cleared when no more requests get suppressed due to ingress rate limiting in the next scrape interval.

For any assistance, contact My Oracle Support.

6.2.15 SCPIngressTrafficRoutedWithoutRateLimitTreatment

Table 6-19 SCPIngressTrafficRoutedWithoutRateLimitTreatment

Alert Parameters Value
Description Ingress traffic routed without rate limit treatment
Summary 'Ingress traffic routed without rate limit treatment: consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumerfqdn = {{$labels.ocscp_consumer_host}}, cause = {{$labels.ocscp_cause}} ,namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ', timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity Major
Condition This alert is raised when the ingress traffic routes without rate limiting treatment.

increase(ocscp_metric_ingress_rate_limiting_not_applied_req_total[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7013
Metric Used ocscp_metric_ingress_rate_limiting_not_applied_req_total
Recommended Actions Cause:

When the ingress traffic routes without rate limiting treatment.

Diagnostic Information:
  • Check the ingress rate limiting configurations for the untreated FQDNs that can be obtained from the following metric: ocscp_metric_ingress_rate_limiting_not_applied_req_total(ocscp_consumer_fqdn)
  • Check the ingress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
Recovery:

This alert is cleared when no more requests get routed without ingress rate limiting treatment in the next scrape interval.

For any assistance, contact My Oracle Support.

6.2.16 SCPEgressTrafficRateExceededConfiguredLimit

Table 6-20 SCPEgressTrafficRateExceededConfiguredLimit

Field Description
Description Alert is raised when the egress traffic rate exceed the configured rate.
Summary Egress traffic rate exceeds configured rate limit: consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumerfqdn = {{$labels.ocscp_consumer_host}}, producernftype = {{$labels.ocscp_nf_type}}, producernfservicetype = {{$labels.ocscp_nf_service_type}}, producernfinstanceid = {{$labels.ocscp_nf_instance_id}}, producerfqdn = {{$labels.ocscp_producer_host}}, configuredegressrate = {{$labels.ocscp_configured_egress_rate}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity Critical
Conditions increase(ocscp_metric_egress_rate_limiting_throttle_req_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7014
Recommended Actions

Cause: When the egress traffic rate exceeds the configured rate.

Diagnostic Information: Check the egress traffic rate by using the following metric: ocscp_metric_http_tx_req_total.

Check the egress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.

Recovery: This alert is cleared when no more requests get suppressed due to egress rate limiting in the next scrape interval.

For any assistance, contact My Oracle Support.

6.2.17 SCPEgressTrafficRoutedWithoutRateLimitTreatment

Table 6-21 SCPEgressTrafficRoutedWithoutRateLimitTreatment

Field Description
Description Alert is raised when egress traffic routes without rate limiting.
Summary Egress traffic routed without rate limiting: producernftype = {{$labels.ocscp_nf_type}}, producernfservicetype = {{$labels.ocscp_nf_service_type}}, producernfinstanceid = {{$labels.ocscp_nf_instance_id}}, producerfqdn = {{$labels.ocscp_producer_host}}, cause = {{$labels.ocscp_cause}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity Major
Conditions increase(ocscp_metric_egress_rate_limiting_not_applied_req_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7015
Recommended Actions

Cause: When the egress traffic routes without rate limiting treatment.

Diagnostic Information: Check the egress rate limiting configurations for the untreated producer FQDN.

Obtain the producer FQDN by using the following metric: ocscp_metric_egress_rate_limiting_not_applied_req_total(ocscp_producer_fqdn)

Check the egress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.

Recovery: This alert is cleared when no more requests get routed without egress rate limiting treatment in the next scrape interval.

For any assistance, contact My Oracle Support.

6.2.18 SCPNotificationRejectTopologySourceLocal

Table 6-22 SCPNotificationRejectTopologySourceLocal

Field Description
Severity Info
Conditions increase(ocscp_notifications_rejected_topologysource_local_total[15m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3007
Description Alert is raised when SCP rejects a notification from NRF due to topology source set to LOCAL for NF Type.
Recommended Actions

Cause: When NF Topology Source Info is set to LOCAL.

Diagnostic Information: Check the topology source information of an NF Type.

For information about the topology source APIs, see Oracle Communications Cloud Native Core, Service Communication Proxy User Guide.

Recovery: This alert is cleared automatically after 15 minutes when NF Topology Source Info is set to NRF from LOCAL.

For any assistance, contact My Oracle Support.

6.2.19 SCPNotificationProcessingFailureForNF

Table 6-23 SCPNotificationProcessingFailureForNF

Field Description
Description Alerts is raised when Notification processing has failed on SCP.
Summary SCP Notification Processing failure for nfInstanceId: {{$labels.nfInstanceId}}, nfType: {{$labels.nfType}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Conditions increase(ocscp_notification_nf_profile_processing_failure_total[15m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.3008
Recommended Actions

Cause: When Notification processing has failed on SCP.

Diagnostic Information: Check notification pod logs for any errors by running the following command:
kubectl logs <notification pod name> -n <scp namespace> 
. To get the list of pods, run the following command:
kubectl get pod -n <scp namespace>
Sample logs:
{"instant":{"epochSecond":1620272241,"nanoOfSecond":609935406},"thread":"runQueueThreadPool1","level":"ERROR","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.Process","message":"{logMsg=Notified profile IP endpoints already present in stored profile, logMsgCode=DUPLICATE_IPENDPOINT_OR_FQDN_FOUND_IN_STORED_PROFILE, rootCause=Notified Profile contains an ipEndPoint {\"ipv4Address\":\"10.75.203.74\",\"transport\":\"TCP\",\"port\":32673} which is already present in stored Profile with nfInstanceIid 93ED74AA-A29C-4450-9D7A-9278CAF6266D for serviceInstanceId audmcl08nv08-udmueauthn-589d6d5bcc-4dslh}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":34,"threadPriority":5,"messageTimestamp":"21-05-06 03:37:21.609+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-notification","engVersion":"25.1.100","mktgVersion":"25.1.100.0.0","vendor":"oracle","namespace":"scpsvc","node":"worker1","pod":"ocscp-scpc-notification-547f699c96-m7nc8","subsystem":"notification","instanceType":"prod","processId":"1"}
      {"instant":{"epochSecond":1620272241,"nanoOfSecond":734873890},"thread":"runQueueThreadPool1","level":"WARN","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.scheduler.RunQueueConsumer","message":"{logMsg=Profile Processing failed, nfInstanceId=93ED74AA-A29C-4450-9D7A-9278CAF6266D}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":34,"threadPriority":5,"messageTimestamp":"21-05-06 03:37:21.734+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-notification","engVersion":"25.1.100","mktgVersion":"25.1.100.0.0","vendor":"oracle","namespace":"scpsvc","node":"worker1","pod":"ocscp-scpc-notification-547f699c96-m7nc8","subsystem":"notification","instanceType":"prod","processId":"1"}

For information about the topology source APIs, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.

Recovery: This alert is cleared automatically after 15 minutes.

For any assistance, contact My Oracle Support.

6.2.20 SCPSubscriptionFailureForNFType

Table 6-24 SCPSubscriptionFailureForNFType

Field Description
Severity Critical
Conditions increase(ocscp_subscription_nf_failure_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.2001
Description Alerts is raised when SCP subscription to NRF has failed. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.
Recommended Actions

Cause: When the subscription fails for an NF Type with NRF.

Diagnostic Information:

Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Check whether the NRF is reachable or not by using one of the following steps:
  • Run the ping command from primary or secondary nodes using IP of NRF. Example of a ping command: ping <IPAddress>.
  • Run the ping command from inside the pod if FQDN of NRF is used.
  • Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.

If NRF is up, check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>.

Sample logs:
{"instant":{"epochSecond":1620275506,"nanoOfSecond":134773910},"thread":"pool-8-thread-1","level":"ERROR","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer","message":"{logMsg=Exception occurred while handling action for subscriber data, action=TO_BE_RENEWED, stackTrace=com.oracle.cgbu.cne.scp.soothsayer.subscription.operations.NrfSubscriptionClient.triggerPatchRequest(NrfSubscriptionClient.java:177)\ncom.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer.handleAction(SubscriptionDataConsumer.java:322)\ncom.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer.consumeSQ(SubscriptionDataConsumer.java:140)\ncom.oracle.cgbu.cne.scp.soothsayer.subscription.processor.SubscriptionDataConsumer.run(SubscriptionDataConsumer.java:84)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)\njava.base/java.lang.Thread.run(Thread.java:832)}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":33,"threadPriority":5,"messageTimestamp":"21-05-06 04:31:46.134+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-subscription","engVersion":"1.15.0","mktgVersion":"1.15.0.0.0","vendor":"oracle","namespace":"scpsvc","node":"primary","pod":"ocscp-scpc-subscription-55cfb57cc6-2qp2g","subsystem":"subscription","instanceType":"prod","processId":"1"}

Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses.

For any assistance, contact My Oracle Support.

6.2.21 SCPReSubscriptionFailureForNFType

Table 6-25 SCPReSubscriptionFailureForNFType

Field Description
Severity Critical
Conditions increase(ocscp_patch_subscription_nf_failure_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.2002
Description Alerts is raised when SCP re-subscription to NRF has failed. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.
Recommended Actions

Cause: When the re-subscription fails for an NF Type with NRF.

Diagnostic Information:

Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Check whether the NRF is reachable or not by using one of the following steps:
  • Run the ping command from primary/secondary nodes by using IP of NRF. Example of a ping command: ping <IPAddress>.
  • Run the ping command from inside the pod, if FQDN of NRF is used.
  • Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.

Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>.

Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses.

For any assistance, contact My Oracle Support.

6.2.22 SCPNrfRegistrationFailureForRegionOrSetId

Table 6-26 SCPNrfRegistrationFailureForRegionOrSetId

Field Description
Severity Major
Conditions increase(ocscp_nrf_registration_failure_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.2003
Description Alerts is raised when SCP registration fails. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod.
Recommended Actions

Cause: When the registration fails for an NF Type with NRF.

Diagnostic Information:

Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Check whether the NRF is reachable or not by using one of the following steps:
  • Run the ping command from primary or secondary nodes by using IP of NRF. Example of a ping command: ping <IPAddress>.
  • Run the ping command from inside the pod if FQDN of NRF is used.
  • Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.

Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>.

Sample logs:

{"instant":{"epochSecond":1620638888,"nanoOfSecond":78728229},"thread":"registration-0","level":"ERROR","loggerName":"com.oracle.cgbu.cne.scp.soothsayer.subscription.processor.NrfRegistrationProcessor","message":"{logMsg=Registration will be retried after configured interval, configuredIntervalInSec=6}","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":33,"threadPriority":5,"messageTimestamp":"21-05-10 09:28:08.078+0000","application":"ocscp-soothsayer","microservice":"ocscp-scpc-subscription","engVersion":"25.1.100","mktgVersion":"25.1.100.0.0","vendor":"oracle","namespace":"scpsvc","node":"worker1","pod":"ocscp-scpc-subscription-66c68b9db6-6g582","subsystem":"subscription","instanceType":"prod","processId":"1"}

Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses.

For any assistance, contact My Oracle Support.

6.2.23 SCPNrfHeartbeatFailureForRegionOrSetId

Table 6-27 SCPNrfHeartbeatFailureForRegionOrSetId

Field Description
Description Alerts is raised when SCP Heartbeat fails.
Summary SCP Heartbeat to NRF Failure for Region Or SetId: {{$labels.nrfRegionOrSetId}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Conditions increase(ocscp_subscription_nrf_heartbeat_failures_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.2004
Recommended Actions

Cause: When the Heartbeat fails for an NF Type with NRF.

Diagnostic Information:

Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Check whether the NRF is reachable or not by using one of the following steps:
  • Run the ping command from primaryor secondary nodes by using IP of NRF. Example of a ping command: ping <IPAddress>.
  • Run the ping command from inside the pod if FQDN of NRF is used.
  • Instead of using ping, you can collect tcpdump for ensuring the connectivity. Example of a tcpdump: tcpdump -w capture.pcap -i <pod interface>.

Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>.

Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses.

For any assistance, contact My Oracle Support.

6.2.24 SCPDBOperationFailure

Table 6-28 SCPDBOperationFailure

Field Description
Severity Warning
Conditions increase(ocscp_db_operation_failure_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.2005
Description Alert is raised for any DB operation failures.
Recommended Actions

Cause: When the SCP DB operation fails.

Diagnostic Information:

Check whether the DB service is up.

Check the status/age of the mysql pod by using the following command: kubectl get pods -n <namespace>. Where, <namespace> is the namespace in which mysql pod is deployed.

Recovery: This alert is cleared automatically when the DB service is up and running.

For any assistance, contact My Oracle Support.

6.2.25 SCPGeneratedErrorsResponseForNFService

Table 6-29 SCPGeneratedErrorsResponseForNFService

Field Description
Severity Info
Conditions increase(ocscp_metric_scp_generated_response_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7016
Description Alert is raised for NF type for which SCP generated response is triggered.
Recommended Actions

Cause: When the error response is generated for NF Service Type by SCP.

Diagnostic Information:

Monitor scp-worker logs to determine the reason for error responses generated by SCP. Check for error reason in the logs: kubectl logs <pod name> -n <namespace>.

Recovery: This alert is cleared automatically when the cause for error response at SCP worker is corrected and configured.

For any assistance, contact My Oracle Support.

6.2.26 SCPCircuitBreakingAppliedForNF

Table 6-30 SCPCircuitBreakingAppliedForNF

Field Description
Severity Info
Conditions ocscp_circuit_breaking_applied > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.7017
Description Alert is raised for NF when circuit breaking is applied.
Recommended Actions

Cause: When Circuit Breaking applies for producer NF FQDN based on the configured http2MaxRequests value.

Diagnostic Information:

Monitor scp-worker logs for number of error responses when outstanding requests exceed the configured http2MaxRequests value: kubectl logs <pod name> -n <namespace>.

Check the latency to upstream producer from SCP. Use the following metric to check the same: ocscp_metric_upstream_service_time_total(ocscp_producer_host or ocscp_nf_end_point).

Recovery: This alert is cleared automatically when the configuration for http2MaxRequests for circuit breaking is configured beyond the traffic at worker or lower the traffic than the value configured for circuit breaking.

For any assistance, contact My Oracle Support.

6.2.27 SCPUpgradeStarted

Table 6-31 SCPUpgradeStarted

Field Description
Severity Info
Conditions When the SCP upgrade process for an SCP microservice starts.
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.6001
Description Alert is raised when the SCP upgrade process for an SCP microservice starts.
Recommended Actions

Cause: When SCP upgrade is performed for a particular microservice.

Diagnostic Information: Not applicable.

Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to false in the ocscp_values.yaml file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is true.

For any assistance, contact My Oracle Support.

6.2.28 SCPUpgradeFailed

Table 6-32 SCPUpgradeFailed

Field Description
Severity Critical
Conditions When any SCP microservice upgrade fails during the upgrade process.
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.6002
Description Alert is raised when any SCP microservice upgrade fails.
Recommended Actions

Cause: When any SCP microservice upgrade fails during the upgrade process.

Diagnostic Information: Monitor new hook-jobs that might have failed after multiple attempts. Also, monitor any failed log.

Run the following command to check the pod of hook-job: kubectl get pods -n <namespace>.

Run the following command to check the logs: kubectl logs <pod name> -n <namespace>.

Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to false in the ocscp_values.yaml file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is true.

For any assistance, contact My Oracle Support.

6.2.29 SCPUpgradeSuccessful

Table 6-33 SCPUpgradeSuccessful

Field Description
Severity Info
Conditions When any SCP microservice upgrade is completed.
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.6003
Description Alert is raised when any SCP microservice upgrade is completed.
Recommended Actions

Cause: When any SCP microservice upgrade is completed.

Diagnostic Information: Not applicable.

Run the following command to check the pod of hook-job: kubectl get pods -n <namespace>.

Run the following command to check the logs: kubectl logs <pod name> -n <namespace>.

Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to false in the ocscp_values.yaml file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is true.

For any assistance, contact My Oracle Support.

6.2.30 SCPRollbackStarted

Table 6-34 SCPRollbackStarted

Field Description
Severity Info
Conditions When the rollback process for an SCP microservice starts.
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.6004
Description Alert is raised when the rollback process for an SCP microservice starts.
Recommended Actions

Cause: When the rollback process for an SCP microservice starts.

Diagnostic Information: Not applicable.

Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to false in the ocscp_values.yaml file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is true.

For any assistance, contact My Oracle Support.

6.2.31 SCPRollbackFailed

Table 6-35 SCPRollbackFailed

Field Description
Severity Critical
Conditions When any SCP microservice rollback fails during the rollback process.
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.6005
Description Alert is raised when any SCP microservice rollback fails.
Recommended Actions

Cause: When any SCP microservice rollback fails during the rollback process.

Diagnostic Information: Monitor new hook-jobs that might have failed after multiple attempts. Also, monitor any failed log.

Run the following command to check the pod of hook-job:
kubectl get pods -n <namespace>

Run the following command to check the logs:

kubectl logs <pod name> -n <namespace>

Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to false in the ocscp_values.yaml file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is true.

For any assistance, contact My Oracle Support.

6.2.32 SCPRollbackSuccessful

Table 6-36 SCPRollbackSuccessful

Field Description
Severity Info
Conditions When any SCP microservice rollback is completed.
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.6006
Description Alert is raised when any SCP microservice rollback is completed.
Recommended Actions

Cause: When any SCP microservice rollback is completed.

Diagnostic Information: Not applicable.

Recovery: This alert is cleared automatically in 5 minutes when the customAlertExpiryEnabled parameter is set to false in the ocscp_values.yaml file. Otherwise, it is cleared after a specific duration as specified in the customAlertExpiryDuration parameter when the customAlertExpiryEnabled value is true.

For any assistance, contact My Oracle Support.

6.2.33 ScpWorkerPodCpuUtilizationAboveWarnThreshold

Table 6-37 ScpWorkerPodCpuUtilizationAboveWarnThreshold

Field Details
Description CPU utilization of SCP worker at warn level
Summary CPU utilization of SCP worker at warn level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }}
Severity Warning
Condition This alert is raised when CPU utilization of SCP-Worker reaches the WARN level.

increase(ocscp_worker_pod_overload_control_cpu_utilization_total{ocscp_threshold_level="WARN"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7018
Metric Used ocscp_worker_pod_overload_control_cpu_utilization_warn
Recommended Action

Cause: When CPU utilization of scp-worker reaches the WARN level.

Diagnostic Information:
  1. Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
  2. Check the CPU threshold level status from the scp-worker pod logs under the warn level.

Recovery:

  • Reduce the incoming service request rate.
  • This alert is automatically cleared when CPU utilization is reduced to below WARN threshold level.

For any assistance, contact My Oracle Support.

6.2.34 ScpWorkerPodCpuUtilizationAboveMinorThreshold

Table 6-38 ScpWorkerPodCpuUtilizationAboveMinorThreshold

Field Details
Description CPU utilization of SCP worker at minor level
Summary Worker CPU utilization lead to minor level.namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Minor
Condition This alert is raised when CPU utilization of scp-worker reaches the MINOR level.

increase(ocscp_worker_pod_overload_control_cpu_utilization_total{ocscp_threshold_level="MINOR"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7019
Metric Used ocscp_worker_pod_overload_control_cpu_utilization_minor
Recommended Action

Cause: When CPU utilization of SCP-Worker reaches the MINOR level.

Diagnostic Information:
  1. Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
  2. Check the CPU threshold level status from the scp-worker pod logs under the warn level.

Recovery:

  • Reduce the incoming service request rate.
  • This alert is automatically cleared when CPU utilization is reduced to below MINOR threshold level.

For any assistance, contact My Oracle Support.

6.2.35 ScpWorkerPodCpuUtilizationAboveMajorThreshold

Table 6-39 ScpWorkerPodCpuUtilizationAboveMajorThreshold

Field Details
Description CPU utilization of SCP worker at major level
Summary Worker CPU utilization lead to major level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Condition This alert is raised when CPU utilization of scp-worker reaches the MAJOR level.

increase(ocscp_worker_pod_overload_control_cpu_utilization_total{ocscp_threshold_level="MAJOR"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7020
Metric Used ocscp_worker_pod_overload_control_cpu_utilization_major
Recommended Action

Cause: When CPU utilization of SCP-Worker reaches the MAJOR level.

Diagnostic Information:
  1. Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
  2. Check the CPU threshold level status from the scp-worker pod logs under the warn level.

Recovery:

  • Reduce the incoming service request rate.
  • This alert is automatically cleared when CPU utilization is reduced to below MAJOR threshold level.

For any assistance, contact My Oracle Support.

6.2.36 ScpWorkerPodCpuUtilizationAboveCriticalThreshold

Table 6-40 ScpWorkerPodCpuUtilizationAboveCriticalThreshold

Field Details
Description CPU utilization of SCP worker at critical level
Summary Worker CPU utilization lead to critical level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Critical
Condition This alert is raised when CPU utilization of scp-worker reaches the CRITICAL level.

increase(ocscp_worker_pod_overload_control_cpu_utilization_total{ocscp_threshold_level="CRITICAL"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7021
Metric Used ocscp_worker_pod_overload_control_cpu_utilization_critical
Recommended Action

Cause: When CPU utilization of scp-worker reaches the CRITICAL level.

Diagnostic Information:
  1. Get the configured threshold level values using Pod Overload Control Policy REST APIs. For more information about Pod Overload Control Policy, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
  2. Check the CPU threshold level status from the scp-worker pod logs under the warn level.

Recovery:

  • Reduce the incoming service request rate.
  • This alert is automatically cleared when CPU utilization is reduced to below CRITICAL threshold level.

For any assistance, contact My Oracle Support.

6.2.37 SCPUnhealthyPeerSCPDetected

Table 6-41 SCPUnhealthyPeerSCPDetected

Field Details
Description Next hop SCP is marked unhealthy
Summary 'Next hop SCP is marked unhealthy. peerscphost: {{labels.peerScpName}}, scpFqdn: {{labels.scpFqdn}} , namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} '
Severity Info
Condition This alert is raised when peer SCP is marked as unhealthy.

ocscp_peer_scp_unhealthy > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7022
Metric Used ocscp_peer_scp_unhealthy
Recommended Action

Cause: The peer SCP is marked as unhealthy because of consecutive failure responses.

Diagnostic Information:
  1. Check transport failures and routing errors on the peer SCP.
  2. Run the ping command from primary or worker nodes using IP of Service.

    Sample command: ping <IPAddress>.

  3. If FQDN of service is used, run the ping command from inside the pod.
  4. If the pod does not support the ping command, get the debug container of SCP pod.
  5. If you do not want to use the ping command, collect tcpdump to establish the connection.

    Sample command: tcpdump -w capture.pcap -i <pod interface>

Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection.

For any assistance, contact My Oracle Support.

6.2.38 SCPDnsSrvQueryFailure

Table 6-42 SCPDnsSrvQueryFailure

Field Details
Description DNS SRV Query failed with cause {{$labels.cause}}
Summary 'DNS SRV Query failed with cause {{$labels.cause}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the DNS server lookup for SRV fails due to network, servfail, or timed-out errors.

ocscp_alternate_resolution_dnssrv_rx_error_res == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.8001
Metric Used ocscp_alternate_resolution_dnssrv_rx_error_res
Recommended Action

Cause: When the dnsSRVAlternateRouting flag is set to true, if the DNS SRV lookup fails due to network, servfail, or timed-out errors.

Diagnostic Information: Check the DNS SRV server status and re-establish the status to normal.

Recovery: This alert is automatically cleared when SCP performs a successful DNS SRV query.

For any assistance, contact My Oracle Support.

6.2.39 SCPProducerOverloadThrottled

Table 6-43 SCPProducerOverloadThrottled

Field Details
Description Producer is in Throttled Overload state
Summary Producer is in Throttled Overload state.ocscp_peer_fqdn: {{$labels.ocscp_peer_fqdn}}, ocscp_peer_nf_instance_id: {{$labels.ocscp_peer_nf_instance_id}}, ocscp_peer_service_instance_id: {{$labels.ocscp_peer_service_instance_id}}, scpfqdn: {{$labels.ocscp_fqdn}}, podname: {{$labels.kubernetes_pod_name}}, namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Info
Condition ocscp_load_manager_peer_load_throttled_threshold == 1
OID 1.3.6.1.4.1.323.5.3.35.1.2.7023
Metric Used ocscp_load_manager_peer_load_throttled_threshold
Recommended Action

Cause: When the load of producer NF is higher than the throttled threshold configured for the service.

Diagnostic Information:
  1. Check and configure the throttled threshold for each service using Routing Options REST APIs configurations as described in "Configuring Routing Options" in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
  2. Check the load for each service using the NF Profile REST APIs as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.

Recovery: This alert clears automatically when the NF profile is deregistered or changed with load less than the throttled abatement threshold.

For any assistance, contact My Oracle Support.

6.2.40 SCPProducerOverloadAlternateRouted

Table 6-44 SCPProducerOverloadAlternateRouted

Field Details
Description Producer is in Alternate Route Overload state
Summary Producer is in Alternate Route Overload state.ocscp_peer_fqdn: {{$labels.ocscp_peer_fqdn}}, ocscp_peer_nf_instance_id: {{$labels.ocscp_peer_nf_instance_id}}, ocscp_peer_service_instance_id: {{$labels.ocscp_peer_service_instance_id}}, scpfqdn: {{$labels.ocscp_fqdn}}, podname: {{$labels.kubernetes_pod_name}}, namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Info
Condition ocscp_load_manager_peer_load_alternateRoute_threshold == 1
OID 1.3.6.1.4.1.323.5.3.35.1.2.7024
Metric Used ocscp_load_manager_peer_load_alternateRoute_threshold
Recommended Action

Cause: When the load of producer NF is higher than alternate routing threshold configured for the service.

Diagnostic Information:
  1. Check and configure the alternate routing threshold for each service using Routing Options REST APIs configurations as described in "Configuring Routing Options" in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.
  2. Check the load for each service using the NF Profile REST APIs as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide.

Recovery: This alert clears automatically when the NF profile is deregistered or changed with load less than the alternate routing abatement threshold.

For any assistance, contact My Oracle Support.

6.2.41 SCPSeppNotConfigured

Table 6-45 SCPSeppNotConfigured

Field Details
Description SEPP is not configured for PLMN
Summary 'SEPP is not configured for PLMN'Summary: 'SEPP is not configured for PLMN. plmnid: {{$labels.plmn_id}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when Security Edge Protection Proxy (SEPP) is not configured.

ocscp_metric_sepp_not_configured_current == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.7025
Metric Used cscp_metric_sepp_not_configured_current
Recommended Action

Cause: When SEPP routing related rules are not configured at SCP for selected PLMN in the inter-PLMN routing.

Diagnostic Information:
  1. Check whether SEPP profile is registered or SEPP related configuration is made at SCP.
  2. Verify the routing rules created for selected inter-PLMN in the PLMN_SEPP_MAPPING table.

Recovery: This alert clears automatically when the SEPP related routing rules are created at SCP for selected PLMN in the inter-PLMN routing.

For any assistance, contact My Oracle Support.

6.2.42 SCPSeppRoutingFailed

Table 6-46 SCPSeppRoutingFailed

Field Details
Description Routing towards SEPP failed
Summary Routing towards SEPP failed. sepp_fqdn: {{$labels.ocscp_sepp_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when routing towards SEPP fails.

ocscp_metric_sepp_routing_attempt_fail_current == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.7026
Metric Used ocscp_metric_sepp_routing_attempt_fail_current
Recommended Action

Cause: Inter-PLMN routing failed for the selected SEPP instances.

Diagnostic Information:
  1. Check whether selected SEPP is up and healthy.
  2. Check if selected SEPP is reachable.
  3. Use the ping command from primary or secondary nodes using IP of SEPP.

    Sample ping command is ping <IpAddress>

  4. If FQDN of SEPP is used, try ping command from inside the pod.
  5. Alternative to ping, collect tcpdump for ensuring proper connectivity.

    Sample command: tcpdump -w capture.pcap -i <pod interface>

  6. Check scp-worker logs for any error response from selected SEPP. If there are error responses, monitor selected SEPP logs by running this command: kubectl logs <pod name> -n <namespace>

Recovery: This alert clears automatically when routing is successful for selected SEPP.

For any assistance, contact My Oracle Support.

6.2.43 SCPGlobalEgressRLRemoteParticipantConnectivityFailure

Table 6-47 SCPGlobalEgressRLRemoteParticipantConnectivityFailure

Field Details
Description 'SCP Global Egress RL Remote Participant Connectivity Failure for participant
Summary 'SCP Global Egress RL Remote Participant Connectivity Failure for participant: {{$labels.scp_remote_coh_cluster_name}}, scp_fqdn: {{$labels.scp_fqdn}}, scp_local_coh_cluster_name: {{$labels.scp_local_coh_cluster_name}}, scp_remote_coh_cluster_fqdn: {{$labels.scp_remote_coh_cluster_fqdn }}, scp_remote_coh_cluster_port: {{$labels.scp_remote_coh_cluster_port }}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when the remote participant SCP connection is not established or goes down.

ocscp_global_egress_rl_remote_participant_connectivity_failure == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9001
Metric Used ocscp_global_egress_rl_bucketkey_not_rate_controlled_total
Recommended Action

Cause: When the remote participant SCP connection is not established or down.

Diagnostic Information:
  1. Check whether the FQDN or IP port are properly configured with remote SCP's scp-cache microservice.
  2. Check whether clusterName and NFInstanceID are configured for the remote SCP.
  3. Check whether communication path is active between the two SCP's scp-cache microservice.
  4. Run the following command to monitor scp-cache logs: kubectl logs <pod name> -n <namespace>

Recovery: This alert clears automatically if the connection is established with the remote participant SCP.

For any assistance, contact My Oracle Support.

6.2.44 SCPGlobalEgressRLRemoteParticipantWithDuplicateNFInstanceId

Table 6-48 SCPGlobalEgressRLRemoteParticipantWithDuplicateNFInstanceId

Field Details
Description This alert is raised when a duplicate remote coherence participant is found.
Summary SCP Global Egress RL Remote Participant Configured With Duplicate NFInstanceId for participant: {{$labels.scp_remote_coh_cluster_name}}, scp_fqdn: {{$labels.ocscp_fqdn}}, scp_nf_instance_id: {{$labels.scp_nf_instance_id}}, scp_local_coh_cluster_name: {{$labels.scp_local_coh_cluster_name}}, scp_remote_coh_cluster_fqdn: {{$labels.scp_remote_coh_cluster_fqdn }}, scp_remote_coh_cluster_port: {{$labels.scp_remote_coh_cluster_port }}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Condition ocscp_global_egress_rl_remote_participant_is_duplicate == 1
OID 1.3.6.1.4.1.323.5.3.35.1.2.9002
Metric Used ocscp_global_egress_rl_remote_participant_is_duplicate
Recommended Action

Cause: Duplicate configuration of remote coherence participants with local SCP.

Diagnostic Information:
  1. Ensure Global Rate Limit feature is enabled.
  2. Check whether the clusterName and NFInstanceID of local SCPs and remote SCPs are not duplicates.
  3. Check whether the clusterName and NFInstanceID of the first remote SCP and the second remote SCP are not duplicates.
  4. Configure unique values for clusterName and NFInstanceID for local SCPs as well as remote SCPs.

Recovery:

  • This alert is cleared automatically if no duplicate configurations between local and remote SCPs are found.

For any assistance, contact My Oracle Support.

6.2.45 SCPMediationConnectivityFailure

Table 6-49 SCPMediationConnectivityFailure

Field Details
Description 'SCP Mediation Connectivity Failed, scp_fqdn
Summary 'SCP Mediation Connectivity Failed, scp_fqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when Mediation connection is not established or request to Mediation is not successful.

ocscp_mediation_http_not_reachable == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9002
Metric Used ocscp_mediation_http_not_reachable
Recommended Action

Cause: The remote Mediation connection is not established or request to Mediation is not successful.

Diagnostic Information:
  1. Check the errors of the ocscp_mediation_http_not_reachable metric on the Grafana dashboard.
  2. Run the following command to check the status of the mediation pod:
    kubectl get pods -n <namespace>
Recovery:
  1. If the mediation pod is not in the ready state, run the following command to check the scp-mediation logs:
    kubectl logs <pod name> -n <namespace>
  2. If the mediation pod is absent, then redeploy or upgrade SCP with mediationService set to true.
This alert clears automatically when the connection is established with Mediation when Mediation is invoked from any of the trigger points.

For any assistance, contact My Oracle Support.

6.2.46 SCPNotificationQueuesUtilizationAboveMinorThreshold

Table 6-50 SCPNotificationQueuesUtilizationAboveMinorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Minor Threshold'
Summary 'SCP Notification Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when the queues in the notification service are utilized above 65% of the maximum size (user configure minor threshold value).

ocscp_notification_queue_alert{severity="MINOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.3009
Metric Used ocscp_notification_queue_alert
Recommended Action

Cause: The Notification module is getting more traffic than expected.

Diagnostic Information:
  1. Monitor Notification traffic to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_notification_queue_utilization

Recovery: This alert clears automatically when notification traffic goes below minor threshold or exceeds major threshold.

For any assistance, contact My Oracle Support.

6.2.47 SCPNotificationQueuesUtilizationAboveMajorThreshold

Table 6-51 SCPNotificationQueuesUtilizationAboveMajorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Major Threshold'
Summary 'SCP Notification Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when the queues in the notification service is utilized above 75% of the maximum size (user configure major threshold value).

ocscp_notification_queue_alert{severity="MAJOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.3010
Metric Used ocscp_notification_queue_alert
Recommended Action

Cause: The Notification module is getting more traffic than expected.

Diagnostic Information:
  1. Monitor Notification traffic to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_notification_queue_utilization

Recovery: This alert clears automatically when notification traffic goes below major threshold or above critical major threshold.

For any assistance, contact My Oracle Support.

6.2.48 SCPNotificationQueuesUtilizationAboveCriticalThreshold

Table 6-52 SCPNotificationQueuesUtilizationAboveCriticalThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Critical Threshold'
Summary 'SCP Notification Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the queues in notification service are utilized above 85% of the maximum size (user configure critical threshold value).

ocscp_notification_queue_alert{severity="CRITICAL"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.3011
Metric Used ocscp_notification_queue_alert
Recommended Action

Cause: The Notification module is getting more traffic than expected.

Diagnostic Information:
  1. Monitor Notification traffic to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_notification_queue_utilization

Recovery: This alert clears automatically when notification traffic goes below critical threshold.

For any assistance, contact My Oracle Support.

6.2.49 SCPNrfProxyQueuesUtilizationAboveMinorThreshold

Table 6-53 SCPNrfProxyQueuesUtilizationAboveMinorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Minor Threshold'
Summary 'SCP Nrfproxy Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when the task queues in scp-nrfproxy service are utilized above 65% of the maximum size (user configure minor threshold value).

ocscp_nrfproxy_queue_alert{severity="MINOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9010
Metric Used ocscp_nrfproxy_queue_alert
Recommended Action

Cause: NrfProxy task queues are getting filled and the traffic is more than expected.

Diagnostic Information:
  1. Monitor traffic towards nrfProxy to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_nrfproxy_queue_utilization

Recovery: This alert clears automatically when the traffic goes below minor threshold or above major threshold.

For any assistance, contact My Oracle Support.

6.2.50 SCPNrfProxyQueuesUtilizationAboveMajorThreshold

Table 6-54 SCPNrfProxyQueuesUtilizationAboveMajorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Major Threshold'
Summary 'SCP Nrfproxy Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when the task queues in scp-nrfproxy service are utilized above 75% of the maximum size (user configure major threshold value).

ocscp_nrfproxy_queue_alert{severity="MAJOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9011
Metric Used ocscp_nrfproxy_queue_alert
Recommended Action

Cause: NrfProxy task queues are getting filled and the traffic is more than expected.

Diagnostic Information:
  1. Monitor traffic towards nrfProxy to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_nrfproxy_queue_utilization

Recovery: This alert clears automatically when the traffic goes below major threshold or above critical threshold.

For any assistance, contact My Oracle Support.

6.2.51 SCPNrfProxyQueuesUtilizationAboveCriticalThreshold

Table 6-55 SCPNrfProxyQueuesUtilizationAboveCriticalThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Critical Threshold'
Summary 'SCP Nrfproxy Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the task queues in scp-nrfproxy service are utilized above 85% of the maximum size (user configure critical threshold value).

ocscp_nrfproxy_queue_alert{severity="CRITICAL"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9012
Metric Used ocscp_nrfproxy_queue_alert
Recommended Action

Cause: NrfProxy task queues are getting filled and the traffic is more than expected.

Diagnostic Information:
  1. Monitor traffic towards nrfProxy to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_nrfproxy_queue_utilization

Recovery: This alert clears automatically when the traffic goes below critical threshold.

For any assistance, contact My Oracle Support.

6.2.52 SCPWorkerQueuesUtilizationAboveMinorThreshold

Table 6-56 SCPWorkerQueuesUtilizationAboveMinorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Minor Threshold'
Summary 'SCP Worker Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when task queues in scp-worker service are utilized above 65% of the maximum size (user configure minor threshold value).

ocscp_worker_queue_alert{severity="MINOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9007
Metric Used ocscp_worker_queue_alert
Recommended Action

Cause: Worker task queues are getting filled and the traffic is more than expected.

Diagnostic Information:
  1. Monitor traffic towards nrfProxy to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_worker_queue_utilization

Recovery: This alert clears automatically when the traffic goes below minor threshold or above major threshold.

For any assistance, contact My Oracle Support.

6.2.53 SCPWorkerQueuesUtilizationAboveMajorThreshold

Table 6-57 SCPWorkerQueuesUtilizationAboveMajorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Major Threshold'
Summary 'SCP Worker Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when task queues in scp-worker service are utilized above 75% of the maximum size (user configure major threshold value).

ocscp_worker_queue_alert{severity="MAJOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9008
Metric Used ocscp_worker_queue_alert
Recommended Action

Cause: Worker task queues are getting filled and the traffic is more than expected.

Diagnostic Information:
  1. Monitor traffic towards nrfProxy to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_worker_queue_utilization

Recovery: This alert clears automatically when the traffic goes below major threshold or goes above critical threshold.

For any assistance, contact My Oracle Support.

6.2.54 SCPWorkerQueuesUtilizationAboveCriticalThreshold

Table 6-58 SCPWorkerQueuesUtilizationAboveCriticalThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Critical Threshold'
Summary 'SCP Worker Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when task queues in scp-worker service are utilized above 85% of the maximum size (user configure critical threshold value).

ocscp_worker_queue_alert{severity="CRITICAL"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9009
Metric Used ocscp_worker_queue_alert
Recommended Action

Cause: Worker task queues are getting filled and the traffic is more than expected.

Diagnostic Information:
  1. Monitor traffic towards nrfProxy to pod using the KPI dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_worker_queue_utilization

Recovery: This alert clears automatically when the traffic goes below critical threshold.

For any assistance, contact My Oracle Support.

6.2.55 SCPCacheQueuesUtilizationAboveMinorThreshold

Table 6-59 SCPCacheQueuesUtilizationAboveMinorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Minor Threshold'
Summary 'SCP Cache Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when the task queues in the scp-cache service are utilized above 65% of their maximum size (the user-configured minor threshold value).

ocscp_cache_queue_alert{severity="MINOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.13002
Metric Used ocscp_cache_queue_utilization
Recommended Action

Cause: When the cache task queues are getting filled, and traffic is higher than expected.

Diagnostic Information:
  1. Monitor traffic towards the cache pod using the KPI Dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_cache_queue_utilization.

Recovery: The alert is cleared automatically when minor threshold or goes above major threshold.

For any assistance, contact My Oracle Support.

6.2.56 SCPCacheQueuesUtilizationAboveMajorThreshold

Table 6-60 SCPCacheQueuesUtilizationAboveMajorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Major Threshold'
Summary SCP Cache Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when the task queues in the scp-cache service are utilized above 75% of their maximum size (the user-configured major threshold value).

ocscp_cache_queue_alert{severity="MAJOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.13001
Metric Used ocscp_cache_queue_utilization
Recommended Action

Cause: When the cache task queues are getting filled, and traffic is higher than expected.

Diagnostic Information:
  1. Monitor traffic towards the cache pod using the KPI Dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_cache_queue_utilization.

Recovery: The alert is cleared automatically when traffic falls below a major threshold or goes above a critical threshold.

For any assistance, contact My Oracle Support.

6.2.57 SCPCacheQueuesUtilizationAboveCriticalThreshold

Table 6-61 SCPCacheQueuesUtilizationAboveCriticalThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Critical Threshold'
Summary 'SCP Cache Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the task queues in the scp-cache service are utilized above 85% of their maximum size (the user-configured critical threshold value).

ocscp_cache_queue_alert{severity="CRITICAL"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.13000
Metric Used ocscp_cache_queue_utilization
Recommended Action

Cause: When the cache task queues are getting filled, and traffic is higher than expected.

Diagnostic Information:
  1. Monitor traffic towards the cache pod using the KPI Dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_cache_queue_utilization.

Recovery: The alert is cleared automatically when traffic falls below a critical threshold.

For any assistance, contact My Oracle Support.

6.2.58 SCPLoadManagerQueuesUtilizationAboveMinorThreshold

Table 6-62 SCPLoadManagerQueuesUtilizationAboveMinorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Minor Threshold'
Summary SCP Load Manager Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Minor
Condition

This alert is raised when the task queues in the scp-load-manager service are utilized above 65% of their maximum size (the user-configured major threshold value).

ocscp_load_manager_queue_alert{severity="MINOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.11002
Metric Used ocscp_load_manager_queue_utilization
Recommended Action

Cause: When the cache task queues are getting filled, and traffic is higher than expected.

Diagnostic Information:
  1. Monitor traffic towards the cache pod using the KPI Dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_load_manager_queue_utilization.

Recovery: The alert is cleared automatically when traffic falls below a minor threshold.

For any assistance, contact My Oracle Support.

6.2.59 SCPLoadManagerQueuesUtilizationAboveMajorThreshold

Table 6-63 SCPLoadManagerQueuesUtilizationAboveMajorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Major Threshold'
Summary 'SCP Load Manager Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when the task queues in the scp-load-manager service are utilized above 75% of their maximum size (the user-configured major threshold value).

ocscp_load_manager_queue_alert{severity="MAJOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.11001
Metric Used ocscp_load_manager_queue_utilization
Recommended Action

Cause: When the cache task queues are getting filled, and traffic is higher than expected.

Diagnostic Information:
  1. Monitor traffic towards the cache pod using the KPI Dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_load_manager_queue_utilization.

Recovery: The alert is cleared automatically when traffic falls below a major threshold.

For any assistance, contact My Oracle Support.

6.2.60 SCPLoadManagerQueuesUtilizationAboveCriticalThreshold

Table 6-64 SCPLoadManagerQueuesUtilizationAboveCriticalThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Critical Threshold'
Summary 'SCP Load Manager Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the task queues in the scp-load-manager service are utilized above 85% of their maximum size (the user-configured critical threshold value).

ocscp_load_manager_queue_alert{severity="CRITICAL"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.11000
Metric Used ocscp_load_manager_queue_utilization
Recommended Action

Cause: When the cache task queues are getting filled, and traffic is higher than expected.

Diagnostic Information:
  1. Monitor traffic towards the cache pod using the KPI Dashboard.
  2. Refer to rate of the following metric on the Grafana dashboard: ocscp_load_manager_queue_utilization.

Recovery: The alert is cleared automatically when traffic falls below a critical threshold.

For any assistance, contact My Oracle Support.

6.2.61 SCPProducerNfSetUnhealthy

Table 6-65 SCPProducerNfSetUnhealthy

Field Details
Description All producer NFs in NF set are marked unhealthy
Summary All producer NFs in NF set are marked unhealthy. nfSet: {{$labels.ocscp_nf_setid}}, scpFqdn: {{$labels.ocscp_fqdn}} , namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Info
Condition ocscp_metric_nf_set_unhealthy == 1
OID 1.3.6.1.4.1.323.5.3.35.1.2.7027
Metric Used ocscp_metric_nf_set_unhealthy
Recommended Action

Cause: All the producer NFs are marked unhealthy because of consecutive failure responses.

Diagnostic Information:
  1. Check transport failures and routing errors on producer NFs.
  2. Run the ping command from primary or secondary nodes using IP of Service.

    Sample command: ping <IPAddress>.

  3. If FQDN of service is used, run the ping command from inside the pod.
  4. If the pod does not support the ping command, get the debug container of SCP pod.
  5. If you do not want to use the ping command, collect tcpdump to establish the connection.

    Sample command: tcpdump -w capture.pcap -i <pod interface>

Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection.

For any assistance, contact My Oracle Support.

6.2.62 SCPPeerSeppUnhealthy

Table 6-66 SCPPeerSeppUnhealthy

Field Details
Description Peer Sepp is marked unhealthy
Summary Peer Sepp is marked unhealthy. seppFqdn: {{$labels.ocscp_sepp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Info
Condition ocscp_sepp_unhealthy == 1
OID 1.3.6.1.4.1.323.5.3.35.1.2.7028
Metric Used ocscp_sepp_unhealthy
Recommended Action

Cause: The peer SEPP is marked unhealthy because of consecutive failure responses.

Diagnostic Information:
  1. Check transport failures and routing errors on peer SCP.
  2. Run the ping command from primary or secondary nodes using IP of Service.

    Sample command: ping <IPAddress>.

  3. If FQDN of service is used, run the ping command from inside the pod.
  4. If the pod does not support the ping command, get the debug container of SCP pod.
  5. If you do not want to use the ping command, collect tcpdump to establish the connection.

    Sample command: tcpdump -w capture.pcap -i <pod interface>

Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection.

For any assistance, contact My Oracle Support.

6.2.63 SCPMicroServiceUnreachable

Table 6-67 SCPMicroServiceUnreachable

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}: SCP communication between the micro-services indicated by source and destination has failed'
Summary Summary: 'SCP communication between the micro-services indicated by source and destination has failed: {{$labels.instance}}, namespace: {{$labels.namespace}}, source:{{$labels.source}}, destination: {{$labels.destination}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the communication between SCP microservices indicated by source and destination has failed.

ocscp_metric_svc_unreachable==1

OID 1.3.6.1.4.1.323.5.3.35.1.2.7029
Metric Used ocscp_metric_svc_unreachable
Recommended Action

Cause: Communication between SCP microservices has failed.

Diagnostic Information: Verify whether endpoints of all the services are in Running and Ready state. If not, restart the services.

Recovery: This alert clears automatically when the required services are in Running and Ready state.

For any assistance, contact My Oracle Support.

6.2.64 SCPTrafficFeedSendFailed

Table 6-68 SCPTrafficFeedSendFailed

Field Details
Description Sending messages to Traffic Feed failed. Cause : {{$labels.ocscp_cause}}
Summary Sending messages to Traffic Feed failed, cause: {{$labels.ocscp_cause}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Minor
Condition increase(ocscp_metric_trafficfeed_failed_total{app_kubernetes_io_name="scp-worker"}[1h]) > 0
OID 1.3.6.1.4.1.323.5.3.35.1.2.9003
Metric Used ocscp_metric_trafficfeed_attempted_total
Recommended Action

Cause: Sending of message to traffic feed failed.

Diagnostic Information:
  • Check failure reason.
  • Check if the traffic feed OCNADD configuration is correct.
  • Check if the OCNADD server is reachable and available.

Recovery: This alert clears automatically after 24 hrs if sending messages to traffic feed stops failing.

For any assistance, contact My Oracle Support.

6.2.65 SCPTrafficFeedKafkaClusterUnhealthy

Table 6-69 SCPTrafficFeedKafkaClusterUnhealthy

Field Details
Description 'Kafka cluster is marked unhealthy, Cause : {{$labels.ocscp_cause}}'
Summary 'Kafka cluster is marked unhealthy, cause: {{$labels.ocscp_cause}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when the Kafka cluster is unhealthy.

ocscp_metric_trafficfeed_cluster_unhealthy == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9026
Metric Used ocscp_metric_trafficfeed_cluster_unhealthy
Recommended Action

Cause: The Kafka cluster is unhealthy.

Diagnostic Information:
  • Check and diagnose OCNADD Kafka cluster.

Recovery: This alert clears when the Kafka cluster recovers from the failure condition.

For any assistance, contact My Oracle Support.

6.2.66 SCPTrafficFeedPartitionUnhealthy

Table 6-70 SCPTrafficFeedPartitionUnhealthy

Field Details
Description 'Kafka partition {{$labels.kafka_partition_id}} is marked unhealthy, Cause : {{$labels.ocscp_cause}}'
Summary 'Kafka cluster is marked unhealthy, cause: {{$labels.ocscp_cause}}, partition_id: {{$labels.kafka_partition_id}}, topic: {{$labels.topic}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when the Kafka partition is unhealthy.

ocscp_metric_trafficfeed_partition_unhealthy == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9025
Metric Used ocscp_metric_trafficfeed_partition_unhealthy
Recommended Action

Cause: The Kafka partition is unhealthy.

Diagnostic Information:
  • Check and diagnose OCNADD Kafka cluster.

Recovery: This alert clears when the Kafka partition recovers from the failure condition.

For any assistance, contact My Oracle Support.

6.2.67 SCPServiceMeshFailure

Table 6-71 SCPServiceMeshFailure

Field Details
Description 'SCP servicemesh failure encountered'
Summary 'SCP servicemesh failure encountered for nfservicetype: {{$labels.ocscp_nf_service_type}}, nftype: {{$labels.ocscp_nf_type}}, nfinstanceid: {{$labels.ocscp_nf_instance_id}}, serviceinstanceid: {{$labels.ocscp_service_instance_id}}, producerfqdn: {{$labels.ocscp_producer_host}}, responsecode: {{$labels.ocscp_response_code}} serverheader:{{$labels.ocscp_server_header}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'SeverityInfo
Severity Info
Condition

This alert is raised when service mesh failure occurs.

increase(ocscp_metric_sidecarproxy_failures_total[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.7030
Metric Used ocscp_metric_sidecarproxy_failures_total
Recommended Action

Cause: Service mesh failure observed at SCP.

Diagnostic Information:
  • Service mesh is unable to reach peer NF due to connection failures or host is not known to service mesh or some other errors at service mesh.
  • Verify the service mesh logs to detect error details.
  • Check sidecar status of peer NF.

Recovery: This alert clears automatically after 2 minutes if there is no service mesh failure observed by SCP with the same dimensions.

For any assistance, contact My Oracle Support.

6.2.68 SCPHealthCheckFailedForPeerSCP

Table 6-72 SCPHealthCheckFailedForPeerSCP

Field Details
Description 'SCP HealthCheck failed for peer SCP'
Summary 'SCP HealthCheck failed for peer SCP. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Info
Condition This alert is raised when peer SCP or inter-SCP becomes unhealthy due to health check status and outlier detection.

ocscp_interscp_health_check_status_failed == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9023
Metric Used ocscp_interscp_health_check_status_failed
Recommended Action

Cause: When peer SCP is unhealthy to recieve any SBI message requests due to health check and outlier detection.

Diagnostic Information:
  • Monitor if the overall average load of peer SCP is greater than the configured threshold value.
  • Check outlier detection.

Recovery: This alert clears automatically if SCP-C decides SCP-P is healthy or available based on the current and previous status of outlier detection and health check.

For any assistance, contact My Oracle Support.

6.2.69 SCPHealthCheckFailed

Table 6-73 SCPHealthCheckFailed

Field Details
Description 'SCP HealthCheck failed'
Summary

'SCP HealthCheck failed. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'

Severity Info
Condition This alert is raised when SCP is unhealthy because the overall average load of SCP is greater than the configured threshold.

ocscp_health_check_status_failed == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.9024
Metric Used ocscp_health_check_status_failed
Recommended Action

Cause: When SCP is unhealthy to receive any SBI message requests due to the overall average load.

Diagnostic Information: Monitor if the overall average load of SCP is greater than the configured threshold value.

Recovery: This alert clears automatically when the overall average load of SCP is less than the configured threshold value.

For any assistance, contact My Oracle Support.

6.2.70 ScpWorkerPodPendingTransUtilizationAboveMinorThreshold

Table 6-74 ScpWorkerPodPendingTransUtilizationAboveMinorThreshold

Field Details
Description Worker Pending Transaction lead to minor level
Summary 'Worker Pending Transaction lead to minor level.namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when pending transaction utilization of SCP-Worker reaches MINOR level.

ocscp_worker_pod_overload_control_pendingTrans_utilization_minor > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9014
Metric Used ocscp_worker_pod_overload_control_pendingTrans_utilization_minor
Recommended Action

Cause: When pending transactions utilization of SCP-Worker reaches MINOR level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of minor pending transaction utilization can be checked from database.
  • Check MINOR level logs of SCP-Worker for pending transaction status.

Recovery: This alert clears automatically when pending transaction utilization is below MINOR threshold level.

For any assistance, contact My Oracle Support.

6.2.71 ScpWorkerPodPendingTransUtilizationAboveMajorThreshold

Table 6-75 ScpWorkerPodPendingTransUtilizationAboveMajorThreshold

Field Details
Description Worker Pending Transaction lead to major level
Summary Worker Pending Transaction lead to major level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when pending transaction utilization of SCP-Worker reaches MAJOR level.

ocscp_worker_pod_overload_control_pendingTrans_utilization_major > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9015
Metric Used ocscp_worker_pod_overload_control_pendingTrans_utilization_major
Recommended Action

Cause: When pending transactions utilization of SCP-Worker reaches MAJOR level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of major pending transaction utilization can be checked from database.
  • Check MAJOR level logs of SCP-Worker for pending transaction status.

Recovery: This alert clears automatically when pending transaction utilization is below MAJOR threshold level.

For any assistance, contact My Oracle Support.

6.2.72 ScpWorkerPodPendingTransUtilizationAboveCriticalThreshold

Table 6-76 ScpWorkerPodPendingTransUtilizationAboveCriticalThreshold

Field Details
Description Worker Pending Transaction lead to critical level
Summary 'Worker Pending Transaction lead to critical level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when pending transaction utilization of SCP-Worker reaches CRITICAL level.

ocscp_worker_pod_overload_control_pendingTrans_utilization_critical > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9016
Metric Used ocscp_worker_pod_overload_control_pendingTrans_utilization_critical
Recommended Action

Cause: When pending transactions utilization of SCP-Worker reaches CRITICAL level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of critical pending transaction utilization can be checked from database.
  • Check CRITICAL level logs of SCP-Worker for pending transaction status.

Recovery: This alert clears automatically when pending transaction utilization is below CRITICAL threshold level.

For any assistance, contact My Oracle Support.

6.2.73 ScpWorkerPodPendingTransUtilizationAboveWarnThreshold

Table 6-77 ScpWorkerPodPendingTransUtilizationAboveWarnThreshold

Field Details
Description Worker Pending Transaction lead to Warn level
Summary 'Worker Pending Transaction lead to warn level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Warn
Condition This alert is raised when pending transaction utilization of SCP-Worker reaches WARN level.

ocscp_worker_pod_overload_control_pendingTrans_utilization_warn > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9017
Metric Used ocscp_worker_pod_overload_control_pendingTrans_utilization_warn
Recommended Action

Cause: When pending transactions utilization of SCP-Worker reaches WARN level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of warn pending transaction utilization can be checked from database.
  • Check WARN level logs of SCP-Worker for pending transaction status.

Recovery: This alert clears automatically when pending transaction utilization is below WARN threshold level.

For any assistance, contact My Oracle Support.

6.2.74 ScpWorkerPodResourceUtilizationAboveMinorThreshold

Table 6-78 ScpWorkerPodResourceUtilizationAboveMinorThreshold

Field Details
Description Worker overload control lead to minor level
Summary 'Worker overload control lead to minor level.namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when overload control resource utilization of SCP-Worker reaches MINOR level.

ocscp_worker_pod_overload_control_resource_utilization_minor > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9018
Metric Used ocscp_worker_pod_overload_control_resource_utilization_minor
Recommended Action

Cause: When overload control resource utilization of SCP-Worker reaches MINOR level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of minor overload control resource utilization can be checked from database.
  • Check MINOR level logs of SCP-Worker for overload control resource utilization status.

Recovery: This alert clears automatically when overload control resource utilization is below MINOR threshold level.

For any assistance, contact My Oracle Support.

6.2.75 ScpWorkerPodResourceUtilizationAboveMajorThreshold

Table 6-79 ScpWorkerPodResourceUtilizationAboveMajorThreshold

Field Details
Description Worker overload control lead to major level
Summary 'Worker overload control lead to major level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when overload control resource utilization of SCP-Worker reaches MAJOR level.

ocscp_worker_pod_overload_control_resource_utilization_major > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9019
Metric Used ocscp_worker_pod_overload_control_resource_utilization_major
Recommended Action

Cause: When overload control resource utilization of SCP-Worker reaches MAJOR level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of major overload control resource utilization can be checked from database.
  • Check MAJOR level logs of SCP-Worker for overload control resource utilization status.

Recovery: This alert clears automatically when overload control resource utilization is below MAJOR threshold level.

For any assistance, contact My Oracle Support.

6.2.76 ScpWorkerPodResourceUtilizationAboveWarnThreshold

Table 6-80 ScpWorkerPodResourceUtilizationAboveWarnThreshold

Field Details
Description 'Worker overload control lead to Warn level'
Summary 'Worker overload control lead to warn level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Warning
Condition This alert is raised when overload control resource utilization of SCP-Worker reaches WARN level.

ocscp_worker_pod_overload_control_resource_utilization_warn > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9021
Metric Used ocscp_worker_pod_overload_control_resource_utilization_warn
Recommended Action

Cause: When overload control resource utilization of SCP-Worker reaches WARN level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of warn overload control resource utilization can be checked from database.
  • Check WARN level logs of SCP-Worker for overload control resource utilization status.

Recovery: This alert clears automatically when overload control resource utilization is below WARN threshold level.

For any assistance, contact My Oracle Support.

6.2.77 ScpWorkerPodResourceUtilizationAboveCriticalThreshold

Table 6-81 ScpWorkerPodResourceUtilizationAboveCriticalThreshold

Field Details
Description Worker overload control lead to critical level
Summary 'Worker overload control lead to critical level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when overload control resource utilization of SCP-Worker reaches CRITICAL level.

ocscp_worker_pod_overload_control_resource_utilization_critical > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.9020
Metric Used ocscp_worker_pod_overload_control_resource_utilization_critical
Recommended Action

Cause: When overload control resource utilization of SCP-Worker reaches CRITICAL level.

Diagnostic Information:
  • Monitor pending transactions usage while processing traffic. The threshold value of critical overload control resource utilization can be checked from database.
  • Check CRITICAL level logs of SCP-Worker for overload control resource utilization status.

Recovery: This alert clears automatically when overload control resource utilization is below CRITICAL threshold level.

For any assistance, contact My Oracle Support.

6.2.78 SCPDNSSRVNRFMigrationTaskFailure

Table 6-82 SCPDNSSRVNRFMigrationTaskFailure

Field Description
Severity critical
Condition ocscp_configuration_dnssrv_nrf_migration_task_failure == 1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15001
Description An alert is raised to notify that migration from static to DNS has failed.
Recommended Actions
Cause:
  • If migration to DNS task was waiting for DNS SRV data and wait time elapsed.
  • If any migration task fails due to no acknowledgement from other microservices.
  • Wait time elapsed in task completion response for any migration task.

Diagnostic Information:

Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state.

Recovery:
  • DNS SRV data, wait time elapsed: Once data is received from DNS, the alert will be cleared.
  • No acknowledgement: Keep retrying for success acknowledgement and clear or remove the alert on receiving success acknowledgement.
  • Wait time elapsed for completion response: Resend the event and wait. Repeat this until a completion response is received, and on receiving a completion response, clear the alert.

    In all the raised conditions, the alert will also be cleared on receiving the new migration task.

For any assistance, contact My Oracle Support.

6.2.79 SCPDNSSRVNRFNonMigrationTaskFailure

Table 6-83 SCPDNSSRVNRFNonMigrationTaskFailure

Field Description
Severity critical
Condition ocscp_configuration_dnssrv_nrf_non_migration_task_failure == 1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15003
Description An alert is raised to notify that the non-migrated task has failed.
Recommended Actions
Cause:
  • If a non-migrated task fails due to no acknowledgement from other microservices.
  • Wait time elapsed in task completion response.

Diagnostic Information:

Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state.

Recovery:
  • No acknowledgement: Keep retrying for success acknowledgement and clear or remove the alert on receiving success acknowledgement.
  • Wait time elapsed: When a new task is submitted for the same SPN, this alert is immediately cleared; if subsequent task processing fails, again, the alert will be raised.
For any assistance, contact My Oracle Support.

6.2.80 SCPDNSSRVNRFDuplicateTargetDetected

Table 6-84 SCPDNSSRVNRFDuplicateTargetDetected

Field Description
Severity critical
Condition ocscp_configuration_dnssrv_nrf_duplicate_target_detected == 1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15002
Description An alert is raised to notify that a duplicate target NRF has been detected in the DNS SRV records.
Recommended Actions

Cause: This alert is raised when a duplicate target FQDN is received from the DNS SRV for different NRF SRV FQDN(s). In this case, the target FQDNs received against the first NRF SRV FQDN in the scpc-configuration service from the scpc-alternate-resolution service shall be processed, but the target FQNDS received against the subsequent NRF SRV FQDN will be ignored, and this alert shall be raised.

Diagnostic Information:

Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state.

Recovery:
  • If for the same NRF SRV FQDN receives non-overlapping target FQDNs, then the alert will be cleared.
For any assistance, contact My Oracle Support.

6.2.81 SCPHighResponseTimeFromProducer

Table 6-85 SCPHighResponseTimeFromProducer

Field Description
Description It notifies when the traffic exceeds 200 messages per second and the response delay from the Producer NF takes more than 50 seconds.
Summary More than 200 responses received by the SCP have a response time exceeding 50,000 milliseconds. Instance name: {{$labels.instance}}, Namespace: {{$labels.namespace}}, Pod name: {{$labels.pod}}, Timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Info
Condition (sum(rate(ocscp_metric_upstream_service_time_total{ocscp_upstream_service_time="50000ms"}[2m])) by (kubernetes_namespace) + sum(rate(ocscp_metric_upstream_service_time_total{ocscp_upstream_service_time=">50000ms"}[2m])) by (kubernetes_namespace)) > 200
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15004
Metric Used ocscp_metric_upstream_service_time_total
Recommended Actions

Cause: More than 200 messages per second have an upstream response time above 50 seconds.

Diagnostic Information:

Monitor metric metricocscp_metric_upstream_service_time_total with ocscp_upstream_service_time="50000ms" and ocscp_upstream_service_time=">50000ms".

Recovery: The alert is cleared automatically when the number of responses with delays exceeding 50 seconds falls below 200 messages per second. If the alert does not clear, check for any producer NFs or specific service request types that are taking more than 50 seconds to respond and take corrective actions if necessary. Note that immediate action may not be required, as this alert is informational. However, a high number of requests with long response delays could lead to performance degradation at the SCP.

For any assistance, contact My Oracle Support.

6.2.82 SCPCGroupVersionDetectionFailed

Table 6-86 SCPCGroupVersionDetectionFailed

Field Description
Severity critical
Condition ocscp_worker_cgroup_version_detection_failed == 1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15005
Description Notify that cgroup version detection has failed.
Recommended Actions

Cause: SCP is unable to detect the cgroup version from the underlying kernel with the command "stat -fc %T /sys/fs/cgroup/." The possible expected value is either tmpfs or cgroup2fs.

Diagnostic Information:
  • Option 1: Check worker error level logs for failure of cgroup version detection.
  • Option 2:
    • Login to the worker pod and run the command "stat -fc %T /sys/fs/cgroup."
    • Verify whether it outputs either tmpfs or cgroup2fs.
Recovery:
  • Make sure the cgroup has either tmpfs or cgroup2fs.
  • Do same version upgrade on SCP..
For any assistance, contact My Oracle Support.

6.2.83 SCPCPUUsageFileReadFailed

Table 6-87 SCPCPUUsageFileReadFailed

Field Description
Severity critical
Condition ocscp_worker_cpu_usage_file_read_failed == 1
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15006
Description Notify that the CPU usage file read operation failed within the detected cgroup version.
Recommended Actions

Cause: SCP encountered a failure in performing a read operation for the CPU usage file within the detected cgroup version. The file path is determined based on the detected cgroup version.

Diagnostic Information:
  • Option 1: Check worker warn level logs for failure in performing a read operation for the CPU usage file.
  • Option 2:
    • Log in to the worker pod and run the command "stat -fc %T /sys/fs/cgroup" to confirm whether it outputs either 'tmpfs' or 'cgroup2fs'.
    • Depending on the detected cgroup version, inspect the file located at /sys/fs/cgroup/cpu/cpuacct.usage for 'tmpfs' or /sys/fs/cgroup/cpu.stat for 'cgroup2fs'.
    • Ensure that the file exists at the specified path and possesses the required permissions for read operations.
Recovery:
  • Verify that the files are appropriate according to the cgroup version and possess the necessary permissions for read operations.
  • Perform the same version upgrade on the SCP.
For any assistance, contact My Oracle Support.

6.2.84 SCPIgnoreUnknownService

Table 6-88 SCPIgnoreUnknownService

Field Description
Severity Info
Condition increase(ocscp_ignore_unknown_service_total[24h]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15000
Description An alert is raised to notify that SCP ignored an unknown service in the NF profile.
Recommended Actions

Cause: SCP has received the NF profile with an unknown service and processed the profile by ignoring this unknown service.

Diagnostic Information:

Check the received NF profile for the unknown services.

Recovery: If the unknown services are not present in the NF profile in the next scrapping interval, then the alert will be cleared.

For any assistance, contact My Oracle Support.

6.2.85 SCPWorkerSSLCertificateOnCriticalExpiry

Table 6-89 SCPWorkerSSLCertificateOnCriticalExpiry

Field Description
Severity Critical
Condition ocscp_metric_ssl_certificate_expire_total == 1
OID used for SNMP Traps 11.3.6.1.4.1.323.5.3.35.1.2.15010
Description An alert is raised whenever the SCP SSL certificate is about to expire, based on the configured threshold values.
Recommended Actions

Cause: The SSL certificate expiration is approaching the configured critical expiry time.

Diagnostic Information:

  • Whenever this alert is raised, it indicates that the SSL certificates configured for SCP are about to expire.
  • Verify the certificate expiry date.

Recovery: The SCP SSL secret needs to be updated with renewed SSL certificates.

For any assistance, contact My Oracle Support.

6.2.86 SCPWorkerSSLCertificateOnMajorExpiry

Table 6-90 SCPWorkerSSLCertificateOnMajorExpiry

Field Description
Severity major
Condition ocscp_metric_ssl_certificate_expire_total == 2
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15011
Description An alert is raised whenever the SCP SSL certificate is about to expire, based on the configured threshold values.
Recommended Actions

Cause: The SSL certificate expiration is approaching the configured major expiry time.

Diagnostic Information:

  • Whenever this alert is raised, it indicates that the SSL certificates configured for SCP are about to expire.
  • Verify the certificate expiry date.

Recovery: The SCP SSL secret needs to be updated with renewed SSL certificates.

For any assistance, contact My Oracle Support.

6.2.87 SCPWorkerSSLCertificateOnMinorExpiry

Table 6-91 SCPWorkerSSLCertificateOnMinorExpiry

Field Description
Severity minor
Condition ocscp_metric_ssl_certificate_expire_total == 3
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15012
Description An alert is raised whenever the SCP SSL certificate is about to expire, based on the configured threshold values.
Recommended Actions

Cause: The SSL certificate expiration is approaching the configured minor expiry time.

Diagnostic Information:

  • Whenever this alert is raised, it indicates that the SSL certificates configured for SCP are about to expire.
  • Verify the certificate expiry date.

Recovery: The SCP SSL secret needs to be updated with renewed SSL certificates.

For any assistance, contact My Oracle Support.

6.2.88 SCPIngressConnectionEstablishmentFailure

Table 6-92 SCPIngressConnectionEstablishmentFailure

Field Description
Severity info
Condition increase(ocscp_worker_https_ingress_connection_failure_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15007
Description An alert is raised whenever any ingress HTTPS connection is failed.
Recommended Actions

Cause: Whenever an Ingress HTTPS connection establishment fails.

Diagnostic Information:

This alert may be raised if any connection establishment fails or if a handshake fails.

Recovery: The alert will be cleared if there are no ingress HTTPS connection failures in the next scrape interval (2 minutes).

For any assistance, contact My Oracle Support.

6.2.89 SCPEgressConnectionEstablishmentFailure

Table 6-93 SCPEgressConnectionEstablishmentFailure

Field Description
Severity info
Condition increase(ocscp_worker_https_egress_connection_failure_total[2m]) > 0
OID used for SNMP Traps 1.3.6.1.4.1.323.5.3.35.1.2.15008
Description An alert is raised whenever any egress HTTPS connection is failed.
Recommended Actions

Cause: Whenever an egress HTTPS connection establishment fails to send the request to producer NFs.

Diagnostic Information:

This alert may be raised if any connection establishment fails or if a handshake fails.

Recovery: The alert will be cleared if there are no engress HTTPS connection failures in the next scrape interval (2 minutes).

For any assistance, contact My Oracle Support.

6.2.90 SCPNrfProxyOauthQueuesUtilizationAboveCriticalThreshold

Table 6-94 SCPNrfProxyOauthQueuesUtilizationAboveCriticalThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}: SCP NrfProxy Oauth Queues Utilization Above Critical Threshold'
Summary 'SCP NrfProxy Oauth Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Critical
Condition This alert is raised when SCP NrfProxy Oauth Queues Utilization is above the critical threshold.

ocscp_nrfproxy_oauth_queue_alert{severity="CRITICAL"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.14000
Metric Used ocscp_nrfproxy_oauth_queue_alert
Recommended Action

Cause: When NrfProxy Task queues are filled, the traffic exceeds the limit.

Diagnostic Information:
  • Monitor the traffic toward nrfProxy to pod using the Grafana dashboard.
  • Refer to the rate of ocscp_nrfproxy_oauth_queue_alert metric on the Grafana dashboard.

Recovery: This alert clears automatically when the traffic decreases below the critical threshold.

For any assistance, contact My Oracle Support.

6.2.91 SCPNrfProxyOauthQueuesUtilizationAboveMajorThreshold

Table 6-95 SCPNrfProxyOauthQueuesUtilizationAboveMajorThreshold

Field Details
Description 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}: SCP NrfProxyOauth Queues Utilization Above Major Threshold'
Summary 'SCP NrfProxy Oauth Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Major
Condition This alert is raised when SCP NrfProxy Oauth Queues Utilization is above the major threshold.

ocscp_nrfproxy_oauth_queue_alert{severity="MAJOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.14001
Metric Used ocscp_nrfproxy_oauth_queue_alert
Recommended Action

Cause: When NrfProxy Task queues are filled, the traffic exceeds the limit.

Diagnostic Information:
  • Monitor the traffic toward nrfProxy to pod using the Grafana dashboard.
  • Refer to the rate of ocscp_nrfproxy_oauth_queue_alert metric on the Grafana dashboard.

Recovery: This alert clears automatically when the traffic decreases below the Major threshold.

For any assistance, contact My Oracle Support.

6.2.92 SCPNrfProxyOauthQueuesUtilizationAboveMinorThreshold

Table 6-96 SCPNrfProxyOauthQueuesUtilizationAboveMinorThreshold

Field Details
Description description: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}: SCP NrfProxyOauth Queues Utilization Above Minor Threshold'
Summary 'SCP NrfProxy Oauth Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
Severity Minor
Condition This alert is raised when SCP NrfProxy Oauth Queues Utilization is above the minor threshold.

ocscp_nrfproxy_oauth_queue_alert{severity="MINOR"} == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.14002
Metric Used ocscp_nrfproxy_oauth_queue_alert
Recommended Action

Cause: When NrfProxy Task queues are filled, the traffic exceeds the limit.

Diagnostic Information:
  • Monitor the traffic toward nrfProxy to pod using the Grafana dashboard.
  • Refer to the rate of ocscp_nrfproxy_oauth_queue_alert metric on the Grafana dashboard.

Recovery: This alert clears automatically when the traffic decreases below the Major threshold.

For any assistance, contact My Oracle Support.

6.2.93 ScpSelfOCIThresholdAboveWarn

Table 6-97 ScpSelfOCIThresholdAboveWarn

Field Details
Description An alert is raised whenever the configured self-OCI for SCP reaches a warn level.
Summary SCP load level lead to Warn level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity warning
Condition

increase(ocscp_worker_self_oci_above_oci_conveyance_threshold_total{ocscp_oci_threshold_level="WARN"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.15013
Metric Used ocscp_worker_self_oci_above_oci_conveyance_threshold_total
Recommended Action

Cause: Whenever the configured self-OCI for SCP reaches a warn level.

Diagnostic Information:

SCP may be experiencing high load.

Recovery: The alert will clear automatically when the load on SCP decreases and falls below the configured warn level.

For any assistance, contact My Oracle Support.

6.2.94 ScpSelfOCIThresholdAboveMinor

Table 6-98 ScpSelfOCIThresholdAboveMinor

Field Details
Description An alert is raised whenever the configured self-OCI for SCP reaches a minor level.
Summary SCP load level lead to Minor level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity minor
Condition

increase(ocscp_worker_self_oci_above_oci_conveyance_threshold_total{ocscp_oci_threshold_level="MINOR"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.15014
Metric Used ocscp_worker_self_oci_above_oci_conveyance_threshold_total
Recommended Action

Cause: Whenever the configured self-OCI for SCP reaches a minor level.

Diagnostic Information:

SCP may be experiencing high load.

Recovery: The alert will clear automatically when the load on SCP decreases and falls below the configured minor level.

For any assistance, contact My Oracle Support.

6.2.95 ScpSelfOCIThresholdAboveMajor

Table 6-99 ScpSelfOCIThresholdAboveMajor

Field Details
Description An alert is raised whenever the configured self-OCI for SCP reaches a major level.
Summary SCP load level lead to Major level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity major
Condition

increase(ocscp_worker_self_oci_above_oci_conveyance_threshold_total{ocscp_oci_threshold_level="MAJOR"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.15015
Metric Used ocscp_worker_self_oci_above_oci_conveyance_threshold_total
Recommended Action

Cause: Whenever the configured self-OCI for SCP reaches a major level.

Diagnostic Information:

SCP may be experiencing high load.

Recovery: The alert will clear automatically when the load on SCP decreases and falls below the configured major level.

For any assistance, contact My Oracle Support.

6.2.96 ScpSelfOCIThresholdAboveCritical

Table 6-100 ScpSelfOCIThresholdAboveCritical

Field Details
Description This alert will be activated whenever the configured self-OCI for SCP reaches a critical level.
Summary SCP load level lead to Critical level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity critical
Condition

increase(ocscp_worker_self_oci_above_oci_conveyance_threshold_total{ocscp_oci_threshold_level="CRITICAL"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.15016
Metric Used ocscp_worker_self_oci_above_oci_conveyance_threshold_total
Recommended Action

Cause: Whenever the configured self-OCI for SCP reaches a critical level.

Diagnostic Information:

SCP may be experiencing high load.

Recovery: The alert will clear automatically when the load on SCP decreases and falls below the configured critical level.

For any assistance, contact My Oracle Support.

6.2.97 SCPWorkerSSLPartialSecret

Table 6-101 SCPWorkerSSLPartialSecret

Field Details
Description SCP Worker SSL Secret is Invalid or Partial'
Summary SCP Worker SSL Secret is Invalid or Partial, ocscp_worker_fqdn: {{$labels.ocscp_producer_host}}, scpfqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}, secretName: {{$labels.secretName}} {{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Condition This alert is raised when any SCP-Worker secret is patched with partial data.

ocscp_worker_ssl_partial_secret == 1

OID 1.3.6.1.4.1.323.5.3.35.1.2.15017
Metric Used ocscp_worker_ssl_partial_secret
Recommended Action

Cause: Whenever any SCP-Worker secret is patched with the partial data.

Diagnostic Information: Verify the patched secret whether it contains all the required data.

Recovery: Alert is cleared automatically when valid secret is patched which contains all the data.

For any assistance, contact My Oracle Support.

6.2.98 SCPWorkerAndNFTimeSyncFailure

Table 6-102 SCPWorkerAndNFTimeSyncFailure

Field Details
Description 'Consumer NF and SCP are not time synchronized'
Summary Consumer NF and SCP are not time synchronized: consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, scpfqdn: {{$labels.ocscp_fqdn}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}
Severity Major
Condition

This alert is raised when there is a time synchronization difference between SCP-Worker and the consumer NF received request with the timestamp_headers_support feature is enabled.

increase(ocscp_worker_timestamp_headers_validation_fail_total{ocscp_validation_failure="TIME_SYNC_FAILURE"}[2m]) > 0

OID 1.3.6.1.4.1.323.5.3.35.1.2.15019
Metric Used ocscp_worker_timestamp_headers_validation_fail_total
Recommended Action

Cause: When there is a time synchronization difference between SCP-Worker and the consumer NF received request with the timestamp_headers_support feature is enabled.

Diagnostic Information: SCP may have received a request from the consumer NF which is not in time sync when the timestamp_headers_support is enabled.

Recovery: Alert is cleared automatically when no such read failure occurs over the next scrape interval.

For any assistance, contact My Oracle Support.

6.3 Configuring Alerts

This section lists the configuring alerts.

6.3.1 Applying Alerts Rule to CNE without Prometheus Operator

SCP Helm Chart Release Name: _NAME_

Prometheus NameSpace: _Namespace _

Perform the following procedure to configure Service Communication Proxy alerts in Prometheus.
  1. Run the following command to check the name of the config map used by Prometheus:
    $kubectl get configmap -n <_Namespace_>
    Example:
    $kubectl get configmap -n prometheus-alert2
    NAME                                  DATA   AGE
    lisa-prometheus-alert2-alertmanager   1      146d
    lisa-prometheus-alert2-server         4      146d
  2. Take a backup of the current config map of Prometheus. This command saves the configmap in the provided file. In the following command, the configmap is stored in the /tmp/tempConfig.yaml file:
    $ kubectl get configmaps <_NAME_>-server -o yaml -n <_Namespace_> /tmp/tempConfig.yaml
    
    Example:
    $ kubectl get configmaps lisa-prometheus-alert2-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
  3. Check and delete the "alertsscp" rule if it has already configured in the prometheus config map. If configured, this step removes the " alertsscp " rule. This is an optional step if configuring the alerts for the first time.
    $ sed -i '/etc\/config\/alertsscp/d' /tmp/tempConfig.yaml
  4. Add the "alertsscp" rule in the configmap dump file under the ' rule_files ' tag.
    $ sed -i '/rule_files:/a\    \- /etc/config/alertsscp'  /tmp/tempConfig.yaml
  5. Update the configmap using below command. Ensure to use the same configmap name that was used to take a backup of the prometheus configmap.
    $ kubectl replace configmap <_NAME_>-server -f /tmp/tempConfig.yaml
    Example:
    $ kubectl replace configmap lisa-prometheus-alert2-server -f /tmp/tempConfig.yaml
  6. Run the following command to patch the configmap with a new "alertsscp" rule:

    Note:

    The patch file provided is the ocscp_csar_23_2_0_0_0.zip folder provided with SCP, that is, SCPAlertrules.yaml.
    $ kubectl patch  configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/SCPAlertrules.yaml)"
    
    Example:
    $ kubectl replace configmap lisa-prometheus-alert2-server -f /tmp/tempConfig.yaml

Note:

Prometheus takes about 20 seconds to apply the updated Config map.

6.3.2 Applying Alerts Rule to CNE with Prometheus Operator

Perform the following procedure to apply alerts rule to Cloud Native Environment (CNE) with Prometheus Operator (CNE 1.9.0 and later).
  1. Run the following command to apply SCP alerts file to create Prometheus rules Custom Resource Definition (CRD):
    kubectl apply -f <file_name> -n <scp namespace>
    Where,
    • <file_name> is the SCP alerts file.
    • <scp namespace> is the SCP namespace.

    Example:

    kubectl apply -f ocscp_alerting_rules_promha_25.1.100.yaml -n scpsvc

    Sample file delivered with SCP package:

    ocscp_alerting_rules_promha_25.1.100.yaml

6.3.3 Configuring Service Communication Proxy Alert using the SCPAlertrules.yaml file

Note:

Default NameSpace is scpsvc for Service Communication Proxy. You can update the NameSpace as per the deployment.

To access the scpAlertsrules_<scp release number>.yaml file from the Scripts folder of ocscp_csar_25_1_1_0_0_0.zip, download the SCP package from My Oracle Support as described in "Downloading the SCP Package " in Oracle Communications Cloud Native Core, Service Communication Proxy Installation, Upgrade, and Fault Recovery Guide.

Alerts Details

Description and summary for alerts are added by the Prometheus alert manager.

Alerts are supported for three different resources/routing crosses threshold.
  • SCPIngress Traffic Rate Above Threshold
    • Has three threshold level Minor (above 9800 mps to 11200 mps), Major (11200 to 13300 mps), Critical (above 13300 mps). These values are configurable.
    • In the description, information is presented similar to: "Ingress Traffic Rate at Locality: <Locality of scp> is above <threshold level (minor/major/critical> threshold (i.e. <value of threshold>)"
    • In Summary: "Namespace: <Namespace of scp deployment that Locality>, Pod: <SCP-worker Pod name>: Current Ingress Traffic Rate is <Current rate of Ingress traffic > mps which is above 70 Percent of Max MPS(<upper limit of ingress traffic rate per pod>)"

      Note:

      Ingress traffic rate is per scp-worker pod in a namespace at particular SCP-Locality. Currently, 14000mps is the upper limit for per scp-worker pod.
  • SCP Routing Failed For Service
    • It alerts for which NF Service Type and NF Type at particular locality, Routing failed
    • Description: "Routing failed for service"
    • Summary: "Routing failed for service: NFService Type = <Message NF Service Type>, NFType = <Message NF Type>, Locality = <SCP Locality where Routing Failed> and value = <Accumulated failure till now, of such message for NFType and NFService Type>"

      Note:

      The value field currently does not provide the number of failures in particular time interval, instead it provides the total number of Routing failures.
  • SCP Pod Memory Usage: Type of alert is SCPWorkerPodMemoryUsage.
    • Pod memory usage for SCP Pods (Soothsayer and Worker) deployed at a particular node instance is provided.
    • The Soothsayer pod threshold is 8 GB
    • The Worker pod threshold is 16 Gi
    • Summary: Instance: "<Node Instance name>, NameSpace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker) Pod name>: <Soothsayer/Worker> Pod High Memory usage detected"
    • Summary: "Instance: "<Node Instance name>, Namespace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker) Pod name>: Memory usage is above <threshold value>G (current value is: <current value of memory usage>)"

6.3.4 Configuring Alert Manager for SNMP Notifier

Grouping of alerts is based on:

  • podname
  • alertname
  • severity
  • namespace
  • nfServiceType
  • nfServiceInstanceId
User needs to add subroutes for SCP alerts in AlertManager config map as below:
  1. Take a backup of the current config map of Alertmanager by running the following command:
    kubectl get configmaps <NAME-alertmanager> -oyaml -n <Namespace> > /tmp/bkupAlertManagerConfig.yaml
    

    Example:

    kubectl get configmaps occne-prometheus-alertmanager -oyaml -n occne-infra > /tmp/bkupAlertManagerConfig.yaml
  2. Edit Configmap to add subroute for SCP Trap OID:
    kubectl edit configmaps <NAME-alertmanager> -n <Namespace>
    Example:
    kubectl edit configmaps occne-prometheus-alertmanager -n occne-infra
  3. Add the subroute under 'route' in configmap:
    routes:
          - receiver: default-receiver
            group_interval: 1m
            group_wait: 10s
            repeat_interval: 9y
            group_by: [podname, alertname, severity, namespace, nfservicetype, nfserviceinstanceid, servingscope, nftype]
            match_re:
              oid: ^1.3.6.1.4.1.323.5.3.35.(.*)

MIB Files for SCP

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.
  • ocscp_mib_tc_25.1.100.mib: This is considered as SCP top level mib file, where the Objects and their data types are defined.
  • ocscp_mib_25.1.100.mib: This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.

Note:

MIB files are packaged with ocscp_csar_23_2_0_0_0.zip. You can download the file from MOS as described in Oracle Communications Cloud Native Core, Service Communication Proxy Installation, Upgrade, and Fault Recovery Guide.

6.4 Configuring SCP Alerts for OCI

To configure SCP alerts for OCI, OCI supports metric expressions written in MQL (Metric Query Language) and therefore requires ocscp_oci_alertrules_25.1.100.zip file for configuring alerts in OCI observability platform. For more information, see Oracle Communications Cloud Native Core, OCI Deployment Guide.

6.5 Mediation Alerts

This section provides detailed information on all mediation alerts, including their descriptions, severity, and recommended actions.

6.5.1 NFMediationUserDefinedVariablesMaxSizeLimitExceeded

Table 6-103 NFMediationUserDefinedVariablesMaxSizeLimitExceeded

Field Details
Description

This alert is raised when the total size of all user-defined variables and their values exceeds the configured size.

Summary namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Currently user defined variables maximum size limit exceeded the configured limit {{ $value | printf "%.2f" }} times
Severity minor
Condition

increase(ocscp_med_udvs_max_size_limit_exceeded_total[5m]) > 0

OID 1.3.6.1.4.1.323.5.3.47.1.2.1005
Metric Used ocscp_med_udvs_max_size_limit_exceeded_total
Recommended Action

Cause: This alert is raised when the total size of all user-defined variables and their values exceeds the configured limit.

Diagnostic Information: The size of the user-defined variables list (med-user-defined-var-list) during mediation may have exceeded the configured limit.

Recovery: Reduce the size or number of user-defined variables to bring the size within the limit.

For any assistance, contact My Oracle Support.