6 Alerts
This section provides information about the supported alerts and how to configure the alerts.
Note:
The performance and capacity of the SCP system may vary based on the call model, feature or interface configuration, network conditions, and underlying CNE and hardware environment.
You can configure alerts in Prometheus and ScpAlertrules.yaml file.
Caution:
User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. The PDF reader version also affects the copy-pasting functionality. It is recommended to verify the pasted content when the hyphens or any special characters are part of the copied content.Note:
kubectl
commands might vary based on the platform deployment. Replacekubectl
with Kubernetes environment-specific command line tool to configure Kubernetes resources through kube-api server. The instructions provided in this document are as per the Oracle Communications Cloud Native Environment (OCCNE) version of kube-api server.- The alert file can be customized as required by the deployment environment. For example, namespace can be added as a filtered criteria to the alert expression to filter alerts only for a specific namespace.
6.1 System level alerts
This section lists the system level alerts.
6.1.1 SCPNotificationPodMemoryUsage
Table 6-1 SCPNotificationPodMemoryUsage
Field | Description |
---|---|
Severity | Major |
Conditions | sum(container_memory_usage_bytes{image!="",pod=~".*scpc-notification.+"}) by (pod,namespace, instance) > 3006477107 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3001 |
Description |
Notify Notification service Pod memory usage if it is above threshold Threshold value is 85% of allocated (4GB) memory: 3.4 GB |
Recommended Actions |
Cause: When high notification rate or very large NF profile size is present in notifications. Diagnostic Information: Monitor the notification metric: ocscp_nrf_notifications_requests_nf_total. Notification usage reduces after some time when it crosses 2.5 GB or 3 GB. Recovery: This alert is cleared automatically when the scpc-notification pod memory usage reduces below the defined threshold. Reduce the notification rate. These notifications are generated by NRF and can be controlled through NRF. For any assistance, contact My Oracle Support. |
6.1.2 SCPWorkerPodMemoryUsage
Table 6-2 SCPWorkerPodMemoryUsage
Field | Description |
---|---|
Severity | major |
Conditions | sum(container_memory_usage_bytes{image!="",pod=~".*scp-worker.+"}) by (pod,namespace, instance) > 6012954214 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7004 |
Description |
Notify Worker per Pod memory usage is above threshold Threshold value is 85% of allocated (8GB) memory: 7.3 GB |
Recommended Actions |
Cause: When there is high traffic rate, alternate routing, more number of routing rules and rules size, and due to network or producer NF latency. Diagnostic Information: Monitor traffic rate, alerts, and latency on the KPI Dashboard. Check the
traffic rates of the following metrics if they are too high:
Check the upstream response time by using the following command and ensure whether upstream is taking too long to respond: ocscp_metric_upstream_service_time_total. Check the following platform metric for current memory usage by the scp-worker pod: container_memory_usage_bytes. Recovery: This alert is cleared automatically when the scp-worker pod memory usage reduces below the defined threshold. Reduce the traffic rate and improve the latency. For any assistance, contact My Oracle Support. |
6.1.3 SCPInstanceDown
Table 6-3 SCPInstanceDown
Field | Description |
---|---|
Severity | Critical |
Conditions | kube_pod_status_ready{pod =~ '.*scp-worker.*|.*scpc-notification.*|.*scpc-subscription.*|.*scpc-configuration.*|.*scpc-audit.*|,|.*scpc-alternate-resolution.*|,condition =~ 'true'} !=1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7006 |
Description | Notify that if any pod in ocscp release is down. Provides information like pod name, instance id and app name. |
Recommended Actions |
Cause: When the following issues occur:
Diagnostic Information:
Recovery: This alert is cleared automatically when the inactive pod is active. Recover DB services if down. Collect the application logs and contact My Oracle Support for any assistance. |
6.2 Application level alerts
This section lists the application level alerts.
6.2.1 SCPCcaFeatureEnabledWithoutHttps
Table 6-4 SCPCcaFeatureEnabledWithoutHttps
Field | Description |
---|---|
Severity | Info |
Condition | ocscp_worker_cca_validation_feature_enabled_without_https > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.9022 |
Description | An alert is raised when the CCA validation feature is enabled without enabling HTTPS. |
Recommended Actions |
Cause: CCA validation feature is enabled without enabling HTTPS. Diagnostic Information: Deploy HTTPS SCP deployment. Recovery: The alert is cleared automatically if either the CCA feature is disabled or deployment is changed to HTTPS. For any assistance, contact My Oracle Support. |
6.2.2 SCPIngressTrafficRateAboveMinorThreshold
Table 6-5 SCPIngressTrafficRateAboveMinorThreshold
Field | Description |
---|---|
Severity | minor |
Condition | sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m]))by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name)>= 1200 to 1400 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7001 |
Description | Notify that Traffic rate is from 1200 to 1400 mps (user configure minor threshold value) with Locality and current value of traffic rate. |
Recommended Actions |
Cause: When the Consumer NF sends more traffic than expected. Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard. Refer to the rate of ocscp_metric_http_rx_req_total metric on the Grafana GUI. Recovery: This alert is cleared automatically when the ingress traffic reduces below the minor threshold or exceeds the major threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue. For any assistance, contact My Oracle Support. |
6.2.3 SCPIngressTrafficRateAboveMajorThreshold
Table 6-6 SCPIngressTrafficRateAboveMajorThreshold
Field | Description |
---|---|
Severity | major |
Conditions | sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m]))by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1400 to 1600 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7001 |
Description | Notify that Traffic rate is from 1400 to 1600 mps (user configure major threshold value) with Locality and current value of traffic rate. |
Recommended Actions |
Cause: When the Consumer NF sends more traffic than expected. Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard. Refer to the rate of ocscp_metric_http_rx_req_total metric on the Grafana GUI. Recovery: This alert is cleared automatically when the ingress traffic reduces below the major threshold or exceeds the critical threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue. If this alert continues for a long duration, then reduce the ingress traffic from consumer to pod. For any assistance, contact My Oracle Support. |
6.2.4 SCPIngressTrafficRateAboveCriticalThreshold
Table 6-7 SCPIngressTrafficRateAboveCriticalThreshold
Field | Description |
---|---|
Severity | critical |
Conditions | sum(rate(ocscp_metric_http_rx_req_total{app_kubernetes_io_name="scp-worker"}[2m]))by(kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1600 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7001 |
Description | Notify that Traffic rate is above 1600mps (user configure critical threshold value) with Locality and current value of traffic rate. |
Recommended Actions |
Cause: When the Consumer NF sends more traffic than expected. Diagnostic Information: Monitor the ingress traffic to pod using the KPI Dashboard. Refer to the rate of the ocscp_metric_http_rx_req_total metric on the Grafana GUI. Recovery: This alert is cleared automatically when the ingress traffic reduces below the critical threshold. If this alert is not cleared, then check the Consumer NF for an uneven distribution of traffic per connection or for any other issue. If this alert continues for a long duration, then reduce the ingress traffic from consumer to pod. For any assistance, contact My Oracle Support. |
6.2.5 SCPRoutingFailedForProducer
Table 6-8 SCPRoutingFailedForProducer
Field | Description |
---|---|
Severity | Info |
Conditions | increase(ocscp_metric_routing_attempt_fail_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7005 |
Description | Notify that Routing failed for producer. Provides detail such as NFService Type, NFType, Locality, producer FQDN and value. |
Recommended Actions |
Cause: When routing fails to select a producer NF due to unavailability of routing rules for an NF service or producer. Diagnostic Information:
Recovery: This alert is cleared automatically when the routing is complete for a producer NF or no more traffic is received in the next Promethues scrape interval. Check if the NF is deregistered. Register the NF to create routing rules if rules do not exist. For any assistance, contact My Oracle Support. |
6.2.6 SCPAuditErrorResponse
Table 6-9 SCPAuditErrorResponse
Field | Description |
---|---|
Severity | Info |
Conditions | ocscp_audit_error_response > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.4001 |
Description |
Alert is raised when Audit module receives a 3xx, 4xx, or 5xx error from NRF. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. Note: Alert is cleared on the next audit cycle. |
Recommended Actions |
Cause: When the configured NRF sends error responses, down, or not reachable. Diagnostic Information:
Recovery: The alert is cleared automatically during the next audit cycle and when no more errors are received. Collect audit and worker service logs and contact My Oracle Support for any assistance. |
6.2.7 SCPAuditEmptyNFArrayResponse
Table 6-10 SCPAuditEmptyNFArrayResponse
Field | Description |
---|---|
Severity | Critical |
Conditions | ocscp_audit_2xx_empty_nf_array_rx_total > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.4002 |
Description |
Alert is generated when Audit module receives a 2xx response with empty NFInstance array from NRF. Alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. Alert is cleared if Audit receives a success response with non-empty NFInstance array or on next audit cycle if topology source is changed to LOCAL. |
Recommended Actions |
Cause: When NRF does not have any NF registered or due to any error condition on NRF. Diagnostic Information: Check if NRF contains any registered NF and validate as required. For more information, refer to NRF documents. Recovery: This alert is cleared automatically if Audit receives a success response with non-empty NFInstance array or during the next audit cycle when the topology source is changed to LOCAL. Register a NF with NRF or change the topology source to LOCAL. For any assistance, contact My Oracle Support. |
6.2.8 DuplicateLocalityFoundInForeignNF
Table 6-11 DuplicateLocalityFoundInForeignNF
Field | Description |
---|---|
Severity | Major |
Conditions | ocscp_notification_duplicate_foreign_location > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3002 |
Description | Alert is raised when an unknown NF or SCP is registered with duplicate locality from the present region. |
Recommended Actions |
Cause: When SCP discovers a duplicate locality of an NF from an unknown region. Diagnostic Information: Check logs for NF
notification received by running the following command:
Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_duplicate_foreign_location (nfInstanceId). From the metric, get the NF Instance ID, Locality, and serving_scope. Check the NF Profile of the corresponding NF in the unknown region as identified by the serving_scope. Check and correct the locality in the NF Profile to ensure it aligns with localities of that unknown region that should be different from locality of SCP which reported this alert. Recovery: This alert is cleared automatically if an unknown NF or SCP is deregistered or registers update with the correct locality. Re-register NF with correct locality information. Collect logs for notification and audit service. For any assistance, contact My Oracle Support. |
6.2.9 ForeignNFLocalityNotServed
Table 6-12 ForeignNFLocalityNotServed
Field | Description |
---|---|
Severity | Critical |
Conditions | ocscp_notification_foreign_nf_locality_unserved > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3003 |
Description | Alert is raised when a Foreign Producer NF's locality is not served by any SCP. |
Recommended Actions |
Cause: When SCP discovers an unknown Producer NF's without any locality served by an SCP. Diagnostic Information: Check logs for received NF
notification by running the following command: Note: Use the complete name of notification pod in the
following command: Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_foreign_nf_locality_unserved (nfInstanceId). Recovery: This alert is cleared automatically if the unknown NF is deregistered or registers update received with locality served by SCP. Re-register NF with correct locality information. For any assistance, contact My Oracle Support. |
6.2.10 UnknownLocalityFoundInForeignNF
Table 6-13 UnknownLocalityFoundInForeignNF
Field | Description |
---|---|
Severity | critical |
Conditions | ocscp_notification_foreign_nf_locality_absent > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3004 |
Description | Alert will be raised when a Foreign Producer NF's locality is unknown. |
Recommended Actions |
Cause: When SCP discovers an unknown Producer NF's without locality information. Diagnostic Information: Check logs for the
received NF notification by running the following command:
Use the complete name of notification pod in the following
command: Check the following metric to get the NFInstanceId information for which this alert is raised: ocscp_notification_foreign_nf_locality_absent(nfInstanceId). Recovery: This alert is cleared automatically if unknown NF is deregistered or registers update received with locality known to SCP. Re-register NF with correct locality information. For any assistance, contact My Oracle Support. |
6.2.11 SCPUpstreamResponseTimeout
Table 6-14 SCPUpstreamResponseTimeout
Field | Description |
---|---|
Severity | info |
Conditions | idelta(ocscp_metric_upstream_response_timeout_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7011 |
Description | Alert is raised when Upstream connection to a producer NF fails |
Recommended Actions |
Cause: When a producer NF is down, not reachable, or latency is high. Diagnostic Information: Check whether the producer
NF is up and network connectivity to the producer NF is established
by using one of the following steps:
Check the upstream response time by using the following metric and determine if upstream is taking too long to respond: ocscp_metric_upstream_service_time (producer FQDN) Recovery: This alert is cleared automatically in the next scrape interval if the system does not observe any error. For any assistance, contact My Oracle Support. |
6.2.12 SCPSingleNfInstanceAvailableForNFType
Table 6-15 SCPSingleNfInstanceAvailableForNFType
Field | Description |
---|---|
Severity | Major |
Conditions | ocscp_no_nf_instance == 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3005 |
Description | Alert is raised when there is a single NFInstance available with SCP for an NFType. |
Recommended Actions |
Cause: When the
Diagnostic Information: Check all SCP NRFs for specific NFType in the alert if only one NFInstance is available. For information about registered NFs, see Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check the number of NFs of a particular type by using API or CNC console of SCP. For information about procedures to check the NFs available with SCP, see " Configuring Service Communication Proxy using the CNC Console" in Oracle Communications Cloud Native Core, Service Communication Proxy User Guide. Recovery: This alert is cleared automatically in the next scrape interval if more than one NFInstance is available for a specified NFType in the alert. For any assistance, contact My Oracle Support. |
6.2.13 SCPNoNfInstanceForNFType
Table 6-16 SCPNoNfInstanceForNFType
Field | Description |
---|---|
Severity | Critical |
Conditions | ocscp_no_nf_instance == 1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3006 |
Description | Alert is raised when there is a no NFInstance available with SCP for a NFType |
Recommended Actions |
Cause: When the
Diagnostic Information: Check all SCP NRFs for specific NFType in the alert if no NFInstance is available. For information about registered NFs, see Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check the number of NFs of a particular type by using API or CNC console of SCP. For information about procedures to check the NFs available with SCP, see " Configuring Service Communication Proxy using the CNC Console" in Oracle Communications Cloud Native Core, Service Communication Proxy User Guide. Recovery: This alert is cleared automatically in the next scrape interval if at least one NFInstance is available for a specified NFType in the alert. For any assistance, contact My Oracle Support. |
6.2.14 SCPIngressTrafficRateExceededConfiguredLimit
Table 6-17 SCPIngressTrafficRateExceededConfiguredLimit
Alert Parameters | Value |
---|---|
Description | Ingress traffic rate exceeds configured rate limit for consumer fqdn: {{$labels.ocscp_consumer_fqdn}} |
Summary | 'Ingress traffic rate exceeds configured rate limit for consumer fqdn: ocscpconsumerfqdn = {{$labels.ocscp_consumer_host}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumernftype = {{$labels.ocscp_consumer_nf_type}}, configuredingressrate = {{$labels.ocscp_configured_ingress_rate}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ',timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} ' |
Severity | Critical |
Condition | This alert is raised when the ingress traffic rate
exceeds the configured rate for consumer FQDN.
increase(ocscp_metric_ingress_rate_limiting_throttle_req_total[2m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7012 |
Metric Used | ocscp_metric_ingress_rate_limiting_throttle_req_total |
Recommended Actions | Cause: When the ingress traffic rate exceeds the configured rate limit for the consumer FQDN. Diagnostic Information:
This alert is cleared when no more requests get suppressed due to ingress rate limiting in the next scrape interval. For any assistance, contact My Oracle Support. |
6.2.15 SCPIngressTrafficRoutedWithoutRateLimitTreatment
Table 6-18 SCPIngressTrafficRoutedWithoutRateLimitTreatment
Alert Parameters | Value |
---|---|
Description | Ingress traffic routed without rate limit treatment |
Summary | 'Ingress traffic routed without rate limit treatment: consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumerfqdn = {{$labels.ocscp_consumer_host}}, cause = {{$labels.ocscp_cause}} ,namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ', timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} ' |
Severity | Major |
Condition | This alert is raised when the ingress traffic routes
without rate limiting treatment.
increase(ocscp_metric_ingress_rate_limiting_not_applied_req_total[2m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7013 |
Metric Used | ocscp_metric_ingress_rate_limiting_not_applied_req_total |
Recommended Actions | Cause: When the ingress traffic routes without rate limiting treatment. Diagnostic Information:
This alert is cleared when no more requests get routed without ingress rate limiting treatment in the next scrape interval. For any assistance, contact My Oracle Support. |
6.2.16 SCPEgressTrafficRateExceededConfiguredLimit
Table 6-19 SCPEgressTrafficRateExceededConfiguredLimit
Field | Description |
---|---|
Summary | 'Egress traffic rate exceeds configured rate limit: producernftype = {{$labels.ocscp_nf_type}}, producernfservicetype = {{$labels.ocscp_nf_service_type}}, producernfinstanceid = {{$labels.ocscp_nf_instance_id}}, producerfqdn = {{$labels.ocscp_producer_fqdn}}, consumernftype = {{$labels.ocscp_consumer_nf_type}},consumernfinstanceid = {{$labels.ocscp_consumer_nf_instance_id}}, consumerfqdn = {{$labels.ocscp_consumer_host}}, configuredegressrate = {{$labels.ocscp_configured_egress_rate}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}},scp_fqdn: ' {{$labels.scp_fqdn}} ', timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} ' |
Severity | Critical |
Conditions | idelta(ocscp_metric_egress_rate_limiting_throttle_req_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7014 |
Description | Alert is raised when the egress traffic rate exceed the configured rate. |
Recommended Actions |
Cause: When the egress traffic rate exceeds the configured rate. Diagnostic Information: Check the egress traffic rate by using the following metric: ocscp_metric_http_tx_req_total. Check the egress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared when no more requests get suppressed due to egress rate limiting in the next scrape interval. For any assistance, contact My Oracle Support. |
6.2.17 SCPEgressTrafficRoutedWithoutRateLimitTreatment
Table 6-20 SCPEgressTrafficRoutedWithoutRateLimitTreatment
Field | Description |
---|---|
Severity | Major |
Conditions | idelta(ocscp_metric_egress_rate_limiting_not_applied_req_total{app_kubernetes_io_name="scp-worker"}[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7015 |
Description | Alert is raised when egress traffic routes without rate limiting. |
Recommended Actions |
Cause: When the egress traffic routes without rate limiting treatment. Diagnostic Information: Check the egress rate limiting configurations for the untreated producer FQDN. Obtain the producer FQDN by using the following metric: ocscp_metric_egress_rate_limiting_not_applied_req_total(ocscp_producer_fqdn) Check the egress rate limit configuration as described in Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared when no more requests get routed without egress rate limiting treatment in the next scrape interval. For any assistance, contact My Oracle Support. |
6.2.18 SCPNotificatoinRejectTopologySourceLocal
Table 6-21 SCPNotificatoinRejectTopologySourceLocal
Field | Description |
---|---|
Severity | Info |
Conditions | increase(ocscp_notifications_rejected_topologysource_local_total[15m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3007 |
Description | Alert is raised when SCP rejects a notification from NRF due to topology source set to LOCAL for NF Type. |
Recommended Actions |
Cause: When NF Topology Source Info is set to LOCAL. Diagnostic Information: Check the topology source information of an NF Type. For information about the topology source APIs, see Oracle Communications Cloud Native Core, Service Communication Proxy User Guide. Recovery: This alert is cleared automatically after 15 minutes when NF Topology Source Info is set to NRF from LOCAL. For any assistance, contact My Oracle Support. |
6.2.19 SCPNotificationProcessingFailureForNF
Table 6-22 SCPNotificationProcessingFailureForNF
Field | Description |
---|---|
Severity | Major |
Conditions | increase(ocscp_failure_processed_nf_notification_total[15m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.3008 |
Description | Alerts is raised when Notification processing has failed on SCP. |
Recommended Actions |
Cause: When Notification processing has failed on SCP. Diagnostic
Information: Check notification pod logs for any errors by
running the following command:
Sample
logs: .
To get the list of pods, run the following command:
For information about the topology source APIs, see Oracle Communications Cloud Native Core, Service Communication Proxy REST Specification Guide. Recovery: This alert is cleared automatically after 15 minutes. For any assistance, contact My Oracle Support. |
6.2.20 SCPSubscriptionFailureForNFType
Table 6-23 SCPSubscriptionFailureForNFType
Field | Description |
---|---|
Severity | Critical |
Conditions | increase(ocscp_subscription_nf_failure_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.2001 |
Description | Alerts is raised when SCP subscription to NRF has failed. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. |
Recommended Actions |
Cause: When the subscription fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable
or not by using one of the following steps:
If NRF is up, check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Sample logs:
Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support. |
6.2.21 SCPReSubscriptionFailureForNFType
Table 6-24 SCPReSubscriptionFailureForNFType
Field | Description |
---|---|
Severity | Critical |
Conditions | increase(ocscp_patch_subscription_nf_failure_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.2002 |
Description | Alerts is raised when SCP re-subscription to NRF has failed. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. |
Recommended Actions |
Cause: When the re-subscription fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of
the following steps:
Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support. |
6.2.22 SCPNrfRegistrationFailureForRegionOrSetId
Table 6-25 SCPNrfRegistrationFailureForRegionOrSetId
Field | Description |
---|---|
Severity | Major |
Conditions | increase(ocscp_nrf_registration_failure_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.2003 |
Description | Alerts is raised when SCP registration fails. This alert is labeled with specific nftype, nrfRegionOrSetId, and auditmethod. |
Recommended Actions |
Cause: When the registration fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of
the following steps:
Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Sample logs:
Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support. |
6.2.23 SCPNrfHeartbeatFailureForRegionOrSetId
Table 6-26 SCPNrfHeartbeatFailureForRegionOrSetId
Field | Description |
---|---|
Severity | Major |
Conditions | increase(ocscp_nrf_heartbeat_failures_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.2004 |
Description | Alerts is raised when SCP Heartbeat fails. |
Recommended Actions |
Cause: When the Heartbeat fails for an NF Type with NRF. Diagnostic Information: Check whether NRF is up. To check the NRF status, see the Oracle Communications Cloud Native Core, Network Repository Function User Guide. Check whether the NRF is reachable or not by using one of
the following steps:
Check scp-worker logs to find any error response from NRF. If there are error responses, monitor NRF logs: kubectl logs <pod name> -n <namespace>. Recovery: This alert is cleared automatically when NRF is up and running or errors are corrected for received error responses. For any assistance, contact My Oracle Support. |
6.2.24 SCPDBOperationFailure
Table 6-27 SCPDBOperationFailure
Field | Description |
---|---|
Severity | Warning |
Conditions | increase(ocscp_db_operation_failure_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.2005 |
Description | Alert is raised for any DB operation failures. |
Recommended Actions |
Cause: When the SCP DB operation fails. Diagnostic Information: Check whether the DB service is up. Check the status/age of the mysql pod by using the following command:
Recovery: This alert is cleared automatically when the DB service is up and running. For any assistance, contact My Oracle Support. |
6.2.25 SCPGeneratedErrorsResponseForNFService
Table 6-28 SCPGeneratedErrorsResponseForNFService
Field | Description |
---|---|
Severity | Info |
Conditions | increase(ocscp_metric_scp_generated_response_total[2m]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7016 |
Description | Alert is raised for NF type for which SCP generated response is triggered. |
Recommended Actions |
Cause: When the error response is generated for NF Service Type by SCP. Diagnostic Information: Monitor scp-worker logs to determine the reason for error responses generated by SCP. Check for error reason in the logs: kubectl logs <pod name> -n <namespace>. Recovery: This alert is cleared automatically when the cause for error response at SCP worker is corrected and configured. For any assistance, contact My Oracle Support. |
6.2.26 SCPCircuitBreakingAppliedForNF
Table 6-29 SCPCircuitBreakingAppliedForNF
Field | Description |
---|---|
Severity | Info |
Conditions | ocscp_circuit_breaking_applied > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.7017 |
Description | Alert is raised for NF when circuit breaking is applied. |
Recommended Actions |
Cause: When Circuit Breaking applies for producer NF FQDN based on the configured http2MaxRequests value. Diagnostic Information: Monitor scp-worker logs for number of error responses when outstanding requests exceed the configured http2MaxRequests value: kubectl logs <pod name> -n <namespace>. Check the latency to upstream producer from SCP. Use the following metric to check the same: ocscp_metric_upstream_service_time_total(ocscp_producer_fqdn or ocscp_nf_end_point). Recovery: This alert is cleared automatically when the configuration for http2MaxRequests for circuit breaking is configured beyond the traffic at worker or lower the traffic than the value configured for circuit breaking. For any assistance, contact My Oracle Support. |
6.2.27 SCPUpgradeStarted
Table 6-30 SCPUpgradeStarted
Field | Description |
---|---|
Severity | Info |
Conditions | When the SCP upgrade process for an SCP microservice starts. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.6001 |
Description | Alert is raised when the SCP upgrade process for an SCP microservice starts. |
Recommended Actions |
Cause: When SCP upgrade is performed for a particular microservice. Diagnostic Information: Not applicable. Recovery: This alert is cleared automatically in
5 minutes when the customAlertExpiryEnabled parameter is set to
false in the
For any assistance, contact My Oracle Support. |
6.2.28 SCPUpgradeFailed
Table 6-31 SCPUpgradeFailed
Field | Description |
---|---|
Severity | Critical |
Conditions | When any SCP microservice upgrade fails during the upgrade process. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.6002 |
Description | Alert is raised when any SCP microservice upgrade fails. |
Recommended Actions |
Cause: When any SCP microservice upgrade fails during the upgrade process. Diagnostic Information: Monitor new hook-jobs that might have failed after multiple attempts. Also, monitor any failed log. Run the following command to check the pod of hook-job:
Run the following command to check the logs:
Recovery: This alert is cleared automatically in
5 minutes when the customAlertExpiryEnabled parameter is set to
false in the
For any assistance, contact My Oracle Support. |
6.2.29 SCPUpgradeSuccessful
Table 6-32 SCPUpgradeSuccessful
Field | Description |
---|---|
Severity | Info |
Conditions | When any SCP microservice upgrade is completed. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.6003 |
Description | Alert is raised when any SCP microservice upgrade is completed. |
Recommended Actions |
Cause: When any SCP microservice upgrade is completed. Diagnostic Information: Not applicable. Run the following command to check the pod of hook-job:
Run the following command to check the logs:
Recovery: This alert is cleared automatically in
5 minutes when the customAlertExpiryEnabled parameter is set to
false in the
For any assistance, contact My Oracle Support. |
6.2.30 SCPRollbackStarted
Table 6-33 SCPRollbackStarted
Field | Description |
---|---|
Severity | Info |
Conditions | When the rollback process for an SCP microservice starts. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.6004 |
Description | Alert is raised when the rollback process for an SCP microservice starts. |
Recommended Actions |
Cause: When the rollback process for an SCP microservice starts. Diagnostic Information: Not applicable. Recovery: This alert is cleared automatically in
5 minutes when the customAlertExpiryEnabled parameter is set to
false in the
For any assistance, contact My Oracle Support. |
6.2.31 SCPRollbackFailed
Table 6-34 SCPRollbackFailed
Field | Description |
---|---|
Severity | Critical |
Conditions | When any SCP microservice rollback fails during the rollback process. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.6005 |
Description | Alert is raised when any SCP microservice rollback fails. |
Recommended Actions |
Cause: When any SCP microservice rollback fails during the rollback process. Diagnostic Information: Monitor new hook-jobs that might have failed after multiple attempts. Also, monitor any failed log. Run the following command to check the pod of
hook-job:
Run the following command to check the logs:
Recovery: This alert is cleared automatically in
5 minutes when the customAlertExpiryEnabled parameter is set to
false in the
For any assistance, contact My Oracle Support. |
6.2.32 SCPRollbackSuccessful
Table 6-35 SCPRollbackSuccessful
Field | Description |
---|---|
Severity | Info |
Conditions | When any SCP microservice rollback is completed. |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.6006 |
Description | Alert is raised when any SCP microservice rollback is completed. |
Recommended Actions |
Cause: When any SCP microservice rollback is completed. Diagnostic Information: Not applicable. Recovery: This alert is cleared automatically in
5 minutes when the customAlertExpiryEnabled parameter is set to
false in the
For any assistance, contact My Oracle Support. |
6.2.33 ScpWorkerPodCpuUtilizationAboveWarnThreshold
Table 6-36 ScpWorkerPodCpuUtilizationAboveWarnThreshold
Field | Details |
---|---|
Description | CPU utilization of SCP worker at warn level |
Summary | CPU utilization of SCP worker at warn level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} |
Severity | Warning |
Condition | This alert is raised when CPU utilization of SCP-Worker
reaches the WARN
level.
ocscp_worker_pod_overload_control_cpu_utilization_warn > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7018 |
Metric Used | ocscp_worker_pod_overload_control_cpu_utilization_warn |
Recommended Action |
Cause: When CPU utilization of scp-worker reaches the WARN level. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support. |
6.2.34 ScpWorkerPodCpuUtilizationAboveMinorThreshold
Table 6-37 ScpWorkerPodCpuUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | CPU utilization of SCP worker at minor level |
Summary | CPU utilization of SCP worker at minor level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} |
Severity | Minor |
Condition | This alert is raised when CPU utilization of scp-worker
reaches the MINOR level.
ocscp_worker_pod_overload_control_cpu_utilization_minor > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7019 |
Metric Used | ocscp_worker_pod_overload_control_cpu_utilization_minor |
Recommended Action |
Cause: When CPU utilization of SCP-Worker reaches the MINOR level. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support. |
6.2.35 ScpWorkerPodCpuUtilizationAboveMajorThreshold
Table 6-38 ScpWorkerPodCpuUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | CPU utilization of SCP worker at major level |
Summary | CPU utilization of SCP worker at major level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} |
Severity | Major |
Condition | This alert is raised when CPU utilization of scp-worker
reaches the MAJOR level.
ocscp_worker_pod_overload_control_cpu_utilization_major > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7020 |
Metric Used | ocscp_worker_pod_overload_control_cpu_utilization_major |
Recommended Action |
Cause: When CPU utilization of SCP-Worker reaches the MAJOR level. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support. |
6.2.36 ScpWorkerPodCpuUtilizationAboveCriticalThreshold
Table 6-39 ScpWorkerPodCpuUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | CPU utilization of SCP worker at critical level |
Summary | CPU utilization of SCP worker at critical level. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} |
Severity | Critical |
Condition | This alert is raised when CPU utilization of scp-worker
reaches the CRITICAL level.
ocscp_worker_pod_overload_control_cpu_utilization_critical > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7021 |
Metric Used | ocscp_worker_pod_overload_control_cpu_utilization_critical |
Recommended Action |
Cause: When CPU utilization of scp-worker reaches the CRITICAL level. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support. |
6.2.37 SCPUnhealthyPeerSCPDetected
Table 6-40 SCPUnhealthyPeerSCPDetected
Field | Details |
---|---|
Description | Next hop SCP is marked unhealthy |
Summary | 'Next hop SCP is marked unhealthy. peerScpFqdn: {{labels.peerScpName}}, scpFqdn: {{labels.scpFqdn}} , namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} ' |
Severity | Info |
Condition |
This alert is raised when the peer SCP is marked as unhealthy. ocscp_peer_scp_unhealthy > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7022 |
Metric Used | ocscp_peer_scp_unhealthy |
Recommended Action |
Cause: The peer SCP is marked as unhealthy because of consecutive failure responses. Diagnostic Information:
Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection. For any assistance, contact My Oracle Support. |
6.2.38 SCPDnsSrvQueryFailure
Table 6-41 SCPDnsSrvQueryFailure
Field | Details |
---|---|
Description | DNS SRV Query failed with cause {{$labels.cause}} |
Summary | 'DNS SRV Query failed with cause {{$labels.cause}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the DNS server lookup for the
SRV query fails due to network or timeout error.
ocscp_alternate_resolution_dnssrv_rx_error_res == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.8001 |
Metric Used | ocscp_alternate_resolution_dnssrv_rx_error_res |
Recommended Action |
Cause: When the DNS SRV lookup fails due to network or timeout error. Diagnostic Information: Check the DNS SRV server status and re-establish the status to normal. Recovery: This alert is automatically cleared when SCP performs a successful DNS SRV query. For any assistance, contact My Oracle Support. |
6.2.39 SCPProducerOverloadThrottled
Table 6-42 SCPProducerOverloadThrottled
Field | Details |
---|---|
Description | Producer is in Throttled Overload state |
Summary | 'Producer is in Throttled Overload state. producerFqdn: {{$labels.producerFqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} |
Severity | Info |
Condition | This alert is raised when the producer NF is in the
throttled congestion state.
ocscp_producer_load_throttled == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7023 |
Metric Used | ocscp_producer_load_throttled |
Recommended Action |
Cause: When the load of producer NF is higher than the throttled threshold configured for the service. Diagnostic Information:
Recovery: This alert clears automatically when the NF profile is deregistered or changed with load less than the throttled abatement threshold. For any assistance, contact My Oracle Support. |
6.2.40 SCPProducerOverloadAlternateRouted
Table 6-43 SCPProducerOverloadAlternateRouted
Field | Details |
---|---|
Description | Producer is in Alternate Route Overload state |
Summary | 'Producer is in Alternate Route Overload state. producerFqdn: {{$labels.producerFqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} |
Severity | Info |
Condition | This alert is raised when the producer NF is in the
alternate routing congestion state.
ocscp_producer_load_alternateRoute == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7024 |
Metric Used | ocscp_producer_load_alternateRoute |
Recommended Action |
Cause: When the load of producer NF is higher than alternate routing threshold configured for the service. Diagnostic Information:
Recovery: This alert clears automatically when the NF profile is deregistered or changed with load less than the alternate routing abatement threshold. For any assistance, contact My Oracle Support. |
6.2.41 SCPSeppNotConfigured
Table 6-44 SCPSeppNotConfigured
Field | Details |
---|---|
Description | SEPP is not configured for PLMN |
Summary | 'SEPP is not configured for PLMN'Summary: 'SEPP is not configured for PLMN. plmnid: {{$labels.plmn_id}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when Security Edge Protection Proxy
(SEPP) is not configured.
ocscp_metric_sepp_not_configured_current == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7025 |
Metric Used | cscp_metric_sepp_not_configured_current |
Recommended Action |
Cause: When SEPP routing related rules are not configured at SCP for selected PLMN in the inter-PLMN routing. Diagnostic Information:
Recovery: This alert clears automatically when the SEPP related routing rules are created at SCP for selected PLMN in the inter-PLMN routing. For any assistance, contact My Oracle Support. |
6.2.42 SCPSeppRoutingFailed
Table 6-45 SCPSeppRoutingFailed
Field | Details |
---|---|
Description | Routing towards SEPP failed |
Summary | Routing towards SEPP failed. sepp_fqdn: {{$labels.ocscp_sepp_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when routing towards SEPP fails.
ocscp_metric_sepp_routing_attempt_fail_current == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7026 |
Metric Used | ocscp_metric_sepp_routing_attempt_fail_current |
Recommended Action |
Cause: Inter-PLMN routing failed for the selected SEPP instances. Diagnostic Information:
Recovery: This alert clears automatically when routing is successful for selected SEPP. For any assistance, contact My Oracle Support. |
6.2.43 SCPWorkerSSLCertificateExpire
Table 6-46 SCPWorkerSSLCertificateExpire
Field | Details |
---|---|
Description | Whenever an SCP SSL certificate is about to expire, an alert will be raised before the configured interval time. |
Summary |
'SCP Worker SSL Certificate is about to expire, ocscp_worker_fqdn: {{$labels.ocscp_producer_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}, certificatetype: {{$labels.certificatetype}} {{ . | first | value | humanizeTimestamp }}{{ end }} |
Severity | Major |
Condition | ocscp_metric_ssl_certificate_expire_total > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.8024 |
Metric Used | ocscp_metric_ssl_certificate_expire_total |
Recommended Action |
Cause: The SCP TLS or SSL certificate expires before the configured time. Diagnostic Information:
Recovery: Update the SCP SSL Secret with renewed SSL certificate. For any assistance, contact My Oracle Support. |
6.2.44 SCPWorkerHTTPSConnectionFailure
Table 6-47 SCPWorkerHTTPSConnectionFailure
Field | Details |
---|---|
Description | SCP Worker HTTPS Connection Establishment is failed |
Summary | Scp Worker HTTPS Connection Establishment is failed ocscp_worker_fqdn: {{$labels.ocscp_producer_fqdn}}, scpfqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end } |
Severity | Info |
Condition | This alert is raised when an egress connection
fails.
increase(ocscp_metric_https_egress_connection_failure_total{app_kubernetes_io_name="scp-worker"} [10m] ) > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.8025 |
Metric Used | ocscp_metric_https_egress_connection_failure_total |
Recommended Action |
Cause: SCP fails to send a message request to a producer NF due to HTTPS connection errors with the producer NF. Diagnostic Information:
Recovery: Ensure that all Transport Layer Security (TLS) or Secure Sockets Layer (SSL) certificates are configured at SCP and producer NFs are valid. For any assistance, contact My Oracle Support. |
6.2.45 SCPGlobalEgressRLRemoteParticipantConnectivityFailure
Table 6-48 SCPGlobalEgressRLRemoteParticipantConnectivityFailure
Field | Details |
---|---|
Description | 'SCP Global Egress RL Remote Participant Connectivity Failure for participant |
Summary | 'SCP Global Egress RL Remote Participant Connectivity Failure for participant: {{$labels.scp_remote_coh_cluster_name}}, scp_fqdn: {{$labels.scp_fqdn}}, scp_local_coh_cluster_name: {{$labels.scp_local_coh_cluster_name}}, scp_remote_coh_cluster_fqdn: {{$labels.scp_remote_coh_cluster_fqdn }}, scp_remote_coh_cluster_port: {{$labels.scp_remote_coh_cluster_port }}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when the remote participant SCP
connection is not established or goes down.
ocscp_global_egress_rl_remote_participant_connectivity_failure == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9001 |
Metric Used | ocscp_global_egress_rl_bucketkey_not_rate_controlled_total |
Recommended Action |
Cause: When the remote participant SCP connection is not established or down. Diagnostic Information:
Recovery: This alert clears automatically if the connection is established with the remote participant SCP. For any assistance, contact My Oracle Support. |
6.2.46 SCPGlobalEgressRLRemoteParticipantWithDuplicateNFInstanceId
Table 6-49 SCPGlobalEgressRLRemoteParticipantWithDuplicateNFInstanceId
Field | Details |
---|---|
Description | SCP global egress RL remote participant configured with duplicate NF InstanceId for participant. |
Summary | 'SCP Global Egress RL Remote Participant Configured With Duplicate NFInstanceId for participant: {{$labels.scp_remote_coh_cluster_name}}, scp_fqdn: {{$labels.scp_fqdn}}, scp_nf_instance_id: {{$labels.scp_nf_instance_id}}, scp_local_coh_cluster_name: {{$labels.scp_local_coh_cluster_name}}, scp_remote_coh_cluster_fqdn: {{$labels.scp_remote_coh_cluster_fqdn }}, scp_remote_coh_cluster_port: {{$labels.scp_remote_coh_cluster_port }}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when a duplicate remote coherence
participant is found.
ocscp_global_egress_rl_remote_participant_is_duplicate == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9002 |
Metric Used | ocscp_global_egress_rl_remote_participant_is_duplicate |
Recommended Action |
Cause: Duplicate configuration of remote coherence participants with local SCP. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support. |
6.2.47 SCPMediationConnectivityFailure
Table 6-50 SCPMediationConnectivityFailure
Field | Details |
---|---|
Description | 'SCP Mediation Connectivity Failed, scp_fqdn |
Summary | 'SCP Mediation Connectivity Failed, scp_fqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when Mediation connection is
not established or request to Mediation is not successful.
ocscp_mediation_http_not_reachable == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9002 |
Metric Used | ocscp_mediation_http_not_reachable |
Recommended Action |
Cause: The remote Mediation connection is not established or request to Mediation is not successful. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support. |
6.2.48 SCPNotificationQueuesUtilizationAboveMinorThreshold
Table 6-51 SCPNotificationQueuesUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Minor Threshold' |
Summary | 'SCP Notification Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when the queues in the notification
service are utilized above 65% of the maximum size (user configure minor
threshold value).
ocscp_notification_queue_alert{severity="MINOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.3009 |
Metric Used | ocscp_notification_queue_alert |
Recommended Action |
Cause: The Notification module is getting more traffic than expected. Diagnostic Information:
Recovery: This alert clears automatically when notification traffic goes below minor threshold or exceeds major threshold. For any assistance, contact My Oracle Support. |
6.2.49 SCPNotificationQueuesUtilizationAboveMajorThreshold
Table 6-52 SCPNotificationQueuesUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Major Threshold' |
Summary | 'SCP Notification Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when the queues in the notification
service is utilized above 75% of the maximum size (user configure major
threshold value).
ocscp_notification_queue_alert{severity="MAJOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.3010 |
Metric Used | ocscp_notification_queue_alert |
Recommended Action |
Cause: The Notification module is getting more traffic than expected. Diagnostic Information:
Recovery: This alert clears automatically when notification traffic goes below major threshold or above critical major threshold. For any assistance, contact My Oracle Support. |
6.2.50 SCPNotificationQueuesUtilizationAboveCriticalThreshold
Table 6-53 SCPNotificationQueuesUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Notification Queues Utilization Above Critical Threshold' |
Summary | 'SCP Notification Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the queues in notification
service are utilized above 85% of the maximum size (user configure
critical threshold value).
ocscp_notification_queue_alert{severity="CRITICAL"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.3011 |
Metric Used | ocscp_notification_queue_alert |
Recommended Action |
Cause: The Notification module is getting more traffic than expected. Diagnostic Information:
Recovery: This alert clears automatically when notification traffic goes below critical threshold. For any assistance, contact My Oracle Support. |
6.2.51 SCPNrfProxyQueuesUtilizationAboveMinorThreshold
Table 6-54 SCPNrfProxyQueuesUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Minor Threshold' |
Summary | 'SCP Nrfproxy Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when the task queues in
scp-nrfproxy service are utilized above 65% of the maximum size (user
configure minor threshold value).
ocscp_nrfproxy_queue_alert{severity="MINOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9010 |
Metric Used | ocscp_nrfproxy_queue_alert |
Recommended Action |
Cause: NrfProxy task queues are getting filled and the traffic is more than expected. Diagnostic Information:
Recovery: This alert clears automatically when the traffic goes below minor threshold or above major threshold. For any assistance, contact My Oracle Support. |
6.2.52 SCPNrfProxyQueuesUtilizationAboveMajorThreshold
Table 6-55 SCPNrfProxyQueuesUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Major Threshold' |
Summary | 'SCP Nrfproxy Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when the task queues in
scp-nrfproxy service are utilized above 75% of the maximum size (user
configure major threshold value).
ocscp_nrfproxy_queue_alert{severity="MAJOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9011 |
Metric Used | ocscp_nrfproxy_queue_alert |
Recommended Action |
Cause: NrfProxy task queues are getting filled and the traffic is more than expected. Diagnostic Information:
Recovery: This alert clears automatically when the traffic goes below major threshold or above critical threshold. For any assistance, contact My Oracle Support. |
6.2.53 SCPNrfProxyQueuesUtilizationAboveCriticalThreshold
Table 6-56 SCPNrfProxyQueuesUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Nrfproxy Queues Utilization Above Critical Threshold' |
Summary | 'SCP Nrfproxy Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the task queues in
scp-nrfproxy service are utilized above 85% of the maximum size (user
configure critical threshold value).
ocscp_nrfproxy_queue_alert{severity="CRITICAL"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9012 |
Metric Used | ocscp_nrfproxy_queue_alert |
Recommended Action |
Cause: NrfProxy task queues are getting filled and the traffic is more than expected. Diagnostic Information:
Recovery: This alert clears automatically when the traffic goes below critical threshold. For any assistance, contact My Oracle Support. |
6.2.54 SCPWorkerQueuesUtilizationAboveMinorThreshold
Table 6-57 SCPWorkerQueuesUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Minor Threshold' |
Summary | 'SCP Worker Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when task queues in scp-worker
service are utilized above 65% of the maximum size (user configure minor
threshold value).
ocscp_worker_queue_alert{severity="MINOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9007 |
Metric Used | ocscp_worker_queue_alert |
Recommended Action |
Cause: Worker task queues are getting filled and the traffic is more than expected. Diagnostic Information:
Recovery: This alert clears automatically when the traffic goes below minor threshold or above major threshold. For any assistance, contact My Oracle Support. |
6.2.55 SCPWorkerQueuesUtilizationAboveMajorThreshold
Table 6-58 SCPWorkerQueuesUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Major Threshold' |
Summary | 'SCP Worker Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when task queues in scp-worker
service are utilized above 75% of the maximum size (user configure major
threshold value).
ocscp_worker_queue_alert{severity="MAJOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9008 |
Metric Used | ocscp_worker_queue_alert |
Recommended Action |
Cause: Worker task queues are getting filled and the traffic is more than expected. Diagnostic Information:
Recovery: This alert clears automatically when the traffic goes below major threshold or goes above critical threshold. For any assistance, contact My Oracle Support. |
6.2.56 SCPWorkerQueuesUtilizationAboveCriticalThreshold
Table 6-59 SCPWorkerQueuesUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Worker Queues Utilization Above Critical Threshold' |
Summary | 'SCP Worker Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when task queues in scp-worker
service are utilized above 85% of the maximum size (user configure
critical threshold value).
ocscp_worker_queue_alert{severity="CRITICAL"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9009 |
Metric Used | ocscp_worker_queue_alert |
Recommended Action |
Cause: Worker task queues are getting filled and the traffic is more than expected. Diagnostic Information:
Recovery: This alert clears automatically when the traffic goes below critical threshold. For any assistance, contact My Oracle Support. |
6.2.57 SCPCacheQueuesUtilizationAboveMinorThreshold
Table 6-60 SCPCacheQueuesUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Minor Threshold' |
Summary | 'SCP Cache Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when the task queues in the
scp-cache service are utilized above 65% of their maximum size (the
user-configured minor threshold value).
ocscp_cache_queue_alert{severity="MINOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.13002 |
Metric Used | ocscp_cache_queue_utilization |
Recommended Action |
Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information:
Recovery: The alert is cleared automatically when minor threshold or goes above major threshold. For any assistance, contact My Oracle Support. |
6.2.58 SCPCacheQueuesUtilizationAboveMajorThreshold
Table 6-61 SCPCacheQueuesUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Major Threshold' |
Summary | SCP Cache Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when the task queues in the
scp-cache service are utilized above 75% of their maximum size (the
user-configured major threshold value).
ocscp_cache_queue_alert{severity="MAJOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.13001 |
Metric Used | ocscp_cache_queue_utilization |
Recommended Action |
Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information:
Recovery: The alert is cleared automatically when traffic falls below a major threshold or goes above a critical threshold. For any assistance, contact My Oracle Support. |
6.2.59 SCPCacheQueuesUtilizationAboveCriticalThreshold
Table 6-62 SCPCacheQueuesUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Cache Queues Utilization Above Critical Threshold' |
Summary | 'SCP Cache Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the task queues in the
scp-cache service are utilized above 85% of their maximum size (the
user-configured critical threshold value).
ocscp_cache_queue_alert{severity="CRITICAL"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.13000 |
Metric Used | ocscp_cache_queue_utilization |
Recommended Action |
Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information:
Recovery: The alert is cleared automatically when traffic falls below a critical threshold. For any assistance, contact My Oracle Support. |
6.2.60 SCPLoadManagerQueuesUtilizationAboveMinorThreshold
Table 6-63 SCPLoadManagerQueuesUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Minor Threshold' |
Summary | 'SCP Load Manager Queues Utilization Above Minor Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when the task queues in the
scp-load-manager service are utilized above 65% of their maximum size
(the user-configured minor threshold value).
ocscp_load_manager_queue_alert{severity="MINOR"} |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.11002 |
Metric Used | ocscp_load_manager_queue_utilization |
Recommended Action |
Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information:
Recovery: The alert is cleared automatically when traffic falls below a minor threshold. For any assistance, contact My Oracle Support. |
6.2.61 SCPLoadManagerQueuesUtilizationAboveMajorThreshold
Table 6-64 SCPLoadManagerQueuesUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Major Threshold' |
Summary | 'SCP Load Manager Queues Utilization Above Major Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when the task queues in the
scp-load-manager service are utilized above 75% of their maximum size
(the user-configured major threshold value).
ocscp_load_manager_queue_alert{severity="MAJOR"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.11001 |
Metric Used | ocscp_load_manager_queue_utilization |
Recommended Action |
Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information:
Recovery: The alert is cleared automatically when traffic falls below a major threshold. For any assistance, contact My Oracle Support. |
6.2.62 SCPLoadManagerQueuesUtilizationAboveCriticalThreshold
Table 6-65 SCPLoadManagerQueuesUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: SCP Load Manage Queues Utilization Above Critical Threshold' |
Summary | 'SCP Load Manager Queues Utilization Above Critical Threshold, instancename: {{$labels.instance}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the task queues in the
scp-load-manager service are utilized above 85% of their maximum size
(the user-configured critical threshold value).
ocscp_load_manager_queue_alert{severity="CRITICAL"} == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.11000 |
Metric Used | ocscp_load_manager_queue_utilization |
Recommended Action |
Cause: When the cache task queues are getting filled, and traffic is higher than expected. Diagnostic Information:
Recovery: The alert is cleared automatically when traffic falls below a critical threshold. For any assistance, contact My Oracle Support. |
6.2.63 SCPProducerNfSetUnhealthy
Table 6-66 SCPProducerNfSetUnhealthy
Field | Details |
---|---|
Description | All producer NFs in NF set are marked unhealthy |
Summary | 'All producer NFs in NF set are marked unhealthy. nfSet: {{$labels.ocscp_nf_setid}}, scpFqdn: {{$labels.scp_fqdn}} , namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Info |
Condition |
This alert is raised when all producer NFs in an NF Set are marked unhealthy. ocscp_metric_nf_set_unhealthy > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7027 |
Metric Used | ocscp_metric_nf_set_unhealthy |
Recommended Action |
Cause: All the producer NFs are marked unhealthy because of consecutive failure responses. Diagnostic Information:
Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection. For any assistance, contact My Oracle Support. |
6.2.64 SCPPeerSeppUnhealthy
Table 6-67 SCPPeerSeppUnhealthy
Field | Details |
---|---|
Description | Peer Sepp is marked unhealthy |
Summary | 'Peer Sepp is marked unhealthy. seppFqdn: {{$labels.ocscp_sepp_fqdn}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Info |
Condition |
This alert is raised when peer SEPP is marked unhealthy. ocscp_sepp_unhealthy > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7028 |
Metric Used | ocscp_sepp_unhealthy |
Recommended Action |
Cause: The peer SEPP is marked unhealthy because of consecutive failure responses. Diagnostic Information:
Recovery: This alert is automatically cleared after the degradation time is over. Degradation time = Number of consecutive degradations multiplied by configured base ejection. For any assistance, contact My Oracle Support. |
6.2.65 SCPMicroServiceUnreachable
Table 6-68 SCPMicroServiceUnreachable
Field | Details |
---|---|
Description | 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}: SCP communication between the micro-services indicated by source and destination has failed' |
Summary | Summary: 'SCP communication between the micro-services indicated by source and destination has failed: {{$labels.instance}}, namespace: {{$labels.namespace}}, source:{{$labels.source}}, destination: {{$labels.destination}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the communication between SCP
microservices indicated by source and destination has failed.
ocscp_metric_svc_unreachable==1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7029 |
Metric Used | ocscp_metric_svc_unreachable |
Recommended Action |
Cause: Communication between SCP microservices has failed. Diagnostic Information: Verify whether endpoints of all the services are in Running and Ready state. If not, restart the services. Recovery: This alert clears automatically when the required services are in Running and Ready state. For any assistance, contact My Oracle Support. |
6.2.66 SCPTrafficFeedSendFailed
Table 6-69 SCPTrafficFeedSendFailed
Field | Details |
---|---|
Description | 'Sending messages to Traffic Feed failed. Cause : {{$labels.ocscp_cause}}' |
Summary | 'Sending messages to Traffic Feed failed, cause: {{$labels.ocscp_cause}}, scp_fqdn: {{$labels.scp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when sending messages to
traffic feed fails.
increase(ocscp_metric_trafficfeed_attempted_total{app_kubernetes_io_name="scp-worker"}[1h]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9003 |
Metric Used | ocscp_metric_trafficfeed_attempted_total |
Recommended Action |
Cause: Sending of message to traffic feed failed. Diagnostic Information:
Recovery: This alert clears automatically after 24 hrs if sending messages to traffic feed stops failing. For any assistance, contact My Oracle Support. |
6.2.67 SCPTrafficFeedKafkaClusterUnhealthy
Table 6-70 SCPTrafficFeedKafkaClusterUnhealthy
Field | Details |
---|---|
Description | 'Kafka cluster is marked unhealthy, Cause : {{$labels.ocscp_cause}}' |
Summary | 'Kafka cluster is marked unhealthy, cause: {{$labels.ocscp_cause}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when the Kafka cluster is
unhealthy.
ocscp_metric_trafficfeed_cluster_unhealthy == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9026 |
Metric Used | ocscp_metric_trafficfeed_cluster_unhealthy |
Recommended Action |
Cause: The Kafka cluster is unhealthy. Diagnostic Information:
Recovery: This alert clears when the Kafka cluster recovers from the failure condition. For any assistance, contact My Oracle Support. |
6.2.68 SCPTrafficFeedPartitionUnhealthy
Table 6-71 SCPTrafficFeedPartitionUnhealthy
Field | Details |
---|---|
Description | 'Kafka partition {{$labels.kafka_partition_id}} is marked unhealthy, Cause : {{$labels.ocscp_cause}}' |
Summary | 'Kafka cluster is marked unhealthy, cause: {{$labels.ocscp_cause}}, partition_id: {{$labels.kafka_partition_id}}, topic: {{$labels.topic}}, scp_fqdn: {{$labels.ocscp_fqdn}}, namespace: {{$labels.namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when the Kafka partition is
unhealthy.
ocscp_metric_trafficfeed_partition_unhealthy == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9025 |
Metric Used | ocscp_metric_trafficfeed_partition_unhealthy |
Recommended Action |
Cause: The Kafka partition is unhealthy. Diagnostic Information:
Recovery: This alert clears when the Kafka partition recovers from the failure condition. For any assistance, contact My Oracle Support. |
6.2.69 SCPServiceMeshFailure
Table 6-72 SCPServiceMeshFailure
Field | Details |
---|---|
Description | 'SCP servicemesh failure encountered' |
Summary | 'SCP servicemesh failure encountered for nfservicetype: {{$labels.ocscp_nf_service_type}}, nftype: {{$labels.ocscp_nf_type}}, nfinstanceid: {{$labels.ocscp_nf_instance_id}}, serviceinstanceid: {{$labels.ocscp_service_instance_id}}, producerfqdn: {{$labels.ocscp_producer_fqdn}}, responsecode: {{$labels.ocscp_response_code}} serverheader:{{$labels.ocscp_server_header}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'SeverityInfo |
Severity | Info |
Condition |
This alert is raised when service mesh failure occurs. increase(ocscp_metric_sidecarproxy_failures_total[2m]) > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.7030 |
Metric Used | ocscp_metric_sidecarproxy_failures_total |
Recommended Action |
Cause: Service mesh failure observed at SCP. Diagnostic Information:
Recovery: This alert clears automatically after 2 minutes if there is no service mesh failure observed by SCP with the same dimensions. For any assistance, contact My Oracle Support. |
6.2.70 SCPHealthCheckFailedForPeerSCP
Table 6-73 SCPHealthCheckFailedForPeerSCP
Field | Details |
---|---|
Description | 'SCP HealthCheck failed for peer SCP' |
Summary | 'SCP HealthCheck failed for peer SCP. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Info |
Condition | This alert is raised when peer SCP or inter-SCP
becomes unhealthy due to health check status and outlier
detection.
ocscp_interscp_health_check_status_failed == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9023 |
Metric Used | ocscp_interscp_health_check_status_failed |
Recommended Action |
Cause: When peer SCP is unhealthy to recieve any SBI message requests due to health check and outlier detection. Diagnostic Information:
Recovery: This alert clears automatically if SCP-C decides SCP-P is healthy or available based on the current and previous status of outlier detection and health check. For any assistance, contact My Oracle Support. |
6.2.71 SCPHealthCheckFailed
Table 6-74 SCPHealthCheckFailed
Field | Details |
---|---|
Description | 'SCP HealthCheck failed' |
Summary |
'SCP HealthCheck failed. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Info |
Condition | This alert is raised when SCP is unhealthy because
the overall average load of SCP is greater than the configured
threshold.
ocscp_health_check_status_failed == 1 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9024 |
Metric Used | ocscp_health_check_status_failed |
Recommended Action |
Cause: When SCP is unhealthy to receive any SBI message requests due to the overall average load. Diagnostic Information: Monitor if the overall average load of SCP is greater than the configured threshold value. Recovery: This alert clears automatically when the overall average load of SCP is less than the configured threshold value. For any assistance, contact My Oracle Support. |
6.2.72 ScpWorkerPodPendingTransUtilizationAboveMinorThreshold
Table 6-75 ScpWorkerPodPendingTransUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | Worker Pending Transaction lead to minor level |
Summary | 'Worker Pending Transaction lead to minor level.namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when pending transaction
utilization of SCP-Worker reaches MINOR level.
ocscp_worker_pod_overload_control_pendingTrans_utilization_minor > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9014 |
Metric Used | ocscp_worker_pod_overload_control_pendingTrans_utilization_minor |
Recommended Action |
Cause: When pending transactions utilization of SCP-Worker reaches MINOR level. Diagnostic Information:
Recovery: This alert clears automatically when pending transaction utilization is below MINOR threshold level. For any assistance, contact My Oracle Support. |
6.2.73 ScpWorkerPodPendingTransUtilizationAboveMajorThreshold
Table 6-76 ScpWorkerPodPendingTransUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | Worker Pending Transaction lead to major level |
Summary | Worker Pending Transaction lead to major level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when pending transaction
utilization of SCP-Worker reaches MAJOR level.
ocscp_worker_pod_overload_control_pendingTrans_utilization_major > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9015 |
Metric Used | ocscp_worker_pod_overload_control_pendingTrans_utilization_major |
Recommended Action |
Cause: When pending transactions utilization of SCP-Worker reaches MAJOR level. Diagnostic Information:
Recovery: This alert clears automatically when pending transaction utilization is below MAJOR threshold level. For any assistance, contact My Oracle Support. |
6.2.74 ScpWorkerPodPendingTransUtilizationAboveCriticalThreshold
Table 6-77 ScpWorkerPodPendingTransUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | Worker Pending Transaction lead to critical level |
Summary | 'Worker Pending Transaction lead to critical level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when pending transaction
utilization of SCP-Worker reaches CRITICAL level.
ocscp_worker_pod_overload_control_pendingTrans_utilization_critical > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9016 |
Metric Used | ocscp_worker_pod_overload_control_pendingTrans_utilization_critical |
Recommended Action |
Cause: When pending transactions utilization of SCP-Worker reaches CRITICAL level. Diagnostic Information:
Recovery: This alert clears automatically when pending transaction utilization is below CRITICAL threshold level. For any assistance, contact My Oracle Support. |
6.2.75 ScpWorkerPodPendingTransUtilizationAboveWarnThreshold
Table 6-78 ScpWorkerPodPendingTransUtilizationAboveWarnThreshold
Field | Details |
---|---|
Description | Worker Pending Transaction lead to Warn level |
Summary | 'Worker Pending Transaction lead to warn level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Warn |
Condition | This alert is raised when pending transaction
utilization of SCP-Worker reaches WARN level.
ocscp_worker_pod_overload_control_pendingTrans_utilization_warn > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9017 |
Metric Used | ocscp_worker_pod_overload_control_pendingTrans_utilization_warn |
Recommended Action |
Cause: When pending transactions utilization of SCP-Worker reaches WARN level. Diagnostic Information:
Recovery: This alert clears automatically when pending transaction utilization is below WARN threshold level. For any assistance, contact My Oracle Support. |
6.2.76 ScpWorkerPodResourceUtilizationAboveMinorThreshold
Table 6-79 ScpWorkerPodResourceUtilizationAboveMinorThreshold
Field | Details |
---|---|
Description | Worker overload control lead to minor level |
Summary | 'Worker overload control lead to minor level.namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Minor |
Condition | This alert is raised when overload control resource
utilization of SCP-Worker reaches MINOR level.
ocscp_worker_pod_overload_control_resource_utilization_minor > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9018 |
Metric Used | ocscp_worker_pod_overload_control_resource_utilization_minor |
Recommended Action |
Cause: When overload control resource utilization of SCP-Worker reaches MINOR level. Diagnostic Information:
Recovery: This alert clears automatically when overload control resource utilization is below MINOR threshold level. For any assistance, contact My Oracle Support. |
6.2.77 ScpWorkerPodResourceUtilizationAboveMajorThreshold
Table 6-80 ScpWorkerPodResourceUtilizationAboveMajorThreshold
Field | Details |
---|---|
Description | Worker overload control lead to major level |
Summary | 'Worker overload control lead to major level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Major |
Condition | This alert is raised when overload control resource
utilization of SCP-Worker reaches MAJOR level.
ocscp_worker_pod_overload_control_resource_utilization_major > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9019 |
Metric Used | ocscp_worker_pod_overload_control_resource_utilization_major |
Recommended Action |
Cause: When overload control resource utilization of SCP-Worker reaches MAJOR level. Diagnostic Information:
Recovery: This alert clears automatically when overload control resource utilization is below MAJOR threshold level. For any assistance, contact My Oracle Support. |
6.2.78 ScpWorkerPodResourceUtilizationAboveWarnThreshold
Table 6-81 ScpWorkerPodResourceUtilizationAboveWarnThreshold
Field | Details |
---|---|
Description | 'Worker overload control lead to Warn level' |
Summary | 'Worker overload control lead to warn level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Warning |
Condition | This alert is raised when overload control resource
utilization of SCP-Worker reaches WARN level.
ocscp_worker_pod_overload_control_resource_utilization_warn > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9021 |
Metric Used | ocscp_worker_pod_overload_control_resource_utilization_warn |
Recommended Action |
Cause: When overload control resource utilization of SCP-Worker reaches WARN level. Diagnostic Information:
Recovery: This alert clears automatically when overload control resource utilization is below WARN threshold level. For any assistance, contact My Oracle Support. |
6.2.79 ScpWorkerPodResourceUtilizationAboveCriticalThreshold
Table 6-82 ScpWorkerPodResourceUtilizationAboveCriticalThreshold
Field | Details |
---|---|
Description | Worker overload control lead to critical level |
Summary | 'Worker overload control lead to critical level. namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}' |
Severity | Critical |
Condition | This alert is raised when overload control resource
utilization of SCP-Worker reaches CRITICAL level.
ocscp_worker_pod_overload_control_resource_utilization_critical > 0 |
OID | 1.3.6.1.4.1.323.5.3.35.1.2.9020 |
Metric Used | ocscp_worker_pod_overload_control_resource_utilization_critical |
Recommended Action |
Cause: When overload control resource utilization of SCP-Worker reaches CRITICAL level. Diagnostic Information:
Recovery: This alert clears automatically when overload control resource utilization is below CRITICAL threshold level. For any assistance, contact My Oracle Support. |
6.2.80 SCPDNSSRVNRFMigrationTaskFailure
Table 6-83 SCPDNSSRVNRFMigrationTaskFailure
Field | Description |
---|---|
Severity | critical |
Condition | ocscp_configuration_dnssrv_nrf_migration_task_failure == 1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15001 |
Description | An alert is raised to notify that migration from static to DNS has failed. |
Recommended Actions |
Cause:
Diagnostic Information: Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state. Recovery:
For any assistance, contact My Oracle Support.
|
6.2.81 SCPDNSSRVNRFNonMigrationTaskFailure
Table 6-84 SCPDNSSRVNRFNonMigrationTaskFailure
Field | Description |
---|---|
Severity | critical |
Condition | ocscp_configuration_dnssrv_nrf_non_migration_task_failure == 1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15003 |
Description | An alert is raised to notify that the non-migrated task has failed. |
Recommended Actions |
Cause:
Diagnostic Information: Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state. Recovery:
For any assistance, contact My Oracle Support.
|
6.2.82 SCPDNSSRVNRFDuplicateTargetDetected
Table 6-85 SCPDNSSRVNRFDuplicateTargetDetected
Field | Description |
---|---|
Severity | critical |
Condition | ocscp_configuration_dnssrv_nrf_duplicate_target_detected == 1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15002 |
Description | An alert is raised to notify that a duplicate target NRF has been detected in the DNS SRV records. |
Recommended Actions |
Cause: This alert is raised when a duplicate target FQDN is received from the DNS SRV for different NRF SRV FQDN(s). In this case, the first NRF SRV FQDN data received in the scpc-configuration service from the scpc-alternate-resolution service shall be processed, but the subsequent NRF SRV FQDN data will be ignored, and this alert shall be raised. Diagnostic Information: Monitor that all the DNS SRV configurations are proper and that all SCP pods are up and running in the proper state. Recovery:
For any assistance, contact My Oracle Support.
|
6.2.83 SCPHighResponseTimeFromProducer
Table 6-86 SCPHighResponseTimeFromProducer
Field | Description |
---|---|
Severity | Info |
Condition | (sum(rate(ocscp_metric_upstream_service_time_total{ocscp_upstream_service_time="15000ms"}[2m])) by (kubernetes_namespace) + sum(rate(ocscp_metric_upstream_service_time_total{ocscp_upstream_service_time=">15000ms"}[2m])) by (kubernetes_namespace)) > 200 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15004 |
Description | It notifies when the traffic exceeds 200 messages per second and the response delay from the producer takes more than 10 seconds. |
Recommended Actions |
Cause: More than 200 messages per second have an upstream response time above 10 seconds. Diagnostic Information: Monitor metric metricocscp_metric_upstream_service_time_total with ocscp_upstream_service_time="15000ms" and ocscp_upstream_service_time=">15000ms". Recovery: An alert is cleared automatically when the number of responses with a response delay of more than 10 seconds falls below 200 messages per second. If Alert is not getting cleared, then check for any producer NFs or specific service request types that are taking more than 10 seconds to respond and take corrective actions if needed. Note that immediate action may not be needed, as this alter is informational. However, having too many requests with a long response delay may cause performance degradation at SCP. For any assistance, contact My Oracle Support. |
6.2.84 SCPCGroupVersionDetectionFailed
Table 6-87 SCPCGroupVersionDetectionFailed
Field | Description |
---|---|
Severity | critical |
Condition | ocscp_worker_cgroup_version_detection_failed == 1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15005 |
Description | Notify that cgroup version detection has failed. |
Recommended Actions |
Cause: SCP is unable to detect the cgroup version from the underlying kernel with the command "stat -fc %T /sys/fs/cgroup/." The possible expected value is either tmpfs or cgroup2fs. Diagnostic Information:
Recovery:
For any assistance, contact My Oracle Support.
|
6.2.85 SCPCPUUsageFileReadFailed
Table 6-88 SCPCPUUsageFileReadFailed
Field | Description |
---|---|
Severity | critical |
Condition | ocscp_worker_cpu_usage_file_read_failed == 1 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15006 |
Description | Notify that the CPU usage file read operation failed within the detected cgroup version. |
Recommended Actions |
Cause: SCP encountered a failure in performing a read operation for the CPU usage file within the detected cgroup version. The file path is determined based on the detected cgroup version. Diagnostic
Information:
Recovery:
For any assistance, contact My Oracle Support.
|
6.2.86 SCPIgnoreUnknownService
Table 6-89 SCPIgnoreUnknownService
Field | Description |
---|---|
Severity | Info |
Condition | increase(ocscp_ignore_unknown_service_total[24h]) > 0 |
OID used for SNMP Traps | 1.3.6.1.4.1.323.5.3.35.1.2.15000 |
Description | An alert is raised to notify that SCP ignored an unknown service in the NF profile. |
Recommended Actions |
Cause: SCP has received the NF profile with an unknown service and processed the profile by ignoring this unknown service. Diagnostic Information: Check the received NF profile for the unknown services. Recovery: If the unknown services are not present in the NF profile in the next scrapping interval, then the alert will be cleared. For any assistance, contact My Oracle Support. |
6.3 Configuring Alerts
6.3.1 Applying Alerts Rule to CNE without Prometheus Operator
SCP Helm Chart Release Name: _NAME_
Prometheus NameSpace: _Namespace _
- Run the following command to check the name of the config map used
by
Prometheus:
$kubectl get configmap -n <_Namespace_>
Example:$kubectl get configmap -n prometheus-alert2 NAME DATA AGE lisa-prometheus-alert2-alertmanager 1 146d lisa-prometheus-alert2-server 4 146d
- Take a backup of the current config map of Prometheus. This command
saves the configmap in the provided file. In the following command, the
configmap is stored in the /tmp/tempConfig.yaml
file:
$ kubectl get configmaps <_NAME_>-server -o yaml -n <_Namespace_> /tmp/tempConfig.yaml
Example:$ kubectl get configmaps lisa-prometheus-alert2-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
- Check and delete the "alertsscp" rule if it has already configured
in the prometheus config map. If configured, this step removes the " alertsscp "
rule. This is an optional step if configuring the alerts for the first time.
$ sed -i '/etc\/config\/alertsscp/d' /tmp/tempConfig.yaml
- Add the "alertsscp" rule in the configmap dump file under the '
rule_files ' tag.
$ sed -i '/rule_files:/a\ \- /etc/config/alertsscp' /tmp/tempConfig.yaml
- Update the configmap using below command. Ensure to use the same
configmap name that was used to take a backup of the prometheus configmap.
$ kubectl replace configmap <_NAME_>-server -f /tmp/tempConfig.yaml
Example:$ kubectl replace configmap lisa-prometheus-alert2-server -f /tmp/tempConfig.yaml
- Run the following command to patch the configmap with a new "alertsscp"
rule:
Note:
The patch file provided is theocscp_csar_23_2_0_0_0.zip
folder provided with SCP, that is,SCPAlertrules.yaml
.$ kubectl patch configmap _NAME_-server -n _Namespace_ --type merge --patch "$(cat ~/SCPAlertrules.yaml)"
Example:$ kubectl replace configmap lisa-prometheus-alert2-server -f /tmp/tempConfig.yaml
Note:
Prometheus takes about 20 seconds to apply the updated Config map.6.3.2 Applying Alerts Rule to CNE with Prometheus Operator
6.3.3 Configuring Service Communication Proxy Alert using the SCPAlertrules.yaml file
Note:
Default NameSpace is scpsvc for Service Communication Proxy. You can update the NameSpace as per the deployment.To access the scpAlertsrules_<scp release
number>.yaml
file from the Scripts
folder of
ocscp_csar_23_2_0_0_0.zip
, download the SCP package from My
Oracle Support as described in "Downloading the SCP Package " in Oracle Communications Cloud Native Core, Service Communication Proxy
Installation, Upgrade, and Fault Recovery Guide.
Alerts Details
Description and summary for alerts are added by the Prometheus alert manager.
- SCPIngress Traffic Rate
Above Threshold
- Has three threshold level Minor (above 1400 mps to 2000mps), Major (1600 to 1800 mps), Critical (above 1800 mps). These values are configurable.
- In the description, information is presented similar to: "Ingress Traffic Rate at Locality: <Locality of scp> is above <threshold level (minor/major/critical> threshold (i.e. <value of threshold>)"
- In Summary:
"Namespace: <Namespace of scp deployment that Locality>, Pod:
<SCP-worker Pod name>: Current Ingress Traffic Rate is <Current rate
of Ingress traffic > mps which is above 70 Percent of Max MPS(<upper
limit of ingress traffic rate per pod>)"
Note:
Ingress traffic rate is per scp-worker pod in a namespace at particular SCP-Locality. Currently, 2000mps is the upper limit for per scp-worker pod.
- SCP Routing Failed For
Service
- It alerts for which NF Service Type and NF Type at particular locality, Routing failed
- Description: "Routing failed for service"
- Summary: "Routing failed for service: NFService
Type = <Message NF Service Type>, NFType = <Message NF Type>, Locality =
<SCP Locality where Routing Failed> and value = <Accumulated failure till
now, of such message for NFType and NFService Type>"
Note:
The value field currently does not provide the number of failures in particular time interval, instead it provides the total number of Routing failures.
- SCP Pod Memory Usage: Type of alert is
SCPWorkerPodMemoryUsage.
- Pod memory usage for SCP Pods (Soothsayer and Worker) deployed at a particular node instance is provided.
- The Soothsayer pod threshold is 8 GB
- The Worker pod threshold is 4 GB
- Summary: Instance: "<Node Instance name>, NameSpace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker) Pod name>: <Soothsayer/Worker> Pod High Memory usage detected"
- Summary: "Instance: "<Node Instance name>, Namespace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker) Pod name>: Memory usage is above <threshold value>G (current value is: <current value of memory usage>)"
6.3.4 Configuring Alert Manager for SNMP Notifier
Grouping of alerts is based on:
- podname
- alertname
- severity
- namespace
- nfServiceType
- nfServiceInstanceId
- Take a backup of the current config map of Alertmanager by
running the following command:
kubectl get configmaps <NAME-alertmanager> -oyaml -n <Namespace> > /tmp/bkupAlertManagerConfig.yaml
Example:
kubectl get configmaps occne-prometheus-alertmanager -oyaml -n occne-infra > /tmp/bkupAlertManagerConfig.yaml
- Edit Configmap to add subroute for SCP Trap
OID:
Example:kubectl edit configmaps <NAME-alertmanager> -n <Namespace>
kubectl edit configmaps occne-prometheus-alertmanager -n occne-infra
- Add the subroute under 'route' in
configmap:
routes: - receiver: default-receiver group_interval: 1m group_wait: 10s repeat_interval: 9y group_by: [podname, alertname, severity, namespace, nfservicetype, nfserviceinstanceid, servingscope, nftype] match_re: oid: ^1.3.6.1.4.1.323.5.3.35.(.*)
MIB Files for SCP
- ocscp_mib_tc_23.4.4.mib: This is considered as SCP top level mib file, where the Objects and their data types are defined.
- ocscp_mib_23.4.4.mib: This file fetches the Objects from the top level mib file and based on the Alert notification, these objects can be selected for display.
Note:
MIB files are packaged withocscp_csar_23_2_0_0_0.zip
. You can download the file
from MOS as described in Oracle Communications Cloud Native Core, Service Communication Proxy
Installation, Upgrade, and Fault Recovery Guide.