5 Alerts
This section provides information about alerts for Oracle Communications Cloud Native Core, Network Slice Selection Function (NSSF).
5.1 System Level Alerts
This section lists the system level alerts.
5.1.1 OcnssfNfStatusUnavailable
Table 5-1 OcnssfNfStatusUnavailable
| Field | Details |
|---|---|
| Description | 'OCNSSF services unavailable' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCNSSF services are unavailable.' |
| Severity | Critical |
| Condition | All the NSSF services are unavailable, either because the NSSF is getting deployed or purged. These NSSF services considered are nssfselection, nssfsubscription, nssfavailability, nssfconfiguration, appinfo, ingressgateway and egressgateway. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9001 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions | The alert is cleared automatically when the NSSF
services start becoming available.
Steps:
|
5.1.2 OcnssfPodsRestart
Table 5-2 OcnssfPodsRestart
| Field | Details |
|---|---|
| Description | 'Pod <Pod Name> has restarted. |
| Summary | 'kubernetes_namespace: {{$labels.namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted' |
| Severity | Major |
| Condition | A pod belonging to any of the NSSF services has restarted. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9002 |
| Metric Used | 'kube_pod_container_status_restarts_total'Note: This is a Kubernetes metric. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared automatically if the specific pod is up. Steps:
|
5.1.3 OcnssfSubscriptionServiceDown
Table 5-3 OcnssfSubscriptionServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF Subscription service <ocnssf-nssubscription> is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NssfSubscriptionServiceDown service down' |
| Severity | Critical |
| Condition | NssfSubscription services is unavailable. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9003 |
| Metric Used |
''up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions | The alert is cleared when the NssfSubscription services
is available.
Steps:
|
5.1.4 OcnssfSelectionServiceDown
Table 5-4 OcnssfSelectionServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF Selection service <ocnssf-nsselection> is down'. |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : OcnssfSelectionServiceDown service down' |
| Severity | Critical |
| Condition | None of the pods of the NSSFSelection microservice is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9004 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions | The alert is cleared when the nfsubscription service is
available.
Steps:
|
5.1.5 OcnssfAvailabilityServiceDown
Table 5-5 OcnssfAvailabilityServiceDown
| Field | Details |
|---|---|
| Description | 'Ocnssf Availability service ocnssf-nsavailability is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NssfAvailability service down' |
| Severity | Critical |
| Condition | None of the pods of the OcnssfAvailabilityServiceDown microservice is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9005 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions | The alert is cleared when the ocnssf-nsavailability
service is available.
Steps:
|
5.1.6 OcnssfConfigurationServiceDown
Table 5-6 OcnssfConfigurationServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF Config service nssfconfiguration is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : OcnssfConfigServiceDown service down' |
| Severity | Critical |
| Condition | None of the pods of the NssfConfiguration microservice is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9006 |
| Metric Used |
'up' Note: : This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the nssfconfiguration service is available. Steps:
|
5.1.7 OcnssfAppInfoServiceDown
Table 5-7 OcnssfAppInfoServiceDown
| Field | Details |
|---|---|
| Description | OCNSSF Appinfo service appinfo is down' |
| Summary | kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Appinfo service down' |
| Severity | Critical |
| Condition | None of the pods of the App Info microservice is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9007 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the app-info service is available. Steps:
|
5.1.8 OcnssfIngressGatewayServiceDown
Table 5-8 OcnssfIngressGatewayServiceDown
| Field | Details |
|---|---|
| Description | 'Ocnssf Ingress-Gateway service ingressgateway is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : OcnssfIngressGwServiceDown service down' |
| Severity | Critical |
| Condition | None of the pods of the Ingress-Gateway microservice is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9008 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the ingressgateway service is available. Steps:
|
5.1.9 OcnssfEgressGatewayServiceDown
Table 5-9 OcnssfEgressGatewayServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF Egress service egressgateway is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : OcnssfEgressGwServiceDown service down' |
| Severity | Critical |
| Condition | None of the pods of the Egress-Gateway microservice is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9009 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the egressgateway service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.10 OcnssfOcpmConfigServiceDown
Table 5-10 OcnssfOcpmConfigServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF OCPM Config service is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnssf OCPM Config service down' |
| Severity | Critical |
| Condition | None of the pods of the ConfigService is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9037 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the ConfigService is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.11 OcnssfPerfInfoServiceDown
Table 5-11 OcnssfPerfInfoServiceDown
| Field | Details |
|---|---|
| Description | OCNSSF PerfInfo service is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnssf PerfInfo service down' |
| Severity | Critical |
| Condition | None of the pods of the PerfInfo service is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9036 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the PerfInfo service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.12 OcnssfNrfClientManagementServiceDown
Table 5-12 OcnssfNrfClientManagementServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF NrfClient Management service is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnssf NrfClient Management service down' |
| Severity | Critical |
| Condition | None of the pods of the NrfClientManagement service is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9034 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the NrfClientManagement service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.13 OcnssfNrfClientDiscoveryServiceDown
Table 5-13 OcnssfNrfClientDiscoveryServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF NrfClient Discovery service is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnssf NrfClient Discovery service down' |
| Severity | Critical |
| Condition | None of the pods of the NrfClient Discovery service is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9033 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the NrfClient Discovery service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.14 OcnssfAlternateRouteServiceDown
Table 5-14 OcnssfAlternateRouteServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF Alternate Route service is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnssf Alternate Route service down' |
| Severity | Critical |
| Condition | None of the pods of the Alternate Route service is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9032 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the Alternate Route service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.15 OcnssfAuditorServiceDown
Table 5-15 OcnssfAuditorServiceDown
| Field | Details |
|---|---|
| Description | 'OCNSSF NsAuditor service is down' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnssf NsAuditor service down' |
| Severity | Critical |
| Condition | None of the pods of the NsAuditor service is available. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9031 |
| Metric Used |
'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system. |
| Recommended Actions |
The alert is cleared when the NsAuditor service is available. Note: The threshold is configurable in the alerts.yaml Steps:
|
5.1.16 OcnssfTotalIngressTrafficRateAboveMinorThreshold
Table 5-16 OcnssfTotalIngressTrafficRateAboveMinorThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above the configured minor threshold i.e. 800 requests per second (current value is: {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second(1000)' |
| Severity | Minor |
| Condition |
The total Ocnssf Ingress Message rate has crossed the configured minor threshold of 800 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when Ocnssf Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate). |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9010 |
| Metric Used | 'oc_ingressgateway_http_requests_total' |
| Recommended Actions |
The alert is cleared either when the total Ingress Traffic rate falls below the Minor threshold or when the total traffic rate crosses the Major threshold, in which case the OcnssfTotalIngressTrafficRateAboveMinorThreshold alert shall be raised. Note: The threshold is configurable in the alerts.yaml Steps: Reassess the reason why the NSSF is receiving additional traffic, for example, the mated site NSSF is unavailable in the georedundancy scenario. If this is unexpected, contact My Oracle Support.
|
5.1.17 OcnssfTotalIngressTrafficRateAboveMajorThreshold
Table 5-17 OcnssfTotalIngressTrafficRateAboveMajorThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above the configured major threshold i.e. 900 requests per second (current value is: {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second(1000)' |
| Severity | Major |
| Condition |
The total Ocnssf Ingress Message rate has crossed the configured major threshold of 900 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when Ocnssf Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate). |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9011 |
| Metric Used | 'oc_ingressgateway_http_requests_total' |
| Recommended Actions |
The alert is cleared when the total Ingress traffic rate falls below the major threshold or when the total traffic rate crosses the critical threshold, in which case the alert shall be raised. OcnssfTotalIngressTrafficRateAboveCriticalThreshold Note: The threshold is configurable in the alerts.yaml Steps: Reassess the reason why the NSSF is receiving additional traffic, for example, the mated site NSSF is unavailable in the georedundancy scenario. If this is unexpected, contact My Oracle Support.
|
5.1.18 OcnssfTotalIngressTrafficRateAboveCriticalThreshold
Table 5-18 OcnssfTotalIngressTrafficRateAboveCriticalThreshold
| Field | Details |
|---|---|
| Description | 'Ingress traffic Rate is above the configured critical threshold i.e. 950 requests per second (current value is: {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 95 Percent of Max requests per second(1000)' |
| Severity | Critical |
| Condition |
The total Ocnssf Ingress Message rate has crossed the configured critical threshold of 950 TPS. Default value of this alert trigger point in NrfAlertValues.yaml is when Ocnssf Ingress Rate crosses 95 % of 1000 (Maximum ingress request rate). |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9012 |
| Metric Used | 'oc_ingressgateway_http_requests_total' |
| Recommended Actions |
The alert is cleared when the Ingress traffic rate falls below the critical threshold. Note: The threshold is configurable in the alerts.yaml Steps: Reassess the reason why the NSSF is receiving additional traffic, for example, the mated site NSSF is unavailable in the georedundancy scenario. If this is unexpected, contact My Oracle Support.
|
5.1.19 OcnssfTransactionErrorRateAbove0.1Percent
Table 5-19 OcnssfTransactionErrorRateAbove0
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions' |
| Severity | Warning |
| Condition | The number of failed transactions is above 0.1 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9013 |
| Metric Used | 'oc_ingressgateway_http_responses_total' |
| Recommended Actions |
The alert is cleared when the number of failure transactions are below 0.1 percent of the total transactions or when the number of failure transactions crosses the 1% threshold in which case the OcnssfTransactionErrorRateAbove1Percent is raised. Steps:
|
5.1.20 OcnssfTransactionErrorRateAbove1Percent
Table 5-20 OcnssfTransactionErrorRateAbove1Percent
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions' |
| Severity | Warning |
| Condition | The number of failed transactions is above 1 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9014 |
| Metric Used | 'oc_ingressgateway_http_responses_total' |
| Recommended Actions |
The alert is cleared when the number failed transactions is below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold in which case the OcnssfTransactionErrorRateAbove10Percent shall be raised. Steps:
|
5.1.21 OcnssfTransactionErrorRateAbove10Percent
Table 5-21 OcnssfTransactionErrorRateAbove10Percent
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions' |
| Severity | Minor |
| Condition | The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9015 |
| Metric Used | 'oc_ingressgateway_http_responses_total' |
| Recommended Actions |
The alert is cleared when the number of failed transactions crosses the 10% threshold of the total transactions or when the ailed transactions crosses the 25% threshold in which case the OcnssfTransactionErrorRateAbove25Percent shall be raised. Steps:
|
5.1.22 OcnssfTransactionErrorRateAbove25Percent
Table 5-22 OcnssfTransactionErrorRateAbove25Percent
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})' |
| summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions' |
| Severity | Major |
| Condition | The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9016 |
| Metric Used | 'oc_ingressgateway_http_responses_total' |
| Recommended Actions |
The alert is cleared when the number of failed transactions crosses the 25% of the total transactions or when the number of failed transactions crosses the 50% threshold in which case the OcnssfTransactionErrorRateAbove50Percent shall be raised. Steps:
|
5.1.23 OcnssfTransactionErrorRateAbove50Percent
Table 5-23 OcnssfTransactionErrorRateAbove50Percent
| Field | Details |
|---|---|
| Description | 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions' |
| Severity | Critical |
| Condition | The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9017 |
| Metric Used | 'oc_ingressgateway_http_responses_total |
| Recommended Actions |
The alert is cleared when the number of failed transactions is below 50 percent of the total transactions. Steps:
|
5.1.24 OcnssfIngressGatewayPodCongestionStateWarning
Table 5-24 OcnssfIngressGatewayPodCongestionStateWarning
| Field | Details |
|---|---|
| Description | Ingress gateway pod congestion state reached DOC |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Ingress gateway pod congestion state reached DOC' |
| Severity | Warning |
| Condition | Ingress gateway pod has moved into a state of DOC for any of the aforementioned metrics. Thresholds are configured for CPU, Pending messages count. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9027 |
| Metric Used | oc_ingressgateway_pod_congestion_state |
| Recommended Actions |
Reassess the reasons leading to NSSF receiving additional traffic. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, check in Grafana for the distribution of traffic among the Ingress gateway pods. Then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.1.25 OcnssfIngressGatewayPodCongestionStateMajor
Table 5-25 OcnssfIngressGatewayPodCongestionStateMajor
| Field | Details |
|---|---|
| Description | Ingress gateway pod congestion state when reached CONGESTED |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Ingress gateway pod congestion state when reached CONGESTED' |
| Severity | Major |
| Condition | Ingress gateway pod has moved into a state of CONGESTED for any of the aforementioned metrics. Thresholds are configured for CPU, Pending messages count. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9028 |
| Metric Used | oc_ingressgateway_pod_congestion_state |
| Recommended Actions |
Reassess the reasons leading to NSSF receiving additional traffic. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, check in Grafana for the distribution of traffic among the Ingress gateway pods. Then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.1.26 OcnssfIngressGatewayPodResourceStateWarning
Table 5-26 OcnssfIngressGatewayPodResourceStateWarning
| Field | Details |
|---|---|
| Description | The ingress gateway pod congestion state reached DOC because of excessive usage of resources |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The ingress gateway pod congestion state reached DOC because of excessive usage of resources' |
| Severity | Warning |
| Condition | The configured threshold for resource cunsumption for state DOC for Ingress gateway is breached. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9029 |
| Metric Used | oc_ingressgateway_pod_resource_state |
| Recommended Actions |
Reassess the reasons leading to NSSF receiving additional traffic. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, check in Grafana for the distribution of traffic among the Ingress gateway pods. Then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.1.27 OcnssfIngressGatewayPodResourceStateMajor
Table 5-27 OcnssfIngressGatewayPodResourceStateMajor
| Field | Details |
|---|---|
| Description | The ingress gateway pod congestion state reached CONGESTED because of excessive usage of resources |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The ingress gateway pod congestion state reached CONGESTED because of excessive usage of resources' |
| Severity | Major |
| Condition | The configured threshold for resource cunsumption for state CONGESTED for Ingress gateway is breached. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9030 |
| Metric Used | oc_ingressgateway_pod_resource_state |
| Recommended Actions |
Reassess the reasons leading to NSSF receiving additional traffic. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, check in Grafana for the distribution of traffic among the Ingress gateway pods. Then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.2 Application Level Alerts
This section lists the application level alerts.
5.2.1 ocnssfPolicyNotFoundWarning
Table 5-28 ocnssfPolicyNotFoundWarning
| Field | Details |
|---|---|
| Description | 'Policy Not Found Rate is above warning threshold i.e. 700 mps (current value is: {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value
| humanizeTimestamp }}{{ end }}: 'Policy Not Found Rate is above 70
Percent'
rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 100 < 150 |
| Severity | Warning |
| Condition | Rate of messages that did not find a matching policy is above warning threshold (Threshold: <>, Current: <>). |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9018 |
| Metric Used | ocnssf_nsselection_policy_not_found_total |
| Recommended Actions |
This alert is cleared when the number of error transactions are below 70 percent of the total traffic. Steps:
|
5.2.2 ocnssfPolicyNotFoundMajor
Table 5-29 ocnssfPolicyNotFoundMajor
| Field | Details |
|---|---|
| Description | 'Policy Not Found Rate is above major threshold i.e. 850 mps (current value is: {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: 'Policy Not Found Rate is above 85 Percent' |
| Severity | Major |
| Condition | Rate of messages that did not find a matching policy is above major threshold. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9019 |
| Metric Used | ocnssf_nsselection_policy_not_found_total |
| Recommended Actions |
This alert is cleared when the number of error transactions are below 85% of the total traffic. Steps:
|
5.2.3 ocnssfPolicyNotFoundCritical
Table 5-30 ocnssfPolicyNotFoundCritical
| Field | Description |
|---|---|
| Description | 'Policy Not Found Rate is above critical threshold i.e. 950 mps (current value is: {{ $value }})' |
| Summary | 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: 'Policy Not Found Rate is above 95 Percent' |
| Severity | Critical |
| Condition | Rate of messages that did not find a matching policy is above critical threshold. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9020 |
| Metric Used | ocnssf_nsselection_policy_not_found_total |
| Recommended Actions |
This alert is cleared when the number of error transactions are below 95 percent of the total traffic. Steps:
|
5.2.4 OcnssfOverloadThresholdBreachedL1
Table 5-31 OcnssfOverloadThresholdBreachedL1
| Field | Details |
|---|---|
| Description | 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L1' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L1' |
| Severity | Warning |
| Condition | NSSF Services have breached their configured threshold
of Level L1 for any of the aforementioned metrics.
Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9021 |
| Metric Used | load_level |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the configured L1 threshold. Note: The thresholds can be configured using REST API. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.2.5 OcnssfOverloadThresholdBreachedL2
Table 5-32 OcnssfOverloadThresholdBreachedL2
| Field | Details |
|---|---|
| Description | 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L2' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L2' |
| Severity | Warning |
| Condition | NSSF Services have breached their configured threshold
of Level L2 for any of the aforementioned metrics.
Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9022 |
| Metric Used | load_level |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the configured L2 threshold. Note: The thresholds can be configured using REST API. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.2.6 OcnssfOverloadThresholdBreachedL3
Table 5-33 OcnssfOverloadThresholdBreachedL3
| Field | Details |
|---|---|
| Description | 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L3' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L3' |
| Severity | Warning |
| Condition | NSSF Services have breached their configured threshold
of Level L3 for any of the aforementioned metrics.
Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9023 |
| Metric Used | load_level |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the configured L3 threshold. Note: The thresholds can be configured using REST API. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.2.7 OcnssfOverloadThresholdBreachedL4
Table 5-34 OcnssfOverloadThresholdBreachedL4
| Field | Details |
|---|---|
| Description | 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L4' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L4' |
| Severity | Warning |
| Condition | NSSF Services have breached their configured threshold
of Level L4 for any of the aforementioned metrics.
Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9024 |
| Metric Used | load_level |
| Recommended Actions |
The alert is cleared when the Ingress Traffic rate falls below the configured L4 threshold. Note: The thresholds can be configured using REST API. Steps: Reassess the reasons leading to NSSF receiving additional traffic. If this is unexpected, contact My Oracle Support. 1. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. 2. Check the service pod logs on Kibana to determine the reason for the errors. 3. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Slice Selection Function REST Specification Guide. |
5.2.8 OcnssfScpMarkedAsUnavailable
Table 5-35 OcnssfScpMarkedAsUnavailable
| Field | Details |
|---|---|
| Description | 'An SCP has been marked unavailable' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : One of the SCP has been marked unavailable' |
| Severity | Major |
| Condition | One of the SCPs has been marked unhealthy. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9025 |
| Metric Used | 'oc_egressgateway_peer_health_status' |
| Recommended Actions | This alert get cleared when unavailable SCPs become available. |
5.2.9 OcnssfAllScpMarkedAsUnavailable
Table 5-36 OcnssfAllScpMarkedAsUnavailable
| Field | Details |
|---|---|
| Description | 'All SCPs have been marked unavailable' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All SCPs have been marked as unavailable' |
| Severity | Critical |
| Condition | All SCPs have been marked unavailable. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9026 |
| Metric Used | 'oc_egressgateway_peer_count and oc_egressgateway_peer_available_count' |
| Recommended Actions | NF clears the critical alarm when at least one SCP peer in a peer set becomes available such that all other SCP or SEPP peers in the given peer set are still unavailable. |
5.2.10 OcnssfTLSCertificateExpireMinor
Table 5-37 OcnssfTLSCertificateExpireMinor
| Field | Details |
|---|---|
| Description | 'TLS certificate to expire in 6 months'. |
| Summary | 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : TLS certificate to expire in 6 months' |
| Severity | Minor |
| Condition | This alert is raised when the TLS certificate is about to expire in six months. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9038 |
| Metric Used | security_cert_x509_expiration_seconds |
| Recommended Actions |
The alert is cleared when the TLS certificate is renewed. For more information about certificate renewal, see "Creating Private Keys and Certificate " section in the Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide. |
5.2.11 OcnssfTLSCertificateExpireMajor
Table 5-38 OcnssfTLSCertificateExpireMajor
| Field | Details |
|---|---|
| Description | 'TLS certificate to expire in 3 months.' |
| Summary | 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : TLS certificate to expire in 3 months' |
| Severity | Major |
| Condition | This alert is raised when the TLS certificate is about to expire in three months. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9039 |
| Metric Used | security_cert_x509_expiration_seconds |
| Recommended Actions |
The alert is cleared when the TLS certificate is renewed. For more information about certificate renewal, see "Creating Private Keys and Certificate " section in the Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide. |
5.2.12 OcnssfTLSCertificateExpireCritical
Table 5-39 OcnssfTLSCertificateExpireCritical
| Field | Details |
|---|---|
| Description | 'TLS certificate to expire in one month.' |
| Summary | 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : TLS certificate to expire in 1 month' |
| Severity | Critical |
| Condition | This alert is raised when the TLS certificate is about to expire in one month. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9040 |
| Metric Used | security_cert_x509_expiration_seconds |
| Recommended Actions |
The alert is cleared when the TLS certificate is renewed. For more information about certificate renewal, see "Creating Private Keys and Certificate " section in the Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide. |
5.2.13 OcnssfNrfInstancesInDownStateMajor
Table 5-40 OcnssfNrfInstancesInDownStateMajor
| Field | Details |
|---|---|
| Description | 'When current operative status of any NRF Instance is unavailable/unhealthy' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Few of the NRF instances are in unavailable state' |
| Severity | Major |
| Condition | When sum of the metric values of each NRF instance is greater than 0 but less than 3. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9042 |
| Metric Used | nrfclient_nrf_operative_status |
| Recommended Actions |
This alert is cleared when operative status of all the NRF Instances is available/healthy. Steps:
|
5.2.14 OcnssfAllNrfInstancesInDownStateCritical
Table 5-41 OcnssfAllNrfInstancesInDownStateCritical
| Field | Details |
|---|---|
| Description | 'When current operative status of all the NRF Instances is unavailable/unhealthy' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All the NRF instances are in unavailable state' |
| Severity | Critical |
| Condition | When sum of the metric values of each NRF instance is equal to 0. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9041 |
| Metric Used | nrfclient_nrf_operative_status |
| Recommended Actions |
This alert is cleared when current operative status of atleast one NRF Instance is available/healthy. Steps:
|
5.2.15 SubscriptionToNrfFailed
Table 5-42 SubscriptionToNrfFailed
| Field | Details |
|---|---|
| Description | 'Subscription to NRF failed for NSSF' |
| Summary | 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Subscription to NRF failed for NSSF' |
| Severity | Major |
| Condition | It gets triggered when subscription to NRF for NSSF fails, in case of GR scenario. |
| OID | 1.3.6.1.4.1.323.5.3.40.1.2.9043 |
| Metric Used | nssf_subscription_to_nrf_successful |
| Recommended Actions |
The alert gets triggered if the value of above metrics is 0, once subscription is successful, the value of metric changes to 1, it stops triggering and alert is cleared. Steps:
|