5 NRF Alerts

This chapter includes information about the NRF alerts.

The following table describes the various alert levels generated by NRF:

Table 5-1 Alerts Levels or Severity Types

Alerts Levels/Severity Types Definition
Critical Indicates a severe issue that poses a significant risk to safety, security, or operational integrity. It requires immediate response to address the situation and prevent serious consequences. Raised for conditions may affect the service of NRF.
Major Indicates a more significant issue that has an impact on operations or poses a moderate risk. It requires prompt attention and action to mitigate potential escalation. Raised for conditions may affect the service of NRF.
Minor Indicates a situation that is low in severity and does not pose an immediate risk to safety, security, or operations. It requires attention but does not demand urgent action. Raised for conditions may affect the service of NRF.
Info or Warn (Informational) Provides general information or updates that are not related to immediate risks or actions. These alerts are for awareness and do not typically require any specific response. WARN and INFO alerts may not impact the service of NRF.

Note:

  • Summary or dimensions may vary based on deployment.
  • The alert triggering time varies as per the environment in which it is deployed.
  • The performance and capacity of the NRF system may vary based on the call model, Feature or Interface configuration, and underlying CNE and hardware environment.

5.1 System Level Alerts

This section lists the system level alerts.

5.1.1 OcnrfNfStatusUnavailable

Table 5-2 OcnrfNfStatusUnavailable

Field Details
Description 'OCNRF services unavailable'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All OCNRF services are unavailable.'
Severity Critical
Condition When all the NRF services are unavailable, either because the NRF is getting deployed or purged. The NRF services considered are nfregistration, nfsubscription, nrfauditor, nrfconfiguration, nfaccesstoken, nfdiscovery, appinfo, ingressgateway, and egressgateway.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7016
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring.

If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions The alert is cleared automatically when the NRF services restart.

Steps:

  1. Check for service-specific alerts which may be causing the issues with service exposure.
  2. Run the following command to check the pod status:
    $ kubectl get po -n <namespace>
    1. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  3. Refer to the application logs on Kibana and check for database related failures such as connectivity and invalid secrets. The logs can be filtered based on the services.
  4. Check for helm status to make sure there are no errors:
    $ helm status <helm release name of the desired NF> -n <namespace>

    If it is not in “STATUS : DEPLOYED”, then capture logs and event again.

  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on the Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.2 OcnrfPodsRestart

Table 5-3 OcnrfPodsRestart

Field Details
Description 'Pod <Pod Name> has restarted.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A Pod has restarted'
Severity Major
Condition A pod belonging to any of the NRF services have restarted.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7017
Metric Used 'kube_pod_container_status_restarts_total'

Note: This is a Kubernetes metric. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared automatically if the specific pod is up.

Steps:

  1. Refer to the application logs on Kibana and filter based on pod name, check for database related failures such as connectivity and Kubernetes secrets.
  2. To check the orchestration logs for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  3. Check the database status. For more information, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on the Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.3 NnrfNFManagementServiceDown

Table 5-4 NnrfNFManagementServiceDown

Field Details
Description 'OCNRF Nnrf_Management service <nfregistration|nfsubscription|nrfauditor> is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFManagement service is down'
Severity Critical
Condition This alert is raised when either NFRegistration, NFSubscription, or NrfAuditor services are unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7018
Metric Used ''up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions The alert is cleared when all the Nnrf_NFManagement services nfregistration, nfsubscription, and nrfauditor are available.

Steps:

  1. Check if NfService specific alerts are generated to understand which service is down.

    Either some or all of the following alerts are generated based on which services are down

    • OcnrfRegistrationServiceDown
    • OcnrfSubscriptionServiceDown
    • OcnrfAuditorServiceDown
  2. To check the orchestration logs for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  3. Check for the POD’s status if they are in “Running” state using the following command:
    $ kubectl get pod –n <namespace> 
    If it is not in “Running” state, capture the pod logs and events by running the following command:
    $ kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>
  4. Refer to the application logs on Kibana and filter based on aforementioned service names. Check for ERROR WARNING logs for each of these services.
  5. Check the database status. For more information on how to check the database status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  6. Refer to the application logs on Kibana and filter the service appinfo, check for the service status of the nfregistration, nfsubscription, and nrfauditor services.
  7. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.4 NnrfAccessTokenServiceDown

Table 5-5 NnrfAccessTokenServiceDown

Field Details
Description 'OCNRF Nnrf_NFAccessToken service nfaccesstoken is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccessToken service down'
Severity Critical
Condition This alert is raised when NFAccessToken service is unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7020
Metric Used ''up''

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available use a similar metric as exposed by the monitoring system.

Recommended Actions The alert is cleared when the Nnrf_AccessToken service is available.

Steps:

  1. To check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. Depending on the failure reason, take the resolution steps.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.5 NnrfNFDiscoveryServiceDown

Table 5-6 NnrfNFDiscoveryServiceDown

Field Details
Description 'OCNRF Nnrf_NFDiscovery service nfdiscovery is down'
Applicable in OCI No
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down'
Severity Critical
Condition NFDiscovery is unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7019
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the Nnrf_NFDiscovery service is available.

Steps:

  1. To check the orchestration logs of nfdiscovery service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. Depending on the failure reason, take the resolution steps.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.6 OcnrfRegistrationServiceDown

Table 5-7 OcnrfRegistrationServiceDown

Field Details
Description 'OCNRF NFRegistration service nfregistration is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFRegistration service is down'
Severity Critical
Condition None of the pods of the NFRegistration microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7021
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the nfregistration service is available.

Steps:

  1. To check the orchestration logs of nfregistration service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nfregistration service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.7 OcnrfSubscriptionServiceDown

Table 5-8 OcnrfSubscriptionServiceDown

Field Details
Description 'OCNRF NFSubscription service nfsubscription is down.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFSubscription service is down'
Severity Critical
Condition None of the pods of the NFSubscription microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7022
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions The alert is cleared when the nfsubscription service is available.

Steps:

  1. To check the orchestration logs of nfsubscription service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nfsubcription service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. Depending on the failure reason, take the resolution steps.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.8 OcnrfDiscoveryServiceDown

Table 5-9 OcnrfDiscoveryServiceDown

Field Details
Description 'OCNRF NFDiscovery service nfdiscovery is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFDiscovery service down'
Severity Critical
Condition None of the pods of the NFDiscovery microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7023
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions The alert is cleared when the nfdiscovery service is available.

Steps:

  1. To check the orchestration logs of nfregistration service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nfdiscovery service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.9 OcnrfAccessTokenServiceDown

Table 5-10 OcnrfAccessTokenServiceDown

Field Details
Description 'OCNRF NFAccessToken service nfaccesstoken is down
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NFAccesstoken service down'
Severity Critical
Condition None of the pods of the NFAccessToken microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7024
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions The alert is cleared when the nfaccesstoken service is available.

Steps:

  1. To check the orchestration logs of nfaccesstoken service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nfaccesstoken service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. Depending on the failure reason, take the resolution steps.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.10 OcnrfAuditorServiceDown

Table 5-11 OcnrfAuditorServiceDown

Field Details
Description 'OCNRF NrfAuditor service nrfauditor is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfAuditor service down'
Severity Critical
Condition None of the pods of the NrfAuditor microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7026
Metric Used 'up' Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.
Recommended Actions

The alert is cleared when the nrfauditor service is available.

Steps:

  1. To check the orchestration logs of nrfauditor service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nrfauditor service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. Depending on the failure reason, take the resolution steps.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.11 OcnrfConfigurationServiceDown

Table 5-12 OcnrfConfigurationServiceDown

Field Details
Description 'OCNRF NrfConfiguration service nrfconfiguration is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfConfiguration service down'
Severity Critical
Condition None of the pods of the NrfConfiguration microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7025
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the nrfconfiguration service is available.

Steps:

  1. To check the orchestration logs of nrfconfiguration service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on nrfconfiguration service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. Depending on the failure reason, take the resolution steps.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.12 OcnrfAppInfoServiceDown

Table 5-13 OcnrfAppInfoServiceDown

Field Details
Description 'OCNRF Appinfo service appinfo is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Appinfo service down'
Severity Critical
Condition None of the pods of the appinfo microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7027
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the appinfo service is available.

Steps:

  1. To check the orchestration logs of appinfo service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on appinfo service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.13 OcnrfArtisanServiceDown

Table 5-14 OcnrfArtisanServiceDown

Field Details
Description 'OCNRF NrfArtisan service {{$labels.app_kubernetes_io_name}} is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : NrfArtisan service is down'
Severity Critical
Condition NrfArtisan is unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7056
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the NrfArtisan service is available.

Steps:

  1. To check the orchestration logs of NrfArtisan service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get pod -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter the logs based on NrfArtisan service names. Check for ERROR and WARNING logs related to thread exceptions.
  3. Check the database status. For more information, see the Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.14 OcnrfAlternateRouteServiceDown

Table 5-15 OcnrfAlternateRouteServiceDown

Field Details
Description 'OCNRF AlternateRoute service {{$labels.app_kubernetes_io_name}} is down'
Applicable in OCI No
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : AlternateRoute service is down'
Severity Critical
Condition AlternateRoute is unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7057
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
Recommended Actions

The alert is cleared when the alternate-route service is available.

Steps:

  1. To check the orchestration logs of alternate-route service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get pod -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter the logs based on Alternate-Route service names. Check for ERROR and WARNING logs related to thread exceptions.
  3. Check the database status. For more information, see the Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

5.1.15 OcnrfPerfInfoServiceDown

Table 5-16 OcnrfPerfInfoServiceDown

Field Details
Description 'OCNRF Perfinfo service {{$labels.app_kubernetes_io_name}} is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Perfinfo service down'
Severity Critical
Condition Perfinfo is unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7058
Metric Used

'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use the similar metric as exposed by the monitoring system.
Recommended Actions

The alert is cleared when the Perfinfo service is available.

Steps:

  1. To check the orchestration logs of Perfinfo service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get pod -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter the logs based on Perf-Info service names. Check for ERROR and WARNING logs related to thread exceptions.
  3. Check the database status. For more information, see the Oracle Communications Cloud Native Core, cnDBTier User Guide.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.16 OcnrfIngressGatewayServiceDown

Table 5-17 OcnrfIngressGatewayServiceDown

Field Details
Description 'OCNRF Ingress-Gateway service ingressgateway is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ingress-gateway service down'
Severity Critical
Condition None of the pods of the Ingress Gateway microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7028
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the Ingress Gateway service is available.

Steps:

  1. To check the orchestration logs of Ingress Gateway service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on Ingress Gateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.17 OcnrfEgressGatewayServiceDown

Table 5-18 OcnrfEgressGatewayServiceDown

Field Details
Description 'OCNRF Egress-Gateway service egressgateway is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Egress-Gateway service down'
Severity Critical
Condition None of the pods of the Egress Gateway microservice is available.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7029
Metric Used 'up'

Note: This is a Prometheus metric used for instance availability monitoring. If this metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions

The alert is cleared when the Egress Gateway service is available.

Steps:

  1. To check the orchestration logs of Egress Gateway service and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the Running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on Egress Gateway service names. Check for ERROR WARNING logs related to thread exceptions.
  3. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No

5.1.18 OcnrfTotalIngressTrafficRateAboveMinorThreshold

Table 5-19 OcnrfTotalIngressTrafficRateAboveMinorThreshold

Field Details
Description 'Total Ingress traffic Rate is above configured minor threshold. (current value is: {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 80 Percent of Max requests per second'
Severity Minor
Condition

The total NRF Ingress Message rate has crossed the configured minor threshold of 800 TPS.

Default value of this alert trigger point in alert file is when NRF Ingress Rate crosses 80 % of 1000 (Maximum ingress request rate).

OID 1.3.6.1.4.1.323.5.3.36.1.2.7001
Metric Used 'oc_ingressgateway_http_requests_total'
Recommended Actions

The alert is cleared either when the total Ingress Traffic rate falls below the minor threshold or when the total traffic rate crosses the major threshold, in which case the OcnrfTotalIngressTrafficRateAboveMajorThreshold alert is raised.

Note: The threshold is configurable in the alert file.

Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.
Steps:
  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer Ingress gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Ingress gateway logs on Kibana to determine the reason for the errors.
Available in OCI No

5.1.19 OcnrfTotalIngressTrafficRateAboveMajorThreshold

Table 5-20 OcnrfTotalIngressTrafficRateAboveMajorThreshold

Field Details
Description 'Total Ingress traffic Rate is above major threshold. (current value is: {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 90 Percent of Max requests per second'
Severity Major
Condition

The total NRF Ingress Message rate has crossed the configured major threshold of 900 TPS.

Default value of this alert trigger point in the alert file is when NRF Ingress Rate crosses 90 % of 1000 (Maximum ingress request rate).

OID 1.3.6.1.4.1.323.5.3.36.1.2.7002
Metric Used 'oc_ingressgateway_http_requests_total'
Recommended Actions

The alert is cleared when the total Ingress Traffic rate falls below the major threshold or when the total traffic rate crosses the critical threshold, in which case the OcnrfTotalIngressTrafficRateAboveCriticalThreshold alert is raised.

Note: The threshold is configurable in the alert file.

Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.
Steps:
  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer Ingress gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Ingress gateway logs on Kibana to determine the reason for the errors.
Available in OCI No

5.1.20 OcnrfTotalIngressTrafficRateAboveCriticalThreshold

Table 5-21 OcnrfTotalIngressTrafficRateAboveCriticalThreshold

Field Details
Description 'Total Ingress traffic Rate is above critical threshold.(current value is: {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is more than 52000 requests per second'
Severity Critical
Condition

The total NRF Ingress Message rate has crossed the configured critical threshold of 52000 TPS.

Default value of this alert trigger point in the alert file is when NRF Ingress Rate crosses 52000 TPS.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7003
Metric Used 'oc_ingressgateway_http_requests_total'
Recommended Actions

The alert is cleared when the Ingress traffic rate falls below the critical threshold.

Note: The threshold is configurable in the alert file.

Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.
Steps:
  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer Ingress gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Ingress gateway logs on Kibana to determine the reason for the errors.
Available in OCI No

5.1.21 OcnrfTransactionErrorRateAbove0Dot1Percent

Table 5-22 OcnrfTransactionErrorRateAbove0Dot1Percent

Field Details
Description 'Transaction Error rate is above 0.1 Percent of Total Transactions (current value is {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 0.1 Percent of Total Transactions'
Severity Warning
Condition The number of failed transactions is above 0.1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7004
Metric Used 'oc_ingressgateway_http_responses_total'
Recommended Actions

The alert is cleared when the number of failure transactions is below 0.1 percent of the total transactions or when the number of failed transactions crosses the 1% threshold, in which case the OcnrfTransactionErrorRateAbove1Percent is raised.

Steps:

  1. Check the service specific metrics to understand the specific service request errors.

    For example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. Check metrics per service, per method:

    For example, discovery requests can be determined from the following metrics:

    Metrics="oc_ingressgateway_http_responses_total"

    Method="GET"

    NFServiceType="nnrf-disc"

    Route_path="/nnrf-disc/v1/nf-instances/**"

    Status="503 SERVICE_UNAVAILABLE"

  3. If guidance is required, contact My Oracle Support.
Available in OCI No

5.1.22 OcnrfTransactionErrorRateAbove1Percent

Table 5-23 OcnrfTransactionErrorRateAbove1Percent

Field Details
Description 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 1 Percent of Total Transactions'
Severity Warning
Condition When the number of failed transactions is above 1 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7005
Metric Used 'oc_ingressgateway_http_responses_total'
Recommended Actions

The alert is cleared when the number of failure transactions is below 1% of the total transactions or when the number of failed transactions crosses the 10% threshold, in which case the OcnrfTransactionErrorRateAbove10Percent is raised.

Steps:

  1. Check the service specific metrics to understand the specific service request errors.

    For example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. Check metrics per service, per method:

    For example, discovery requests can be determined from the following metrics:

    Metrics="oc_ingressgateway_http_responses_total"

    Method="GET"

    NFServiceType="nnrf-disc"

    Route_path="/nnrf-disc/v1/nf-instances/**"

    Status="503 SERVICE_UNAVAILABLE"

  3. If guidance is required, contact My Oracle Support.
Available in OCI No

5.1.23 OcnrfTransactionErrorRateAbove10Percent

Table 5-24 OcnrfTransactionErrorRateAbove10Percent

Field Details
Description 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 10 Percent of Total Transactions'
Severity Minor
Condition The number of failed transactions has crossed the minor threshold of 10 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7006
Metric Used 'oc_ingressgateway_http_responses_total'
Recommended Actions

The alert is cleared when the number of failure transactions is below 10 percent of the total transactions or when the number of failed transactions crosses the 25 percent threshold, in which case the OcnrfTransactionErrorRateAbove25Percent is raised.

Steps:

  1. Check the service specific metrics to understand the specific service request errors.

    For example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. Check metrics per service, per method:

    For example, discovery requests can be determined from the following metrics:

    Metrics="oc_ingressgateway_http_responses_total"

    Method="GET"

    NFServiceType="nnrf-disc"

    Route_path="/nnrf-disc/v1/nf-instances/**"

    Status="503 SERVICE_UNAVAILABLE"

  3. If guidance is required, contact My Oracle Support.
Available in OCI No

5.1.24 OcnrfTransactionErrorRateAbove25Percent

Table 5-25 OcnrfTransactionErrorRateAbove25Percent

Field Details
Description 'Transaction Error rate is above 25 Percent of Total Transactions (current value is {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 25 Percent of Total Transactions'
Severity Major
Condition The number of failed transactions has crossed the minor threshold of 25 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7007
Metric Used 'oc_ingressgateway_http_responses_total'
Recommended Actions

The alert is cleared when the number of failure transactions is below 25 percent of the total transactions or when the number of failed transactions crosses the 50 percent threshold, in which case the OcnrfTransactionErrorRateAbove50Percent is raised.

Steps:

  1. Check the service specific metrics to understand the specific service request errors.

    For example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. Check metrics per service, per method:

    For example, discovery requests can be determined from the following metrics:

    Metrics="oc_ingressgateway_http_responses_total"

    Method="GET"

    NFServiceType="nnrf-disc"

    Route_path="/nnrf-disc/v1/nf-instances/**"

    Status="503 SERVICE_UNAVAILABLE"

  3. If guidance is required, contact My Oracle Support.
Available in OCI No

5.1.25 OcnrfTransactionErrorRateAbove50Percent

Table 5-26 OcnrfTransactionErrorRateAbove50Percent

Field Details
Description 'Transaction Error rate is above 50 Percent of Total Transactions (current value is {{ $value }})'
Summary 'timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Transaction Error Rate detected above 50 Percent of Total Transactions'
Severity Critical
Condition The number of failed transactions has crossed the minor threshold of 50 percent of the total transactions.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7008
Metric Used 'oc_ingressgateway_http_responses_total'
Recommended Actions

The alert is cleared when the number of failure transactions is below 50 percent of the total transactions.

Steps:

  1. Check the service specific metrics to understand the specific service request errors.

    For example: ocnrf_nfDiscover_tx_responses_total with statusCode ~= 2xx.

  2. Check metrics per service, per method:

    For example, discovery requests can be determined from the following metrics:

    Metrics="oc_ingressgateway_http_responses_total"

    Method="GET"

    NFServiceType="nnrf-disc"

    Route_path="/nnrf-disc/v1/nf-instances/**"

    Status="503 SERVICE_UNAVAILABLE"

  3. If guidance is required, contact My Oracle Support.
Available in OCI No

5.1.26 OcnrfTotalEgressTrafficRateAboveCriticalThreshold

Table 5-27 OcnrfTotalEgressTrafficRateAboveCriticalThreshold

Field Details
Description 'Egress traffic rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary ''kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 51600 requests per second'
Severity Critical
Condition This alarm is raised when the Egress traffic rate is greater than the critical configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7109
Metric Used oc_egressgateway_http_requests_total
Recommended Actions The alert is cleared either when the total discovery rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer Egress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Egress Gateway logs on Kibana to determine the reason for the errors.
Available in OCI No

5.1.27 OcnrfTotalForwardingTrafficRateAboveCriticalThreshold

Table 5-28 OcnrfTotalForwardingTrafficRateAboveCriticalThreshold

Field Details
Description 'NRF-NRF Forwarding Rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: $labels.kubernetes_namespace, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 5200 requests per second.'
Severity Critical
Condition This alarm is raised when the rate between NRF and NRF Forwarding is greater than the critical configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7110
Metric Used ocnrf_forward_nfDiscover_tx_requests_total
Recommended Actions The alert is cleared either when the total NRF Forwarding rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer NRF Forwarding section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check NRF Forwarding logs on Kibana to determine the reason for the errors.
Available in OCI No

5.1.28 OcnrfHeapUsageCrossedMinorThreshold

Table 5-29 OcnrfHeapUsageCrossedMinorThreshold

Field Details
Description 'OCNRF Heap Usage for pod {{ $labels.pod }} has crossed the configured minor threshold (50%) (value={{ $value }}) of its limit.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Heap Usage of pod exceeded 50% of its limit.'
Severity Minor
Condition This alert is raised when the Java memory heap usage of pods exceeds the configured minor threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7126
Metric Used jvm_memory_used_bytes
Recommended Actions

The alert is cleared when the heap usage of pods falls below the minor threshold.

Note: The threshold is configurable in the alert file. If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Identify the pods which are raised in the alert.
  2. Refer Resource usage in Grafana dashboard to get the memory usage.
  3. Collect the pod logs and top output of the pods.
  4. Contact My Oracle Support.
Available in OCI No

5.1.29 OcnrfHeapUsageCrossedMajorThreshold

Table 5-30 OcnrfHeapUsageCrossedMajorThreshold

Field Details
Description 'OCNRF Heap Usage for pod {{ $labels.pod }} has crossed the configured major threshold (60%) (value={{ $value }}) of its limit.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Heap Usage of pod is more than or equal to 60% and less than 70% of its limit.'
Severity Major
Condition This alert is raised when the Java memory heap usage of pods exceeds the configured major threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7127
Metric Used jvm_memory_used_bytes
Recommended Actions

The alert is cleared when the heap usage of pods falls below the major threshold.

Note: The threshold is configurable in the alert file. If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Identify the pods which are raised in the alert.
  2. Refer Resource usage in Grafana dashboard to get the memory usage.
  3. Collect the pod logs and top output of the pods.
  4. Contact My Oracle Support.
Available in OCI No

5.1.30 OcnrfHeapUsageCrossedCriticalThreshold

Table 5-31 OcnrfHeapUsageCrossedCriticalThreshold

Field Details
Description 'OCNRF Heap Usage for pod {{ $labels.pod }} has crossed the configured critical threshold (70%) (value={{ $value }}) of its limit.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.pod}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Heap Usage of pod is more than 70% of its limit.'
Severity Critical
Condition This alert is raised when the Java memory heap usage of pods exceeds the configured critical threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7128
Metric Used jvm_memory_used_bytes
Recommended Actions

The alert is cleared when the heap usage of pods falls below the critical threshold.

Note: The threshold is configurable in the alert file. If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Identify the pods which are raised in the alert.
  2. Refer Resource usage in Grafana dashboard to get the memory usage.
  3. Collect the pod logs and top output of the pods.
  4. Contact My Oracle Support.
Available in OCI No

5.2 Service Level Alerts

This section lists the service level alerts.

5.2.1 OcnrfAccessTokenRequestsRejected

Table 5-32 OcnrfAccessTokenRequestsRejected

Field Details
Description 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF.'
Severity Warning
Condition NRF rejected an AccessToken Request
OID 1.3.6.1.4.1.323.5.3.36.1.2.7014
Metric Used 'ocnrf_accessToken_tx_responses_total'
Recommended Actions The alert is cleared automatically.
Steps:
  1. The Rejection Reason is present in the alert.
  2. In case the RejectionReason is AuthScreeningFailed/ClientNotAuthorized, either the configurations need to be reevaluated or check the consumer NF that has requested for unauthorized token. For more information about token information, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  3. For other reasons, follow the RejectionReason.
Available in OCI No

5.2.2 OcnrfAuditorMultiplePodUnavailable

Table 5-33 OcnrfAuditorMultiplePodUnavailable

Field Details
Description Ocnrf Auditor Multiple Pods are Unavailable in deployment
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnrf Auditor Multiple Pods are Unavailable'
Severity Critical
Condition Ocnrf Auditor Multiple Pods are Unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7075
Metric Used NA
Recommended Actions

This alert is raised due to auditor multiple pods are unavailable. This alert is cleared automatically when the pods are available.

Available in OCI No

5.2.3 OcnrfAppInfoMultiplePodUnavailable

Table 5-34 OcnrfAppInfoMultiplePodUnavailable

Field Details
Description Ocnrf AppInfo Multiple Pods are Unavailable in deployment
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnrf AppInfo Multiple Pods are Unavailable'
Severity Critical
Condition Ocnrf Auditor Multiple Pods are Unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7076
Metric Used NA
Recommended Actions

This alert is raised due to App-Info multiple pods are unavailable. This alert is cleared automatically when the pods are available.

Available in OCI No

5.2.4 OcnrfPerfInfoMultiplePodUnavailable

Table 5-35 OcnrfPerfInfoMultiplePodUnavailable

Field Details
Description Ocnrf PerfInfo Multiple Pods are Unavailable in deployment
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Ocnrf PerfInfo Multiple Pods are Unavailable'
Severity Critical
Condition Ocnrf PerfInfo Multiple Pods are Unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7077
Metric Used NA
Recommended Actions

This alert is raised due to perf-Info multiple pods are unavailable. This alert is cleared automatically when the pods are available.

Available in OCI No

5.2.5 OcnrfTotalSLFRateAboveCriticalThreshold

Table 5-36 OcnrfTotalSLFRateAboveCriticalThreshold

Field Details
Description 'NRF-SLF Rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: $labels.kubernetes_namespace, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 45600 requests per second.'
Severity Critical
Condition This alarm is raised when the rate between NRF and SLF reaches is greater than the critical configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7111
Metric Used ocnrf_SLF_tx_requests_total
Recommended Actions The alert is cleared either when the total SLF rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer SLF section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check SLF logs on Kibana to determine the reason for the errors.
Available in OCI No

5.2.6 OcnrfTotalDiscoveryRateAboveCriticalThreshold

Table 5-37 OcnrfTotalDiscoveryRateAboveCriticalThreshold

Field Details
Description 'Total Discovery Rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: $labels.kubernetes_namespace, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Traffic Rate is above 51600 requests per second.'
Severity Critical
Condition This alarm is raised when the total discovery rate is greater than the critical configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7112
Metric Used ocnrf_nfDiscover_rx_requests_total
Recommended Actions The alert is cleared when the total discovery rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Grafana to determine which service is receiving high traffic.
  2. Refer Discovery section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Discovery logs on Kibana to determine the reason for the errors.
Available in OCI No

5.2.7 OcnrfAccessTokenRequestsAboveThreshold

Table 5-38 OcnrfAccessTokenRequestsAboveThreshold

Field Details
Description 'Total Access token request rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total Access token request rate is above 5'
Severity Critical
Condition The alert is raised when the rate of Access Token requests is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7115
Metric Used ocnrf_accessToken_rx_requests_total
Recommended Actions The alert is cleared when the total number of access token request rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer the NfAccessToken Section in Grafana to determine increase in TPS.
  2. Refer the Grafana to determine increase in failure responses.
Available in OCI No

5.2.8 OcnrfNfUpdateRequestsAboveThreshold

Table 5-39 OcnrfNfUpdateRequestsAboveThreshold

Field Details
Description 'Total NfUpdate request rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total NfUpdate request rate is above 5'
Severity Critical
Condition This alert is raised when the total number of NfUpdate requests is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7116
Metric Used ocnrf_nfUpdate_rx_requests_total
Recommended Actions The alert is cleared when the total number of NfUpdate request falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer the NfRegister Section in Grafana to determine increase in TPS.
  2. Refer the Grafana to determine increase in failure responses.
Available in OCI No

5.2.9 OcnrfNfHeartBeatRequestsAboveThreshold

Table 5-40 OcnrfNfHeartBeatRequestsAboveThreshold

Field Details
Description 'Total NfHeartBeat request rate is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total NfHeartBeat request rate is above 52'
Severity Critical
Condition This alert is raised when the total number of NfHeartBeat requests is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7117
Metric Used ocnrf_nfHeartBeat_rx_requests_total
Recommended Actions The alert is cleared when the total number of NfHeartBeat request falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer the NfRegister Section in Grafana to determine increase in TPS.
  2. Refer the Grafana to determine increase in failure responses.
Available in OCI No

5.2.10 OcnrfRegisteredNfCountAboveThreshold

Table 5-41 OcnrfRegisteredNfCountAboveThreshold

Field Details
Description 'Total Number of active registrations in OCNRF is above critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total Number of active registrations in OCNRF is above 260'
Severity Critical
Condition The alert is raised when the total number of NFs registered in the set is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7118
Metric Used ocnrf_nf_registered_count
Recommended Actions The alert is cleared when the total number active registrations in NRF falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional registrations. If this alert is unexpected, contact My Oracle Support.

Step:

  1. Refer Grafana to determine the number of NFs per nfType.
Available in OCI No

5.2.11 OcnrfNfProfileSizeAboveThreshold

Table 5-42 OcnrfNfProfileSizeAboveThreshold

Field Details
Description 'The size of the NF profile is above the critical threshold. (current value is: {{ $value }})'
Summary ''namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The size of the NF profile is above 12kB threshold'
Severity Critical
Condition This alert is raised when the size of the NF profile is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7119
Metric Used ocnrf_nf_profile_size
Recommended Actions The alert is cleared when the size of the NF profile is smaller than the critical threshold.

Note: The threshold is configurable in the alert file.

Step:

  1. Verify which NF has registered a nfProfile greater than the threshold size, using the nfInstanceId in the ocnrf_nf_profile_size metric.
Available in OCI No

5.2.12 OcnrfDiscoveryResponseSizeAboveThreshold

Table 5-43 OcnrfDiscoveryResponseSizeAboveThreshold

Field Details
Description 'The size of nfDiscover response is above the critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The size of nfDiscover response is above 45kB threshold''
Severity Critical
Condition This alert is raised when the size of the nfDiscover response is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7120
Metric Used ocnrf_nfDiscover_tx_response_size_bytes_max
Recommended Actions The alert is cleared when the size of the nfDiscover response is less than the critical threshold.

Note: The threshold is configurable in the alert file.

Step:

  1. Refer Grafana to check for which targetNfType triggers discovery response with size greater than the threshold. Higher discovery response may impact NRF discovery performance. If the alert is unexpected, contact My Oracle Support.
Available in OCI No

5.2.13 OcnrfTotalSubscriptionsAboveThreshold

Table 5-44 OcnrfTotalSubscriptionsAboveThreshold

Field Details
Description 'Total Number of active subscriptions in OCNRF is above the critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total Number of active subscriptions in OCNRF is above 1000.'
Severity Critical
Condition This alert is raised when the total number of active subscriptions in NRF is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7121
Metric Used ocnrf_nfset_active_subscriptions
Recommended Actions The alert is cleared when the total number active subscriptions in NRF is less than the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF has received additional subscriptions (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Grafana to determine the total number of subscriptions created.
  2. Verify if Subscription Limit feature has been enabled using subscriptionLimit.featureStatus parameter. For more information, see Oracle Communications Cloud Native Core, Network Repository Function User Guide.
  3. Assess which NFs are creating the additional subscriptions.
Available in OCI No

5.2.14 OcnrfDiscoveryRequestsForUDRAboveThreshold

Table 5-45 OcnrfDiscoveryRequestsForUDRAboveThreshold

Field Details
Description 'Total NfDiscover request rate for nfType UDR is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total NfDiscover request rate for nfType UDR is above above 700'
Severity Critical
Condition This alert is raised when the rate of nfDiscover requests for nfType UDR is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7122
Metric Used ocnrf_nfDiscover_rx_requests_total
Recommended Actions The alert is cleared when the rate of nfDiscover requests for nfType UDR is below than the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic for UDR. If this alert is unexpected, contact My Oracle Support.
Available in OCI No

5.2.15 OcnrfDiscoveryRequestsForUDMAboveThreshold

Table 5-46 OcnrfDiscoveryRequestsForUDMAboveThreshold

Field Details
Description 'Total NfDiscover request rate for nfType UDM is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total NfDiscover request rate for nfType UDM is above above 46000'
Severity Critical
Condition This alert is raised when the rate of nfDiscover requests for nfType UDM is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7123
Metric Used ocnrf_nfDiscover_rx_requests_total
Recommended Actions The alert is cleared when the rate of nfDiscover requests for nfType UDM is below than the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic for UDM. If this alert is unexpected, contact My Oracle Support.
Available in OCI No

5.2.16 OcnrfDiscoveryRequestsForAMFAboveThreshold

Table 5-47 OcnrfDiscoveryRequestsForAMFAboveThreshold

Field Details
Description 'Total NfDiscover request rate for nfType AMF is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total NfDiscover request rate for nfType AMF is above 2500'
Severity Critical
Condition This alert is raised when the rate of nfDiscover requests for nfType AMF is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7124
Metric Used ocnrf_nfDiscover_rx_requests_total
Recommended Actions The alert is cleared when the rate of nfDiscover requests for nfType AMF is below than the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic for AMF. If this alert is unexpected, contact My Oracle Support.
Available in OCI No

5.2.17 OcnrfDiscoveryRequestsForSMFAboveThreshold

Table 5-48 OcnrfDiscoveryRequestsForSMFAboveThreshold

Field Details
Description 'Total NfDiscover request rate for nfType SMF is above the configured critical threshold. (current value is: {{ $value }})'
Summary 'namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Total NfDiscover request rate for nfType SMF is above 4500'
Severity Critical
Condition This alert is raised when the rate of nfDiscover requests for nfType SMF is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7125
Metric Used ocnrf_nfDiscover_rx_requests_total
Recommended Actions The alert is cleared when the rate of nfDiscover requests for nfType SMF is below than the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic for SMF. If this alert is unexpected, contact My Oracle Support.
Available in OCI No

5.3 NfProfile Status Change Alerts

This section lists the alerts raised when there is status change in NfProfile.

5.3.1 OcnrfRegisteredPCFsBelowCriticalThreshold

Table 5-49 OcnrfRegisteredPCFsBelowCriticalThreshold

Field Details
Description 'The number of registered NFs detected below critical threshold (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.RequesterNfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below critical threshold.'
Severity Critical
Condition

The number of NFs of the given NFType PCF currently registered with NRF is below the critical threshold.

Note: Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.

Default value of this alert trigger point in the alert file is when registered PCFs count with NRF is below 2.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7009
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions

The alert is cleared when the number of registered PCFs is above the critical threshold.

Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if ingress gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Notes
  1. Operator can configure the threshold values to the number of NFs of type PCF expected within the network.
  2. PCFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are considered as unregistered.
  3. Operator can configure the RequesterNfType expected within the network.
  4. Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.
Available in OCI No

5.3.2 OcnrfRegisteredPCFsBelowMajorThreshold

Table 5-50 OcnrfRegisteredPCFsBelowMajorThreshold

Field Details
Description 'The number of registered NFs detected below major threshold (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below major threshold.'
Severity Major
Condition

The number of NFs of the given NFType PCF currently registered with NRF is below the major threshold.

Note: Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.

Default value of this alert trigger point in the alert file is when Registered PCFs count with NRF is greater than or equal to 2 and below 10.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7010
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions

The alert is cleared when the number of registered PCFs is above the major threshold.

Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if Ingress Gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs of type PCF expected within the network.
  2. PCFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are considered as unregistered.
  3. Operator can configure the RequesterNfType expected within the network.
  4. Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.
Available in OCI No

5.3.3 OcnrfRegisteredPCFsBelowMinorThreshold

Table 5-51 OcnrfRegisteredPCFsBelowMinorThreshold

Field Details
Description 'The number of registered NFs detected below minor threshold (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs detected below minor threshold.'
Severity Minor
Condition

The number of NFs of the given NFType PCF currently registered with NRF is below the minor threshold.

Note: Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.

Default value of this alert trigger point in the alert file is when registered PCFs count with NRF is greater than or equal to 10 and below 20.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7011
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions

The alert is cleared when the number of registered PCFs is above the minor threshold.

Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if ingress gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for Registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs of type PCF expected within the network.
  2. PCFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are considered as unregistered.
  3. Operator can configure the RequesterNfType expected within the network.
  4. Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.
Available in OCI No

5.3.4 OcnrfRegisteredPCFsBelowThreshold

Table 5-52 OcnrfRegisteredPCFsBelowThreshold

Field Details
Description 'The number of registered NFs is approaching minor threshold (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The number of registered NFs approaching minor threshold.'
Severity Warning
Condition

The number of NFs of the given NFType PCF currently registered with NRF is approaching minor threshold.

Note: Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.

Default value of this alert trigger point in the alert file is when registered PCFs count with NRF is greater than or equal to 20 and below 30.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7012
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions

The alert is cleared when the number of registered PCFs is approaching minor threshold.

Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if Ingress Gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for Registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs of type PCF expected within the network.
  2. PCFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are considered as unregistered.
  3. Operator can configure the RequesterNfType expected within the network.
  4. Operator can add similar alerts for each NfType and configure the corresponding thresholds as required.
Available in OCI No

5.3.5 OcnrfTotalNFsRegisteredBelowCriticalThreshold

Table 5-53 OcnrfTotalNFsRegisteredBelowCriticalThreshold

Field Details
Description 'Number of active registrations in OCNRF (current value is: {{ $value }}) is below critical threshold'
Summary kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Active registrations count.
Severity Critical
Condition The total number of NFs currently in "REGISTERED" state with the NRF is below the critical threshold.

Note: The threshold values are provided as an example. User can configure the threshold value as per the requirement.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7042
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions The alert is cleared when the number of registered NFs is above the critical threshold.
Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if Ingress Gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, cnDBTier User Guide.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are not considered as registered.
Available in OCI Yes

5.3.6 OcnrfTotalNFsRegisteredBelowMajorThreshold

Table 5-54 OcnrfTotalNFsRegisteredBelowMajorThreshold

Field Details
Description 'Number of active registrations in OCNRF (current value is: {{ $value }}) is below major threshold'
Summary kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Active registrations count.
Severity Major
Condition The total number of NFs currently in "REGISTERED" state with the NRF is below the major threshold.

Note: The threshold values are provided as an example. The user can configure the threshold value as per the requirement.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7043
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions The alert is cleared when the number of registered NFs is above the major threshold.
Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if Ingress Gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for Registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are not considered as registered.
Available in OCI Yes

5.3.7 OcnrfTotalNFsRegisteredBelowMinorThreshold

Table 5-55 OcnrfTotalNFsRegisteredBelowMinorThreshold

Field Details
Description 'Number of active registrations in OCNRF (current value is: {{ $value }}) is below minor threshold'
Summary kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Active registrations count.
Severity Minor
Condition The total number of NFs currently in "REGISTERED" state with the NRF is below the minor threshold.

Note: The threshold values are provided as an example. The user can configure the threshold value as per the requirement.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7044
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions The alert is cleared when the number of registered NFs is above the minor threshold.
Steps:
  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running.
  2. Check if Ingress Gateway pod is up and running:
    kubectl get po -n <namespace>
  3. Check for registration pod logs on Kibana for ERROR WARN logs.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are not considered as registered.
Available in OCI Yes

5.3.8 OcnrfTotalNFsRegisteredApproachingMinorThreshold

Table 5-56 OcnrfTotalNFsRegisteredApproachingMinorThreshold

Field Details
Description 'Number of active registrations in OCNRF (current value is: {{ $value }}) is approaching minor threshold'
Summary kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Active registrations count.
Severity Info
Condition The total number of NFs currently in "REGISTERED" state with the NRF is approaching minor threshold.

Note: The threshold values provided as an example. The user can configure the threshold as per need.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7045
Metric Used 'ocnrf_active_registrations_count'
Recommended Actions The alert is cleared when the number of registered NFs are approaching minor threshold.

Steps: No action is required. This is an information alert.

Notes
  1. Operator can configure the threshold values with respect to the number of NFs expected within the network.
  2. NFs with NFStatus as 'SUSPENDED' or "UNDISCOVERABLE' are not considered as registered.
Available in OCI Yes

5.3.9 OcnrfNFStatusTransitionToRegistered

Table 5-57 OcnrfNFStatusTransitionToRegistered

Field Details
Description 'NF with NF profile fqdn {{$labels.NfProfileFqdn}} NF instance id {{$labels.NfInstanceId}} NF type {{$labels.NfType}} is REGISTERED , previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfType: {{$labels.NfType}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF is REGISTERED.'
Severity Info
Condition NF Instance's status transitions to REGISTERED.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7046
Metric Used ocnrf_nfInstance_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.

Steps:

No action is required. This is an information alert.
Available in OCI Yes

5.3.10 OcnrfNFServiceStatusTransitionToRegistered

Table 5-58 OcnrfNFServiceStatusTransitionToRegistered

Field Details
Description 'NF service {{$labels.NfServiceName}} and service instance id {{$labels.NfServiceInstanceId}} of NF profile fqdn {{$labels.NfProfileFqdn}} and instance id {{$labels.NfInstanceId}} is REGISTERED, previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfServiceName: {{$labels.NfServiceName}},NfServiceInstanceId:{{$labels.NfServiceInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfServiceFqdn: {{$labels.NfServiceFqdn}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF service is REGISTERED.'
Severity Info
Condition Status of an NF Instance's service transitions to REGISTERED.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7047
Metric Used ocnrf_nfService_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.

Steps:

No action is required. This is an information alert.
Available in OCI Yes

5.3.11 OcnrfNFStatusTransitionToSuspended

Table 5-59 OcnrfNFStatusTransitionToSuspended

Field Details
Description 'NF with NF profile fqdn {{$labels.NfProfileFqdn}} NF instance id {{$labels.NfInstanceId}} NF type {{$labels.NfType}} is SUSPENDED, previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfType: {{$labels.NfType}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF is SUSPENDED.'
Severity Major
Condition NF Instance's status transitions to SUSPENDED.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7048
Metric Used ocnrf_nfInstance_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.
Steps:
  1. Check logs in NRF registration pod for failing patch requests or check Jaeger traces to see traces for incoming requests.
  2. Check Ingress Gateway logs to see if the requests are coming.
  3. Check if the NRF pods are UP.
  4. Check for the Ingress Gateway metrics in Prometheus for PATCH requests or responses in this time frame. Confirm if the responses have any non-2xx error codes.
  5. Depending on the failure reason, take the resolution steps.
  6. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.3.12 OcnrfNFServiceStatusTransitionToSuspended

Table 5-60 OcnrfNFServiceStatusTransitionToSuspended

Field Details
Description 'NF service {{$labels.NfServiceName}} and service instance id {{$labels.NfServiceInstanceId}} of NF profile fqdn {{$labels.NfProfileFqdn}} and instance id {{$labels.NfInstanceId}} is SUSPENDED, previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfServiceName: {{$labels.NfServiceName}},NfServiceInstanceId:{{$labels.NfServiceInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfServiceFqdn: {{$labels.NfServiceFqdn}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF service is SUSPENDED.'
Severity Minor
Condition Status of an NF Instance's service transitions to SUSPENDED.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7049
Metric Used ocnrf_nfService_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.
Steps:
  1. Check logs in NRF registration pod for failing patch requests or check Jaeger traces to see traces for incoming requests.
  2. Check Ingress Gateway logs to see if the requests are coming.
  3. Check if the NRF pods are UP.
  4. Check for the Ingress Gateway metrics in Prometheus for PATCH requests or responses in this time frame. Confirm if the responses have any non-2xx error codes.
  5. Depending on the failure reason, take the resolution steps.
  6. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.3.13 OcnrfNFStatusTransitionToUndiscoverable

Table 5-61 OcnrfNFStatusTransitionToUndiscoverable

Field Details
Description 'NF with NF profile fqdn {{$labels.NfProfileFqdn}} NF instance id {{$labels.NfInstanceId}} NF type {{$labels.NfType}} is UNDISCOVERABLE, previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfType: {{$labels.NfType}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF is UNDISCOVERABLE.'
Severity Info
Condition NF Instance's status transitions to UNDISCOVERABLE.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7050
Metric Used ocnrf_nfInstance_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.

Steps:

  1. Check logs in NRF registration pod to verify if the NF has sent UNDISCOVERABLE status in NFRegister or NfUpdate requests or check Jaeger traces to see traces for incoming requests.
  2. If there is no such incoming request, collect the logs and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.3.14 OcnrfNFServiceStatusTransitionToUndiscoverable

Table 5-62 OcnrfNFServiceStatusTransitionToUndiscoverable

Field Details
Description

'NF service {{$labels.NfServiceName}} and service instance id {{$labels.NfServiceInstanceId}} of NF profile fqdn {{$labels.NfProfileFqdn}} and instance id {{$labels.NfInstanceId}} is UNDISCOVERABLE, previous status was {{$labels.PreviousStatus}}'

Summary

'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfServiceName: {{$labels.NfServiceName}},NfServiceInstanceId:{{$labels.NfServiceInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfServiceFqdn: {{$labels.NfServiceFqdn}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF service is UNDISCOVERABLE.'

Severity Info
Condition Status of an NF Instance's service transitions to UNDISCOVERABLE.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7051
Metric Used ocnrf_nfService_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.

Steps:

  1. Check logs in NRF registration pod to verify if the NF has sent UNDISCOVERABLE status in NFRegister or NfUpdate requests or check Jaeger traces to see traces for incoming requests.
  2. If there is no such incoming request, collect the logs and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.3.15 OcnrfNFStatusTransitionToDeregistered

Table 5-63 OcnrfNFStatusTransitionToDeregistered

Field Details
Description 'NF with NF profile fqdn {{$labels.NfProfileFqdn}} NF instance id {{$labels.NfInstanceId}} NF type {{$labels.NfType}} is DEREGISTERED, previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfType: {{$labels.NfType}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF is DEREGISTERED.'
Severity Info
Condition NF Instance's status transitions to DEREGISTERED.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7052
Metric Used ocnrf_nfInstance_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.
Steps:
  1. Check logs in NRF registration pod for failing patch requests or check Jaeger traces to see traces for incoming requests.
  2. Check Ingress Gateway logs to see if the requests are coming.
  3. Check if the NRF pods are UP.
  4. Check for the Ingress Gateway metrics in Prometheus for PATCH requests or responses in this time frame. Confirm if the responses have any non 2xx error codes.
  5. Depending on the failure reason, take the resolution steps.
  6. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.3.16 OcnrfNFServiceStatusTransitionToDeregistered

Table 5-64 OcnrfNFServiceStatusTransitionToDeregistered

Field Details
Description 'NF service {{$labels.NfServiceName}} and service instance id {{$labels.NfServiceInstanceId}} of NF profile fqdn {{$labels.NfProfileFqdn}} and instance id {{$labels.NfInstanceId}} is DEREGISTERED, previous status was {{$labels.PreviousStatus}}'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}},podname: {{$labels.kubernetes_pod_name}},NfInstanceId: {{$labels.NfInstanceId}},NfServiceName: {{$labels.NfServiceName}},NfServiceInstanceId:{{$labels.NfServiceInstanceId}},NfProfileFqdn: {{$labels.NfProfileFqdn}},NfServiceFqdn: {{$labels.NfServiceFqdn}},PreviousStatus: {{$labels.PreviousStatus}},NewStatus: {{$labels.NewStatus}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} NF service is DEREGISTERED.'
Severity Info
Condition Status of an NF Instance's service transitions to DEREGISTERED.

Note: When multiple alerts are present for a given NF, the latest alert is always considered. The timestamp can also be seen in the "Active Since" field of the alert in Prometheus.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7053
Metric Used ocnrf_nfService_status_change_total
Recommended Actions The alert is cleared automatically after a window of 5 minutes.
Steps:
  1. Check logs in NRF registration pod for failing patch requests or check Jaeger traces to see traces for incoming requests.
  2. Check Ingress Gateway logs to see if the requests are coming.
  3. Check if the NRF pods are UP.
  4. Check for the Ingress Gateway metrics in Prometheus for PATCH requests or responses in this time frame. Confirm if the responses have any non 2xx error codes.
  5. Depending on the failure reason, take the resolution steps.
  6. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on capturing logs, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.4 Feature Specific Alerts

This section lists the feature specific alerts.

5.4.1 KeyID for AccessToken Feature

This section lists the alerts that are specific to KeyID for AccessToken feature. For more information about the feature, see the "Key-ID for AccessToken" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.1.1 OcnrfAccessTokenCurrentKeyIdNotConfigured

Table 5-65 OcnrfAccessTokenCurrentKeyIdNotConfigured

Field Details
Description 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF as Current Key Id is not configured.'
Severity Critical
Condition NRF Access Token Rejected due to CurrentKeyId not configured
OID 1.3.6.1.4.1.323.5.3.36.1.2.7033
Metric Used 'ocnrf_accessToken_tx_responses_total'
Recommended Actions The alert is automatically cleared as it is raised when NRF receives Access Token Request, and at that point, Current Key Id is not selected. For more information about configuring currentKeyID parameter, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.1.2 OcnrfAccessTokenCurrentKeyIdInvalidDetails

Table 5-66 OcnrfAccessTokenCurrentKeyIdInvalidDetails

Field Details
Description 'AccessToken request(s) have been rejected by OCNRF (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyType: {{$labels.KeyType}}, RejectionReason: {{$labels.RejectionReason}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} AccessToken Request has been rejected by OCNRF as CurrentKeyId details are invalid.'
Severity Critical
Condition NRF Access Token Rejected due to token signing details corresponding to CurrentKeyId are invalid.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7034
Metric Used 'ocnrf_accessToken_tx_responses_total'
Recommended Actions The alert is automatically cleared when NRF receives Access Token Request, and at that point, Current Key Id details are invalid. For more information about configuring currentKeyID parameter, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.1.3 OcnrfOauthCurrentKeyNotConfigured

Table 5-67 OcnrfOauthCurrentKeyNotConfigured

Field Details
Description 'OCNRF Oauth Access token Current Key Id is not configured'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token Current Key Id is not configured.'
Severity Critical
Condition Oauth Current Key ID is not configured
OID 1.3.6.1.4.1.323.5.3.36.1.2.7035
Metric Used ocnrf_oauth_currentKeyId_configuredStatus
Recommended Actions The alert is cleared when the current key ID is configured.

Steps:

Configure valid current key ID in Access Token Configuration. For more information about configuring currentKeyID parameter, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.

Available in OCI No
5.4.1.4 OcnrfOauthCurrentKeyDataHealthStatus

Table 5-68 OcnrfOauthCurrentKeyDataHealthStatus

Field Details
Description 'OCNRF Oauth Access token Current Key Id status is not healthy'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, KeyType: {{$labels.KeyType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token Current Key Id status is not healthy.'
Severity Critical
Condition Oauth Current Key ID details health is not good.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7036
Metric Used ocnrf_oauth_keyData_healthStatus
Recommended Actions

The alert is cleared when the current key ID status is healthy.

Steps: Rectify the condition by checking ErrorCondition

For example: For ErrorCondition Invalid_Key_Details, check if the k8SecretName, k8SecretNameSpace, and filename combination exists correctly for both privateKey and certificate. Make sure that the pem file data is not corrupt or the certificate has not expired.

Available in OCI No
5.4.1.5 OcnrfOauthNonCurrentKeyDataHealthStatus

Table 5-69 OcnrfOauthNonCurrentKeyDataHealthStatus

Field Details
Description 'OCNRF Oauth Access token Non current Key Id status is not healthy'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, KeyType: {{$labels.KeyType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id status is not healthy.'
Severity Info
Condition Oauth Non Current Key details health is not good
OID 1.3.6.1.4.1.323.5.3.36.1.2.7037
Metric Used ocnrf_oauth_keyData_healthStatus
Recommended Actions

The alert is cleared when the current key ID status is healthy.

Steps: Rectify the condition by checking ErrorCondition

For example: For ErrorCondition Invalid_Key_Details, check if the k8SecretName, k8SecretNameSpace, and filename combination exists correctly for both privateKey and certificate. Make sure that the pem file data is not corrupt or the certificate has not expired.

Available in OCI No
5.4.1.6 OcnrfOauthCurrentCertificateExpiringIn1Week

Table 5-70 OcnrfOauthCurrentCertificateExpiringIn1Week

Field Details
Description 'OCNRF Oauth Access token current Key Id certificate is expiring in less than 1 week'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token current Key Id certificate is expiring in less than 1 week.'
Severity Critical
Condition Oauth Current Key ID details are expiring in less than 1 week
OID 1.3.6.1.4.1.323.5.3.36.1.2.7038
Metric Used ocnrf_oauth_keyData_expiryStatus
Recommended Actions

The alert is cleared when the key expiry time is more than 1 week.

Steps:

Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
Available in OCI No
5.4.1.7 OcnrfOauthNonCurrentCertificateExpiringIn1Week

Table 5-71 OcnrfOauthNonCurrentCertificateExpiringIn1Week

Field Details
Description 'OCNRF Oauth Access token non current Key Id certificate is expiring in less than 1 week'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id certificate is expiring in less than 1 week.'
Severity Info
Condition Oauth Non Current Key ID details are expiring in less than 1 week
OID 1.3.6.1.4.1.323.5.3.36.1.2.7039
Metric Used ocnrf_oauth_keyData_expiryStatus
Recommended Actions

The alert is cleared when the key expiry time is more than 1 week.

Steps:

Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
Available in OCI No
5.4.1.8 OcnrfOauthCurrentCertificateExpiringIn30days

Table 5-72 OcnrfOauthCurrentCertificateExpiringIn30days

Field Details
Description 'OCNRF Oauth Access token current Key Id certificate is expiring in less than 30 days'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token current Key Id certificate is expiring in less than 30 days.'
Severity Major
Condition Oauth Current Key ID details are expiring in more than 24 hours and less than 30 days
OID 1.3.6.1.4.1.323.5.3.36.1.2.7040
Metric Used ocnrf_oauth_keyData_expiryStatus
Recommended Actions

The alert is cleared when certificate for the current key id's expiry time is more than 30 days.

Steps:

Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
Available in OCI No
5.4.1.9 OcnrfOauthNonCurrentCertificateExpiringIn30days

Table 5-73 OcnrfOauthNonCurrentCertificateExpiringIn30days

Field Details
Description 'OCNRF Oauth Access token non current Key Id certificate is expiring in less than 30 days'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, KeyId: {{$labels.KeyId}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Oauth Access token non current Key Id certificate is expiring in less than 30 days.'
Severity Info
Condition Oauth Non Current Key ID details are expiring in more than 24 hours and less than 30 days
OID 1.3.6.1.4.1.323.5.3.36.1.2.7041
Metric Used ocnrf_oauth_keyData_expiryStatus
Recommended Actions

The alert is cleared when certificate for the non-current key id's certificate expiry time is more than 30 days.

Steps:

Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
Available in OCI No

5.4.2 Overload Control Based on Percentage Discards Feature

This section lists the alerts that are specific to Overload Control Based on Percentage Discards feature. For more information about the feature, see the "Overload Control Based on Percentage Discards" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.2.1 OcnrfMemoryUsageCrossedMinorThreshold

Table 5-74 OcnrfMemoryUsageCrossedMinorThreshold

Field Details
Description 'OCNRF Memory Usage for pod <Pod name> has crossed the configured minor threshold (50 %) (value={{ $value }}) of its limit.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 50% of its limit.'
Severity Minor
Condition A pod has reached the configured minor threshold (50%) of its memory resource limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7030
Metric Used 'container_memory_usage_bytes' and 'container_spec_memory_limit_bytes'

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the minor threshold or crosses the major threshold, in which case OcnrfMemoryUsageCrossedMajorThreshold alert is raised.

Note: The threshold is configurable in the alerts file.

In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes
5.4.2.2 OcnrfMemoryUsageCrossedMajorThreshold

Table 5-75 OcnrfMemoryUsageCrossedMajorThreshold

Field Details
Description 'OCNRF Memory Usage for pod <Pod name> has crossed the major threshold (60%) (value = {{ $value }}) of its limit.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 60% of its limit.'
Severity Major
Condition A pod has reached the configured major threshold (60%) of its memory resource limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7031
Metric Used 'container_memory_usage_bytes' and 'container_spec_memory_limit_bytes'

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use the similar metric as exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the major threshold or crosses the critical threshold, in which case OcnrfMemoryUsageCrossedCriticalThreshold alert is raised.

Note: The threshold is configurable in the alert file.

In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes
5.4.2.3 OcnrfMemoryUsageCrossedCriticalThreshold

Table 5-76 OcnrfMemoryUsageCrossedCriticalThreshold

Field Details
Description 'OCNRF Memory Usage for pod <Pod name> has crossed the configured critical threshold (70%) (value = {{ $value }}) of its limit.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Memory Usage of pod exceeded 70% of its limit.'
Severity Critical
Condition A pod has reached the configured critical threshold (70%) of its memory resource limits.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7032
Metric Used 'container_memory_usage_bytes' and 'container_spec_memory_limit_bytes'

Note: This is a Kubernetes metric used for instance availability monitoring. If the metric is not available, use a similar metric as exposed by the monitoring system.

Recommended Actions The alert gets cleared when the memory utilization falls below the critical threshold.

Note: The threshold is configurable in the alert file.

In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes
5.4.2.4 OcnrfOverloadThresholdBreachedL1

Table 5-77 OcnrfOverloadThresholdBreachedL1

Field Details
Description 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L1'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L1'
Severity Warning
Condition NRF Services have breached its configured threshold of Level L1 for any of the aforementioned metrics. Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7059
Metric Used load_level
Recommended Actions

The alert is cleared when the Ingress Traffic rate falls below the configured L1 threshold.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons leading to NRF receiving additional traffic.
  2. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic.

    For example: When one mated site goes down, the NFs move to the given site.

  3. Check the service pod logs on Kibana to determine the reason for the errors.
  4. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  5. If this is the unexpected traffic, contact My Oracle Support.
Available in OCI Yes
5.4.2.5 OcnrfOverloadThresholdBreachedL2

Table 5-78 OcnrfOverloadThresholdBreachedL2

Field Details
Description 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L2'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L2'
Severity Warning
Condition NRF Services have breached its configured threshold of Level L2 for any of the aforementioned metrics. Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7060
Metric Used load_level
Recommended Actions

The alert is cleared when the Ingress Traffic rate falls below the configured L2 threshold.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons leading to NRF receiving additional traffic.
  2. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic.

    For example: When one mated site goes down, the NFs move to the given site.

  3. Check the service pod logs on Kibana to determine the reason for the errors.
  4. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  5. If this is the unexpected traffic, contact My Oracle Support.
Available in OCI Yes
5.4.2.6 OcnrfOverloadThresholdBreachedL3

Table 5-79 OcnrfOverloadThresholdBreachedL3

Field Details
Description 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L3'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L3'
Severity Warning
Condition NRF Services have breached its configured threshold of Level L3 for any of the aforementioned metrics. Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7061
Metric Used load_level
Recommended Actions

The alert is cleared when the Ingress Traffic rate falls below the configured L3 threshold.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons leading to NRF receiving additional traffic.
  2. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic.

    For example: When one mated site goes down, the NFs move to the given site.

  3. Check the service pod logs on Kibana to determine the reason for the errors.
  4. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  5. If this is the unexpected traffic, contact My Oracle Support.
Available in OCI Yes
5.4.2.7 OcnrfOverloadThresholdBreachedL4

Table 5-80 OcnrfOverloadThresholdBreachedL4

Field Details
Description 'Overload Level of {{$labels.app_kubernetes_io_name}} service is L4'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}: Overload Level of {{$labels.app_kubernetes_io_name}} service is L4'
Severity Warning
Condition NRF Services have breached its configured threshold of Level L4 for any of the aforementioned metrics. Thresholds are configured for CPU, svc_failure_count, svc_pending_count, and memory.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7062
Metric Used load_level
Recommended Actions

The alert is cleared when the Ingress Traffic rate falls below the configured L4 threshold.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons leading to NRF receiving additional traffic.
  2. Refer to alert to determine which service is receiving high traffic. It may be due to a sudden spike in traffic.

    For example: When one mated site goes down, the NFs move to the given site.

  3. Check the service pod logs on Kibana to determine the reason for the errors.
  4. If this is expected traffic, then the thresholds levels may be reevaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  5. If this is the unexpected traffic, contact My Oracle Support.
Available in OCI Yes

5.4.3 DNS NAPTR Update Feature

This section lists the alerts that are specific to DNS NAPTR Update feature. For more information about the feature, see the "DNS NAPTR Update" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.3.1 OcnrfDnsNaptrFailureResponseStatus

Table 5-81 OcnrfDnsNaptrFailureResponseStatus

Field Details
Description OCNRF DNS NAPTR Response status is not healthy
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, NfInstanceId: {{$labels.NfInstanceId}}, NfSetFqdn: {{$labels.NfSetFqdn}}, Replacement: {{$labels.Replacement}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Dns Naptr Response status is not healthy.'
Severity Major
Condition The DNS NAPTR response towards DNS Server is not successful.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7063
Metric Used ocnrf_dns_naptr_failure_rx_response
Recommended Actions This alert is cleared when DNS NAPTR response is successful either automatic through service operations, or manual trigger for update and delete NAPTR requests.
5.4.3.2 OcnrfAlternateRouteUpstreamDnsRetryExhausted

Table 5-82 OcnrfAlternateRouteUpstreamDnsRetryExhausted

Field Details
Description OCNRF alternate route upstream DNS retry exhausted
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, FQDNS_Name: {{$labels.FQDNS_Name}}, Replacement_Name: {{$labels.Replacement_Name}},timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF alternate route upstream dns retry exhausted'
Severity Major
Condition The DNS NAPTR retry is exhausted.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7064
Metric Used oc_alternate_route_upstream_dns_retry_exhausted
Recommended Actions This alert is cleared automatically in 2 minutes.
Available in OCI No

5.4.4 Notification Retry Feature

This section lists the alerts that are specific to Notification Retry feature. For more information about the feature, see the "Notification Retry" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.4.1 OcnrfNotificationRetryExhausted

Table 5-83 OcnrfNotificationRetryExhausted

Field Details
Description 'OCNRF NotificationRetry Exhausted'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, SubscriptionId: {{$labels.SubscriptionId}}, NotificationHostPort: {{$labels.NotificationHostPort}}'
Severity Major
Condition This alarm is raised when number of retries are exhausted.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7065
Metric Used ocnrf_nfStatusNotify_rx_responses_total
Recommended Actions The alert is cleared automatically after 5 minutes.

Steps: Check logs in NF management pod to check the reason for retry query failures.

Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.
Available in OCI Yes
5.4.4.2 OcnrfNotificationFailureOtherThanRetryExhausted

Table 5-84 OcnrfNotificationFailureOtherThanRetryExhausted

Field Details
Description 'OCNRF notification failure other than retry exhausted'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, SubscriptionId: {{$labels.SubscriptionId}}, NotificationHostPort: {{$labels.NotificationHostPort}}, NumberOfRetriesAttempted: {{$labels.NumberOfRetriesAttempted}}'
Severity Major
Condition This alarm is raised when notification failure occurs with reason other than retry count exhausted.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7066
Metric Used ocnrf_nfStatusNotify_rx_responses_total
Recommended Actions The alert is cleared automatically after 5 minutes.

Steps: Check logs in NF management pod to check the reason for retry query failures.

Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.
Available in OCI Yes

5.4.5 NRF Message Feed Feature

This section lists the alerts that are specific to NRF Message Feed feature. For more information about the feature, see the "NRF Message Feed" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.5.1 OcnrfIngressGatewayDDUnreachable

Table 5-85 OcnrfIngressGatewayDDUnreachable

Field Details
Description OCNRF Ingress Gateway Data Director unreachable
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Ingress Gateway Data Director unreachable'
Severity Major
Condition This alarm is raised when data director is not reachable from Ingress Gateway.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7067
Metric Used oc_ingressgateway_dd_unreachable
Recommended Actions Alert gets cleared automatically when the connection with data director is established.
Available in OCI No
5.4.5.2 OcnrfEgressGatewayDDUnreachable

Table 5-86 OcnrfEgressGatewayDDUnreachable

Field Details
Description OCNRF Egress Gateway Data Director unreachable
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} OCNRF Egress Gateway Data Director unreachable'
Severity Major
Condition This alarm is raised when data director is not reachable from Egress Gateway.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7068
Metric Used oc_egressgateway_dd_unreachable
Recommended Actions Alert gets cleared automatically when the connection with data director is established.
Available in OCI No

5.4.6 Subscription Limit Feature

This section lists the alerts that are specific to Subscription Limit feature. For more information about the feature, see the "Subscription Limit" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.6.1 OcnrfSubscriptionGlobalCountWarnThresholdBreached

Table 5-87 OcnrfSubscriptionGlobalCountWarnThresholdBreached

Field Details
Description The total number of subscriptions has breached the configured WARN level threshold.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}: The total number of subscriptions has breached the configured WARN level threshold'
Severity Warning
Condition This alarm is raised when the total number of subscriptions has breached the configured WARN level threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7069
Metric Used ocnrf_nfset_limit_level
Recommended Actions

The alert is cleared automatically when the count comes down due to unsubscription.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons for new or renewal of subscription.
  2. If this is expected subscription, then the subscription limit may be reevaluated as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  3. If this is the unexpected subscription, contact My Oracle Support.
Available in OCI Yes
5.4.6.2 OcnrfSubscriptionGlobalCountMinorThresholdBreached

Table 5-88 OcnrfSubscriptionGlobalCountMinorThresholdBreached

Field Details
Description The total number of subscriptions has breached the configured MINOR level threshold
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}: The total number of subscriptions has breached the configured MINOR level threshold'
Severity Minor
Condition This alarm is raised when the total number of subscriptions has breached the configured MINOR level threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7070
Metric Used ocnrf_nfset_limit_level
Recommended Actions

The alert is cleared automatically when the count comes down due to unsubscription.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons for new or renewal of subscription.
  2. If this is expected subscription, then the subscription limit may be reevaluated as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  3. If this is the unexpected subscription, contact My Oracle Support.
Available in OCI Yes
5.4.6.3 OcnrfSubscriptionGlobalCountMajorThresholdBreached

Table 5-89 OcnrfSubscriptionGlobalCountMajorThresholdBreached

Field Details
Description The total number of subscriptions has breached the configured MAJOR level threshold
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}: The total number of subscriptions has breached the configured MAJOR level threshold'
Severity MAJOR
Condition This alarm is raised when the total number of subscriptions has breached the configured MAJOR level threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7071
Metric Used ocnrf_nfset_limit_level
Recommended Actions

The alert is cleared automatically when the count comes down due to unsubscription.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons for new or renewal of subscription.
  2. If this is expected subscription, then the subscription limit may be reevaluated as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  3. If this is the unexpected subscription, contact My Oracle Support.
Available in OCI Yes
5.4.6.4 OcnrfSubscriptionGlobalCountCriticalThresholdBreached

Table 5-90 OcnrfSubscriptionGlobalCountCriticalThresholdBreached

Field Details
Description The total number of subscriptions has breached the configured CRITICAL level threshold
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}: The total number of subscriptions has breached the configured CRITICAL level threshold'
Severity Critical
Condition This alarm is raised when the total number of subscriptions has breached the configured CRITICAL level threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7072
Metric Used ocnrf_nfset_limit_level
Recommended Actions

The alert is cleared automatically when the count comes down due to unsubscription.

Note: The thresholds can be configured using REST API.

Steps:

  1. Reassess the reasons for new or renewal of subscription.
  2. If this is expected subscription, then the subscription limit may be reevaluated as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
  3. If this is the unexpected subscription, contact My Oracle Support.
Available in OCI Yes
5.4.6.5 OcnrfSubscriptionMigrationInProgressWarn

Table 5-91 OcnrfSubscriptionMigrationInProgressWarn

Field Details
Description The subscription migration is pending and subscriptionLimit feature is disabled
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, subscriptionLimitFeatureStatus:{{$labels.subscriptionLimitFeatureStatus}}: The subscription migration is pending and subscriptionLimit feature is disabled'
Severity Warning
Condition The subscription migration is pending and subscriptionLimit feature is disabled.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7073
Metric Used ocnrf_subscription_migration_status
Recommended Actions This alert is cleared automatically when the migration is complete.
5.4.6.6 OcnrfSubscriptionMigrationInProgressCritical

Table 5-92 OcnrfSubscriptionMigrationInProgressCritical

Field Details
Description The subscription migration is pending and subscriptionLimit feature is enabled
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, subscriptionLimitFeatureStatus:{{$labels.subscriptionLimitFeatureStatus}}: The subscription migration is pending and subscriptionLimit feature is enabled'
Severity Warning
Condition The subscription migration is pending and subscriptionLimit feature is enabled.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7074
Metric Used ocnrf_subscription_migration_status
Recommended Actions

This alert is cleared automatically when the migration is complete.

Steps: Disable the Subscription Limit feature. For more information, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.

Available in OCI No

5.4.7 Pod Protection Support for NRF Subscription Microservice Feature

This section lists the alerts that are specific to Pod Protection Support for NRF Subscription Microservice feature. For more information about the feature, see the "Pod Protection Support for NRF Subscription Microservice" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.7.1 OcnrfPodInDangerOfCongestionState

Table 5-93 OcnrfPodInDangerOfCongestionState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} of service {{$labels.app_kubernetes_io_name}} is in Danger of Congestion state'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Danger of Congestion state'
Severity Major
Condition A pod of a service is in Danger Of Congestion state. This could be due to CPU Usage or Pending Message Count above configured thresholds.

This alert is raised when the Pod Protection feature is enabled for nfSubscription service. Currently this is applicable for NfSubscription service only.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7079
Metric Used ocnrf_pod_congestion_state
Recommended Actions The alert is cleared when the CPU or Pending Message Count goes below the configured thresholds for the Danger of Congested state.

Note: The thresholds can be viewed using REST API.

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

Steps:
  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register or deregister.
  2. Check for the corresponding congestion alert for CPU and Pending Message Count to understand the reason for pod congestion.
  3. Check the service pod logs on Kibana to determine the reason for the errors.
  4. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.7.2 OcnrfPodPendingMessageCountInDangerOfCongestionState

Table 5-94 OcnrfPodPendingMessageCountInDangerOfCongestionState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} of service {{$labels.app_kubernetes_io_name}} is in Danger of Congestion state due to Pending Message Count above threshold'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Danger of Congestion state due to Pending Message Count above threshold'
Severity Major
Condition

A pod of a service is in Danger Of Congestion state due to its Pending Message Count above configured thresholds.

Currently this is applicable for NfSubscription service only.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7081
Metric Used ocnrf_pod_pending_message_count_congestion_state
Recommended Actions The alert is cleared when the pending message count goes below the configured thresholds for the Danger of Congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.7.3 OcnrfPodInCongestedState

Table 5-95 OcnrfPodInCongestedState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} of service {{$labels.app_kubernetes_io_name}} is in Congested state'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Congested state'
Severity Major
Condition One or more pods of a service are in congested state. This could be due to CPU usage or Pending Message Count above configured thresholds. Currently this is applicable for NfSubscription service only.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7082
Metric Used ocnrf_pod_congested_state
Recommended Actions The alert is cleared when the CPU usage or Pending Message Count goes below the configured thresholds for the congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.7.4 OcnrfPodCpuUsageInCongestedState

Table 5-96 OcnrfPodCpuUsageInCongestedState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} of service {{$labels.app_kubernetes_io_name}} is in Congested state due to CPU usage above threshold'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Congested state due to CPU usage above threshold'
Severity Major
Condition A pod of a service is in Congested state due to its CPU Usage above configured thresholds. Currently this is applicable for NfSubscription service only.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7083
Metric Used ocnrf_pod_cpu_congestion_state
Recommended Actions The alert is cleared when the CPU usage goes below the configured thresholds for the congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.7.5 OcnrfPodCpuUsageInDangerOfCongestionState

Table 5-97 OcnrfPodCpuUsageInDangerOfCongestionState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} of service {{$labels.app_kubernetes_io_name}} is in Danger of Congestion state due to CPU usage above threshold'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Danger of Congestion state due to CPU usage above threshold'
Severity Major
Condition

A pod of a service is in Danger Of Congestion state due to its CPU above configured thresholds.

This alert is raised when the Pod Pretoectoin feature is enabled for nfSubscription service. Currently this is applicable for NfSubscription service only.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7080
Metric Used ocnrf_pod_cpu_congestion_state
Recommended Actions The alert is cleared when the CPU goes below the configured thresholds for the Danger of Congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending sending high number of updates, register or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.7.6 OcnrfPodPendingMessageCountInCongestedState

Table 5-98 OcnrfPodPendingMessageCountInCongestedState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} of service {{$labels.app_kubernetes_io_name}} is in Congested state due to Pending Message Count above threshold'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Congested state due to Pending Message Count above threshold'
Severity Major
Condition A pod of a service is in Congested state due to its Pending Message Count above configured thresholds. Currently this is applicable for NfSubscription service only.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7084
Metric Used ocnrf_pod_pending_message_count_congestion_state
Recommended Actions The alert is cleared when the pending message count goes below the configured thresholds for the congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No

5.4.8 Controlled Shutdown of NRF Feature

This section lists the alerts that are specific to Controlled Shutdown of NRF feature. For more information about the feature, see the "Controlled Shutdown of NRF" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.8.1 OcnrfOperationalStateCompleteShutdown

Table 5-99 OcnrfOperationalStateCompleteShutdown

Field Details
Description 'The operational state of NRF is Complete Shutdown.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The Operational state of NRF is Complete Shutdown'
Severity Warning
Condition The operator has changed the operational state of NRF to Complete Shutdown.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7085
Metric Used ocnrf_operational_state
Recommended Actions The alert is cleared when the user changes the operational state to NORMAL
  • If the alert is not cleared automatically after the operational state changes to NORMAL, collect the following:
    • all the logs as mentioned in the NrfConfiguration, Ingress Gateway, Egress Gateway, NrfAuditor microservices
    • the database dump from the site
    • REST output of operationalState, operationalStateHistory, and controlledShutdownOptions
  • Contact My Oracle Support.
Available in OCI No
5.4.8.2 OcnrfAuditOperationsPaused

Table 5-100 OcnrfAuditOperationsPaused

Field Details
Description 'The Audit procedures at NRF have been paused.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The Audit procedures at NRF has been paused'
Severity Warning
Condition The NrfAuditor microservice has paused all audit procedures.
This occurs during any of the following scenarios:
  1. The NRF is in COMPLETE_SHUTDOWN operational state or just transitioned from COMPLETE_SHUTDOWN to a NORMAL operational state.
  2. The database has been down for a prolonged period of time. To restore the database, see section "Database Corruption" in Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
  3. If the NrfAuditor pod has transitioned from READY to NOT_READY state.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7086
Metric Used ocnrf_audit_status
Recommended Actions The alert is expected to clear automatically, after the waiting period, and once all the above conditions are resolved.
  • If the alert is not cleared automatically, collect the following:
    • all the logs as mentioned in the NrfConfiguration microservice, and NrfAuditor pod logs,
    • the database dump from the site,
    • REST output of operationalState, operationalStateHistory, and controlled ShutdownOptions
  • Contact My Oracle Support.
Notes

NrfAuditor continues to remain in the paused state for some time, even after OcnrfOperationalStateCompleteShutdown alarm is cleared. For more information, see From CONTROLLED_SHUTDOWN to NORMAL subsection under "Controlled Shutdown of NRF" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Available in OCI No

5.4.9 Monitoring the Availability of SCP Using SCP Health APIs Feature

This section lists the alerts that are specific to Monitoring the Availability of SCP Using SCP Health APIs feature. For more information about the feature, see the "Monitoring the Availability of SCP Using SCP Health APIs" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.9.1 OcnrfAllSCPsMarkedAsUnavailable

Table 5-101 OcnrfAllSCPsMarkedAsUnavailable

Field Details
Description 'All SCPs have been marked unavailable.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : All SCPs have been marked as unavailable'
Severity Critical
Condition All SCPs have been marked unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7088
Metric Used 'oc_egressgateway_peer_count and oc_egressgateway_peer_available_count'
Recommended Actions NF clears the critical alarm when atleast 1 SCP peer in a peerset becomes available such that all other SCP peers in the given peerset are still unavailable.
Available in OCI Yes
5.4.9.2 OcnrfSCPMarkedAsUnavailable

Table 5-102 OcnrfSCPMarkedAsUnavailable

Field Details
Description 'An SCP has been marked unavailable.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : One of the SCP has been marked unavailable'
Severity Major
Condition One of the SCPs has been marked unhealthy.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7087
Metric Used oc_egressgateway_peer_health_status
Recommended Actions This alert gets cleared when unavailable SCPs become available.
Available in OCI Yes

5.4.10 CCA Header Validation in NRF for Access Token Service Operation Feature

This section lists the alerts that are specific to CCA Header Validation in NRF for Access Token Service Operation feature. For more information about the feature, see the "CCA Header Validation in NRF for Access Token Service Operation" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.10.1 OcnrfCcaRootCertificateExpiringIn4Hours

Table 5-103 OcnrfCcaRootCertificateExpiringIn4Hours

Field Details
Description 'The CCA Root Certificates expiring in 4 hours'.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : CCA Root Certificate is expiring in 4 Hours'
Severity Critical
Condition Indicates the expiry dates of the CCA Root certificates that are expiring in four hours.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7091
Metric Used 'oc_ingressgateway_cca_certificate_info'
Recommended Actions The alert is cleared when the expiring CCA root certificates are replaced with new ones.

Steps: Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.

Available in OCI No
5.4.10.2 OcnrfCcaRootCertificateExpiringIn1Day

Table 5-104 OcnrfCcaRootCertificateExpiringIn1Day

Field Details
Description 'The CCA Root Certificates expiring in 1 day'.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : CCA Root Certificate is expiring in 1 Day'
Severity Major
Condition Indicates the expiry dates of the CCA Root certificates that are expiring in one day.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7090
Metric Used 'oc_ingressgateway_cca_certificate_info'
Recommended Actions The alert is cleared when the expiring CCA root certificates are replaced with new ones.

Steps: Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.

Available in OCI No
5.4.10.3 OcnrfCcaRootCertificateExpiringIn5Days

Table 5-105 OcnrfCcaRootCertificateExpiringIn5Days

Field Details
Description 'The CCA Root Certificates expiring in 5 days.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : CCA Root Certificate is expiring in 5 Days'
Severity Minor
Condition Indicates the expiry dates of the CCA Root certificates that are expiring in five days.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7089
Metric Used 'oc_ingressgateway_cca_certificate_info'
Recommended Actions The alert is cleared when the expiring CCA root certificates are replaced with new ones.

Steps: Replace expiring certificate key pair with new ones. For more information on creating certificate key pair, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.

Available in OCI No

5.4.11 NRF Georedundancy Feature

This section lists the alerts that are specific to NRF Georedundancy feature. For more information about the feature, see the "NRF Georedundancy" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.11.1 OcnrfDbReplicationStatusInactive

Table 5-106 OcnrfDbReplicationStatusInactive

Field Details
Description 'The Database Replication Status is currently INACTIVE.'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nftype:{{$labels.NfType}}, nrflevel:{{$labels.NrfLevel}}, remoteNrfInstanceId: {{$labels.nrfInstanceId}}, remoteSiteName: {{$labels.siteName}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The database replication status is INACTIVE.'
Severity Critical
Condition The database replication channel status between the given site and the georedundant site(s) is inactive. The alert is raised per replication channel. The alarm is raised or cleared only if the georedundancy feature is enabled.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7013
Metric Used 'ocnrf_dbreplication_status'
Recommended Actions The alert is cleared when the database channel replication status between the given site and the georedundant site(s) is up. For more information on how to check the database replication status, see Oracle Communications Cloud Native Core, cnDBTier User Guide.
Notes The alarm is included only if the georedundancy feature is enabled.
Available in OCI No
5.4.11.2 OcnrfReplicationStatusMonitoringInactive

Table 5-107 OcnrfReplicationStatusMonitoringInactive

Field Details
Description 'OCNRF Replication Status Monitoring Inactive'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Pod {{ $labels.kubernetes_pod_name}} are not monitoring the replication status'
Severity Critical
Condition This alarm is raised when one or more pods are not monitoring the replication status.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7078
Metric Used ocnrf_replication_status_monitoring_inactive
Recommended Actions Resolution Steps:
  1. Identify the pod for which the alert is raised.
  2. Run the following command to restart the pod:

    kubectl delete pod <pod_name> -n <namespace>

Available in OCI No

5.4.12 XFCC Header Validation Feature

This section lists the alert that is specific to XFCC Header Validation feature. For more information about the feature, see the "XFCC Header Validation" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.12.1 OcnrfNfAuthenticationFailureRequestsRejected

Table 5-108 OcnrfNfAuthenticationFailureRequestsRejected

Field Details
Description 'Service request(s) received from NF have been rejected by OCNRF (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Request rejected for Nf FQDN based Authentication failure.'
Severity Warning
Condition NRF rejected a service request due to NF authentication failure
OID 1.3.6.1.4.1.323.5.3.36.1.2.7015
Metric Used 'ocnrf_nf_authentication_failure_total'
Recommended Actions The alert is cleared automatically.

Steps:

Filter out nfAccessToken application ERROR logs on Kibana for more details.
Available in OCI No

5.4.13 Enhanced NRF Set Based Deployment (NRF Growth) Feature

This section lists the alert that is specific to Enhanced NRF Set Based Deployment (NRF Growth) feature. For more information about the feature, see the "Enhanced NRF Set Based Deployment (NRF Growth)" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.13.1 OcnrfRemoteSetNrfSyncFailed

Table 5-109 OcnrfRemoteSetNrfSyncFailed

Field Details
Description 'A sync request to the NRF in the remote set has failed.'

Note: The alert must be configured only if the NRF Growth feature is enabled.

Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A sync request to the NRF in the remote set has failed.'
Severity Minor
Condition Sync request to the NRF in the remote NRF set has failed.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7098
Metric Used ocnrf_query_remote_cds_responses_total
Recommended Actions

The alert is cleared when the synchronization with the remote NRF set is successful.

Steps:

  1. Verify the remote NRF set is up.
  2. Verify the connectivity between the local NRF set and remote NRF set.
  3. Collect logs from local NRF and remote NRF(s). Contact My Oracle Support.
5.4.13.2 OcnrfSyncFailureFromAllNrfsOfAnyRemoteSet

Table 5-110 OcnrfSyncFailureFromAllNrfsOfAnyRemoteSet

Field Details
Description 'Sync requests to all the NRFs of a remote set has failed.'

Note: The alert must be configured only if the NRF Growth feature is enabled.

Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Sync requests to all the NRFs in any of the remote sets have failed'
Severity Major
Condition The sync requests to all the NRFs in the remote sets has failed.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7099
Metric Used ocnrf_remote_set_unavailable_total
Recommended Actions The alert is cleared when synchronization is successful with at least one NRF of the remote NRF set.

Steps:

  1. Verify the remote NRF sets are up.
  2. Verify the host details configured in the nrfHostConfig attribute using REST API. For more information about the attribute, see Oracle Communications, Cloud Native Core Network Repository Function REST Specifications Guide.
  3. Verify the connectivity between the local NRF set and remote NRF set.
  4. Collect logs from local NRF and remote NRF(s). Contact My Oracle Support.
Available in OCI No
5.4.13.3 OcnrfSyncFailureFromAllNrfsOfAllRemoteSets

Table 5-111 OcnrfSyncFailureFromAllNrfsOfAllRemoteSets

Field Details
Description 'Sync request to all the NRFs in all the remote sets have failed.'

Note: The alert must be configured only if the NRF Growth feature is enabled.

Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Sync request to all the NRFs in all the remote sets have failed'
Severity critical
Condition Sync requests to all the NRFs in all the remote sets have failed.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7100
Metric Used ocnrf_all_remote_sets_unavailable_total
Recommended Actions

The alert is cleared when synchronization is successful with at least one NRF of the remote set(s).

Steps:

  1. Verify the remote NRF sets are up.
  2. Verify the host details configured in the nrfHostConfig attribute using REST API. For more information about the attribute, see Oracle Communications, Cloud Native Core Network Repository Function REST Specifications Guide.
  3. Verify the connectivity between the local NRF set and remote NRF set.
  4. Collect logs from local NRF and remote NRF(s). Contact My Oracle Support.
Available in OCI No
5.4.13.4 OcnrfCacheDataServiceDown

Table 5-112 OcnrfCacheDataServiceDown

Field Details
Description 'OCNRF NrfCacheData service {{$labels.app_kubernetes_io_name}} is down'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : Cache Data Service is down'
Severity Critical
Condition Cache Data Service is unavailable.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7101
Metric Used up
Recommended Actions

The alert is cleared when the Cache Data Service (CDS) is available.

Steps:
  1. To check the orchestration logs of the CDS and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on service names. Check for ERROR WARNING logs.
  3. Check the DB status. For more information on how to check the DB status, see Oracle Communications Cloud Native Core, cnDBTier User Guide. Depending on the failure reason, take the resolution steps.
  4. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

Note: Use the CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using the Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI No
5.4.13.5 OcnrfDatabaseFallbackUsed

Table 5-113 OcnrfDatabaseFallbackUsed

Field Details
Description 'A service operation is unable to get data from the Cache Data Service, and hence gets the data from the cnDBTier to fulfill the service operation'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : A service Operation is unable to get data from the Cache Data Service, so falling back to DB'
Severity Major
Condition When a service operation is unable to get data from the Cache Data Service, and hence gets the data from the database to fulfill the service operation.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7102
Metric Used ocnrf_db_fallback_total
Recommended Actions

The alert is cleared automatically.

Steps:

  1. To check the orchestration logs of the CDS and check for liveness or readiness probe failures, do the following:
    1. Run the following command to check the pod status:
      $ kubectl get po -n <namespace>
    2. Run the following command to analyze the error condition of the pod that is not in the running state:
      $ kubectl describe pod <pod name not in Running state> -n <namespace>

      Where <pod name not in Running state> indicates the pod that is not in the Running state.

  2. Refer to the application logs on Kibana and filter based on service names. Check for ERROR WARNING logs.
  3. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.
Available in OCI No
5.4.13.6 OcnrfTotalNFsRegisteredAtSegmentBelowMinorThreshold

Table 5-114 OcnrfTotalNFsRegisteredAtSegmentBelowMinorThreshold

Field Details
Description The alert is raised when the number of NFs registered at the segment is below the configured minor threshold.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The number of NFs registered at the segment is below minor threshold'
Severity Minor
Condition The number of NFs registered at the segment is below minor threshold.

Note: This alert is triggered when the registered NF count is greater than or equal to 10 and below 20. This default value can be modified in the ocnrf_alertrules_25.1.200.yaml or ocnrf_alertrules_promha_25.1.200.yaml file depending on Prometheus version.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7103
Metric Used ocnrf_nf_registered_count
Recommended Actions

The alert is cleared when the number of registered NFs in the segment is above the minor threshold.

Steps:

  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running in all NRF Sets.
  2. Check if the Ingress Gateway pod is up and running in all NRF sets.
    kubectl get po -n <namespace>
  3. Validate that the CDS synchronization with remote NRF sets is successful. Validate below alerts are not present in the system:
    1. OcnrfSyncFailureFromAllNrfsOfAnyRemoteSet
    2. OcnrfSyncFailureFromAllNrfsOfAllRemoteSets
  4. Check for registration pod logs on Kibana for ERROR WARN logs.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use the CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using the Data Collector tool, see Oracle Communications Cloud Native Core, cnDBTier User Guide.

Available in OCI No
5.4.13.7 OcnrfTotalNFsRegisteredAtSegmentBelowMajorThreshold

Table 5-115 OcnrfTotalNFsRegisteredAtSegmentBelowMajorThreshold

Field Details
Description The alert is raised when the number of NFs registered at the segment is below the configured major threshold.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The number of NFs registered at the segment is below major threshold
Severity Major
Condition The number of NFs registered at the segment is below major threshold.

Note: This alert is triggered when the registered NF count is greater than or equal to 2 and below 10. This default value can be modified in the ocnrf_alertrules_25.1.200.yaml or ocnrf_alertrules_promha_25.1.200.yaml file depending on Prometheus version.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7104
Metric Used ocnrf_nf_registered_count
Recommended Actions

The alert is cleared when the number of registered NFs in the segment is above the major threshold.

Steps:

  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running in all NRF sets.
  2. Check if the Ingress Gateway pod is up and running in all NRF sets.
    kubectl get po -n <namespace>
    
  3. Validate that the CDS synchronization with remote NRF sets is successful. Validate below alerts are not present in the system:
    1. OcnrfSyncFailureFromAllNrfsOfAnyRemoteSet
    2. OcnrfSyncFailureFromAllNrfsOfAllRemoteSets
  4. Check for registration pod logs on Kibana for ERROR WARN logs.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use the CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using the Data Collector tool, see Oracle Communications Cloud Native Core, cnDBTier User Guide.

Available in OCI No
5.4.13.8 OcnrfTotalNFsRegisteredAtSegmentBelowCriticalThreshold

Table 5-116 OcnrfTotalNFsRegisteredAtSegmentBelowCriticalThreshold

Field Details
Description The alert is raised when the number of NFs registered at the segment is below the configured critical threshold.
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The number of NFs registered at the segment is below critical threshold'
Severity Critical
Condition The number of NFs registered at the segment is below critical threshold.

Note: This alert is triggered when the registered NF count is below 2. This default value can be modified in the ocnrf_alertrules_25.1.200.yaml or ocnrf_alertrules_promha_25.1.200.yaml file depending on Prometheus version.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7105
Metric Used ocnrf_nf_registered_count
Recommended Actions

The alert is cleared when the number of registered NFs in the segment is above the critical threshold.

Steps:

  1. Check if there is traffic for requests other than registration (for example, discovery requests). This ensures that NRF FQDN is reachable from other NFs and Ingress Gateway is up and running in all NRF sets.
  2. Check if the Ingress Gateway pod is up and running in all NRF sets.
    kubectl get po -n <namespace>
  3. Validate that the CDS synchronization with remote NRF sets is successful. Validate below alerts are not present in the system:
    1. OcnrfSyncFailureFromAllNrfsOfAnyRemoteSet
    2. OcnrfSyncFailureFromAllNrfsOfAllRemoteSets
  4. Check for registration pod logs on Kibana for ERROR WARN logs.
  5. In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.

    Note: Use the CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using the Data Collector tool, see Oracle Communications Cloud Native Core, cnDBTier User Guide.

Available in OCI No

5.4.14 Ingress Gateway Pod Protection Feature

This section lists the alerts that are specific to Ingress Gateway Pod Protection feature. For more information about the feature, see the "Ingress Gateway Pod Protection" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.14.1 OcnrfIngressGatewayPodInDangerOfCongestionState

Table 5-117 OcnrfIngressGatewayPodInDangerOfCongestionState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} is in Danger of Congestion state'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Danger of Congestion state'
Severity Major
Condition

When Ingress Gateway pod is in Danger Of Congestion state.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7092
Metric Used oc_ingressgateway_pod_congestion_state
Recommended Actions The alert is cleared when the pod is out of Danger Of Congestion (DoC) state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.14.2 OcnrfIngressGatewayPodInCongestedState

Table 5-118 OcnrfIngressGatewayPodInCongestedState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} is in Congested state'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Congested state'
Severity Critical
Condition

When Ingress Gateway pod is in Congested state.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7093
Metric Used oc_ingressgateway_pod_congestion_state
Recommended Actions The alert is cleared when the pod is out of Congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.14.3 OcnrfIngressGatewayPodCpuUsageInCongestedState

Table 5-119 OcnrfIngressGatewayPodCpuUsageInCongestedState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} is in Congested state'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Congested state'
Severity Critical
Condition

Ingress Gateway pod is in Congested state due to CPU consumption above the configured thresholds.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7094
Metric Used oc_ingressgateway_pod_resource_state
Recommended Actions The alert is cleared when the CPU consumption goes below the configured thresholds for the Congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.14.4 OcnrfIngressGatewayPodCpuUsageInDangerOfCongestionState

Table 5-120 OcnrfIngressGatewayPodCpuUsageInDangerOfCongestionState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} is in Danger of Congestion state due to CPU usage above threshold'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Danger of Congestion state due to CPU usage above threshold'
Severity Major
Condition

Ingress Gateway pod is in Danger of Congestion state due to CPU consumption above the configured thresholds.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7095
Metric Used oc_ingressgateway_pod_resource_state
Recommended Actions The alert is cleared when the CPU consumption is not as per the configured thresholds value for the Danger of Congestion state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.14.5 OcnrfIngressGatewayPodPendingMessageInCongestedState

Table 5-121 OcnrfIngressGatewayPodPendingMessageInCongestedState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} is in Congested state'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Congested state'
Severity Critical
Condition

Ingress Gateway pod is in Congested state due to pending message count above the configured thresholds.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7096
Metric Used oc_ingressgateway_pod_resource_state
Recommended Actions The alert is cleared when the pending message count is not as per the configured thresholds value for the Congested state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No
5.4.14.6 OcnrfIngressGatewayPodPendingMessageInDangerOfCongestionState

Table 5-122 OcnrfIngressGatewayPodPendingMessageInDangerOfCongestionState

Field Details
Description 'The pod {{$labels.kubernetes_pod_name}} is in Danger of Congestion state due to Pending Message above threshold'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}},podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : The pod is in Danger of Congestion state due to Pending Message above threshold'
Severity Major
Condition

Ingress Gateway pod is in Danger of Congestion state due to pending message count above the configured thresholds.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7097
Metric Used oc_ingressgateway_pod_resource_state
Recommended Actions The alert is cleared when the pending message count is not as per the configured thresholds value for the Danger of Congestion state.

Note: The thresholds can be viewed using REST API.

Steps:

Reassess if the NRF is receiving additional traffic.

If this is unexpected, contact My Oracle Support.

  1. Refer to alert to determine which pod is receiving high traffic. It may due to a sudden spike in traffic. For example: When one mated site goes down, the NFs move to the given site. Check if NF is sending high number of updates, register, or deregister.
  2. Check the service pod logs on Kibana to determine the reason for the errors.
  3. If this is expected traffic, then the thresholds levels may need to be re-evaluated as per the call rate and reconfigured as mentioned in Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
Available in OCI No

5.4.15 Subscriber Location Function Feature

This section lists the alert that is specific to Subscriber Location Function feature. For more information about the feature, see the "Subscriber Location Function Feature" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.15.1 OcnrfMaxSlfAttemptsExhausted

Table 5-123 OcnrfMaxSlfAttemptsExhausted

Field Details
Description 'NF discovery request with fqdn {{$labels.NfProfileFqdn}} NF type {{$labels.NfType}} has exhausted maximum SLF attempts'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, nrflevel:{{$labels.NrfLevel}}, podname: {{$labels.kubernetes_pod_name}}, NfProfileFqdn: {{$labels.NfProfileFqdn}}, NfType: {{$labels.NfType}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: The maximum slf attempts have exhausted.'
Severity Critical
Condition

NF discovery request with FQDN of the given NFType UDR has exhausted maximum SLF attempts. This alert is raised when the ocnrf_max_slf_attempts_exhausted_total metric is pegged.

Note: This alert is included if SLF selection from registered profiles is enabled.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7054
Metric Used 'ocnrf_max_slf_attempts_exhausted_total'
Recommended Actions

The alert is cleared automatically after 5 minutes.

Steps:
  1. Check logs in NF discovery pod to check the reason for SLF query failures.
  2. In DISCOVERED_SLF_CONFIG_MODE, make sure that SLFs are registered with valid IPV4, PV6, or FQDN information. Verify the same in the slfDiscoveredCandidateList from the slfOptions.
  3. In STATIC_SLF_CONFIG_MODE, verify if slfHostConfig details are configured correctly.

    Note: Use CNC NF Data Collector tool for capturing logs. For more information on how to collect logs using Data Collector tool, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.

Available in OCI Yes

5.4.16 EmptyList in Discovery Response Feature

This section lists the alert that is specific to EmptyList in Discovery Response feature. For more information about the feature, see the "EmptyList in Discovery Response" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.16.1 OcnrfNFDiscoveryEmptyListObservedNotification

Table 5-124 OcnrfNFDiscoveryEmptyListObservedNotification

Field Details
Description 'Empty List observed with received discovery request with NfType $labels.NfType Feature Status $labels.FeatureStatus'
Summary 'namespace: $labels.namespace, nrflevel:$labels.NrfLevel, podname: $labels.pod, NfType: $labels.NfType, FeatureStatus: $labels.FeatureStatus: Empty List observed with received discovery request'
Severity Critical
Condition

This alarm is raised when profiles do not match the discovery request.

Also, this alarm is raised when the SUSPENDED profile is in response to incoming request and Empty List feature is enabled.

OID 1.3.6.1.4.1.323.5.3.36.1.2.7055
Metric Used ocnrf_nfDiscover_emptyList_total
Recommended Actions

The alert is cleared automatically after a duration of 5 minutes.

Steps:

  1. Collect the logs.
  2. Check logs for the following conditions:
    • Verify if the NF has sent Empty List in response in NRF Discovery.
    • Check if NF has sent SUSPENDED profiles in response for incoming requests when EmptyList feature is ENABLED.
    • If the response is not Empty List or does not contain SUSPENDED profiles.
  3. If the alert still persists, contact My Oracle Support.
Note: Use CNC NF Data Collector tool for capturing logs. For more details, see Oracle Communications Cloud Native Core, Network Function Data Collector User Guide.
Available in OCI No

5.4.17 Support for TLS Feature

This section lists the alert that is specific to Support for TLS feature. For more information about the feature, see the "Support for TLS" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

5.4.17.1 OcnrfTLSCertificateExpireMinor

Table 5-125 OcnrfTLSCertificateExpireMinor

Field Details
Description 'TLS certificate to expire in 6 months'.
Summary 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : TLS certificate to expire in 6 months'
Severity Minor
Condition This alert is raised when the TLS certificate is about to expire in six months.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7106
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

The alert is cleared when the TLS certificate is renewed.

For more information about certificate renewal, see "Creating Private Keys and Certificate " section in the Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.

Available in OCI No
5.4.17.2 OcnrfTLSCertificateExpireMajor

Table 5-126 OcnrfTLSCertificateExpireMajor

Field Details
Description 'TLS certificate to expire in 3 months.'
Summary 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : TLS certificate to expire in 3 months'
Severity Major
Condition This alert is raised when the TLS certificate is about to expire in three months.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7107
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

The alert is cleared when the TLS certificate is renewed.

For more information about certificate renewal, see "Creating Private Keys and Certificate" section in the Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.

Available in OCI No
5.4.17.3 OcnrfTLSCertificateExpireCritical

Table 5-127 OcnrfTLSCertificateExpireCritical

Field Details
Description 'TLS certificate to expire in one month.'
Summary 'namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} : TLS certificate to expire in 1 month'
Severity Critical
Condition This alert is raised when the TLS certificate is about to expire in one month.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7108
Metric Used security_cert_x509_expiration_seconds
Recommended Actions

The alert is cleared when the TLS certificate is renewed.

For more information about certificate renewal, see "Creating Private Keys and Certificate" section in the Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.

Available in OCI No

5.4.18 Egress Gateway Pod Throttling

5.4.18.1 OcnrfEgressPerPodDiscardRateAboveMajorThreshold

Table 5-128 OcnrfEgressPerPodDiscardRateAboveMajorThreshold

Field Details
Description 'Egressgateway PerPod Discard Rate is greater than the configured major threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Egressgateway PerPod Discard Rate is more than 1 request per second.'
Severity Major
Condition This alert is raised when the Egress Gateway pods discard traffic due to its request limit is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7113
Metric Used oc_egressgateway_podlevel_throttling_discarded_total
Recommended Actions The alert is cleared when the Egress Gateway pods discard traffic rate falls below the major threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Egress Gateway section in Grafana to determine which service is sending high traffic.
  2. Refer Egress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Egress Gateway logs on Kibana to determine the reason for the errors.
Available in OCI No
5.4.18.2 OcnrfEgressPerPodDiscardRateAboveCriticalThreshold

Table 5-129 OcnrfEgressPerPodDiscardRateAboveCriticalThreshold

Field Details
Description 'Egressgateway PerPod Discard Rate is greater than the configured critical threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Egressgateway PerPod Discard Rate is more than 100 requests per second.’
Severity Critical
Condition This alert is raised when the Egress Gateway pods discard traffic due to its request limit is greater than the configured threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7114
Metric Used oc_egressgateway_podlevel_throttling_discarded_total
Recommended Actions The alert is cleared when the Egress Gateway pods discard traffic rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Egress Gateway section in Grafana to determine which service is sending high traffic.
  2. Refer Egress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Check Egress Gateway logs on Kibana to determine the reason for the errors.
Available in OCI No

5.4.19 Ingress Gateway Pod Protection Using Rate Limiting

5.4.19.1 OcnrfIngressDiscardDueToRateLimitMajorThreshold

Table 5-130 OcnrfIngressDiscardDueToRateLimitMajorThreshold

Field Details
Description 'Ingress Gateway discards due to rate limit exceeds the configured major threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Ingressgateway Discard due to Rate Limit is more than or equal to 1 requests per second and less than 100 requests per second.'
Severity Major
Condition This alert is raised when Ingress Gateway discard requests as rate limit exceeds the configured major threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7129
Metric Used oc_ingressgateway_http_request_ratelimit_denied_count_total
Recommended Actions

The alert is cleared when the Ingress Gateway pods discard traffic rate falls below the major threshold or exceeds the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Ingress Gateway Pod Protection By Rate Limit section in Grafana to determine which pods are overloaded and the rate of traffic received.
  2. Refer Ingress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Refer the Grafana dashboard to determine which service traffic is above expectation.
Available in OCI No
5.4.19.2 OcnrfIngressDiscardDueToRateLimitCriticalThreshold

Table 5-131 OcnrfIngressDiscardDueToRateLimitCriticalThreshold

Field Details
Description 'Ingress gateway discards due to rate limit exceeds the configured critical threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Ingressgateway Discard due to Rate Limit is more than or equal to 100 requests per second.'
Severity Critical
Condition This alert is raised when Ingress Gateway discard requests as rate limit exceeds the configured critical threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7130
Metric Used oc_ingressgateway_http_request_ratelimit_denied_count_total
Recommended Actions

The alert is cleared when the Ingress Gateway pods discard traffic rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Ingress Gateway Pod Protection By Rate Limit section in Grafana to determine which pods are overloaded and the rate of traffic received.
  2. Refer Ingress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Refer the Grafana dashboard to determine which service traffic is above expectation.
Available in OCI No

5.4.20 Egress Gateway Pod Protection Using Rate Limiting

5.4.20.1 OcnrfEgressDiscardDueToRateLimitMajorThreshold

Table 5-132 OcnrfEgressDiscardDueToRateLimitMajorThreshold

Field Details
Description 'Egress Gateway discards due to rate limit exceeds the configured major threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Egressgateway Discard due to Rate Limit is more than or equal to 1 requests per second and less than 100 requests per second.'
Severity Major
Condition This alert is raised when Egress Gateway discard requests as rate limit exceeds the configured major threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7131
Metric Used oc_egressgateway_http_request_ratelimit_denied_count_total
Recommended Actions

The alert is cleared when the Egress Gateway pods discard traffic rate falls below the major threshold or exceeds the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Egress Gateway Pod Protection By Rate Limit section in Grafana to determine which pods are overloaded and the rate of traffic received.
  2. Refer Egress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Refer the Grafana dashboard to determine which service traffic is above expectation.
Available in OCI No
5.4.20.2 OcnrfEgressDiscardDueToRateLimitCriticalThreshold

Table 5-133 OcnrfEgressDiscardDueToRateLimitCriticalThreshold

Field Details
Description 'Egress gateway discards due to rate limit exceeds the configured critical threshold. (current value is: {{ $value }})'
Summary 'kubernetes_namespace: {{$labels.kubernetes_namespace}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Egressgateway Discard due to Rate Limit is more than or equal to 100 requests per second.'
Severity Critical
Condition This alert is raised when Egress Gateway discard requests as rate limit exceeds the configured critical threshold.
OID 1.3.6.1.4.1.323.5.3.36.1.2.7132
Metric Used oc_egressgateway_http_request_ratelimit_denied_count_total
Recommended Actions

The alert is cleared when the Egress Gateway pods discard traffic rate falls below the critical threshold.

Note: The threshold is configurable in the alert file. Reassess why the NRF is receiving additional traffic (for example, Mated site NRF is unavailable in georedundancy scenario). If this alert is unexpected, contact My Oracle Support.

Steps:

  1. Refer Egress Gateway Pod Protection By Rate Limit section in Grafana to determine which pods are overloaded and the rate of traffic received.
  2. Refer Egress Gateway section in Grafana to determine the increase in 4xx and 5xx error codes.
  3. Refer the Grafana dashboard to determine which service traffic is above expectation.
Available in OCI No

5.5 NRF Alert Configuration

NRF Alert Configuration

Follow the steps below for NRF Alert configuration in Prometheus:

Note:

  1. The Name is the release name used in helm install command.
  2. The Namespace is the namespace used in helm install command. By default Namespace for NRF is ocnrf that must be update as per the deployment.
  3. The ocnrf-config-1.1.0.0.0.zip file can be downloaded from OHC.

    Unzip the ocnrf-config-1.1.0.0.0.zip package after downloading to get NrfAlertrules.yaml file.

  1. Take Backup of current configuration map of Prometheus:
    kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
  2. Check and Add NRF Alert file name inside Prometheus configuration map:
    sed -i '/etc\/config\/alertsnrf/d' /tmp/tempConfig.yaml
    sed -i '/rule_files:/a\  \- /etc/config/alertsnrf' /tmp/tempConfig.yaml
  3. Update configuration map with updated file name of NRF alert file:
    kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
  4. Add NRF Alert rules in configuration map under file name of NRF alert file:
    kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch
    "$(cat ~/NrfAlertrules.yaml)"

Note:

Prometheus server takes updated configuration map that is automatically reloaded after 60 seconds approximately. Refresh the Prometheus GUI to confirm that the NRF Alerts are loaded.

5.5.1 Disable Alerts

This section explains the procedure to disable the alerts in NRF.
  1. Edit NrfAlertrules-25.1.200.yaml file to remove a specific alert.
  2. Remove complete content of a specific alert from the NrfAlertrules-25.1.200.yaml file.
    For example: If you want to remove OcnrfTrafficRateAboveMinorThreshold alert, remove the complete content:
    ## ALERT SAMPLE START##
    
          - alert: OcnrfTrafficRateAboveMinorThreshold
            annotations:
              description: 'Ingress traffic Rate is above minor threshold i.e. 800 mps (current value is: {{ $value }})'
              summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
            expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="ocnrf"}[2m])) >= 800 < 900
            labels:
              severity: Minor
    ## ALERT SAMPLE END##
  3. Perform Alert configuration. For more information about configuring alerts, see NRF Alert Configuration section.

5.5.2 Configuring SNMP Notifier

This section describes the procedure to configure SNMP Notifier.

Configure the IP and port of the SNMP trap receiver in the SNMP Notifier using the following procedure:
  1. Run the following command to edit the deployment:
    $ kubectl edit deploy <snmp_notifier_deployment_name> -n <namespace>

    Example:

    $ kubectl edit deploy occne-snmp-notifier -n occne-infra

    SNMP deployment yaml file is displayed.

  2. Edit the SNMP destination in the deployment yaml file as follows:
    --snmp.destination=<destination_ip>:<destination_port>

    Example:

    --snmp.destination=10.75.203.94:162
  3. Save the file.
Checking SNMP Traps
Following is an example on how to capture the logs of the trap receiver server to view the generated SNMP traps:
$ docker logs <trapd_container_id>
Sample output:
2020-04-29 15:34:24 10.75.203.103 [UDP: [10.75.203.103]:2747->[172.17.0.4]:162]:DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (158510800) 18 days, 8:18:28.00        SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003    SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.1 = STRING: "1.3.6.1.4.1.323.5.3.36.1.2.7003[]"  SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.2 = STRING: "critical"      SNMPv2-SMI::enterprises.323.5.3.36.1.2.7003.3 = STRING: "Status: critical- Alert: OcnrfActiveSubscribersBelowCriticalThreshold  Summary: namespace: ocnrf, nftype:5G_EIR, nrflevel:6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, podname: ocnrf-nrfauditor-6b459f5db5-4kvt4,
        timestamp: 2020-04-29 15:33:24.408 +0000 UTC: Current number of registered NFs detected below critical threshold.  Description: The number of registered NFs detected below critical threshold (current value
          is: 0)
MIB Files for NRF

There are two MIB files which are used to generate the traps. The user need to update these files along with the Alert file in order to fetch the traps in their environment.

  • ocnrf_mib_tc_25.1.200.mib

    This is considered as NRF top level mib file, where the objects and their data types are defined.

  • ocnrf_mib_25.1.200.mib

    This file fetches the objects from the top level mib file and based on the alert notification, these objects can be selected for display.

  • toplevel_25.1.200.mib: This defines the OIDs for all NFs.

Note:

MIB files are packaged along with the release package. Download the file from MOS. For more information on downloading the release package, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.