10 OCNWDAF Alerts

This chapter describes the following information about OCNWDAF alerts and KPIs:

10.1 OCNWDAF Alert Configuration

This section describes the measurement based alert rules configuration for OCNWDAF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

OCNWDAF Alert configuration in Prometheus

The following procedure is used to configure alerts in Prometheus:

  1. Download the ocn-nwdaf-alerting-rules.yaml file. Edit this file to configure the alert rules. The parameters in the file that can be edited include name of the alert, rules for the alert including alert name and the expression expr defined to trigger the alert.
  2. Copy the updated ocn-nwdaf-alerting-rules.yaml file to Bastion Host.
  3. Run the following command:

    kubectl apply -f ocn-nwdaf-alerting-rules.yaml -n ocn-nwdaf

  4. To verify if the Custom Resource Definition (CRD) is created, run the following command:

    kubectl get prometheusrule -n ocn-nwdaf

  5. Verify the alerts in the Prometheus GUI, the alert name and expression is listed. See example below:

    Figure 10-1 Prometheus GUI


    Prometheus GUI

Alert Rules

The alerts are configured on the Prometheus server. The metrics scraped correspond to a pod that runs a single microservice, so each alert belongs to one of the pods running. Prometheus continously collects metrics and when any of the alerting rules are met, the alert is triggered. All the alert rules are written in one or multiple .yml files and deployed as described in procedure OCNWDAF Alert configuration in Prometheus. Listed below are the alert rules for the various alerts captured for OCNWDAF:

Status Alert Rule
- name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: up{app="SERVICE LABEL"} == 0
Example:
 - name: OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING
    rules:
    - alert: OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING
      expr: up{app="ocn-nwdaf-data-collection"} == 0
Traffic Alert Rule
  • Request rate rule:

    - name: <ALERT NAME>
        rules:
        - alert: <ALERT NAME>
          expr: >
          sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="<URI ENDPOINT>"}[1m])) > 1000
    
    Example:
      - name: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE
        rules:
        - alert: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE
          expr: sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR"}[1m])) > 1000
  • Failure rate request rule:

    - name: <ALERT NAME>
        rules:
        - alert: <ALERT NAME>
          expr: >
         (sum without(method,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="<URI ENDPOINT>",status=~"[4-5].."}[1m]))/ ignoring(status) sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="<URI ENDPOINT>"}[1m]))) * 100 > 70
    Example:
     - name: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE
        rules:
        - alert: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE
          expr: (sum without(method,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR",status=~"[4-5].."}[1m]))/ ignoring(status) sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR"}[1m]))) * 100 > 70
CPU Alert Rule
- name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: system_cpu_usage{app="<SERVICE LABEL>"} * 100 > 80
Example:
 - name: OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD
    rules:
    - alert: OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD
      expr: system_cpu_usage{app="ocn-nwdaf-data-collection"} * 100 > 80
JVM Memory Usage Alert Rule
 - name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: >

      (sum(avg_over_time(jvm_memory_used_bytes{area="heap",app="<SERVICE LABEL>"} [1m]))/sum(avg_over_time(jvm_memory_max_bytes{area="heap",app="<SERVICE LABEL>"}[1m]))) * 100 > 80
Example:
 - name: OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE
    rules:
    - alert: OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE
      expr: (sum(avg_over_time(jvm_memory_used_bytes{area="heap",app="ocn-nwdaf-data-collection"} [1m]))/sum(avg_over_time(jvm_memory_max_bytes{area="heap",app="ocn-nwdaf-data-collection"}[1m]))) * 100 > 80

10.1.1 SNMP Support

Simple Network Management Protocol (SNMP) is an application-layer protocol designed for monitoring and managing network devices within a Local Area Network (LAN) or Wide Area Network (WAN).

OCNWDAF forwards the Prometheus alerts as Simple Network Management Protocol (SNMP) traps to the southbound SNMP servers. OCNWDAF uses two SNMP MIB files to generate the traps. Update the alertmanager.yaml file to configure the alert manager. In the alertmanager.yaml file, the alerts can be grouped based on podname, alertname, severity, namespace, and so on. The Prometheus Alert Manager is integrated with Oracle Communications Cloud Native Core, Cloud Native Environment (CNE) snmp-notifier service. The external SNMP servers are set up to receive the Prometheus alerts as SNMP traps. The operator must update the MIB and alert manager files to fetch the SNMP traps in their environment.

Configuring SNMP Support

The alertmanager.yaml file is updated to include additional information for SNMP traps.

Sample of the alertmanager.yaml file:

{{- range $key, $svcName := .Values.global.rules.services }}
- alert: {{ $svcName | replace "-" "_" | upper }}_HIGH_CPU_LOAD
  expr: system_cpu_usage{job={{ $svcName | quote }}, namespace={{ $.Release.Namespace | quote }} } * 100 > 90
  for: 5m
  labels:
    alertname: "OCN_NWDAF_SVC_HIGH_CPU_LOAD"
    oid: "1.3.6.1.4.1.323.5.3.45.1.{{ index $.Values.global.rules.oid $svcName }}.4002"
    severity: critical
    namespace: {{ $.Release.Namespace | quote }}
  annotations:
    namespace: {{ $.Release.Namespace | quote }}
    severity: critical
    summary: "Service {{ "{{$labels.app}}" }} CPU load is high."
    description: "Service {{ "{{$labels.app}}" }} CPU load has been high for more than 5 minutes."
{{- end }}

Configure the SNMP Test Client

Follow the steps below to configure the SNMP Test Client:

  1. Create a ConfigMap that includes the MIB files. Run the following command:
    kubectl create configmap my-config --from-file=/path/to/mib/files/ -n <namespace>

    Where, my-config is the name of the ConfigMap. The same has to be used in the pod configuration file. The ConfigMap must be in the same namespace where the SNMP client is deployed.

  2. To start the SNMP trap daemon service, use the service configuration .yaml file, see the example below:
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        name: snmptrapd
      name: snmptrapd
      namespace: performance-idc  // namespace in which you want to deploy the service
    spec:
      ports:
      - name: snmptrapd
        port: 162
        protocol: UDP
        targetPort: 162
      selector:
        name: snmptrapd
      sessionAffinity: None
      type: ClusterIP
  3. Use the following pod deployment configuration to deploy the pod corresponding to the above service. The commands mentioned in this file add the MIB files from the ConfigMap to the pod, following which the SNMP trap daemon service application starts.

    Sample docker file:

    FROM ocr-docker-remote.artifactory.oci.oraclecorp.com/os/oraclelinux:8-slim
    ARG HTTPS_PROXY=http://www-proxy.us.oracle.com:80
    RUN echo -e "[main]\nproxy=${HTTPS_PROXY}" >> /etc/dnf/dnf.conf
    RUN microdnf update -y && microdnf install -y lsof
    RUN microdnf install net-snmp
    ADD snmptrapd.conf /etc/snmp/snmptrapd.conf
    EXPOSE 162
    CMD ["/bin/sh"]

    Sample snmp-pod.yaml file:

    apiVersion: v1
    kind: Pod
    metadata:
      name: snmptrapd
      labels:
        name: snmptrapd
        role: snmptrapd
      namespace: performance-idc   // namespace in which you want to deploy the pod
    spec:
      containers:
      - name: snmptrapd
        image: occne-repo-host:5000/snmptrapd:1.1.1  // you need to create you own snmptrapd image using dockerfile
        volumeMounts:
          - name: config-volume
            mountPath: /MIB
        imagePullPolicy: IfNotPresent
        command: ["/bin/bash","-c","kill -9 $(lsof -t -i:162); cp /MIB/* /usr/share/snmp/mibs && echo MIB files copied successfully ;snmptrapd -m ALL -f -Of -Lo"]
        ports:
        - containerPort: 162
          protocol: UDP
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: "1"
            memory: 1Gi
      volumes:
      - name: config-volume
        configMap:
          name: my-config
  4. Run the following commands to deploy the service and pod in the SNMP Client:
    kubectl apply -f snmp-pod.yaml
    kubectl apply -f snmp-svc.yam
  5. Run the following command to view the pod logs:
    $ kubectl logs pod/snmptrapd -n performance-idc

    Sample of the pod logs:

    kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
    MIB files copied successfully
    NET-SNMP version 5.7.2

Integrate the Alert Manager with snmp-notifier Service

Update the SNMP client destination in occne-snmp-notifier service with the SNMP destination client IP:

$ kubectl edit deployment -n occne-infra occne-snmp-notifier
Update the field --snmp.destination=<IP>:<port> inside the args of container and add the snmp-client destination IP as follows:
- --snmp.destination=<fqdn of target receiver>:162

Verify the Traps

Run the following command to verify the traps:

$ kubectl logs pod/snmptrapd -n performance-idc -f

Sample output:

2024-03-11 11:31:10 10-233-87-165.occne-snmp-notifier.occne-infra.svc.blurr8 [UDP: [10.233.87.165]:46951->[10.233.116.34]:162]:
.iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.sysUpTimeInstance = Timeticks: (147060000) 17 days, 0:30:00.00        .iso.org.dod.internet.snmpV2.snmpModules.snmpMIB.snmpMIBObjects.snmpTrap.snmpTrapOID.0 = OID: .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown     .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown.1 = STRING: "1.3.6.1.4.1.323.5.3.45.1.33.2002[job=ocn-nwdaf-georedagent]"        .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown.2 = STRING: "critical"  .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown.3 = STRING: "Status: critical
- Alert: OCN_NWDAF_GEOREDAGENT_NOT_RUNNING
  Summary: Service is down.
  Description: Service has been down for more than 2 minutes."
2024-03-11 11:35:38 10-233-87-165.occne-snmp-notifier.occne-infra.svc.blurr8 [UDP: [10.233.87.165]:56385->[10.233.116.34]:162]:
.iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.sysUpTimeInstance = Timeticks: (147086800) 17 days, 0:34:28.00        .iso.org.dod.internet.snmpV2.snmpModules.snmpMIB.snmpMIBObjects.snmpTrap.snmpTrapOID.0 = OID: .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown  .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown.1 = STRING: "1.3.6.1.4.1.323.5.3.45.1.24.2002[job=cap4c-model-controller]"    .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown.2 = STRING: "critical"        .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown.3 = STRING: "Status: critical
- Alert: CAP4C_MODEL_CONTROLLER_NOT_RUNNING
  Summary: Service is down.
  Description: Service has been down for more than 2 minutes."

Figure 10-2 Prometheus GUI


Prometheus GUI

OCNWDAF MIB Files

Two OCNWDAF MIB files are used to generate the traps. The operator has to update the MIB files and the alert manager file to obtain the traps in their environment. The files are:
  • OCNWDAF-MIB-TC-24.2.0.mib: This is a top level mib file, where the objects and their data types are defined.
  • OCNWDAF-MIB-24.2.0.mib: This file fetches the objects from the top level mib file and based on the alert notification, the objects are selected for display.

OID Definition for OCNWDAF Services

OCNWDAF microservices and OID definitions are listed below:

OCNWDAF's OID: 1.3.6.1.4.1.323.5.3.45

Table 10-1 OID Definitions

Service Name OID
cap4c-api-gateway 1.3.6.1.4.1.323.5.3.45.1.20
cap4c-capex-optimization-service 1.3.6.1.4.1.323.5.3.45.1.21
cap4c-configuration-manager-service 1.3.6.1.4.1.323.5.3.45.1.22
cap4c-kafka-ingestor 1.3.6.1.4.1.323.5.3.45.1.23
cap4c-model-controller 1.3.6.1.4.1.323.5.3.45.1.24
cap4c-stream-analytics 1.3.6.1.4.1.323.5.3.45.1.25
cap4c-stream-transformer 1.3.6.1.4.1.323.5.3.45.1.26
nwdaf-cap4c-reporting-service 1.3.6.1.4.1.323.5.3.45.1.27
nwdaf-cap4c-scheduler-service 1.3.6.1.4.1.323.5.3.45.1.28
nwdaf-cap4c-spring-cloud-config-server 1.3.6.1.4.1.323.5.3.45.1.29
ocn-nwdaf-analytics-info 1.3.6.1.4.1.323.5.3.45.1.30
ocn-nwdaf-data-collection-service 1.3.6.1.4.1.323.5.3.45.1.31
ocn-nwdaf-datacollection-controller 1.3.6.1.4.1.323.5.3.45.1.32
ocn-nwdaf-georedagent 1.3.6.1.4.1.323.5.3.45.1.33
ocn-nwdaf-mtlf-service 1.3.6.1.4.1.323.5.3.45.1.34
ocn-nwdaf-subscription-service 1.3.6.1.4.1.323.5.3.45.1.35

Alerts

OCNWDAF Subscription Alerts

This section lists the OCNWDAF subscription alerts:

Table 10-2 OCNWDAF_SUBSCRIPTION_CREATE

Field Details
Severity Info
OID to be appended 2000
Description Indicates the subscription is successfully created.

Table 10-3 OCNWDAF_SUBSCRIPTION_CREATE_FAILURE

Field Details
Severity Warning
OID to be appended 2001
Description Indicates an issue in creating the subscription.

Table 10-4 OCNWDAF_SUBSCRIPTION_DELETE

Field Details
Severity Info
OID to be appended 2002
Description Indicates the subscription is successfully deleted.

Table 10-5 OCNWDAF_SUBSCRIPTION_UPDATE

Field Details
Severity Info
OID to be appended 2003
Description Indicates the subscription is successfully updated.

Table 10-6 OCNWDAF_SUBSCRIPTION_DELETE_FAILURE

Field Details
Severity Warning
OID to be appended 2004
Description Indicates an issue in deleting the subscription.

Table 10-7 OCNWDAF_SUBSCRIPTION_UPDATE_FAILURE

Field Details
Severity Warning
OID to be appended 2005
Description Indicates an issue in updating the subscription.

Notification Alerts

This section lists the notification alerts:

Table 10-8 OCNWDAF_ABNORMAL_BEHAVIOR_STATISTICS_NOTIFICATION

Field Details
Severity Info
OID to be appended 3000
Description Indicates abnormal behavior statistics notification is received.

Table 10-9 OCNWDAF_ABNORMAL_BEHAVIOR_THRESHOLD_NOTIFICATION

Field Details
Severity Info
OID to be appended 3001
Description Indicates abnormal behavior threshold notification is received.

Table 10-10 OCNWDAF_ABNORMAL_BEHAVIOR_PREDICTION_NOTIFICATION

Field Details
Severity Info
OID to be appended 3002
Description Indicates abnormal behavior prediction notification is received.

Table 10-11 OCNWDAF_NETWORK_PERFORMANCE_STATISTICS_NOTIFICATION

Field Details
Severity Info
OID to be appended 3003
Description Indicates network performance statistics notification is received.

Table 10-12 OCNWDAF_NETWORK_PERFORMANCE_THRESHOLD_NOTIFICATION

Field Details
Severity Info
OID to be appended 3004
Description Indicates network performance threshold notification is received.

Table 10-13 OCNWDAF_NETWORK_PERFORMANCE_PREDICTION_NOTIFICATION

Field Details
Severity Info
OID to be appended 3005
Description Indicates network performance prediction notification is received.

Table 10-14 OCNWDAF_NF_LOAD_STATISTICS_NOTIFICATION

Field Details
Severity Info
OID to be appended 3006
Description Indicates NF load statistics notification is received.

Table 10-15 OCNWDAF_NF_LOAD_THRESHOLD_NOTIFICATION

Field Details
Severity Info
OID to be appended 3007
Description Indicates NF load threshold notification is received.

Table 10-16 OCNWDAF_NF_LOAD_PREDICTION_NOTIFICATION

Field Details
Severity Info
OID to be appended 3008
Description Indicates NF load prediction notification is received.

Table 10-17 OCNWDAF_SLICE_LOAD_STATISTICS_NOTIFICATION

Field Details
Severity Info
OID to be appended 3009
Description Indicates slice load statistics notification is received.

Table 10-18 OCNWDAF_SLICE_LOAD_THRESHOLD_NOTIFICATION

Field Details
Severity Info
OID to be appended 3010
Description Indicates slice load threshold notification is received.

Table 10-19 OCNWDAF_SLICE_LOAD_PREDICTION_NOTIFICATION

Field Details
Severity Info
OID to be appended 3011
Description Indicates slice load prediction notification is received.

ML Model Alerts

This section lists the ML model alerts:

Table 10-20 OCNWDAF_MODEL_CREATION_FAILURE

Field Details
Severity Critical
OID to be appended 4000
Description Indicates an issue in ML model creation.

Table 10-21 OCNWDAF_MODEL_CREATION_SUCCESS

Field Details
Severity Info
OID to be appended 4001
Description Indicates ML model is successfully created.

Data Collection Alerts

This section lists the data collection alerts:

Table 10-22 PRESENCE_IN_AOI_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5000
Description Indicates Presence in AOI report is successfully received.

Table 10-23 LOCATION_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5001
Description Indicates location report is successfully received.

Table 10-24 UES_IN_AREA_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5002
Description Indicates UEs in area report is successfully received.

Table 10-25 NF_LOAD_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5003
Description Indicates NF load report is successfully received.

Table 10-26 SMF_SES_EST_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5004
Description Indicates a SMF session established report is successfully received.

Table 10-27 SMF_SES_REL_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5005
Description Indicates a SMF session released report is successfully received.

Table 10-28 KAFKA_SOURCED_REPORT_RECEIVED

Field Details
Severity Info
OID to be appended 5006
Description Indicates Kafka sourced report is successfully received.

Operation Alerts

This section lists the operational alerts:

Table 10-29 OCN_NWDAF_SVC_HIGH_CPU_LOAD

Field Details
Severity Critical
OID to be appended 6000
Description Verifies if the CPU usage of a particular service is exceeding 90%.

Table 10-30 OCN_NWDAF_SVC_HIGH_JVM_MEMORY_USAGE

Field Details
Severity Critical
OID to be appended 6001
Description Verifies if the percentage of heap memory used by a specific JVM instance/service is exceeding 90% over a one minute duration.

Table 10-31 OCN_NWDAF_SVC_NOT_RUNNING_ALERT

Field Details
Severity Critical
OID to be appended 6002
Description Verifies if there are no instances of the specified service running in the specified Kubernetes namespace or if all instances of the service are not healthy.

10.2 System Level Alerts

This section lists the system level alerts.

OCN_NWDAF_ANALYTICS_HIGH_CPU_LOAD

Table 10-32 OCN_NWDAF_ANALYTICS_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_COMMUNICATION_HIGH_CPU_LOAD

Table 10-33 OCN_NWDAF_COMMUNICATION_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_CPU_LOAD

Table 10-34 OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD

Table 10-35 OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_GATEWAY_HIGH_CPU_LOAD

Table 10-36 OCN_NWDAF_GATEWAY_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_MTLF_HIGH_CPU_LOAD

Table 10-37 OCN_NWDAF_MTLF_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_SUBSCRIPTION_HIGH_CPU_LOAD

Table 10-38 OCN_NWDAF_SUBSCRIPTION_HIGH_CPU_LOAD

Field Details
Description CPU load is high at the pod where the microservice is running.
Affected Functions All
Cause CPU load is more than 80% of the allocated resources.

OCN_NWDAF_ANALYTICS_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-39 OCN_NWDAF_ANALYTICS_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

OCN_NWDAF_COMMUNICATION_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-40 OCN_NWDAF_COMMUNICATION_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-41 OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-42 OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

OCN_NWDAF_GATEWAY_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-43 OCN_NWDAF_GATEWAY_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

OCN_NWDAF_MTLF_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-44 OCN_NWDAF_MTLF_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

OCN_NWDAF_SUBSCRIPTION_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-45 OCN_NWDAF_SUBSCRIPTION_HIGH_JVM_HEAP_MEMORY_USAGE

Field Details
Description The average of the memory heap usage is high.
Affected Functions All
Cause The heap memory usage is more than 80%.

10.3 Application Level Alerts

This section lists the application level alerts.

OCN_NWDAF_ANALYTICS_NOT_RUNNING

Table 10-46 OCN_NWDAF_ANALYTICS_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-analytics is down.

OCN_NWDAF_COMMUNICATION_NOT_RUNNING

Table 10-47 OCN_NWDAF_COMMUNICATION_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-communication is down.

OCN_NWDAF_CONFIGURATION_SERVICE_NOT_RUNNING

Table 10-48 OCN_NWDAF_CONFIGURATION_SERVICE_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-configuration-service is down.

OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING

Table 10-49 OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-data-collection is down.

OCN_NWDAF_GATEWAY_NOT_RUNNING

Table 10-50 OCN_NWDAF_GATEWAY_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-gateway is down.

OCN_NWDAF_MTLF_NOT_RUNNING

Table 10-51 OCN_NWDAF_MTLF_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-mtlf is down.

OCN_NWDAF_SUBSCRIPTION_NOT_RUNNING

Table 10-52 OCN_NWDAF_SUBSCRIPTION_NOT_RUNNING

Field Details
Description The microservice is not available or not reachable.
Cause Microservice ocn-nwdaf-subscription is down.

HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE

Table 10-53 HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE

Field Details
Description The number of requests received per second is high.
Cause Traffic is high, above 1000 requests per second.
URI Endpoint nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR
Affected Functions ABNORMAL_BEHAVIOUR

HIGH_UE_MOBILITY_REQUEST_RATE

Table 10-54 HIGH_UE_MOBILITY_REQUEST_RATE

Field Details
Description The number of requests received per second is high.
Cause Traffic is high, above 1000 requests per second.
URI Endpoint nnwdaf-analyticsinfo/v1/analytics?event-id=UE_MOBILITY
Affected Functions UE_MOBILITY

HIGH_EVENT_SUBSCRIPTION_REQUEST_RATE

Table 10-55 HIGH_EVENT_SUBSCRIPTION_REQUEST_RATE

Field Details
Description The number of requests received per second is high.
Cause Traffic is high, above 1000 requests per second.
URI Endpoint nnwdaf-eventssubscription/v1/subscriptions
Affected Functions UE_MOBILITY, SLICE_LOAD_LEVEL, ABNORMAL_BEHAVIOUR

HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE

Table 10-56 HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE

Field Details
Description The number of requests failing per second is high.
Cause The request failing rate is more than the 70%.
URI Endpoint nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR
Affected Functions ABNORMAL_BEHAVIOUR

HIGH_UE_MOBILITY_REQUEST_FAILURE_RATE

Table 10-57 HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE

Field Details
Description The number of requests failing per second is high.
Cause The request failing rate is more than the 70%.
URI Endpoint nnwdaf-analyticsinfo/v1/analytics?event-id=UE_MOBILITY
Affected Functions UE_MOBILITY

HIGH_EVENT_SUBSCRIPTION_REQUEST_FAILURE_RATE

Table 10-58 HIGH_EVENT_SUBSCRIPTION_REQUEST_FAILURE_RATE

Field Details
Description The number of requests failing per second is high.
Cause The request failing rate is more than the 70%.
URI Endpoint nnwdaf-eventssubscription/v1/subscriptions
Affected Functions UE_MOBILITY, SLICE_LOAD_LEVEL, ABNORMAL_BEHAVIOUR

10.4 OCNWDAF KPIs

This section provides information about Key Performance Indicators (KPIs) used for Oracle Communications Networks Data Analytics Function (OCNWDAF).

OCNWDAF KPIs are listed below:

Table 10-59 Frontend Reports Received Total

KPI Detail Total number of reports received on Front End.
Metric Used for the KPI (CNE) PromQL: total_fe_reports_recieved_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:total_fe_reports_recieved_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Source NF: AMF,SMF,NRF,OAM

Table 10-60 Frontend Bytes Received Total

KPI Detail Total number of bytes received on Front End.
Metric Used for the KPI (CNE) PromQL: fe_bytes_recieved_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:fe_bytes_recieved_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Source NF: AMF,SMF,NRF,OAM

Table 10-61 Kafka Sourced Reports Received Total

KPI Detail Total number of reports received by NWDAF Front End through Kafka.
Metric Used for the KPI (CNE) PromQL: total_kafka_sourced_reports_recieved_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:total_kafka_sourced_reports_recieved_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Source NF: OAM

Table 10-62 Total Kafka Bytes Received

KPI Detail Total number of Kafka bytes received.
Metric Used for the KPI (CNE) PromQL: total_kafka_bytes_recieved{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:total_kafka_bytes_recieved[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Source NF: AMF,SMF,NRF,OAM

Report Type:

NW_PERF_OAM_REPORT, QOS_OAM_REPORT, UDC_OAM_REPORT

Table 10-63 Nwdaf Subscriptions Created Total

KPI Detail Total number of subscriptions created.
Metric Used for the KPI (CNE) PromQL: nwdaf_subscriptions_created_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:nwdaf_subscriptions_created_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Notification Method: DESCRIPTIVE, PREDICTIVE, THRSHOLDING.

Table 10-64 Nwdaf Subscriptions Accepted Total

KPI Detail Total number of subscriptions accepted out of the subscriptions created.
Metric Used for the KPI (CNE) PromQL: nwdaf_subscriptions_accepted_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:nwdaf_subscriptions_accepted_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Notification Method: DESCRIPTIVE, PREDICTIVE, THRSHOLDING.

Table 10-65 Nwdaf Subscriptions Data Reports Sent

KPI Detail Total number of reports or notifications sent.
Metric Used for the KPI (CNE) PromQL: nwdaf_subscriptions_data_reports_sent_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:nwdaf_subscriptions_data_reports_sent_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Notification Method: DESCRIPTIVE, PREDICTIVE, THRSHOLDING.

Table 10-66 Nwdaf Subscriptions Threshold Reports Sent

KPI Detail Total number of threshold reports or notifications sent out of the total reports.
Metric Used for the KPI (CNE) PromQL: nwdaf_subscriptions_threshold_reports_sent_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:nwdaf_subscriptions_threshold_reports_sent_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Table 10-67 Nwdaf Subscriptions Prediction Reports Sent

KPI Detail Total number of predictive reports or notifications sent out of the total reports.
Metric Used for the KPI (CNE) PromQL: nwdaf_subscriptions_prediction_reports_sent_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:nwdaf_subscriptions_prediction_reports_sent_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Table 10-68 Analyticsinfo Request Received Total

KPI Detail Total number of analytics information requests received.
Metric Used for the KPI (CNE) PromQL: analyticsinfo_request_received_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI) MQL:analyticsinfo_request_received_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation NA
Response Code NA
Tags and Values

Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.