OCNWDAF Alerts

10 OCNWDAF Alerts

This chapter describes the following information about OCNWDAF alerts and KPIs:

10.1 OCNWDAF Alert Configuration

This section describes the measurement based alert rules configuration for OCNWDAF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

OCNWDAF Alert configuration in Prometheus

The following procedure is used to configure alerts in Prometheus:

Download the ocn-nwdaf-alerting-rules.yaml file. Edit this file to configure the alert rules. The parameters in the file that can be edited include name of the alert, rules for the alert including alert name and the expression expr defined to trigger the alert.
Copy the updated ocn-nwdaf-alerting-rules.yaml file to Bastion Host.
Run the following command:
kubectl apply -f ocn-nwdaf-alerting-rules.yaml -n ocn-nwdaf
To verify if the Custom Resource Definition (CRD) is created, run the following command:
kubectl get prometheusrule -n ocn-nwdaf
Verify the alerts in the Prometheus GUI, the alert name and expression is listed. See example below:

Figure 10-1 Prometheus GUI

Alert Rules

The alerts are configured on the Prometheus server. The metrics scraped correspond to a pod that runs a single microservice, so each alert belongs to one of the pods running. Prometheus continously collects metrics and when any of the alerting rules are met, the alert is triggered. All the alert rules are written in one or multiple .yml files and deployed as described in procedure OCNWDAF Alert configuration in Prometheus. Listed below are the alert rules for the various alerts captured for OCNWDAF:

Status Alert Rule

- name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: up{app="SERVICE LABEL"} == 0

Example:

 - name: OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING
    rules:
    - alert: OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING
      expr: up{app="ocn-nwdaf-data-collection"} == 0

Traffic Alert Rule

Request rate rule:

- name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: >
      sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="<URI ENDPOINT>"}[1m])) > 1000

Example:

  - name: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE
    rules:
    - alert: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE
      expr: sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR"}[1m])) > 1000

Failure rate request rule:

- name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: >
     (sum without(method,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="<URI ENDPOINT>",status=~"[4-5].."}[1m]))/ ignoring(status) sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="<URI ENDPOINT>"}[1m]))) * 100 > 70

Example:

 - name: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE
    rules:
    - alert: HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE
      expr: (sum without(method,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR",status=~"[4-5].."}[1m]))/ ignoring(status) sum without(method,status,outcome,exception,app,instance,container,pod,pod_template_hash) (rate(http_server_requests_seconds_count{uri="nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR"}[1m]))) * 100 > 70

CPU Alert Rule

- name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: system_cpu_usage{app="<SERVICE LABEL>"} * 100 > 80

Example:

 - name: OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD
    rules:
    - alert: OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD
      expr: system_cpu_usage{app="ocn-nwdaf-data-collection"} * 100 > 80

JVM Memory Usage Alert Rule

 - name: <ALERT NAME>
    rules:
    - alert: <ALERT NAME>
      expr: >

      (sum(avg_over_time(jvm_memory_used_bytes{area="heap",app="<SERVICE LABEL>"} [1m]))/sum(avg_over_time(jvm_memory_max_bytes{area="heap",app="<SERVICE LABEL>"}[1m]))) * 100 > 80

Example:

 - name: OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE
    rules:
    - alert: OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE
      expr: (sum(avg_over_time(jvm_memory_used_bytes{area="heap",app="ocn-nwdaf-data-collection"} [1m]))/sum(avg_over_time(jvm_memory_max_bytes{area="heap",app="ocn-nwdaf-data-collection"}[1m]))) * 100 > 80

10.1.1 SNMP Support

Simple Network Management Protocol (SNMP) is an application-layer protocol designed for monitoring and managing network devices within a Local Area Network (LAN) or Wide Area Network (WAN).

OCNWDAF forwards the Prometheus alerts as Simple Network Management Protocol (SNMP) traps to the southbound SNMP servers. OCNWDAF uses two SNMP MIB files to generate the traps. Update the alertmanager.yaml file to configure the alert manager. In the alertmanager.yaml file, the alerts can be grouped based on podname, alertname, severity, namespace, and so on. The Prometheus Alert Manager is integrated with Oracle Communications Cloud Native Core, Cloud Native Environment (CNE) snmp-notifier service. The external SNMP servers are set up to receive the Prometheus alerts as SNMP traps. The operator must update the MIB and alert manager files to fetch the SNMP traps in their environment.

Configuring SNMP Support

The alertmanager.yaml file is updated to include additional information for SNMP traps.

Sample of the alertmanager.yaml file:

{{- range $key, $svcName := .Values.global.rules.services }}
- alert: {{ $svcName | replace "-" "_" | upper }}_HIGH_CPU_LOAD
  expr: system_cpu_usage{job={{ $svcName | quote }}, namespace={{ $.Release.Namespace | quote }} } * 100 > 90
  for: 5m
  labels:
    alertname: "OCN_NWDAF_SVC_HIGH_CPU_LOAD"
    oid: "1.3.6.1.4.1.323.5.3.45.1.{{ index $.Values.global.rules.oid $svcName }}.4002"
    severity: critical
    namespace: {{ $.Release.Namespace | quote }}
  annotations:
    namespace: {{ $.Release.Namespace | quote }}
    severity: critical
    summary: "Service {{ "{{$labels.app}}" }} CPU load is high."
    description: "Service {{ "{{$labels.app}}" }} CPU load has been high for more than 5 minutes."
{{- end }}

Configure the SNMP Test Client

Follow the steps below to configure the SNMP Test Client:

Create a ConfigMap that includes the MIB files. Run the following command:
```
kubectl create configmap my-config --from-file=/path/to/mib/files/ -n <namespace>
```
Where, my-config is the name of the ConfigMap. The same has to be used in the pod configuration file. The ConfigMap must be in the same namespace where the SNMP client is deployed.

To start the SNMP trap daemon service, use the service configuration .yaml file, see the example below:

apiVersion: v1
kind: Service
metadata:
  labels:
    name: snmptrapd
  name: snmptrapd
  namespace: performance-idc  // namespace in which you want to deploy the service
spec:
  ports:
  - name: snmptrapd
    port: 162
    protocol: UDP
    targetPort: 162
  selector:
    name: snmptrapd
  sessionAffinity: None
  type: ClusterIP

Use the following pod deployment configuration to deploy the pod corresponding to the above service. The commands mentioned in this file add the MIB files from the ConfigMap to the pod, following which the SNMP trap daemon service application starts.

Sample docker file:

FROM ocr-docker-remote.artifactory.oci.oraclecorp.com/os/oraclelinux:8-slim
ARG HTTPS_PROXY=http://www-proxy.us.oracle.com:80
RUN echo -e "[main]\nproxy=${HTTPS_PROXY}" >> /etc/dnf/dnf.conf
RUN microdnf update -y && microdnf install -y lsof
RUN microdnf install net-snmp
ADD snmptrapd.conf /etc/snmp/snmptrapd.conf
EXPOSE 162
CMD ["/bin/sh"]

Sample snmp-pod.yaml file:

apiVersion: v1
kind: Pod
metadata:
  name: snmptrapd
  labels:
    name: snmptrapd
    role: snmptrapd
  namespace: performance-idc   // namespace in which you want to deploy the pod
spec:
  containers:
  - name: snmptrapd
    image: occne-repo-host:5000/snmptrapd:1.1.1  // you need to create you own snmptrapd image using dockerfile
    volumeMounts:
      - name: config-volume
        mountPath: /MIB
    imagePullPolicy: IfNotPresent
    command: ["/bin/bash","-c","kill -9 $(lsof -t -i:162); cp /MIB/* /usr/share/snmp/mibs && echo MIB files copied successfully ;snmptrapd -m ALL -f -Of -Lo"]
    ports:
    - containerPort: 162
      protocol: UDP
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1"
        memory: 1Gi
  volumes:
  - name: config-volume
    configMap:
      name: my-config

Run the following commands to deploy the service and pod in the SNMP Client:
```
kubectl apply -f snmp-pod.yaml
```
```
kubectl apply -f snmp-svc.yam
```

Run the following command to view the pod logs:

$ kubectl logs pod/snmptrapd -n performance-idc

Sample of the pod logs:

kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
MIB files copied successfully
NET-SNMP version 5.7.2

Integrate the Alert Manager with snmp-notifier Service

Update the SNMP client destination in occne-snmp-notifier service with the SNMP destination client IP:

$ kubectl edit deployment -n occne-infra occne-snmp-notifier

Update the field --snmp.destination=<IP>:<port> inside the args of container and add the snmp-client destination IP as follows:

- --snmp.destination=<fqdn of target receiver>:162

Verify the Traps

Run the following command to verify the traps:

$ kubectl logs pod/snmptrapd -n performance-idc -f

Sample output:

2024-03-11 11:31:10 10-233-87-165.occne-snmp-notifier.occne-infra.svc.blurr8 [UDP: [10.233.87.165]:46951->[10.233.116.34]:162]:
.iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.sysUpTimeInstance = Timeticks: (147060000) 17 days, 0:30:00.00        .iso.org.dod.internet.snmpV2.snmpModules.snmpMIB.snmpMIBObjects.snmpTrap.snmpTrapOID.0 = OID: .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown     .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown.1 = STRING: "1.3.6.1.4.1.323.5.3.45.1.33.2002[job=ocn-nwdaf-georedagent]"        .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown.2 = STRING: "critical"  .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.ocnwdafGeoredagent.ocnwdafgeoredagentSvcDown.3 = STRING: "Status: critical
- Alert: OCN_NWDAF_GEOREDAGENT_NOT_RUNNING
  Summary: Service is down.
  Description: Service has been down for more than 2 minutes."
2024-03-11 11:35:38 10-233-87-165.occne-snmp-notifier.occne-infra.svc.blurr8 [UDP: [10.233.87.165]:56385->[10.233.116.34]:162]:
.iso.org.dod.internet.mgmt.mib-2.system.sysUpTime.sysUpTimeInstance = Timeticks: (147086800) 17 days, 0:34:28.00        .iso.org.dod.internet.snmpV2.snmpModules.snmpMIB.snmpMIBObjects.snmpTrap.snmpTrapOID.0 = OID: .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown  .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown.1 = STRING: "1.3.6.1.4.1.323.5.3.45.1.24.2002[job=cap4c-model-controller]"    .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown.2 = STRING: "critical"        .iso.org.dod.internet.private.enterprises.tekelecCorp.tekelecProductGroups.tekelecSwitchingGroup.oracleNWDAF.oracleNWDAFMIB.cap4cModelController.ocnwdafcap4cModelControllerSvcDown.3 = STRING: "Status: critical
- Alert: CAP4C_MODEL_CONTROLLER_NOT_RUNNING
  Summary: Service is down.
  Description: Service has been down for more than 2 minutes."

Figure 10-2 Prometheus GUI

OCNWDAF MIB Files

Two OCNWDAF MIB files are used to generate the traps. The operator has to update the MIB files and the alert manager file to obtain the traps in their environment. The files are:

OCNWDAF-MIB-TC-24.2.0.mib: This is a top level mib file, where the objects and their data types are defined.
OCNWDAF-MIB-24.2.0.mib: This file fetches the objects from the top level mib file and based on the alert notification, the objects are selected for display.

OID Definition for OCNWDAF Services

OCNWDAF microservices and OID definitions are listed below:

OCNWDAF's OID: 1.3.6.1.4.1.323.5.3.45

Table 10-1 OID Definitions

Service Name	OID
cap4c-api-gateway	1.3.6.1.4.1.323.5.3.45.1.20
cap4c-capex-optimization-service	1.3.6.1.4.1.323.5.3.45.1.21
cap4c-configuration-manager-service	1.3.6.1.4.1.323.5.3.45.1.22
cap4c-kafka-ingestor	1.3.6.1.4.1.323.5.3.45.1.23
cap4c-model-controller	1.3.6.1.4.1.323.5.3.45.1.24
cap4c-stream-analytics	1.3.6.1.4.1.323.5.3.45.1.25
cap4c-stream-transformer	1.3.6.1.4.1.323.5.3.45.1.26
nwdaf-cap4c-reporting-service	1.3.6.1.4.1.323.5.3.45.1.27
nwdaf-cap4c-scheduler-service	1.3.6.1.4.1.323.5.3.45.1.28
nwdaf-cap4c-spring-cloud-config-server	1.3.6.1.4.1.323.5.3.45.1.29
ocn-nwdaf-analytics-info	1.3.6.1.4.1.323.5.3.45.1.30
ocn-nwdaf-data-collection-service	1.3.6.1.4.1.323.5.3.45.1.31
ocn-nwdaf-datacollection-controller	1.3.6.1.4.1.323.5.3.45.1.32
ocn-nwdaf-georedagent	1.3.6.1.4.1.323.5.3.45.1.33
ocn-nwdaf-mtlf-service	1.3.6.1.4.1.323.5.3.45.1.34
ocn-nwdaf-subscription-service	1.3.6.1.4.1.323.5.3.45.1.35

Alerts

OCNWDAF Subscription Alerts

This section lists the OCNWDAF subscription alerts:

Table 10-2 OCNWDAF_SUBSCRIPTION_CREATE

Field	Details
Severity	Info
OID to be appended	2000
Description	Indicates the subscription is successfully created.

Table 10-3 OCNWDAF_SUBSCRIPTION_CREATE_FAILURE

Field	Details
Severity	Warning
OID to be appended	2001
Description	Indicates an issue in creating the subscription.

Table 10-4 OCNWDAF_SUBSCRIPTION_DELETE

Field	Details
Severity	Info
OID to be appended	2002
Description	Indicates the subscription is successfully deleted.

Table 10-5 OCNWDAF_SUBSCRIPTION_UPDATE

Field	Details
Severity	Info
OID to be appended	2003
Description	Indicates the subscription is successfully updated.

Table 10-6 OCNWDAF_SUBSCRIPTION_DELETE_FAILURE

Field	Details
Severity	Warning
OID to be appended	2004
Description	Indicates an issue in deleting the subscription.

Table 10-7 OCNWDAF_SUBSCRIPTION_UPDATE_FAILURE

Field	Details
Severity	Warning
OID to be appended	2005
Description	Indicates an issue in updating the subscription.

Notification Alerts

This section lists the notification alerts:

Table 10-8 OCNWDAF_ABNORMAL_BEHAVIOR_STATISTICS_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3000
Description	Indicates abnormal behavior statistics notification is received.

Table 10-9 OCNWDAF_ABNORMAL_BEHAVIOR_THRESHOLD_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3001
Description	Indicates abnormal behavior threshold notification is received.

Table 10-10 OCNWDAF_ABNORMAL_BEHAVIOR_PREDICTION_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3002
Description	Indicates abnormal behavior prediction notification is received.

Table 10-11 OCNWDAF_NETWORK_PERFORMANCE_STATISTICS_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3003
Description	Indicates network performance statistics notification is received.

Table 10-12 OCNWDAF_NETWORK_PERFORMANCE_THRESHOLD_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3004
Description	Indicates network performance threshold notification is received.

Table 10-13 OCNWDAF_NETWORK_PERFORMANCE_PREDICTION_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3005
Description	Indicates network performance prediction notification is received.

Table 10-14 OCNWDAF_NF_LOAD_STATISTICS_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3006
Description	Indicates NF load statistics notification is received.

Table 10-15 OCNWDAF_NF_LOAD_THRESHOLD_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3007
Description	Indicates NF load threshold notification is received.

Table 10-16 OCNWDAF_NF_LOAD_PREDICTION_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3008
Description	Indicates NF load prediction notification is received.

Table 10-17 OCNWDAF_SLICE_LOAD_STATISTICS_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3009
Description	Indicates slice load statistics notification is received.

Table 10-18 OCNWDAF_SLICE_LOAD_THRESHOLD_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3010
Description	Indicates slice load threshold notification is received.

Table 10-19 OCNWDAF_SLICE_LOAD_PREDICTION_NOTIFICATION

Field	Details
Severity	Info
OID to be appended	3011
Description	Indicates slice load prediction notification is received.

ML Model Alerts

This section lists the ML model alerts:

Table 10-20 OCNWDAF_MODEL_CREATION_FAILURE

Field	Details
Severity	Critical
OID to be appended	4000
Description	Indicates an issue in ML model creation.

Table 10-21 OCNWDAF_MODEL_CREATION_SUCCESS

Field	Details
Severity	Info
OID to be appended	4001
Description	Indicates ML model is successfully created.

Data Collection Alerts

This section lists the data collection alerts:

Table 10-22 PRESENCE_IN_AOI_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5000
Description	Indicates Presence in AOI report is successfully received.

Table 10-23 LOCATION_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5001
Description	Indicates location report is successfully received.

Table 10-24 UES_IN_AREA_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5002
Description	Indicates UEs in area report is successfully received.

Table 10-25 NF_LOAD_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5003
Description	Indicates NF load report is successfully received.

Table 10-26 SMF_SES_EST_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5004
Description	Indicates a SMF session established report is successfully received.

Table 10-27 SMF_SES_REL_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5005
Description	Indicates a SMF session released report is successfully received.

Table 10-28 KAFKA_SOURCED_REPORT_RECEIVED

Field	Details
Severity	Info
OID to be appended	5006
Description	Indicates Kafka sourced report is successfully received.

Operation Alerts

This section lists the operational alerts:

Table 10-29 OCN_NWDAF_SVC_HIGH_CPU_LOAD

Field	Details
Severity	Critical
OID to be appended	6000
Description	Verifies if the CPU usage of a particular service is exceeding 90%.

Table 10-30 OCN_NWDAF_SVC_HIGH_JVM_MEMORY_USAGE

Field	Details
Severity	Critical
OID to be appended	6001
Description	Verifies if the percentage of heap memory used by a specific JVM instance/service is exceeding 90% over a one minute duration.

Table 10-31 OCN_NWDAF_SVC_NOT_RUNNING_ALERT

Field	Details
Severity	Critical
OID to be appended	6002
Description	Verifies if there are no instances of the specified service running in the specified Kubernetes namespace or if all instances of the service are not healthy.

10.2 System Level Alerts

This section lists the system level alerts.

OCN_NWDAF_ANALYTICS_HIGH_CPU_LOAD

Table 10-32 OCN_NWDAF_ANALYTICS_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_COMMUNICATION_HIGH_CPU_LOAD

Table 10-33 OCN_NWDAF_COMMUNICATION_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_CPU_LOAD

Table 10-34 OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD

Table 10-35 OCN_NWDAF_DATA_COLLECTION_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_GATEWAY_HIGH_CPU_LOAD

Table 10-36 OCN_NWDAF_GATEWAY_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_MTLF_HIGH_CPU_LOAD

Table 10-37 OCN_NWDAF_MTLF_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_SUBSCRIPTION_HIGH_CPU_LOAD

Table 10-38 OCN_NWDAF_SUBSCRIPTION_HIGH_CPU_LOAD

Field	Details
Description	CPU load is high at the pod where the microservice is running.
Affected Functions	All
Cause	CPU load is more than 80% of the allocated resources.

OCN_NWDAF_ANALYTICS_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-39 OCN_NWDAF_ANALYTICS_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

OCN_NWDAF_COMMUNICATION_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-40 OCN_NWDAF_COMMUNICATION_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-41 OCN_NWDAF_CONFIGURATION_SERVICE_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-42 OCN_NWDAF_DATA_COLLECTION_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

OCN_NWDAF_GATEWAY_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-43 OCN_NWDAF_GATEWAY_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

OCN_NWDAF_MTLF_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-44 OCN_NWDAF_MTLF_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

OCN_NWDAF_SUBSCRIPTION_HIGH_JVM_HEAP_MEMORY_USAGE

Table 10-45 OCN_NWDAF_SUBSCRIPTION_HIGH_JVM_HEAP_MEMORY_USAGE

Field	Details
Description	The average of the memory heap usage is high.
Affected Functions	All
Cause	The heap memory usage is more than 80%.

10.3 Application Level Alerts

This section lists the application level alerts.

OCN_NWDAF_ANALYTICS_NOT_RUNNING

Table 10-46 OCN_NWDAF_ANALYTICS_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-analytics is down.

OCN_NWDAF_COMMUNICATION_NOT_RUNNING

Table 10-47 OCN_NWDAF_COMMUNICATION_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-communication is down.

OCN_NWDAF_CONFIGURATION_SERVICE_NOT_RUNNING

Table 10-48 OCN_NWDAF_CONFIGURATION_SERVICE_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-configuration-service is down.

OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING

Table 10-49 OCN_NWDAF_DATA_COLLECTION_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-data-collection is down.

OCN_NWDAF_GATEWAY_NOT_RUNNING

Table 10-50 OCN_NWDAF_GATEWAY_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-gateway is down.

OCN_NWDAF_MTLF_NOT_RUNNING

Table 10-51 OCN_NWDAF_MTLF_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-mtlf is down.

OCN_NWDAF_SUBSCRIPTION_NOT_RUNNING

Table 10-52 OCN_NWDAF_SUBSCRIPTION_NOT_RUNNING

Field	Details
Description	The microservice is not available or not reachable.
Cause	Microservice ocn-nwdaf-subscription is down.

HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE

Table 10-53 HIGH_ABNORMAL_BEHAVIOUR_REQUEST_RATE

Field	Details
Description	The number of requests received per second is high.
Cause	Traffic is high, above 1000 requests per second.
URI Endpoint	`nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR`
Affected Functions	ABNORMAL_BEHAVIOUR

HIGH_UE_MOBILITY_REQUEST_RATE

Table 10-54 HIGH_UE_MOBILITY_REQUEST_RATE

Field	Details
Description	The number of requests received per second is high.
Cause	Traffic is high, above 1000 requests per second.
URI Endpoint	`nnwdaf-analyticsinfo/v1/analytics?event-id=UE_MOBILITY`
Affected Functions	UE_MOBILITY

HIGH_EVENT_SUBSCRIPTION_REQUEST_RATE

Table 10-55 HIGH_EVENT_SUBSCRIPTION_REQUEST_RATE

Field	Details
Description	The number of requests received per second is high.
Cause	Traffic is high, above 1000 requests per second.
URI Endpoint	`nnwdaf-eventssubscription/v1/subscriptions`
Affected Functions	UE_MOBILITY, SLICE_LOAD_LEVEL, ABNORMAL_BEHAVIOUR

HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE

Table 10-56 HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE

Field	Details
Description	The number of requests failing per second is high.
Cause	The request failing rate is more than the 70%.
URI Endpoint	`nnwdaf-analyticsinfo/v1/analytics?event-id=ABNORMAL_BEHAVIOUR`
Affected Functions	ABNORMAL_BEHAVIOUR

HIGH_UE_MOBILITY_REQUEST_FAILURE_RATE

Table 10-57 HIGH_ABNORMAL_BEHAVIOUR_REQUEST_FAILURE_RATE

Field	Details
Description	The number of requests failing per second is high.
Cause	The request failing rate is more than the 70%.
URI Endpoint	`nnwdaf-analyticsinfo/v1/analytics?event-id=UE_MOBILITY`
Affected Functions	UE_MOBILITY

HIGH_EVENT_SUBSCRIPTION_REQUEST_FAILURE_RATE

Table 10-58 HIGH_EVENT_SUBSCRIPTION_REQUEST_FAILURE_RATE

Field	Details
Description	The number of requests failing per second is high.
Cause	The request failing rate is more than the 70%.
URI Endpoint	`nnwdaf-eventssubscription/v1/subscriptions`
Affected Functions	UE_MOBILITY, SLICE_LOAD_LEVEL, ABNORMAL_BEHAVIOUR

10.4 OCNWDAF KPIs

This section provides information about Key Performance Indicators (KPIs) used for Oracle Communications Networks Data Analytics Function (OCNWDAF).

OCNWDAF KPIs are listed below:

Table 10-59 Frontend Reports Received Total

KPI Detail	Total number of reports received on Front End.
Metric Used for the KPI (CNE)	PromQL: total_fe_reports_recieved_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:total_fe_reports_recieved_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Source NF: AMF,SMF,NRF,OAM

Table 10-60 Frontend Bytes Received Total

KPI Detail	Total number of bytes received on Front End.
Metric Used for the KPI (CNE)	PromQL: fe_bytes_recieved_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:fe_bytes_recieved_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Source NF: AMF,SMF,NRF,OAM

Table 10-61 Kafka Sourced Reports Received Total

KPI Detail	Total number of reports received by NWDAF Front End through Kafka.
Metric Used for the KPI (CNE)	PromQL: total_kafka_sourced_reports_recieved_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:total_kafka_sourced_reports_recieved_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Source NF: OAM

Table 10-62 Total Kafka Bytes Received

KPI Detail	Total number of Kafka bytes received.
Metric Used for the KPI (CNE)	PromQL: total_kafka_bytes_recieved{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:total_kafka_bytes_recieved[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Source NF: AMF,SMF,NRF,OAM Report Type: NW_PERF_OAM_REPORT, QOS_OAM_REPORT, UDC_OAM_REPORT

Table 10-63 Nwdaf Subscriptions Created Total

KPI Detail	Total number of subscriptions created.
Metric Used for the KPI (CNE)	PromQL: nwdaf_subscriptions_created_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:nwdaf_subscriptions_created_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR. Notification Method: DESCRIPTIVE, PREDICTIVE, THRSHOLDING.

Table 10-64 Nwdaf Subscriptions Accepted Total

KPI Detail	Total number of subscriptions accepted out of the subscriptions created.
Metric Used for the KPI (CNE)	PromQL: nwdaf_subscriptions_accepted_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:nwdaf_subscriptions_accepted_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR. Notification Method: DESCRIPTIVE, PREDICTIVE, THRSHOLDING.

Table 10-65 Nwdaf Subscriptions Data Reports Sent

KPI Detail	Total number of reports or notifications sent.
Metric Used for the KPI (CNE)	PromQL: nwdaf_subscriptions_data_reports_sent_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:nwdaf_subscriptions_data_reports_sent_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR. Notification Method: DESCRIPTIVE, PREDICTIVE, THRSHOLDING.

Table 10-66 Nwdaf Subscriptions Threshold Reports Sent

KPI Detail	Total number of threshold reports or notifications sent out of the total reports.
Metric Used for the KPI (CNE)	PromQL: nwdaf_subscriptions_threshold_reports_sent_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:nwdaf_subscriptions_threshold_reports_sent_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Table 10-67 Nwdaf Subscriptions Prediction Reports Sent

KPI Detail	Total number of predictive reports or notifications sent out of the total reports.
Metric Used for the KPI (CNE)	PromQL: nwdaf_subscriptions_prediction_reports_sent_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:nwdaf_subscriptions_prediction_reports_sent_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.

Table 10-68 Analyticsinfo Request Received Total

KPI Detail	Total number of analytics information requests received.
Metric Used for the KPI (CNE)	PromQL: analyticsinfo_request_received_total{namespace="$NAMESPACE"}
Metric Used for the KPI (OCI)	MQL:analyticsinfo_request_received_total[5m]{k8Namespace="$NAMESPACE"}.count()
Service Operation	NA
Response Code	NA
Tags and Values	Event Name: SLICE_LOAD_LEVEL, NETWORK_PERFORMANCE, NF_LOAD, ABNORMAL_BEHAVIOUR.