4 Troubleshooting OCNWDAF

This chapter provides information to troubleshoot the common errors which can be encountered during the preinstallation, installation, upgrade, and rollback procedures of OCNWDAF.

4.1 Generic Checklist

The following sections provide a generic checklist for troubleshooting tips.

Deployment related tips

Perform the following checks after the deployment:

  • Are OCNWDAF deployment, pods, and services created?

    Are OCNWDAF deployment, pods, and services running and available?

    Run the following the command:
    # kubectl -n <namespace> get deployments,pods,svc
    Inspect the output, check the following columns:
    • AVAILABLE of deployment
    • READY, STATUS, and RESTARTS of a pod
    • PORT(S) of service
  • Check if the microservices can access each other via REST interface.

    Run the following command:

    # kubectl -n <namespace> exec <pod name> -- curl <uri>

Application related tips

Run the following command to check the application logs and look for exceptions:

# kubectl -n <namespace> logs -f <pod name>

You can use '-f' to follow the logs or 'grep' for specific pattern in the log output.

4.2 Deployment Related Issue

This section describes the most common deployment related issues and their resolution steps. It is recommended to perform the resolution steps provided in this guide. If the issue still persists, then contact My Oracle Support.

4.2.1 Installation

This section describes the most common installation related issues and their resolution steps.

4.2.1.1 Pod Creation Failure

A pod creation can fail due to various reasons. Some of the possible scenarios are as follows:

Verifying Pod Image Correctness

To verify pod image:

  • Check whether any of the pods is in the ImagePullBackOff state.
  • Check if the image name used for all the pods are correct. Verify the image names and versions in the OCNWDAF installation file. For more information about the custom value file, see Oracle Communications Networks Data Analytics Function Installation and Fault Recovery Guide.

Verifying Resource Allocation Failure

To verify any resource allocation failure:

  • Run the following command to verify whether any pod is in the pending state.

    kubectl describe <nwdaf-drservice pod id> --n <ocnef-namespace>

  • Verify whether any warning on insufficient CPU exists in the describe output of the respective pod. If it exists, it means there are insufficient CPUs for the pods to start. Address this hardware issue.

Verifying Resource Allocation Issues on Webscale Environment

Webscale environment has openshift container installed. There can be cases where,

  • Pods does not scale after you run the installation command and the installation fails with timeout error. In this case, check for preinstall hooks failure. Run the oc get job command to create the jobs. Describe the job for which the pods are not getting scaled and check if there are quota limit exceeded errors with CPU or memory.
  • Any of the actual microservice pods do not scale post the hooks completion. In this case, run the oc get rs command to get the list of replicaset created for the NF deployment. Then, describe the replicaset for which the pods are not getting scaled and check for resource quota limit exceeded errors with CPU or memory.
  • Installation times-out after all the microservice pods are scaled as expected with the expected number of replicas. In this case, check for post install hooks failure. Run the oc get job command to get the post install jobs and do a describe on the job for which the pods are not getting scaled and check if there are quota limit exceeded errors with CPU or memory.
  • Resource quota exceed beyond limits.

Verifying Resources Assigned to Previous Installation

If a previous installations, uninstall procedure was not successful and the uninstall process was forced, it is possible that some resources are still assigned to the previous installation. This can be detected by running the following command:

kubectl -n <namepsace> describe pod <podname>

While searching for events, if you detect messages similar to the following message, it indicates that there are resources still assigned to the previous installation and should be purged.

0/n nodes are available: n pods has unbound immediate PersistenVolumeClaims
4.2.1.2 Pod Startup Failure
Follow the guidelines shared below to debug the pod startup failure liveness check issues:
  • If dr-service, diameter-proxy, and diam-gateway services are stuck in the Init state, then the reason could be that config-server is not yet up. A sample log on these services is as follows:
    "Config Server is Not yet Up, Wait For config server to be up."

    To resolve this, you must either check for the reason of config-server not being up or if the config-server is not required, then disable it.

  • If the notify and on-demand migration service is stuck in the Init state, then the reason could be the dr-service is not yet up. A sample log on these services is as follows:
    "DR Service is Not yet Up, Wait For dr service to be up."

    To resolve this, check for failures on dr-service.

4.2.1.3 NRF Registration Failure
The OCNWDAF registration with NRF may fail due to various reasons. Some of the possible scenarios are as follows:
  • Confirm whether registration was successful from the nrf-client-service pod.
  • Check the ocnwdaf-nrf-client-nfmanagement logs. If the log has "OCNWDAF is Unregistered" then:
    • Check if all the services mentioned under allorudr/slf (depending on OCNWDAF mode) in the installation file has same spelling as that of service name and are enabled.
    • Once all services are up, OCNWDAF must register with NRF.
  • If you see a log for SERVICE_UNAVAILABLE(503), check if the primary and secondary NRF configurations (primaryNrfApiRoot/secondaryNrfApiRoot) are correct and they are UP and Running.
4.2.1.4 Helm Install Failure

This section describes the various scenarios in which helm install might fail. Following are some of the scenarios:

4.2.1.4.1 Incorrect image name in ocnwdaf-custom-values file

Problem

helm install might fail if an incorrect image name is provided in the ocnwdaf-custom-values.yaml file.

Error Code/Error Message

When kubectl get pods -n <ocnwdaf_namespace> is performed, the status of the pods might be ImagePullBackOff or ErrImagePull.

For example:

NAME                                                             READY   STATUS                       RESTARTS   AGE
cap4c-model-controller-deploy-779cbdcf8f-wscf9                   1/1     Running                      0          28d
cap4c-model-executor-deploy-68b498c765-rpwz8                     0/1     ImagePullBackOff             0          27d
cap4c-stream-analytics-deploy-744878569-xn4wb                    0/1     ImagePullBackOff             0          27d
kafka-sts-0                                                      1/1     Running                      1          95d
kafka-sts-1                                                      1/1     Running                      1          95d
kafka-sts-2                                                      1/1     Running                      1          95d
keycloak-pod                                                     1/1     Running                      0          3d19h
mysql-pod                                                        1/1     Running                      1          95d

Solution

Perform the following steps to verify and correct the image name:
  1. Check ocnwdaf-custom-values.yaml file has the release specific image name and tags.
    vi ocnwdaf-custom-values-<release-number>
    For ocnwdaf images details, see "Customizing ocnwdaf" in Oracle Communications Cloud Native Core Networks Data Analytics Function Installation Guide.
  2. Edit ocnwdaf-custom-values file in case the release specific image name and tags must be modified.
  3. Save the file.
  4. Run the following command to delete the deployment:
    helm delete --purge <release_namespace>
    Sample command:
    helm delete --purge ocnwdaf
  5. To verify the deletion, see the "Verifying Uninstallation" section in Oracle Communications Networks Data Analytics Function Installation and Fault Recovery Guide..
  6. Run helm install command. For helm install command, see the "Customizing OCNWDAF" section in Oracle Communications Networks Data Analytics Function Installation and Fault Recovery Guide..
  7. Run kubectl get pods -n <ocnwdaf_namespace> to verify if all the pods are in Running state.

    For example:

    $ kubectl get pods -n ocnwdaf

    NAME                                                             READY   STATUS             RESTARTS          AGE
    cap4c-model-controller-deploy-b5f8b48d7-6h58w                    1/1     Running            0                 21h
    cap4c-model-executor-deploy-575b448467-j8tdd                     1/1     Running            1 (4d ago)        6d15h
    cap4c-stream-analytics-deploy-79ffd7fb65-5lzr5                   1/1     Running            0                 17h
    kafka-sts-0                                                      1/1     Running            0                 60d
    keycloak-pod                                                     1/1     Running            0                 3d17h
    mysql-pod                                                        1/1     Running            0                 60d
    nwdaf-cap4c-kafka-ui-pod                                         1/1     Running            0                 57d
    nwdaf-cap4c-scheduler-service-deploy-548c7948d4-64s85            1/1     Running            0                 6d14h
    nwdaf-cap4c-spring-cloud-config-server-deploy-565dd8f7d6-cxdwh   1/1     Running            0                 19d
    nwdaf-portal-deploy-55488c885-rgq77                              1/1     Running            0                 20h
    nwdaf-portal-service-deploy-8dc89dd9f-z2964                      1/1     Running            0                 20h
    ocn-nwdaf-analytics-info-deploy-f4585c4b-zbf5d                   1/1     Running            0                 3d16h
    ocn-nwdaf-communication-service-deploy-7bf75fbb7c-4qx9s          1/1     Running            3 (3d15h ago)     3d15h
    ocn-nwdaf-configuration-service-deploy-d87b66c55-7ttcc           1/1     Running            0                 3d16h
    ocn-nwdaf-data-collection-service-deploy-5ffcb86488-l9r9l        1/1     Running            0                 24h
    ocn-nwdaf-gateway-service-deploy-654cbc6475-h95tw                1/1     Running            0                 3d15h
    ocn-nwdaf-mtlf-service-deploy-545c8b445d-kqzfz                   1/1     Running            0                 3d15h
    ocn-nwdaf-subscription-service-deploy-f7959fc76-wfcxm            1/1     Running            0                 19h
    redis-master-pod                                                 1/1     Running            0                 60d
    redis-slave-sts-0                                                1/1     Running            0                 60d
    zookeper-sts-0                                                   1/1     Running            0                 60d
4.2.1.4.2 Docker registry is configured incorrectly

Problem

helm install might fail if the docker registry is not configured in all primary and secondary nodes.

Error Code/Error Message

When kubectl get pods -n <ocnwdaf_namespace> is performed, the status of the pods might be ImagePullBackOff or ErrImagePull.

For example:

$ kubectl get pods -n ocnwdaf

Solution

Configure docker registry on all primary and secondary nodes. For more information on configuring the docker registry, see Oracle Communications Cloud Native Environment Installation Guide.

4.2.1.4.3 Continuous Restart of Pods

Problem

helm install might fail if the MySQL primary and secondary hosts are not configured properly in ocnwdaf-custom-values.yaml.

Error Code/Error Message

When kubectl get pods -n <ocnwdaf_namespace> is performed, the pods restart count increases continuously.

For example:

$ kubectl get pods -n ocnwdaf

Solution

MySQL servers(s) may not be configured properly according to the pre-installation steps. For configuring MySQL servers, see the "Configuring Database, Creating Users, and Granting Permissions" section in Oracle Communications Cloud Native Core Networks Data Analytics Function Installation Guide.

4.2.1.5 Custom Value File Parse Failure
This section explains troubleshooting procedure in case of failure while parsing ocnwdaf-custom-values.yaml file.

Problem

Not able to parse ocnwdaf-custom-values-x.x.x.yaml, while running helm install.

Error Code/Error Message

Error: failed to parse ocnwdaf-custom-values-x.x.x.yaml: error converting YAML to JSON: yaml

Symptom

While creating the ocnwdaf-custom-values-x.x.x.yaml file, if the aforementioned error is received, it means that the file is not created properly. The tree structure may not have been followed or there may also be tab spaces in the file.

Solution

Following the procedure as mentioned:
  1. Download the latest OCNWDAF templates zip file from MOS. For more information, see the "Downloading OCNWDAF Package" section in Oracle Communications Cloud Native Core Networks Data Analytics Function Installation Guide.
  2. Follow the steps mentioned in the "Installation Tasks" section in Oracle Communications Cloud Native Core Networks Data Analytics Function Installation Guide.

4.2.2 Post Installation

4.2.2.1 Helm Test Error Scenario

Following are the error scenarios that may be identified using helm test.

  1. Run the following command to get the Helm Test pod name:
    kubectl get pods -n <deployment-namespace>
  2. When a helm test is performed, a new helm test pod is created. Check for the Helm Test pod that is in an error state.
  3. Get the logs using the following command:
    kubectl logs <podname> -n <namespace>
    Example:
    kubectl get <helm_test_pod> -n ocnwdaf

    For further assistance, collect the logs and contact MOS.

4.2.2.2 Purge Kafka Topics for New Installation

If in a previous OCNWDAF installation, Kafka topics contained messages, the topics should be retained in the new installation but not the messages. Follow the procedure below to prevent purge of Kafka topics:

  1. Connect to Kafka pod in your Kubernetes environment, run the command:
    kubectl -n <namespace> exec -it <podname> -- bash
  2. Change directory, move to the directory that contains the binary files:
    cd kafka_2.13-3.1.0/bin/
  3. Obtain the list of topics, run the command:
    kafka-topics.sh --list --bootstrap-server localhost:9092
  4. Delete each topic (repeat this step for each topic):
    kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic <topicname>

On completion of this procedure, the Kafka topics exist, but the messages do not exist.

Note:

After every installation is recommended to purge the topics before uninstalling them.

4.3 Database Related Issues

This section describes the most common database related issues and their resolution steps. It is recommended to perform the resolution steps provided in this guide. If the issue still persists, then contact My Oracle Support.

4.3.1 Debugging MySQL DB Errors

If you are facing issues related to subscription creation, follow the procedure below to login to MySQL DB:

Note:

Once the MySQL cluster is created, the cndbtier_install container generates the password and stores it in the occne-mysqlndb-root-secret secret.
  1. Retrieve the MySQL root password from occne-mysqlndb-root-secret secret.
    Run the command:
    $ kubectl -n occne-cndbtier get secret occne-mysqlndb-root-secret -o jsonpath='{.data}'map[mysql_root_password:TmV4dEdlbkNuZQ==]
  2. Decode the encoded output received as an output of the previous step to get the actual password:
    $ echo TmV4dEdlbkNuZQ== | base64 --decode
    NextGenCne
  3. Login to MySQL pod, run the command:
    $ kubectl -n occnepsa exec -it ndbmysqld-0 -- bash

    Note:

    Default container name is: mysqlndbcluster.

    Run the command kubectl describe pod/ndbmysqld-0 -n occnepsa to see all the containers in this pod.

  4. Login using MySQL client as the root user, run the command:
    $ mysql -h 127.0.0.1 -uroot -p
  5. Enter current root password for MySQL root user obtained from step 2.
  6. To debug each microservice, perform the following steps:
    • For the ocn-nwdaf-subscription service, run the following SQL commands:
      use <dbName>;
      use nwdaf_subscription;
      select * from nwdaf_subscription;
      select * from amf_ue_event_subscription
      select * from smf_ue_event_subscription
    • For the ocn-nrf-simulator service, run the following SQL commands:
      use <dbName>;
      use nrf;
      select * from profile;
    • For the ocn-smf-simulator service, run the following SQL commands:
      use <dbName>;
      use nrf;
      select * from smf_event_subscription;
    • For the ocn-amf-simulator service, run the following SQL commands:
      use <dbName>;
      use nrf;
      select * from amf_event_subscription;
    • For the ocn-nwdaf-data-collection service, run the following SQL commands:
      use <dbName>;
      use nwdaf_data_collection;
      select * from amf_event_notification_report_list;
      select * from amf_ue_event_report;               
      select * from cap4c_ue_notification;            
      select * from slice_load_level_notification;     
      select * from smf_event_notification_report_list;
      select * from smf_ue_event_report;               
      select * from ue_mobility_notification;
    • For the ocn-nwdaf-configuration-service service, run the following SQL commands:
      use <dbName>;
      use nwdaf_configuration_service;
      select * from slice;
      select * from tracking_are;               
      select * from slice_tracking_area;            
      select * from cell;  

4.4 Apache Kafka Related Issues

To debug issues related to Apache Kafka pipelines (such as, unable to read messages from the pipeline or write messages to the pipeline) perform the following steps:

  1. Get the Kafka pods, run the command:
    kubectl -n performance-ns get pods -o wide | grep "kafka"
  2. Select any pod and access the pod using the command:
    kubectl -n performance-ns exec -it kafka-sts-0 -- bash
  3. Move to the directory containing the binary files, run the command:
    cd kafka_2.13-3.1.0/bin/
  4. Obtain the list of topics, run the command:
    kafka-topics.sh --list --bootstrap-server localhost:9092
  5. For each topic, run the following command:
    kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic
        <topic-name>

4.5 CAP4C Related Issues

CAP4C comprises of the following services:
  • cap4c-model-controller
  • cap4c-model-executor
  • kafka
  • mysql-pod

To obtain more information on the service pods, follow the steps listed below:

  1. Each of these services is deployed as pod in Kubernetes. To find the status of the pods in Kubernetes run the following command:

    $ kubectl get pods -n <namespace>

    Sample output:

    NAME                                                 READY   STATUS    RESTARTS   AGE
     
    cap4c-model-controller-deploy-779cbdcf8f-w2pfh       1/1     Running   0          4d8h
     
    cap4c-model-executor-deploy-f9c96db54-ttnhd          1/1     Running   0          4d5h
     
    cap4c-stream-analytics-deploy-744878569-5xr2w        1/1     Running   0          4d8h
     
    
  2. To verify the pod information, print the detail of each pod to:
    $ kubectl describe pod cap4c-model-controller-deploy-779cbdcf8f-w2pfh  -n
        <namespace>

    Sample output:

    Name:         cap4c-model-controller-deploy-779cbdcf8f-w2pfh
     
    Namespace:    performance-ns
     
    Priority:     0
     
    Node:         sunstreaker-k8s-node-2/192.168.200.197
     
    Start Time:   Fri, 26 Aug 2022 15:31:39 +0000
     
    Labels:       app=cap4c-model-controller
     
                  pod-template-hash=779cbdcf8f
     
    Annotations:  cni.projectcalico.org/containerID: 480ca581a828184ccf6fabf7ec7cfb68920624f48d57148f6d93db4512bc5335
     
                  cni.projectcalico.org/podIP: 10.233.76.134/32
     
                  cni.projectcalico.org/podIPs: 10.233.76.134/32
     
                  kubernetes.io/psp: restricted
     
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
     
    Status:       Running
  3. List the service configuration for the pods, run the command:
    $ kubectl get svc -n <namespace>

    Sample output:

    NAME              TYPE        CLUSTER-IP      EXTERNAL-IP 
             PORT(S)    AGE cap4c-executor    ClusterIP   10.233.5.218    <none>       
             8888:32767/TCP 4d8h

4.6 Service Related Issues

This section describes the most common service related issues and their resolution steps. It is recommended to perform the resolution steps provided in this guide. If the issue still persists, then contact My Oracle Support.

4.6.1 Errors from Microservices

The OCNWDAF microservices are listed below:

  • ocn-nwdaf-subscription
  • ocn-nwdaf-data-collection
  • ocn-nwdaf-communication
  • ocn-nwdaf-configuration-service
  • ocn-nwdaf-analytics
  • ocn-nwdaf-gateway
  • ocn-nwdaf-mtlf
  • ocn-nrf-simulator
  • ocn-smf-simulator
  • ocn-amf-simulator
  • mesa-simulator

To debug microservice related errors, obtain the logs in the pods which are facing issues, run the following commands for each microservice:

  1. Obtain the pod information, run the command:
    kubectl get pods -n <nameSpace> -o wide

    Sample output:

    Figure 4-1 Sample Output


    Sample Output

  2. Obtain the log information for the pods, run the command:
    kubectl logs <podName> -n <nameSpace>
Sample commands:
  • kubectl logs ocn-nwdaf-subscription-84f8b74cc7-d7lk9 -n performance-ns
  • kubectl logs ocn-nwdaf-data-collection-57b948989c-xs7dq -n performance-ns
  • kubectl logs ocn-nwdaf-gateway-584577d8b7-f2xvd -n performance-ns
  • kubectl logs ocn-amf-simulator-584ccb8fd4-pcdn6 -n performance-ns