4 Troubleshooting NSSF

This section provides information about how to identify problems and a systematic approach to resolve the identified issues. It also includes a generic checklist to help identify the problem.

Note:

The performance and capacity of the NSSF system may vary based on the call model, Feature or Interface configuration, and underlying CNE and hardware environment.

4.1 Generic Checklist

The following sections provide a generic checklist for troubleshooting tips.

Deployment related tips

  • Are NSSF deployment, pods and services created, running and available?

    Run the following command:
    # kubectl -n <namespace> get deployments,pods,svc
    

    Inspect the output, check the following columns:

    • AVAILABLE of deployment
    • READY, STATUS, and RESTARTS of pods
    • PORT(S) of service

  • Is the correct image used and the correct environment variables set in the deployment? Run following the command:
    # kubectl -n <namespace> get deployment <deployment-name> -o yaml
    
    Inspect the output, check the environment and image.
    # kubectl -n nssf-svc get deployment ocnssf-nfregistration -o yaml 
    apiVersion: extensions/v1beta1 
    kind: Deployment metadata:  
                  annotations:    
    			  deployment.kubernetes.io/revision: "1"
    			kubectl.kubernetes.io/last-applied-configuration: |      
    			{"apiVersion":"apps/v1","kind":"Deployment","metadata": 
    			{"annotations":{},"name":"ocnssf-nfregistration","namespace":"nssfsvc"},
    			"spec":{"replicas":1,"selector":{"matchLabels": {"app":"ocnssf-nfregistration"}},
    			"template":{"metadata": {"labels":{"app":"ocnssf-nfregistration"}},
    			"spec": {"containers":[{"env":[{"name":"MYSQL_HOST","value":"mysql"}, 
    			{"name":"MYSQL_PORT","value":"3306"}, 
    			{"name":"MYSQL_DATABASE","value":"nssfdb"}, 
    			{"name":"nssf_REGISTRATION_ENDPOINT","value":"ocnssfnfregistration"},
    			{"name":"nssf_SUBSCRIPTION_ENDPOINT","value":"ocnssfnfsubscription"},
    			{"name":"NF_HEARTBEAT","value":"120"}, 
    			{"name":"DISC_VALIDITY_PERIOD","value":"3600"}],
    			"image":"dsrmaster0:5000/ocnssfnfregistration:latest",
    			"imagePullPolicy":"Always","name":"ocnssfnfregistration","ports": 
    			[{"containerPort":8080,"name":"server"}]}]}}}}  
    			creationTimestamp: 2018-08-27T15:45:59Z  generation: 1  
    			name: ocnssf-nfregistration  namespace: nssf-svc  
    			resourceVersion: "2336498"  
    			selfLink: /apis/extensions/v1beta1/namespaces/ nssf-svc/deployments/ocnssf-nfregistration 
    			uid: 4b82fe89-aa10-11e8-95fd-fa163f20f9e2 
    			spec:  progressDeadlineSeconds: 600  
    			replicas: 1  
    			revisionHistoryLimit: 10  
    			selector:    
    			matchLabels:      
    			app: ocnssf-nfregistration  
    			strategy:    
    			rollingUpdate:      
    			maxSurge: 25%      
    			maxUnavailable: 25%    
    			type: RollingUpdate  
    			template:    
    			metadata:      
    			creationTimestamp: null      
    			labels:        
    			app: ocnssf-nfregistration    
    			spec:      
    			containers:      
    			- env:       
    			- name: MYSQL_HOST          
    			value: mysql        
    			- name: MYSQL_PORT          
    			value: "3306"       
    			- name: MYSQL_DATABASE          
    			value: nssfdb        
    			- name: nssf_REGISTRATION_ENDPOINT          
    			value: ocnssf-nfregistration       
    			- name: nssf_SUBSCRIPTION_ENDPOINT         
    			value: ocnssf-nfsubscription
    			
                - name: NF_HEARTBEAT          
    			value: "120"        
    			- name: DISC_VALIDITY_PERIOD          
    			value: "3600"        
    			image: dsr-master0:5000/ocnssf-nfregistration:latest        
    			imagePullPolicy: Always       
    			name: ocnssf-nfregistration        
    			ports:        - containerPort: 8080          
    			name: server          
    			protocol: TCP        
    			resources: {}        
    			terminationMessagePath: /dev/termination-log        
    			terminationMessagePolicy: File      
    			dnsPolicy: ClusterFirst      
    			restartPolicy: Always      
    			schedulerName: default-scheduler      
    			securityContext: {}      
    			terminationGracePeriodSeconds: 30 
    			status:  
    			availableReplicas: 1  
    			conditions:  - lastTransitionTime: 2018-08-27T15:46:01Z    
    			lastUpdateTime: 2018-08-27T15:46:01Z    
    			message: Deployment has minimum availability.    
    			reason: MinimumReplicasAvailable    
    			status: "True"    
    			type: Available  
    			- lastTransitionTime: 2018-08-27T15:45:59Z    
    			lastUpdateTime: 2018-08-27T15:46:01Z    
    			message: ReplicaSet 
    			"ocnssf-nfregistration-7898d657d9" has successfully progressed.    
    			reason: NewReplicaSetAvailable    
    			status: "True"    
    			type: Progressing  
    			observedGeneration: 1  
    			readyReplicas: 1  
    			replicas: 1  
    			updatedReplicas: 1
    
  • Check if the microservices can access each other through a REST interface. Run the following command:
    # kubectl -n <namespace> exec <pod name> -- curl <uri>
    

    Example:

    # kubectl -n nssf-svc exec $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfs) -
            curl http://ocnssf-nfregistration:8080/nnssf-nfm/v1/nfinstances
    # kubectl -n nssf-svc exec $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfr) -        
    curl http://ocnssf-nfsubscription:8080/nnssf-nfm/v1/nfinstances

    Note:

    These commands are in their simple form and display the logs only if there is 1 nssf<registration> and nssf<unscription> pod deployed.

Application related tips

  • Run the following command to check the application logs and look for exceptions:
    # kubectl -n <namespace> logs -f <pod name>
    You can use '-f' to follow the logs or 'grep' for specific patterns in the log output.

    Example:

    # kubectl -n nssf-svc logs -f $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfr)
    # kubectl -n nssf-svc logs -f $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfs) 

    Note:

    These commands are in their simple form and display the logs only if there is 1 nssf<egistration> and nfs<ubscription> pod deployed.

4.2 Deployment Related Issues

This section describes the most common deployment related issues and their resolution steps. Users are recommended to attempt the resolution steps provided in this guide before contacting Oracle Support.

4.2.1 Preinstallation

This section describes the common preinstallation issues and their resolution steps.

4.2.1.1 Debugging General CNE

Problem: The environment is not working as expected.

Solution:

Run the following command to get all the events:
kubectl get events -n <ocnssf_namespace>
4.2.1.1.1 The Environment is Not Working As Expected

Problem: The environment is not working as expected.

Solution:
  1. Check if kubectl is installed and working as expected.
  2. Check if kubectl version command works. This displays the Kubernetes client and server versions.
  3. Check if kubectl create namespace test command works.
  4. Check if kubectl delete namespace test command works.
  5. Check if helm is installed and working as expected.
  6. Check if helm version command works. This displays the helm client and server versions.
4.2.1.2 Curl HTTP2 Not Supported

Problem

The system does not support Curl HTTP2.

Error Code or Error Message

Unsupported protocol error is thrown or connection is established with HTTP/1.1 200 OK

Symptom

If unsupported protocol error is thrown or connection is established with HTTP1.1, it is an indication that Curl HTTP2 support is unavailable on your machine.

Solution

Following is the procedure to install Curl with HTTP2 support:

1. Make sure git is installed:

$ sudo yum install git -y 

2. Install nghttp2:

$ git clone https://github.com/tatsuhiro-t/nghttp2.git
 $ cd nghttp2 $ autoreconf -i 
$ automake 
$ autoconf 
$ ./configure 
$ make 
$ sudo make install  
$ echo '/usr/local/lib' > /etc/ld.so.conf.d/custom-libs.conf 
$ ldconfig 

3. Install the latest Curl:


$ wget http://curl.haxx.se/download/curl-7.46.0.tar.bz2 (NOTE: Check for latest version during Installation) 
$ tar -xvjf curl-7.46.0.tar.bz2 
$ cd curl-7.46.0 
$ ./configure --with-nghttp2=/usr/local --with-ssl 
$ make 
$ sudo make install 
$ sudo ldconfig 

4. Run the following command to verify that HTTP2 is added in features:

$ curl --http2-prior-knowledge -v "<http://10.75.204.35:32270/nnrf
disc/v1/nf-instances?requester-nf-type=AMF&target-nf-type=SMF>"

4.2.2 Installation

This section describes the common installation related issues and their resolution steps.

4.2.2.1 Helm Install Failure

This section describes the various scenarios in which helm install might fail. Following are some of the scenarios:

  • Incorrect image name in ocnssf-custom-values files
  • Docker registry is configured incorrectly
  • Continuous Restart of Pods
4.2.2.1.1 Incorrect image name in ocnssf-custom-values files

Problem

helm install might fail if an incorrect image name is provided in the ocnssf_custom_values_23.4.0.yaml file.

Error Code/Error Message

When kubectl get pods -n <ocnssf_namespace> is performed, the status of the pods might be ImagePullBackOff or ErrImagePull.

For example:

$ kubectl get pods -n ocnssf

NAME                                  READY STATUS RESTARTS AGE
ocnssf-appinfo-7969c9fbf7-4fmgj                 1/1  Running   0   18m 
ocnssf-config-server-54bf4bc8f9-s82cv           1/1  Running   0   18m 
ocnssf-egress-6b6bff8949-2mf7b                  1/1  ImagePullBackOff   0   18m 
ocnssf-ingress-68d76954f5-9fsfq                 1/1  Running   0   18m 
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q   1/1  Running   0   18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v   1/1  Running   0   18m
ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1  Running   0   18m 
ocnssf-nsavailability-644999bbfb-9gcm5          1/1  Running   0   18m 
ocnssf-nsconfig-577446c487-dzsh6                1/1  Running   0   18m
ocnssf-nsdb-585f7bd7d-tdth4                     1/1  Running   0   18m 
ocnssf-nsselection-5dfcc94bc7-q9gct             1/1  Running   0   18m 
ocnssf-nssubscription-5c898fbbb9-fqcw6          1/1  Running   0   18m 
ocnssf-performance-6d75c7f966-qm5fq             1/1  Running   0   18m

Solution

Perform the following steps to verify and correct the image name:
  1. Check ocnssf_custom_values_23.4.0.yaml file has the release specific image name and tags.
    vi ocnssf_custom_values_23.4.0.yaml
    For NSSF images details, see "Customizing NSSF" in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
  2. Edit ocnssf_custom_values_23.4.0.yaml file in case the release specific image name and tags must be modified.
  3. Save the file.
  4. Run the following command to delete the deployment:
    helm delete --purge <release_namespace>
    Sample command:
    helm delete --purge ocnssf
  5. In case the helm purge does not clean the deployment and Kubernetes objects completely, then see the "Cleaning NSSF deployment" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
  6. Run helm install command. For helm install command, see the "Customizing NSSF" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
  7. Run kubectl get pods -n <ocnssf_namespace> to verify if the status of all the pods is Running.

    For example:

    $ kubectl get pods -n ocnssf
    NAME                                  READY STATUS RESTARTS AGE
    ocnssf-appinfo-7969c9fbf7-4fmgj                 1/1  Running   0   18m 
    ocnssf-config-server-54bf4bc8f9-s82cv           1/1  Running   0   18m 
    ocnssf-egress-6b6bff8949-2mf7b                  1/1  Running   0   18m 
    ocnssf-ingress-68d76954f5-9fsfq                 1/1  Running   0   18m 
    ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q   1/1  Running   0   18m
    ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v   1/1  Running   0   18m
    ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1  Running   0   18m 
    ocnssf-nsavailability-644999bbfb-9gcm5          1/1  Running   0   18m 
    ocnssf-nsconfig-577446c487-dzsh6                1/1  Running   0   18m
    ocnssf-nsdb-585f7bd7d-tdth4                     1/1  Running   0   18m 
    ocnssf-nsselection-5dfcc94bc7-q9gct             1/1  Running   0   18m 
    ocnssf-nssubscription-5c898fbbb9-fqcw6          1/1  Running   0   18m 
    ocnssf-performance-6d75c7f966-qm5fq             1/1  Running   0   18m
4.2.2.1.2 Docker registry is configured incorrectly

Problem

helm install might fail if the docker registry is not configured in all primary and secondary nodes.

Error Code/Error Message

When kubectl get pods -n <ocnssf_namespace> is performed, the status of the pods might be ImagePullBackOff or ErrImagePull.

For example:

$ kubectl get pods -n ocnssf
NAME                                  READY STATUS RESTARTS AGE
ocnssf-appinfo-7969c9fbf7-4fmgj                 1/1  Running   0   18m 
ocnssf-config-server-54bf4bc8f9-s82cv           1/1  Running   0   18m 
ocnssf-egress-6b6bff8949-2mf7b                  1/1  ImagePullBackOff   0   18m 
ocnssf-ingress-68d76954f5-9fsfq                 1/1  Running   0   18m 
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q   1/1  Running   0   18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v   1/1  Running   0   18m
ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1  Running   0   18m 
ocnssf-nsavailability-644999bbfb-9gcm5          1/1  Running   0   18m 
ocnssf-nsconfig-577446c487-dzsh6                1/1  Running   0   18m
ocnssf-nsdb-585f7bd7d-tdth4                     1/1  Running   0   18m 
ocnssf-nsselection-5dfcc94bc7-q9gct             1/1  Running   0   18m 
ocnssf-nssubscription-5c898fbbb9-fqcw6          1/1  Running   0   18m 
ocnssf-performance-6d75c7f966-qm5fq             1/1  Running   0   18m

Solution

Configure docker registry on all primary and secondary nodes. For more information on configuring the docker registry, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.

4.2.2.1.3 Continuous Restart of Pods

Problem

helm install might fail if the MySQL primary and secondary hosts are not configured properly in ocnssf-custom-values.yaml.

Error Code/Error Message

When kubectl get pods -n <ocnssf_namespace> is performed, the pods restart count increases continuously.

For example:

$ kubectl get pods -n ocnssf
NAME                                  READY STATUS RESTARTS AGE
ocnssf-appinfo-7969c9fbf7-4fmgj                 1/1  Running   0   18m 
ocnssf-config-server-54bf4bc8f9-s82cv           1/1  Running   0   18m 
ocnssf-egress-6b6bff8949-2mf7b                  1/1  Running   0   18m 
ocnssf-ingress-68d76954f5-9fsfq                 1/1  Running   0   18m 
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q   1/1  Running   0   18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v   1/1  Running   0   18m
ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1  Running   0   18m 
ocnssf-nsavailability-644999bbfb-9gcm5          1/1  Running   0   18m 
ocnssf-nsconfig-577446c487-dzsh6                1/1  Running   0   18m
ocnssf-nsdb-585f7bd7d-tdth4                     1/1  Running   0   18m 
ocnssf-nsselection-5dfcc94bc7-q9gct             1/1  Running   0   18m 
ocnssf-nssubscription-5c898fbbb9-fqcw6          1/1  Running   0   18m 
ocnssf-performance-6d75c7f966-qm5fq             1/1  Running   0   18m

Solution

MySQL servers(s) may not be configured properly according to the pre-installation steps. For configuring MySQL servers, see the "Configuring MySQL Database and User" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

4.2.2.1.4 Tiller Pod Failure

Problem

Tiller Pod is not ready to run helm install.

Error Code/Error Message

The error 'could not find a ready tiller pod' message is received.

Symptom

When you run helm ls command and receive, 'could not find a ready tiller pod' message error.

Solution

Following is the procedure to install helm and tiller using the below commands:

1. Delete the preinstalled helm:

kubectl delete svc tiller-deploy -n kube-system kubectl delete deploy tiller-deploy -n kube-system

2. Install helm and tiller using these commands:

helm init --client-only 
helm plugin install https://github.com/rimusz/
helm-tiller helm tiller install 
helm tiller start kube-system
4.2.2.2 Custom Value File Parsing Failure
This section explains troubleshooting procedure in case of failure while parsing ocnssf_custom_values_23.4.0.yaml file.

Problem

Not able to parse ocnssf_custom_values_23.4.0.yaml, while running helm install.

Error Code/Error Message

Error: failed to parse ocnssf_custom_values_23.4.0.yaml: error converting YAML to JSON: yaml

Symptom

While creating the ocnssf_custom_values_23.4.0.yaml file, if the aforementioned error is received, it means that the file is not created properly. The tree structure may not have been followed or there may also be tab spaces in the file.

Solution

Following the procedure as mentioned:
  1. Download the latest NSSF templates zip file from MOS. For more information, see the "Downloading the NSSF Package and Custom Template ZIP file" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
  2. Follow the steps mentioned in the "Installation Tasks" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

4.2.3 Post installation

This section describes the common post installation issues and their resolution steps.

4.2.3.1 Helm Test Error Scenarios

Identify error scenarios using the helm test as follows:

  1. Run the following command to get the Helm Test pod name:
    kubectl get pods -n <deployment-namespace>
  2. Check for the Helm Test pod that is in the error state.
  3. Run the following command to get the logs:
    kubectl logs <podname> -n <namespace>
    Example:
    kubectl get <helm_test_pod> -n ocnssf

    Depending on the failure reasons, perform the resolution steps.

For further assistance, collect the logs and contact MOS.

4.3 Upgrade or Rollback Failure

When Oracle Communications Cloud Native Core, Network Slice Selection Function (NSSF) upgrade or rollback fails, perform the following procedure.

  1. Check the pre or post upgrade logs or rollback hook logs in Kibana as applicable.

    Users can filter upgrade or rollback logs using the following filters:

    • For upgrade: lifeCycleEvent=9001
    • For rollback: lifeCycleEvent=9002

    {
       "time_stamp":"2021-08-23 06:45:57.698+0000",
       "thread":"main",
       "level":"INFO",
       "logger":"com.oracle.cgbu.cne.ocnssf.hooks.releases.ReleaseHelmHook_1_14_1",
       "message":"{logMsg=Starting Pre-Upgrade hook Execution, lifeCycleEvent=9001 | Upgrade, sourceRelease=101400, targetRelease=101401}",
       "loc":"com.oracle.cgbu.ocnssf.common.utils.EventSpecificLogger.submit(EventSpecificLogger.java:94)"
    }
  2. Check the pod logs in Kibana to analyze the cause of failure.
  3. After detecting the cause of failure, do the following:
    • For upgrade failure:
      • If the cause of upgrade failure is a database or network connectivity issue, then resolve the issue and rerun the upgrade command.
      • If the cause of failure is not related to a database or network connectivity issue and is observed during the preupgrade phase, then do not perform rollback because NSSF deployment remains in the source or older release.
      • If the upgrade failure occurs during the postupgrade phase, for example, post upgrade hook failure due to target release pod not moving to ready state, then perform a rollback.
    • For rollback failure: If the cause of rollback failure is a database or network connectivity issue, contact your system administrator. When the issue is resolved, rerun the rollback command.
  4. If the issue persists, contact My Oracle Support.

4.3.1 Replication Channel Breaks While Rolling Back cnDBTier from 23.4.x to 23.3.x.

Scenario

Replication Channel has broken while doing a rollback of cnDBTier from 23.4.x to 23.3.x.

Problem

Intermittently, during rollback of cnDBTier from 23.4.x to 23.3.x in georedundant scenario, the replication is going down.

Solution

As a workaround, follow the recovery procedure explained in the sections, "Resolving Georeplication Failure Between cnDBTier Clusters in a Two Site Replication" and "Resolving Georeplication Failure Between cnDBTier Clusters in a Three Site Replication " in Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to recover the replication.

4.4 Database Related Issues

This section describes the most common database related issues and their resolution steps. It is recommended to attempt the resolution steps provided in this guide before contacting Oracle Support.

4.4.1 NSSF MySQL Database Access

Problem

Keyword - wait-for-db

Tags - "config-server" "database" "readiness" "init" "SQLException" "access denied"

Due to database accessibility issues from the NSSF service, pods will stay in the init state.

Even though some pods are up, they still keep receiving the following exception: " Cannot connect to database server java.sql.SQLException"

Reasons:

  1. MySQL host IP address OR MySQL-service name[in case of occne-infra] is incorrect.
  2. Few MySQL nodes are probably down.
  3. The username or password given in the secrets are not created in the database or do not have proper grant or access to service databases.
  4. Databases are not created correctly with the same name mentioned in the ocnssf_custom_values_23.4.0.yaml file while installing NSSF.

Resolution Steps

To resolve this issue, perform the following steps:
  1. Check if the database IP is proper and pingable from worker nodes of the Kubernetes cluster. Update the database IP and service accordingly. If required, you can use floating IP as well. If the database connectivity persists, then update the correct IP address.

    In the case of OCCNE-infra, instead of mentioning IP address for MySQL connection, use FQDN for mysql-connectivity-service to connect to the database.

  2. Manually log in to MySQL through the same database IP as mentioned in the ocnssf_custom_values_23.4.0.yaml file. In case of MySQL service name, run the following command to describe the service:
    kubectl describe svc <mysql-servicename> -n <namespace> 
    Log in to the MySQL database with all sets of IPs described in the MySQL service. If any SQL node is down, it can lead to an intermittent database query failure. So, make sure that you can log in to MySQL from all the nodes mentioned in the IP list of MySQL service describe command.

    Make sure that all the MySQL nodes are up and running before installing NSSF.

  3. Check the existing user list into the database using SQL query: "select user from mysql.user;"
    Check if all the mentioned users in the custom-value of NSSF installation are present in the database.

    Note:

    Create the user with correct password as mentioned in the secret file of the NSSF.
  4. Check the grants of all the users mentioned into the ocnssf_custom_values_23.4.0.yaml file by SQL query: "show grants for <username>;"

    If a username or password issue persists, then correctly create a user with the required password and also provide the grants as per the Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.

  5. Check if the databases are created with the same name as mentioned in the ocnssf_custom_values_23.4.0.yaml file for the services.

    Note:

    Create the database as per the ocnssf_custom_values_23.4.0.yaml file.
  6. Check if problematic pods are getting created on any one unique worker node. If yes, then may be the cause of the error can be the worker node. Try draining the problematic worker node and allowing pods to move to another node.