4 Troubleshooting NSSF
This section provides information about how to identify problems and a systematic approach to resolve the identified issues. It also includes a generic checklist to help identify the problem.
Note:
The performance and capacity of the NSSF system may vary based on the call model, Feature or Interface configuration, and underlying CNE and hardware environment.4.1 Generic Checklist
The following sections provide a generic checklist for troubleshooting tips.
Deployment related tips
- Are NSSF deployment, pods and services created, running and
available?
Run the following command:
# kubectl -n <namespace> get deployments,pods,svc
Inspect the output, check the following columns:
- AVAILABLE of deployment
- READY, STATUS, and RESTARTS of pods
- PORT(S) of service
- Is the correct image used and the correct environment variables set in
the deployment? Run following the command:
# kubectl -n <namespace> get deployment <deployment-name> -o yaml
# kubectl -n nssf-svc get deployment ocnssf-nfregistration -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata": {"annotations":{},"name":"ocnssf-nfregistration","namespace":"nssfsvc"}, "spec":{"replicas":1,"selector":{"matchLabels": {"app":"ocnssf-nfregistration"}}, "template":{"metadata": {"labels":{"app":"ocnssf-nfregistration"}}, "spec": {"containers":[{"env":[{"name":"MYSQL_HOST","value":"mysql"}, {"name":"MYSQL_PORT","value":"3306"}, {"name":"MYSQL_DATABASE","value":"nssfdb"}, {"name":"nssf_REGISTRATION_ENDPOINT","value":"ocnssfnfregistration"}, {"name":"nssf_SUBSCRIPTION_ENDPOINT","value":"ocnssfnfsubscription"}, {"name":"NF_HEARTBEAT","value":"120"}, {"name":"DISC_VALIDITY_PERIOD","value":"3600"}], "image":"dsrmaster0:5000/ocnssfnfregistration:latest", "imagePullPolicy":"Always","name":"ocnssfnfregistration","ports": [{"containerPort":8080,"name":"server"}]}]}}}} creationTimestamp: 2018-08-27T15:45:59Z generation: 1 name: ocnssf-nfregistration namespace: nssf-svc resourceVersion: "2336498" selfLink: /apis/extensions/v1beta1/namespaces/ nssf-svc/deployments/ocnssf-nfregistration uid: 4b82fe89-aa10-11e8-95fd-fa163f20f9e2 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: ocnssf-nfregistration strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: ocnssf-nfregistration spec: containers: - env: - name: MYSQL_HOST value: mysql - name: MYSQL_PORT value: "3306" - name: MYSQL_DATABASE value: nssfdb - name: nssf_REGISTRATION_ENDPOINT value: ocnssf-nfregistration - name: nssf_SUBSCRIPTION_ENDPOINT value: ocnssf-nfsubscription - name: NF_HEARTBEAT value: "120" - name: DISC_VALIDITY_PERIOD value: "3600" image: dsr-master0:5000/ocnssf-nfregistration:latest imagePullPolicy: Always name: ocnssf-nfregistration ports: - containerPort: 8080 name: server protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: availableReplicas: 1 conditions: - lastTransitionTime: 2018-08-27T15:46:01Z lastUpdateTime: 2018-08-27T15:46:01Z message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: 2018-08-27T15:45:59Z lastUpdateTime: 2018-08-27T15:46:01Z message: ReplicaSet "ocnssf-nfregistration-7898d657d9" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing observedGeneration: 1 readyReplicas: 1 replicas: 1 updatedReplicas: 1
- Check if the microservices can access each other through a REST
interface. Run the following command:
# kubectl -n <namespace> exec <pod name> -- curl <uri>
Example:
# kubectl -n nssf-svc exec $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfs) - curl http://ocnssf-nfregistration:8080/nnssf-nfm/v1/nfinstances # kubectl -n nssf-svc exec $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfr) - curl http://ocnssf-nfsubscription:8080/nnssf-nfm/v1/nfinstances
Note:
These commands are in their simple form and display the logs only if there is 1 nssf<registration> and nssf<unscription> pod deployed.
Application related tips
- Run the following command to check the application logs and look for
exceptions:
You can use '-f' to follow the logs or 'grep' for specific patterns in the log output.# kubectl -n <namespace> logs -f <pod name>
Example:
# kubectl -n nssf-svc logs -f $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfr) # kubectl -n nssf-svc logs -f $(kubectl -n nssf-svc get pods -o name|cut -d'/' -f2|grep nfs)
Note:
These commands are in their simple form and display the logs only if there is 1 nssf<egistration> and nfs<ubscription> pod deployed.
4.2 Deployment Related Issues
This section describes the most common deployment related issues and their resolution steps. Users are recommended to attempt the resolution steps provided in this guide before contacting Oracle Support.
4.2.1 Preinstallation
This section describes the common preinstallation issues and their resolution steps.
4.2.1.1 Debugging General CNE
Problem: The environment is not working as expected.
Solution:
kubectl get events -n <ocnssf_namespace>
4.2.1.1.1 The Environment is Not Working As Expected
Problem: The environment is not working as expected.
- Check if kubectl is installed and working as expected.
- Check if
kubectl version
command works. This displays the Kubernetes client and server versions. - Check if
kubectl create namespace test
command works. - Check if
kubectl delete namespace test
command works. - Check if helm is installed and working as expected.
- Check if
helm version
command works. This displays the helm client and server versions.
4.2.1.2 Curl HTTP2 Not Supported
Problem
The system does not support Curl HTTP2.
Error Code or Error Message
Unsupported protocol error is thrown or connection is established with HTTP/1.1 200 OK
Symptom
If unsupported protocol error is thrown or connection is established with HTTP1.1, it is an indication that Curl HTTP2 support is unavailable on your machine.
Solution
Following is the procedure to install Curl with HTTP2 support:
1. Make sure git is installed:
$ sudo yum install git -y
2. Install nghttp2:
$ git clone https://github.com/tatsuhiro-t/nghttp2.git
$ cd nghttp2 $ autoreconf -i
$ automake
$ autoconf
$ ./configure
$ make
$ sudo make install
$ echo '/usr/local/lib' > /etc/ld.so.conf.d/custom-libs.conf
$ ldconfig
3. Install the latest Curl:
$ wget http://curl.haxx.se/download/curl-7.46.0.tar.bz2 (NOTE: Check for latest version during Installation)
$ tar -xvjf curl-7.46.0.tar.bz2
$ cd curl-7.46.0
$ ./configure --with-nghttp2=/usr/local --with-ssl
$ make
$ sudo make install
$ sudo ldconfig
4. Run the following command to verify that HTTP2 is added in features:
$ curl --http2-prior-knowledge -v "<http://10.75.204.35:32270/nnrf
disc/v1/nf-instances?requester-nf-type=AMF&target-nf-type=SMF>"
4.2.2 Installation
This section describes the common installation related issues and their resolution steps.
4.2.2.1 Helm Install Failure
This section describes the various scenarios in which helm install
might
fail. Following are some of the scenarios:
- Incorrect image name in ocnssf-custom-values files
- Docker registry is configured incorrectly
- Continuous Restart of Pods
4.2.2.1.1 Incorrect image name in ocnssf-custom-values files
Problem
helm install
might fail if an incorrect image name is
provided in the ocnssf_custom_values_23.4.0.yaml
file.
Error Code/Error Message
When kubectl get pods -n <ocnssf_namespace>
is
performed, the status of the pods might be ImagePullBackOff or ErrImagePull.
For example:
$ kubectl get pods -n ocnssf
NAME READY STATUS RESTARTS AGE
ocnssf-appinfo-7969c9fbf7-4fmgj 1/1 Running 0 18m
ocnssf-config-server-54bf4bc8f9-s82cv 1/1 Running 0 18m
ocnssf-egress-6b6bff8949-2mf7b 1/1 ImagePullBackOff 0 18m
ocnssf-ingress-68d76954f5-9fsfq 1/1 Running 0 18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q 1/1 Running 0 18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v 1/1 Running 0 18m
ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1 Running 0 18m
ocnssf-nsavailability-644999bbfb-9gcm5 1/1 Running 0 18m
ocnssf-nsconfig-577446c487-dzsh6 1/1 Running 0 18m
ocnssf-nsdb-585f7bd7d-tdth4 1/1 Running 0 18m
ocnssf-nsselection-5dfcc94bc7-q9gct 1/1 Running 0 18m
ocnssf-nssubscription-5c898fbbb9-fqcw6 1/1 Running 0 18m
ocnssf-performance-6d75c7f966-qm5fq 1/1 Running 0 18m
Solution
- Check
ocnssf_custom_values_23.4.0.yaml
file has the release specific image name and tags.
For NSSF images details, see "Customizing NSSF" in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.vi ocnssf_custom_values_23.4.0.yaml
- Edit
ocnssf_custom_values_23.4.0.yaml
file in case the release specific image name and tags must be modified. - Save the file.
- Run the following command to delete the
deployment:
helm delete --purge <release_namespace>
Sample command:helm delete --purge ocnssf
- In case the helm purge does not clean the deployment and Kubernetes objects completely, then see the "Cleaning NSSF deployment" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
- Run
helm install
command. For helm install command, see the "Customizing NSSF" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide. - Run
kubectl get pods -n <ocnssf_namespace>
to verify if the status of all the pods is Running.For example:
$ kubectl get pods -n ocnssf
NAME READY STATUS RESTARTS AGE ocnssf-appinfo-7969c9fbf7-4fmgj 1/1 Running 0 18m ocnssf-config-server-54bf4bc8f9-s82cv 1/1 Running 0 18m ocnssf-egress-6b6bff8949-2mf7b 1/1 Running 0 18m ocnssf-ingress-68d76954f5-9fsfq 1/1 Running 0 18m ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q 1/1 Running 0 18m ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v 1/1 Running 0 18m ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1 Running 0 18m ocnssf-nsavailability-644999bbfb-9gcm5 1/1 Running 0 18m ocnssf-nsconfig-577446c487-dzsh6 1/1 Running 0 18m ocnssf-nsdb-585f7bd7d-tdth4 1/1 Running 0 18m ocnssf-nsselection-5dfcc94bc7-q9gct 1/1 Running 0 18m ocnssf-nssubscription-5c898fbbb9-fqcw6 1/1 Running 0 18m ocnssf-performance-6d75c7f966-qm5fq 1/1 Running 0 18m
4.2.2.1.2 Docker registry is configured incorrectly
Problem
helm install
might fail if the docker registry is not
configured in all primary and secondary nodes.
Error Code/Error Message
When kubectl get pods -n <ocnssf_namespace>
is
performed, the status of the pods might be ImagePullBackOff or ErrImagePull.
For example:
$ kubectl get pods -n ocnssf
NAME READY STATUS RESTARTS AGE
ocnssf-appinfo-7969c9fbf7-4fmgj 1/1 Running 0 18m
ocnssf-config-server-54bf4bc8f9-s82cv 1/1 Running 0 18m
ocnssf-egress-6b6bff8949-2mf7b 1/1 ImagePullBackOff 0 18m
ocnssf-ingress-68d76954f5-9fsfq 1/1 Running 0 18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q 1/1 Running 0 18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v 1/1 Running 0 18m
ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1 Running 0 18m
ocnssf-nsavailability-644999bbfb-9gcm5 1/1 Running 0 18m
ocnssf-nsconfig-577446c487-dzsh6 1/1 Running 0 18m
ocnssf-nsdb-585f7bd7d-tdth4 1/1 Running 0 18m
ocnssf-nsselection-5dfcc94bc7-q9gct 1/1 Running 0 18m
ocnssf-nssubscription-5c898fbbb9-fqcw6 1/1 Running 0 18m
ocnssf-performance-6d75c7f966-qm5fq 1/1 Running 0 18m
Solution
Configure docker registry on all primary and secondary nodes. For more information on configuring the docker registry, see Oracle Communications Cloud Native Core, Cloud Native Environment Installation, Upgrade, and Fault Recovery Guide.
4.2.2.1.3 Continuous Restart of Pods
Problem
helm install
might fail if the MySQL primary and
secondary hosts are not configured properly in
ocnssf-custom-values.yaml
.
Error Code/Error Message
When kubectl get pods -n <ocnssf_namespace>
is
performed, the pods restart count increases continuously.
For example:
$ kubectl get pods -n
ocnssf
NAME READY STATUS RESTARTS AGE
ocnssf-appinfo-7969c9fbf7-4fmgj 1/1 Running 0 18m
ocnssf-config-server-54bf4bc8f9-s82cv 1/1 Running 0 18m
ocnssf-egress-6b6bff8949-2mf7b 1/1 Running 0 18m
ocnssf-ingress-68d76954f5-9fsfq 1/1 Running 0 18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-l4q2q 1/1 Running 0 18m
ocnssf-nrf-client-nfdiscovery-cf48cd8d8-vmt5v 1/1 Running 0 18m
ocnssf-nrf-client-nfmanagement-7db4598fbb-672hc 1/1 Running 0 18m
ocnssf-nsavailability-644999bbfb-9gcm5 1/1 Running 0 18m
ocnssf-nsconfig-577446c487-dzsh6 1/1 Running 0 18m
ocnssf-nsdb-585f7bd7d-tdth4 1/1 Running 0 18m
ocnssf-nsselection-5dfcc94bc7-q9gct 1/1 Running 0 18m
ocnssf-nssubscription-5c898fbbb9-fqcw6 1/1 Running 0 18m
ocnssf-performance-6d75c7f966-qm5fq 1/1 Running 0 18m
Solution
MySQL servers(s) may not be configured properly according to the pre-installation steps. For configuring MySQL servers, see the "Configuring MySQL Database and User" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
4.2.2.1.4 Tiller Pod Failure
Problem
Tiller Pod is not ready to run helm install.
Error Code/Error Message
The error 'could not find a ready tiller pod' message is received.
Symptom
When you run helm ls command and receive, 'could not find a ready tiller pod' message error.
Solution
Following is the procedure to install helm and tiller using the below commands:
1. Delete the preinstalled helm:
kubectl delete svc tiller-deploy -n kube-system kubectl delete deploy tiller-deploy -n kube-system
2. Install helm and tiller using these commands:
helm init --client-only
helm plugin install https://github.com/rimusz/
helm-tiller helm tiller install
helm tiller start kube-system
4.2.2.2 Custom Value File Parsing Failure
ocnssf_custom_values_23.4.0.yaml
file.
Problem
Not able to parse ocnssf_custom_values_23.4.0.yaml
, while running helm install
.
Error Code/Error Message
Error: failed to parse ocnssf_custom_values_23.4.0.yaml
: error converting YAML to JSON: yaml
Symptom
While creating the ocnssf_custom_values_23.4.0.yaml
file, if the aforementioned error is received, it means that the
file is not created properly. The tree structure may not have been followed or there may
also be tab spaces in the file.
Solution
- Download the latest NSSF templates zip file from MOS. For more information, see the "Downloading the NSSF Package and Custom Template ZIP file" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
- Follow the steps mentioned in the "Installation Tasks" section in Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
4.2.3 Post installation
This section describes the common post installation issues and their resolution steps.
4.2.3.1 Helm Test Error Scenarios
Identify error scenarios using the helm test as follows:
- Run the following command to get the Helm Test pod
name:
kubectl get pods -n <deployment-namespace>
- Check for the Helm Test pod that is in the error state.
- Run the following command to get the
logs:
kubectl logs <podname> -n <namespace>
Example:kubectl get <helm_test_pod> -n ocnssf
Depending on the failure reasons, perform the resolution steps.
For further assistance, collect the logs and contact MOS.
4.3 Upgrade or Rollback Failure
When Oracle Communications Cloud Native Core, Network Slice Selection Function (NSSF) upgrade or rollback fails, perform the following procedure.
- Check the pre or post upgrade logs or rollback hook logs in Kibana as
applicable.
Users can filter upgrade or rollback logs using the following filters:
- For upgrade: lifeCycleEvent=9001
-
For rollback: lifeCycleEvent=9002
{ "time_stamp":"2021-08-23 06:45:57.698+0000", "thread":"main", "level":"INFO", "logger":"com.oracle.cgbu.cne.ocnssf.hooks.releases.ReleaseHelmHook_1_14_1", "message":"{logMsg=Starting Pre-Upgrade hook Execution, lifeCycleEvent=9001 | Upgrade, sourceRelease=101400, targetRelease=101401}", "loc":"com.oracle.cgbu.ocnssf.common.utils.EventSpecificLogger.submit(EventSpecificLogger.java:94)" }
- Check the pod logs in Kibana to analyze the cause of failure.
- After detecting the cause of failure, do the following:
- For upgrade failure:
- If the cause of upgrade failure is a database or network connectivity issue, then resolve the issue and rerun the upgrade command.
- If the cause of failure is not related to a database or network connectivity issue and is observed during the preupgrade phase, then do not perform rollback because NSSF deployment remains in the source or older release.
- If the upgrade failure occurs during the postupgrade phase, for example, post upgrade hook failure due to target release pod not moving to ready state, then perform a rollback.
- For rollback failure: If the cause of rollback failure is a database or network connectivity issue, contact your system administrator. When the issue is resolved, rerun the rollback command.
- For upgrade failure:
- If the issue persists, contact My Oracle Support.
4.3.1 Replication Channel Breaks While Rolling Back cnDBTier from 23.4.x to 23.3.x.
Scenario
Replication Channel has broken while doing a rollback of cnDBTier from 23.4.x to 23.3.x.
Problem
Intermittently, during rollback of cnDBTier from 23.4.x to 23.3.x in georedundant scenario, the replication is going down.
Solution
As a workaround, follow the recovery procedure explained in the sections, "Resolving Georeplication Failure Between cnDBTier Clusters in a Two Site Replication" and "Resolving Georeplication Failure Between cnDBTier Clusters in a Three Site Replication " in Cloud Native Core, cnDBTier Installation, Upgrade, and Fault Recovery Guide to recover the replication.
4.4 Database Related Issues
This section describes the most common database related issues and their resolution steps. It is recommended to attempt the resolution steps provided in this guide before contacting Oracle Support.
4.4.1 NSSF MySQL Database Access
Problem
Keyword - wait-for-db Tags - "config-server" "database" "readiness" "init" "SQLException" "access denied"Due to database accessibility issues from the NSSF service, pods will stay in the init state.
Even though some pods are up, they still keep receiving the following exception: " Cannot connect to database server java.sql.SQLException"
Reasons:
- MySQL host IP address OR MySQL-service name[in case of occne-infra] is incorrect.
- Few MySQL nodes are probably down.
- The username or password given in the secrets are not created in the database or do not have proper grant or access to service databases.
- Databases are not created correctly with the same name mentioned in
the
ocnssf_custom_values_23.4.0.yaml
file while installing NSSF.
Resolution Steps
- Check if the database IP is proper and pingable from worker
nodes of the Kubernetes cluster. Update the database IP and service
accordingly. If required, you can use floating IP as well. If the database
connectivity persists, then update the correct IP address.
In the case of OCCNE-infra, instead of mentioning IP address for MySQL connection, use FQDN for mysql-connectivity-service to connect to the database.
- Manually log in to MySQL through the same database IP as
mentioned in the
ocnssf_custom_values_23.4.0.yaml
file. In case of MySQL service name, run the following command to describe the service:
Log in to the MySQL database with all sets of IPs described in the MySQL service. If any SQL node is down, it can lead to an intermittent database query failure. So, make sure that you can log in to MySQL from all the nodes mentioned in the IP list of MySQL service describe command.kubectl describe svc <mysql-servicename> -n <namespace>
Make sure that all the MySQL nodes are up and running before installing NSSF.
- Check the existing user list into the database using SQL query:
"select user from mysql.user;"
Check if all the mentioned users in the custom-value of NSSF installation are present in the database.
Note:
Create the user with correct password as mentioned in the secret file of the NSSF. - Check the grants of all the users mentioned into the
ocnssf_custom_values_23.4.0.yaml
file by SQL query: "show grants for <username>;"If a username or password issue persists, then correctly create a user with the required password and also provide the grants as per the Oracle Communications Cloud Native Core, Network Slice Selection Function Installation, Upgrade, and Fault Recovery Guide.
- Check if the databases are created with the same name as
mentioned in the
ocnssf_custom_values_23.4.0.yaml
file for the services.Note:
Create the database as per theocnssf_custom_values_23.4.0.yaml
file. - Check if problematic pods are getting created on any one unique worker node. If yes, then may be the cause of the error can be the worker node. Try draining the problematic worker node and allowing pods to move to another node.