4 Troubleshooting NRF
This chapter provides information to troubleshoot the common errors that can be encountered during the preinstall, installation, upgrade, and rollback procedures of Oracle Communications Cloud Native Core, Network Repository Function (NRF).
Following are the troubleshooting procedures:
- Helm Install Failure
- Custom Value File Parse Failure
- Helm Test Error Scenarios
- Upgrade or Rollback Failure
Note:
kubectl
commands might vary based on the platform deployment. Replace
kubectl
with Kubernetes environment-specific command line tool to
configure Kubernetes resources through kube-api server. The instructions provided in this
document are as per the Oracle Communications Cloud Native Core, Cloud Native Environment
(CNE) version of kube-api server.
Caution:
User, computer and applications, and character encoding settings may cause an issue when copy-pasting commands or any content from PDF. PDF reader version also affects the copy-pasting functionality. It is recommended to verify the copy-pasted content, especially when hyphens or any special characters are part of the copied content.Note:
The performance and capacity of the NRF system may vary based on the call model, Feature or Interface configuration, and underlying CNE and hardware environment.4.1 Generic Checklist
The following sections provide a generic checklist for troubleshooting tips.
Deployment related tips
- Are NRF deployment, pods, and services created?
Are NRF deployment, pods, and services running and available?
Run the following command:# kubectl -n <namespace> get deployments,pods,svc
Inspect the output, check the following columns:- AVAILABLE of deployment
- READY, STATUS, and RESTARTS of a pod
- PORT(S) of service
- Is the correct image used?
Is the correct environment variables set in the deployment?
Run the following command:# kubectl -n <namespace> get deployment <deployment-name> -o yaml
Inspect the output, check the environment and image.# kubectl -n nrf-svc get deployment ocnrf-nfregistration -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"ocnrf-nfregistration","namespace":"nrf-svc"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"ocnrf-nfregistration"}},"template":{"metadata":{"labels":{"app":"ocnrf-nfregistration"}},"spec":{"containers":[{"env":[{"name":"MYSQL_HOST","value":"mysql"},{"name":"MYSQL_PORT","value":"3306"},{"name":"MYSQL_DATABASE","value":"nrfdb"},{"name":"NRF_REGISTRATION_ENDPOINT","value":"ocnrf-nfregistration"},{"name":"NRF_SUBSCRIPTION_ENDPOINT","value":"ocnrf-nfsubscription"},{"name":"NF_HEARTBEAT","value":"120"},{"name":"DISC_VALIDITY_PERIOD","value":"3600"}],"image":"dsr-master0:5000/ocnrf-nfregistration:latest","imagePullPolicy":"Always","name":"ocnrf-nfregistration","ports":[{"containerPort":8080,"name":"server"}]}]}}}} creationTimestamp: 2018-08-27T15:45:59Z generation: 1 name: ocnrf-nfregistration namespace: nrf-svc resourceVersion: "2336498" selfLink: /apis/extensions/v1beta1/namespaces/nrf-svc/deployments/ocnrf-nfregistration uid: 4b82fe89-aa10-11e8-95fd-fa163f20f9e2 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: ocnrf-nfregistration strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: ocnrf-nfregistration spec: containers: - env: - name: MYSQL_HOST value: mysql - name: MYSQL_PORT value: "3306" - name: MYSQL_DATABASE value: nrfdb - name: NRF_REGISTRATION_ENDPOINT value: ocnrf-nfregistration - name: NRF_SUBSCRIPTION_ENDPOINT value: ocnrf-nfsubscription - name: NF_HEARTBEAT value: "120" - name: DISC_VALIDITY_PERIOD value: "3600" image: dsr-master0:5000/ocnrf-nfregistration:latest imagePullPolicy: Always name: ocnrf-nfregistration ports: - containerPort: 8080 name: server protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: availableReplicas: 1 conditions: - lastTransitionTime: 2018-08-27T15:46:01Z lastUpdateTime: 2018-08-27T15:46:01Z message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: 2018-08-27T15:45:59Z lastUpdateTime: 2018-08-27T15:46:01Z message: ReplicaSet "ocnrf-nfregistration-7898d657d9" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing observedGeneration: 1 readyReplicas: 1 replicas: 1 updatedReplicas: 1
- Check if the microservices can access each other using REST interface.
Run the following command:
# kubectl -n <namespace> exec <pod name> -- curl <uri>
Example:# kubectl -n nrf-svc exec ocnrf-nfregistration-44f4d8f5d5-6q92i -- curl http://ocnrf-nfregistration:8080/nnrf-nfm/v1/nf-instances
Note:
These commands are in their simple form and display the logs only if there is a single nrf<registration> and nf<subscription> pod deployed.
Application related tips
# kubectl -n <namespace> logs -f <pod name>
You can use '-f' to follow the logs or 'grep' for a specific pattern in the log output.
Example:
# kubectl -n nrf-svc logs -f $(kubectl -n nrf-svc get pods -o name|cut -d'/' -f2|grep nfr)
# kubectl -n nrf-svc logs -f $(kubectl -n nrf-svc get pods -o name|cut -d'/' -f2|grep nfs)
Note:
These commands are in their simple form and display the logs only if there is 1 nrf<registration> and nf<subscription> pod deployed.4.2 Deployment Related Issues
This section describes the most common deployment related issues and their resolution steps. It is recommended to perform the resolution steps provided in this guide. If the issue still persists, then contact My Oracle Support (MOS).
4.2.1 Installation
This section describes the common installation related issues and their resolution steps.
4.2.1.1 Helm Install Failure
helm install
might fail. Following are some of the scenarios:
4.2.1.1.1 Incorrect image name in ocnrf-custom-values files
Problem
helm install
might fail if an incorrect image name is
provided in the ocnrf-custom-values.yaml file.
Error Code/Error Message
When kubectl get pods -n <ocnrf_namespace>
is
run, the status of the pods might be ImagePullBackOff or ErrImagePull.
For example:
$ kubectl get pods -n ocnrf
NAME READY STATUS RESTARTS AGE
ocnrf-egressgateway-d6567bbdb-9jrsx 2/2 ImagePullBackOff 0 30h
ocnrf-egressgateway-d6567bbdb-ntn2v 2/2 Running 0 30h
ocnrf-ingressgateway-754d645984-h9vzq 2/2 Running 0 30h
ocnrf-ingressgateway-754d645984-njz4w 2/2 Running 0 30h
ocnrf-nfaccesstoken-59fb96494c-k8w9p 1/1 Running 0 30h
ocnrf-nfaccesstoken-49fb96494c-k8w9q 1/1 Running 0 30h
ocnrf-nfdiscovery-84965d4fb9-rjxg2 1/1 Running 0 30h
ocnrf-nfdiscovery-94965d4fb9-rjxg3 1/1 Running 0 30h
ocnrf-nfregistration-64f4d8f5d5-6q92j 1/1 Running 0 30h
ocnrf-nfregistration-44f4d8f5d5-6q92i 1/1 Running 0 30h
ocnrf-nfsubscription-5b6db965b9-gcvpf 1/1 Running 0 30h
ocnrf-nfsubscription-4b6db965b9-gcvpe 1/1 Running 0 30h
ocnrf-nrfauditor-67b676dd87-xktbm 1/1 Running 0 30h
ocnrf-nrfconfiguration-678fddc5f5-c5htj 1/1 Running 0 30h
ocnrf-appinfo-8b7879cdb-jds4r 1/1 Running 0 30h
Solution
- Check ocnrf-custom-values.yaml file has the release
specific image name and tags.
For NRF images details, see "Customizing NRF" in Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.vi ocnrf-custom-values-<release-number>
- Edit ocnrf-custom-values file in case the release specific image name and tags must be modified.
- Save the file.
- Run
helm install
command. For helm install command, see the "Customizing NRF" section in Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide. - Run
kubectl get pods -n <ocnrf_namespace>
to verify if the status of all the pods is Running.For example:
$ kubectl get pods -n ocnrf
NAME READY STATUS RESTARTS AGE ocnrf-egressgateway-d6567bbdb-9jrsx 2/2 Running 0 30h ocnrf-egressgateway-d6567bbdb-ntn2v 2/2 Running 0 30h ocnrf-ingressgateway-754d645984-h9vzq 2/2 Running 0 30h ocnrf-ingressgateway-754d645984-njz4w 2/2 Running 0 30h ocnrf-nfaccesstoken-59fb96494c-k8w9p 1/1 Running 0 30h ocnrf-nfaccesstoken-49fb96494c-k8w9q 1/1 Running 0 30h ocnrf-nfdiscovery-84965d4fb9-rjxg2 1/1 Running 0 30h ocnrf-nfdiscovery-94965d4fb9-rjxg3 1/1 Running 0 30h ocnrf-nfregistration-64f4d8f5d5-6q92j 1/1 Running 0 30h ocnrf-nfregistration-44f4d8f5d5-6q92i 1/1 Running 0 30h ocnrf-nfsubscription-5b6db965b9-gcvpf 1/1 Running 0 30h ocnrf-nfsubscription-4b6db965b9-gcvpe 1/1 Running 0 30h ocnrf-nrfauditor-67b676dd87-xktbm 1/1 Running 0 30h ocnrf-nrfconfiguration-678fddc5f5-c5htj 1/1 Running 0 30h ocnrf-appinfo-8b7879cdb-jds4r 1/1 Running 0 30h
4.2.1.1.2 Docker registry is configured incorrectly
Problem
helm install
might fail if the docker registry is not
configured in all primary and secondary nodes.
Error Code or Error Message
When kubectl get pods -n <ocnrf_namespace>
is
performed, the status of the pods might be ImagePullBackOff or ErrImagePull.
For example:
$ kubectl get pods -n ocnrf
NAME READY STATUS RESTARTS AGE
ocnrf-egressgateway-d6567bbdb-9jrsx 2/2 ImagePullBackOff 0 30h
ocnrf-egressgateway-d6567bbdb-ntn2v 2/2 Running 0 30h
ocnrf-ingressgateway-754d645984-h9vzq 2/2 Running 0 30h
ocnrf-ingressgateway-754d645984-njz4w 2/2 Running 0 30h
ocnrf-nfaccesstoken-59fb96494c-k8w9p 1/1 Running 0 30h
ocnrf-nfaccesstoken-49fb96494c-k8w9q 1/1 Running 0 30h
ocnrf-nfdiscovery-84965d4fb9-rjxg2 1/1 Running 0 30h
ocnrf-nfdiscovery-94965d4fb9-rjxg3 1/1 Running 0 30h
ocnrf-nfregistration-64f4d8f5d5-6q92j 1/1 Running 0 30h
ocnrf-nfregistration-44f4d8f5d5-6q92i 1/1 Running 0 30h
ocnrf-nfsubscription-5b6db965b9-gcvpf 1/1 Running 0 30h
ocnrf-nfsubscription-4b6db965b9-gcvpe 1/1 Running 0 30h
ocnrf-nrfauditor-67b676dd87-xktbm 1/1 Running 0 30h
ocnrf-nrfconfiguration-678fddc5f5-c5htj 1/1 Running 0 30h
ocnrf-appinfo-8b7879cdb-jds4r 1/1 Running 0 30h
Solution
Configure docker registry on all primary and secondary nodes. For more information on configuring the docker registry, see Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
4.2.1.1.3 Continuous Restart of Pods
Problem
helm install
might fail if the MySQL primary and
secondary hosts are not configured properly in
ocnrf-custom-values.yaml
.
Error Code/Error Message
When kubectl get pods -n <ocnrf_namespace>
is
performed, the pods restart count increases continuously.
For example:
$ kubectl get pods -n
ocnrf
NAME READY STATUS RESTARTS AGE
ocnrf-egressgateway-d6567bbdb-9jrsx 2/2 Running 0 30h
ocnrf-egressgateway-d6567bbdb-ntn2v 2/2 Running 0 30h
ocnrf-ingressgateway-754d645984-h9vzq 2/2 Running 0 30h
ocnrf-ingressgateway-754d645984-njz4w 2/2 Running 2 30h
ocnrf-nfaccesstoken-59fb96494c-k8w9p 1/1 Running 0 30h
ocnrf-nfaccesstoken-49fb96494c-k8w9q 1/1 Running 0 30h
ocnrf-nfdiscovery-84965d4fb9-rjxg2 1/1 Running 0 30h
ocnrf-nfdiscovery-94965d4fb9-rjxg3 1/1 Running 0 30h
ocnrf-nfregistration-64f4d8f5d5-6q92j 1/1 Running 0 30h
ocnrf-nfregistration-44f4d8f5d5-6q92i 1/1 Running 0 30h
ocnrf-nfsubscription-5b6db965b9-gcvpf 1/1 Running 0 30h
ocnrf-nfsubscription-4b6db965b9-gcvpe 1/1 Running 0 30h
ocnrf-nrfauditor-67b676dd87-xktbm 1/1 Running 0 30h
ocnrf-nrfconfiguration-678fddc5f5-c5htj 1/1 Running 0 30h
ocnrf-appinfo-8b7879cdb-jds4r 1/1 Running 0 30h
Solution
MySQL servers(s) may not be configured properly according to the preinstallation steps. For configuring MySQL servers, see the "Configuring Database, Creating Users, and Granting Permissions" section in Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
4.2.1.2 Custom Value File Parse Failure
ocnrf-custom-values.yaml
file.
Problem
Unable to parse ocnrf-custom-values-x.x.x.yaml, while running
helm install
.
Error Code/Error Message
Error: failed to parse ocnrf-custom-values-x.x.x.yaml: error converting YAML to JSON: yaml
Symptom
While creating the ocnrf-custom-values-x.x.x.yaml file, if the aforementioned error is received, it means that the file is not created properly. The tree structure may not have been followed or there may also be tab spaces in the file.
Solution
- Download the latest NRF templates zip file from My Oracle Support. For more information, see the "Downloading NRF package" section in Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
- Follow the steps mentioned in the "Installation Tasks" section in Oracle Communications Cloud Native Core, Network Repository Function Installation, Upgrade, and Fault Recovery Guide.
4.2.2 Postinstallation
This section describes the common postinstallation related issues and their resolution steps.
4.2.2.1 Helm Test Error Scenarios
Following are the error scenarios that may be identified using helm test.
- Run the following command to get the Helm Test pod
name:
kubectl get pods -n <deployment-namespace>
- When a helm test is performed, a new helm test pod is created. Check for the Helm Test pod that is in an error state.
- Get the logs using the following
command:
kubectl logs <podname> -n <namespace>
Example:kubectl get <helm_test_pod> -n ocnrf
For further assistance, collect the logs and contact MOS.
4.3 Upgrade or Rollback Failure
When NRF upgrade or rollback fails, perform the following procedure.
- Check the pre or post upgrade logs or rollback hook logs in Kibana as
applicable.
Users can filter upgrade or rollback logs using the following filters:
- For upgrade: lifeCycleEvent=9001
-
For rollback: lifeCycleEvent=9002
{ "time_stamp":"2021-08-23 06:45:57.698+0000", "thread":"main", "level":"INFO", "logger":"com.oracle.cgbu.cne.ocnrf.hooks.releases.ReleaseHelmHook_1_14_1", "message":"{logMsg=Starting Pre-Upgrade hook Execution, lifeCycleEvent=9001 | Upgrade, sourceRelease=101400, targetRelease=101401}", "loc":"com.oracle.cgbu.ocnrf.common.utils.EventSpecificLogger.submit(EventSpecificLogger.java:94)" }
- Check the pod logs in Kibana to analyze the cause of failure.
- After detecting the cause of failure, do the following:
- For upgrade failure:
- If the cause of upgrade failure is database or network connectivity issue, contact your system administrator. When the issue is resolved, rerun the upgrade command.
- If the cause of failure occurs during the preupgrade phase, do not perform the roll back.
- If the upgrade failure occurs during the postupgrade phase, for example, post upgrade hook failure due to target release pod not moving to ready state, then perform a rollback.
- For rollback failure: If the cause of rollback failure is database or network connectivity issue, contact your system administrator. When the issue is resolved, rerun the rollback command.
- For upgrade failure:
- If the issue persists, contact My Oracle Support.
4.4 Troubleshooting CDS
Service Operations responses doesn't contain Remote NRF Set Data
CDS is down
- When the CDS is down, the OcnrfCacheDataServiceDown alert is
raised. All the NRF core microservices fall back to cnDBTier for serving the
requests.
In this case, the NRF instance has the local set georeplicated view and not the segment-level view.
- Check the resolution steps to resolve the OcnrfCacheDataServiceDown alert.
- Once the alert is cleared and CDS is in the Running state, the NRF core microservices connect to CDS to serve the requests.
- In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.
CDS unable to synchronize with the remote NRF set
- If the CDS from a set is unable to synchronize the in-memory cache from the remote NRF’s CDS, then the CDS attempts to reach healthy remote NRFs to synchronize the in-memory cache.
- The retry attempt to the same remote NRF is performed based on the configuration in Egress Gateway.
- The reroute from local NRF is based on the NRF Growth feature configuration. For more information about the feature configuration, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
- If all the remote NRFs are not reachable, then the CDS from NRF uses the last known data from the remote set to serve the service requests.
Incorrect Feature Configuration
- If the CDS from a set is unable to synchronize the in-memory cache from the remote NRF’s CDS, then the CDS attempts to reach healthy remote NRFs to synchronize the in-memory cache.
- Check the NRF Growth feature configuration as mentioned in the REST configuration. For more information about the feature configuration, see Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.
- In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.
CDS Unreachable
- Check for the OcnrfDatabaseFallbackUsed alert.
If present, wait for 30 seconds to 1 minute and retry till the alerts are cleared. If the alerts are not cleared, see alerts for resolution steps.
- In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.
CDS unable to synchronization with the local cnDBTier
- If the CDS is unable to synchronize the data with the local cnDBTier, then the CDS marks itself as not in the ready state.
- With CDS not being ready, the NRF core services mark itself as not ready forcing the NF consumers and producers to move to mated and healthy NRFs.
- The CDS to CDS synchronization request also fails so that the NRFs in the peer set move to healthy NRFs for updated data synchronization.
NF Records present in NRF after Deregistration
- Check for the following alerts:
- OcnrfRemoteSetNrfSyncFailed
- OcnrfSyncFailureFromAllNrfsOfAnyRemoteSet
- OcnrfSyncFailureFromAllNrfsOfAllRemoteSets
If present, wait for 30 seconds to 1 minute and retry till the alerts are cleared. If the alerts are not cleared, see alerts for resolution steps.
- Check the
nrfHostConfigList
configuration in the local NRF set. - In case the issue persists, capture all the outputs for the above steps and contact My Oracle Support.
4.5 TLS Connection Failure
This section describes the TLS related issues and their resolution steps. It is recommended to attempt the resolution steps provided in this guide before contacting Oracle Support.
Problem: Handshake is not established between NRFs.
Scenario: When the client version is TLS 1.2 and the server version is TLS 1.3
The client supported protocol versions[TLSv1.2] are not accepted by
server preferences [TLSv1.3]
Client Error Message
Received fatal alert: protocol_version
Scenario: When the client version is TLS 1.3 and the server version is TLS1.2
The client supported protocol versions[TLSv1.3]are not accepted by
server preferences [TLSv1.2]
Client Error Message
Received fatal alert: protocol_version
Solution:
If the error logs have the SSL exception, do the following:
Check the TLS version of both NRFs, if both support different and single TLS versions, (that is, NRF1 supports TLS 1.2 only and NRF2 supports TLS 1.3 only or vice versa), handshake fails. Ensure that the TLS version is same for both NRFs or revert to default configuration for both NRFs. The TLS version communication supported are:Table 4-1 TLS Version Used
Client TLS Version | Server TLS Version | TLS Version Used |
---|---|---|
TLS 1.2, TLS 1.3 | TLS 1.2, TLS 1.3 | TLS 1.3 |
TLS 1.3 | TLS 1.3 | TLS 1.3 |
TLS 1.3 | TLS 1.2, TLS 1.3 | TLS 1.3 |
TLS 1.2, TLS 1.3 | TLS 1.3 | TLS 1.3 |
TLS 1.2 | TLS 1.2, TLS 1.3 | TLS 1.2 |
TLS 1.2, TLS 1.3 | TLS 1.2 | TLS 1.2 |
Check the cipher suites being supported by both NRFs, it should be either the same or should have common cipher suites present. If not, revert to default configuration.
Problem: Pods not coming up after populating the
clientDisabledExtension
or
serverDisabledExtension
parameter.
- Check the values given in the Helm parameters. The values listed
cannot be added in these parameters:
- supported_versions
- key_share
- supported_groups
- signature_algorithms
- pre_shared_key
Problem: Pods not coming up after populating
clientSignatureSchemes
parameter.
Solution:
- Check the values given in the Helm parameters.
- Value listed below should not be removed from these parameters:
- rsa_pkcs1_sha512
- rsa_pkcs1_sha384
- rsa_pkcs1_sha256
Problem: Connection Failure Due to Cipher Mismatch: NRF -Client and Producer Server for TLS 1.3
Scenario: The NRF client is configured to request a connection using TLS 1.3 with specific ciphers that are not supported by the producer server. As a result, the connection fails due to the cipher mismatch, preventing secure communication between the client and server.
No appropriate protocol(protocol is disabled or cipher suites are inappropriate)
Received fatal alert: handshake failure
- Ensure that the following cipher suites are configured for the
NRF client to use with TLS 1.3:
- TLS_AES_128_GCM_SHA256
- TLS_AES_256_GCM_SHA384
- TLS_CHACHA20_POLY1305_SHA256
- Verify TLS 1.3 for secure communication between the NRF- Client and the producer server to ensure that the issue has been resolved.
Problem: Connection Failure for TLS 1.3 Due to Expired Certificates.
Scenario: The NRF -Client is attempting to establish a connection using TLS 1.3, but the connection fails due to expired certificates. Specifically, the NRF -Client is presenting TLS 1.3 certificates that have passed their validity period, which causes the Producer server to reject the connection.
Service Unavailable for producer due to Certificate Expired
Received fatal alert: handshake failure
- Verify the validity of the current certificate.
- If the certificate has expired, renew it or extend its validity.
- Attempt to establish a connection between the NRF client and the Producer server to confirm that the issue has been resolved.
- Verify the TLS 1.3 for secure communication.