Troubleshooting Cloud Native Core Policy (CNC Policy)

6 Troubleshooting Cloud Native Core Policy (CNC Policy)

This section provides information to troubleshoot the common error which can be encountered during the installation and upgrade of Cloud Native Core Policy (CNC Policy).

If helm install command Fails

This section covers the reasons and troubleshooting procedures if the helm install command fails.

Reasons for helm install failure:

Chart syntax issue [This issue could be shown in the few seconds]
Please resolve the chart specific things and rerun the helm install command, because in this case, no hooks should have begun.
Most possible reason [TIMEOUT]
If any job stuck in a pending/error state and not able to execute, it will result in the timeout after 5 minutes. As default timeout for helm command is "5 minutes". In this case, we have to follow the below steps to troubleshoot.
helm install command failed in case of duplicated chart
```
helm install /home/cloud-user/pcf_1.6.1/sprint3.1/ocpcf-1.6.1-sprint.3.1.tgz --name ocpcf2 --namespace ocpcf2 -f <custom-value-file>
```
Error: release ocpcf2 failed: configmaps "perfinfo-config-ocpcf2" already exists
Here, configmap 'perfinfo-config-ocpcf2' exists multiple times, while creating Kubernetes objects after pre-upgrade hooks, this will be failed. In this case also please go through the below troubleshooting steps.
Troubleshooting steps:
1. Check from describe/logs of failure pods and fix them accordingly. You need to verify what went wrong on the installation of the CNC Policy by checking the below points:
  For the PODs which were not started, run the following command to check the failed pods:
```
kubectl describe pod <pod-name>  -n <release-namespace>
```
  For the PODs which were started but failed to come into "READY"state, run the following command to check the failed pods:
```
kubectl describe logs <pod-name>  -n <release-namespace>
```
2. Execute the below command to get kubernetes objects:
```
kubectl get all -n <release_namespace>
```
  This gives a detailed overview of which objects are stuck or in a failed state.
3. Execute the below command to delete all kubernetes objects:
```
kubectl delete all --all -n <release_namespace>
```
4. Execute the below command to delete all current configmaps:
```
kubectl delete cm --all -n <release-namespace>
```
5. Execute the below command to cleanup the databases created by the helm install command and create the database again:
```
DROP DATABASE IF EXISTS occnp_audit_service;
DROP DATABASE IF EXISTS occnp_config_server;
DROP DATABASE IF EXISTS occnp_pcf_am;
DROP DATABASE IF EXISTS occnp_pcf_sm;
DROP DATABASE IF EXISTS occnp_pcf_user;
DROP DATABASE IF EXISTS occnp_pcrf_core;
DROP DATABASE IF EXISTS occnp_release;
DROP DATABASE IF EXISTS occnp_binding;
CREATE DATABASE IF NOT EXISTS occnp_audit_service;
CREATE DATABASE IF NOT EXISTS occnp_config_server;
CREATE DATABASE IF NOT EXISTS occnp_pcf_am;
CREATE DATABASE IF NOT EXISTS occnp_pcf_sm;
CREATE DATABASE IF NOT EXISTS occnp_pcf_user;
CREATE DATABASE IF NOT EXISTS occnp_pcrf_core;
CREATE DATABASE IF NOT EXISTS occnp_release;
CREATE DATABASE IF NOT EXISTS occnp_binding;
```
6. Execute the below command :
```
helm ls --all
```
  If this is in a failed state, please purge the namespace using the command
```
helm delete --purge <release_namespace>
```
  Note:
  If the execution of this command is taking more time, run the below command parallelly in another session to clear all the delete jobs.
```
while true; do kubectl delete jobs --all -n <release_namespace>; sleep 5;done
```
  Monitor the below command:
```
helm delete --purge <release_namespace>
```
  Once that is succeeded, press "ctrl+c" to stop the above script.
7. After the database cleanup and creation of the database again, run the helm install command.

You can use Data Collector tool to fetch Network Function (NF) specific logs, metrics, traces, alerts from production environment integrated with Elastic search and Prometheus. See Cloud Native Core NF Data Collector User's Guide for more information.