7 Troubleshooting Policy Control Function

This section provides information to troubleshoot the common error which can be encountered during the installation and upgrade of Policy Control Function.

If helm install command Fails

This section covers the reasons and troubleshooting procedures if the helm install command fails.

Reasons for helm install failure:
  • Chart syntax issue [This issue could be shown in the few seconds]

    Please resolve the chart specific things and rerun the helm install command, because in this case, no hooks should have begun.

  • Most possible reason [TIMEOUT]

    If any job stuck in a pending/error state and not able to execute, it will result in the timeout after 5 minutes. As default timeout for helm command is "5 minutes". In this case, we have to follow the below steps to troubleshoot.

  • helm install command failed in case of duplicated chart
    helm install /home/cloud-user/pcf_1.6.0/sprint3.1/ocpcf-1.6.0-sprint.3.1.tgz --name ocpcf2 --namespace ocpcf2 -f cust-ashish.yaml
    Error: release ocpcf2 failed: configmaps "perfinfo-config-ocpcf2" already exists
    Here, configmap 'perfinfo-config-ocpcf2' exists multiple times, while creating Kubernetes objects after pre-upgrade hooks, this will be failed. In this case also please go through the below troubleshooting steps.
    Troubleshooting steps:
    1. Execute the below command to cleanup the databases created by the helm install command :
      
      DROP DATABASE IF EXISTS `pcf_smservice`;
      DROP DATABASE IF EXISTS `pcf_amservice`;
      DROP DATABASE IF EXISTS `pcf_userservice`;
      DROP DATABASE IF EXISTS `ocpm_config_server`;
      DROP DATABASE IF EXISTS `oc5g_audit_service`;
      DROP DATABASE IF EXISTS `pcf_release`;
    2. Execute the below command to get kubernetes objects:
      kubectl get all -n <release_namespace>
      This gives a detailed overview of which objects are stuck or in a failed state.
    3. Execute the below command to delete all kubernetes objects:
      kubectl delete all --all -n <release_namespace>
    4. Execute the below command :
      helm ls --all
      If this is in a failed state, please purge the namespace using the command
      helm delete --purge <release_namespace>

      Note:

      If the execution of this command is taking more time, run the below command parallelly in another session to clear all the delete jobs.
      while true; do kubectl delete jobs --all -n <release_namespace>; sleep 5;done
      Monitor the below command:
      helm delete --purge <release_namespace>
      Once that is succeeded, press "ctrl+c" to stop the above script.
    5. After the database cleanup, run the helm install command.

If helm upgrade command Fails

This section covers the reasons and troubleshooting procedures if the helm upgrade command fails.

Reasons for helm upgrade failure:
  • Chart syntax issue [This issue could be shown in the few seconds]

    Please resolve the chart specific things and rerun the helm upgrade command, because in this case, no hooks should have begun.

  • Most possible reason [TIMEOUT]

    If any job stuck in a pending/error state and not able to execute, it will result in the timeout after 5 minutes. As default timeout for helm command is "5 minutes". In this case, we have to follow the below steps to troubleshoot.

  • Helm upgrade failed in case of duplicated chart
    helm upgrade upgradetestpcf <helm-chart> -f custom-value.yaml
    Error: release ocpcf2 failed: configmaps "perfinfo-config-ocpcf2" already exists
    Here, configmap 'perfinfo-config-ocpcf2' exists multiple times, while creating Kubernetes objects after pre-upgrade hooks, this will be failed. In this case also please go through the below troubleshooting steps.
    Troubleshooting steps:
    1. Execute the below command to cleanup upgrade jobs:
      kubectl get jobs -n <release_namespace>
      This gives a detailed overview of all the objects. Delete all the kubernetes objects.
    2. Execute the below command to cleanup kubernetes objects:
      while true; do kubectl delete jobs --all -n <release_namespace>; sleep 5;done

      Execute the above script until all jobs are killed from given release namespace.

    3. Execute the below command to refill the version configurations for a clean upgrade:
      TRUNCATE TABLE `pcf_release`.`release_config`;
      INSERT INTO `pcf_release`.`release_config` values('public.hook.configserver','{"currentVersion" : 100500,"rollbackVersion" : -1}');
      INSERT INTO `pcf_release`.`release_config` values('public.hook.smservice','{"currentVersion" : 100500,"rollbackVersion" : -1}');
      INSERT INTO `pcf_release`.`release_config` values('public.hook.amservice','{"currentVersion" : 100500,"rollbackVersion" : -1}');
      INSERT INTO `pcf_release`.`release_config` values('public.hook.auditservice','{"currentVersion" : 100500,"rollbackVersion" : -1}');
      INSERT INTO `pcf_release`.`release_config` values('public.hook.userservice','{"currentVersion" : 100500,"rollbackVersion" : -1}');
    4. Fix the issues that happened in the helm upgrade. And run the helm upgrade command again.