4 Troubleshooting Policy

This chapter provides information to troubleshoot the common errors which can be encountered during the preinstall, installation, upgrade, and rollback procedures of Policy.

Note:

The performance and capacity of the Policy system may vary based on the Call model, Feature/Interface configuration, underlying CNE and hardware environment, including but not limited to the complexity of deployed policies, policy table size , object expression and custom json usage in policy design.

4.1 Database Related Issues

This section describes the most common database related issues and their resolution steps. It is recommended to perform the resolution steps provided in this guide. If the issue still persists, then contact Oracle Support.

4.1.1 Policy MySQL DB Access

Problem

Keyword - wait-for-db

Tags - "config-server" "database" "readiness" "init" "SQLException" "access denied"

Because of database accessibility issues from the Policy service, pods will stay in the init state.

For some pods, if they come up, they will be kept on getting the exception : " Cannot connect to database server java.sql.SQLException"

Reasons:

  1. MySQL host IP address OR MySQL-service name[in case of occne-infra] is not correctly given.
  2. Few MySQL nodes are probably down.
  3. Username/Password given in the secrets are not created in the database OR not having proper grant/access to service databases.
  4. Databases are not created correctly with the same name mentioned in the custom_value file while installing Policy. - MOST LIKELY

Resolution Steps

To resolve this issue, perform the following steps:
  1. Check if the database IP is proper and pingable from worker nodes of the Kubernetes cluster. Update the database IP and service accordingly. If required, you can use floating IP as well. If the database connectivity issue is there, then please update the proper IP address.

    In the case of the CNE infrastructure, instead of mentioning IP address for MySQL connection, please use FQDN for mysql-connectivity-service to connect to the database.

  2. Manually log in to MySQL via the same database IP mentioned in a custom-value file. In case of MySQL service name, describe the service by command :
    kubectl describe svc <mysql-servicename> -n <namespace> 
    and login to the MySQL database with all sets of IPs described in the MySQL service, If any SQL node is down, it will lead to an intermittent DB query failure issue. So make sure that you can log in to MySQL from all the Nodes mentioned in the IP list of MySQL-service describe command.

    Make sure that all the MySQL nodes are up and running before installing the Policy.

  3. Check the existing user list into the database using SQL query: "select user from mysql.user;"
    Check if all the mentioned users in the custom-value of Policy installation are present in the database.

    Note:

    Create the user with proper password as mentioned in the secret file of the Policy.
  4. Check the grants of all the users mentioned into the custom_value file by SQL query: "show grants for <username>;"

    If username/password issue is there, then please correctly create the user with the required password and provide grants as per the installation guide.

  5. Check the databases are created with the same name mentioned in the custom_value file for the services.

    Note:

    Create the database as per the custom_value file.
  6. Check if problematic pods are getting created on any one unique worker node. If yes, then may be the cause of the error can be the worker node. Try draining the problematic worker node and allow pods to move to another node.

4.2 Upgrade or Rollback Failure

When Policy upgrade or rollback fails, perform the following procedure.

  1. Check the pre or post upgrade or rollback hook logs in Kibana as applicable.
    Users can filter upgrade or rollback logs using the following filters:
    • For upgrade: lifeCycleEvent=9001 or 9011
    • For rollback: lifeCycleEvent=9002

  2. Check the pod logs in Kibana to analyze the cause of failure.
  3. After detecting the cause of failure, do the following:
    • For upgrade failure:
      • If the cause of upgrade failure is database or network connectivity issue, contact your system administrator. When the issue is resolved, rerun the upgrade command.
      • If the cause of failure is not related to database or network connectivity issue and is observed during the preupgrade phase, do not perform rollback because Policy deployment remains in the source or older release.
      • If the upgrade failure occurs during the postupgrade phase, for example, post upgrade hook failure due to target release pod not moving to ready state, then perform a rollback.
    • For rollback failure: If the cause of rollback failure is database or network connectivity issue, contact your system administrator. When the issue is resolved, rerun the rollback command.
  4. If the issue persists, contact My Oracle Support.