Troubleshooting SCP

Generic Checklist

The following sections provide generic checklist for troubleshooting tips.

Deployment related tips

Perform the following checks before the deployment:

Are OCSCP deployment, pods and services created, running and available?
Execute following the command:
```
# kubectl -n <namespace> get deployments,pods,svc
```
Inspect the output, check the following columns:
- AVAILABLE of deployment
- READY, STATUS and RESTARTS of pod
- PORT(S) of service
Is the correct image used and the correct environment variables set in the deployment?
Execute following the command:
```
# kubectl -n <namespace> get deployment <deployment-name> -o yaml
```
Check if the micro-services can access each other via REST interface.
Execute following command:
```
# kubectl -n <namespace> exec <pod name> -- curl <uri>
```
Example:
```
# kubectl -n scp-svc exec $(kubectl -n scp-svc get pods -o name|cut -d'/' -f2|grep nfs) --
        curl http://ocscp-nfregistration:8080/nscp-nfm/v1/nf-instances
```
```
# kubectl -n scp-svc exec $(kubectl -n scp-svc get pods -o name|cut -d'/' -f2|grep nfr) --
        curl http://ocscp-nfsubscription:8080/nscp-nfm/v1/nf-instances
```
Note:
These commands are in their simple form and display the logs only if there is 1 scp<registration> and nfs<subscription> pod deployed.

Application related tips

Check the application logs and look for exceptions, by executing the following command:

# kubectl -n <namespace> logs -f <pod name>

You can use '-f' to follow the logs or 'grep' for specific pattern in the log output.

Example:

# kubectl -n scp-svc logs -f $(kubectl -n scp-svc get pods -o name|cut -d'/' -f2|grep nfr)
 # kubectl -n scp-svc logs -f $(kubectl -n scp-svc get pods -o name|cut -d'/' -f2|grep nfs)

Note:

These commands are in their simple form and display the logs only if there is 1 scp<registration> and nf<subscription> pod deployed.

Helm Install Failure

This section describes the various scenarios in which helm install might fail. Following are some of the scenarios:

Incorrect image name in ocscp-custom-values files

Problem

helm install might fail if incorrect image name is provided in the ocscp-custom-values file.

Error Code/Error Message

When kubectl get pods -n <ocscp_namespace> is executed, the status of the pods might be ImagePullBackOff or ErrImagePull.

Solution

Perform the following steps to verify and correct the image name:

Edit ocscp-custom-values file and provide release specific image name and tags.
Execute helm install command.
Execute kubectl get pods -n <ocscp_namespace> to verify if the status of all the pods is Running.

Docker registry is configured incorrectly

Problem

helm install might fail if docker registry is not configured in all primary and secondary nodes.

Error Code/Error Message

When kubectl get pods -n <ocscp_namespace> is executed, the status of the pods might be ImagePullBackOff or ErrImagePull.

Solution

Configure docker registry on all primary and secondary nodes.

Continuous Restart of Pods

Problem

helm install might fail if MySQL primary and secondary hosts may not be configured properly in ocscp-custom-values.yaml.

Error Code/Error Message

When kubectl get pods -n <ocscp_namespace> is executed, the pods restart count increases continuously.

Solution

MySQL servers(s) may not be configured properly. Refer to Installation Tasks for more information on MySQL configuration.

Custom Value File Parse Failure

This section explains troubleshooting procedure in case of failure during parsing custom values file.

Problem

Not able to parse ocscp-custom-values-x.x.x.yaml, while running helm install.

Error Code/Error Message

Error: failed to parse ocscp-custom-values-x.x.x.yaml: error converting YAML to JSON: yaml

Symptom

While creating the ocscp-custom-values-x.x.x.yaml file, if the above mentioned error is received, it means that the file is not created properly. The tree structure may not have been followed and/or there may also be tab spaces in the file.

Solution

Following the procedure as mentioned:

Download the latest SCP templates zip file from OHC. Refer to Installation Tasks for more information.
Follow the steps mentioned in the Installation Tasks section.

Curl HTTP2 Not Supported

Problem

curl http2 is not supported on the system.

Error Code/Error Message

Unsupported protocol error is thrown or connection is established with HTTP/1.1 200 OK

Symptom

If unsupported protocol error is thrown or connection is established with http1.1, it is an indication that curl http2 support may not be present on your machine.

Solution

Following is the procedure to install curl with HTTP2 support:

Make sure git is installed:
```
$ sudo yum install git -y 
```

Install nghttp2:

$ git clone https://github.com/tatsuhiro-t/nghttp2.git
$ cd nghttp2

$ autoreconf -i
$ automake
$ autoconf

$ ./configure
$ make
$ sudo make install

$ echo '/usr/local/lib' > /etc/ld.so.conf.d/custom-libs.conf

$ ldconfig

Install the latest Curl:

$ wget http://curl.haxx.se/download/curl-7.46.0.tar.bz2  (NOTE: Check for latest version during Installation)
$ tar -xvjf curl-7.46.0.tar.bz2
$ cd curl-7.46.0

$ ./configure --with-nghttp2=/usr/local --with-ssl

$ make

$ sudo make install

$ sudo ldconfig

Make sure HTTP2 is added in features by executing the following command:

 $ curl --http2-prior-knowledge -v "<http://10.75.204.35:32270/nscp-disc/v1/nf-instances?requester-nf-type=AMF&target-nf-type=SMF>"

Kubernetes Node Failure

Problem

Kubernetes nodes goes down.

Error Code/Error Message

"NotReady" status is displayed against the Kubernetes node.

Symptom

On running the command kubectl get nodes, "NotReady" status is displayed, as shown below:

Figure 6-1 Kubernetes Nodes Output

Solution

Following is the procedure to identify the kubernetes nodes failure:

Execute the following command to describe the node:
kubectl describe node <kubernete_node_name>

Example: kubectl describe node k8s-1.odyssey.morrisville.us.lab.oracle.com
Check Nodes utilization by running the command:
kubectl top nodes

SCP DB goes into deadlock state

Problem

MySQL locks gets struck.

Error Code/Error Message

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Symptom

Unable to access MySQL.

Solution

Following is the procedure to remove the deadlock as follows:

Execute the following command on each SQL node:

SELECT
CONCAT('KILL ', id, ';')
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE `User` = <DbUsername>
AND `db` = <DbName>;

This command will retrieve the list of commands to kill each connections.

Example:

select
 CONCAT('KILL ', id, ';')
 FROM INFORMATION_SCHEMA.PROCESSLIST
 where `User` = 'scpuser'
      AND `db` = 'ocscpdb';
+--------------------------+
| CONCAT('KILL ', id, ';') |
+--------------------------+
| KILL 204491;             |
| KILL 200332;             |
| KILL 202845;             |
+--------------------------+
3 rows in set (0.00 sec)

Execute the kill command on each SQL node.

Tiller Pod Failure

Problem

Tiller Pod is not ready to run helm install.

Error Code/Error Message

The error 'could not find a ready tiller pod' message is received.

Symptom

When helm ls is executed, 'could not find a ready tiller pod' message is received.

Solution

Following is the procedure to install helm and tiller using the below commands:

Delete the pre-installed helm:

kubectl delete svc tiller-deploy -n kube-system
kubectl delete deploy tiller-deploy -n kube-system

Install helm and tiller using this commands:

helm init --client-only
helm plugin install https://github.com/rimusz/helm-tiller
helm tiller install
helm tiller start kube-system