Using Node Doctor to Troubleshoot Worker Node Issues

If a cluster has a worker node that is in a state other than Active or Running, use the Node Doctor utility to troubleshoot the issues.

Node Doctor scans a worker node and reports the health status of the node. Node Doctor can do the following tasks:

Use Node Doctor only on worker nodes. Because Node Doctor is installed on OKE images, Node Doctor is also available on cluster control plane nodes. Do not use Node Doctor on control plane nodes.

Check the Oracle Private Cloud Appliance Release Notes for the release in which Node Doctor was first delivered. If your node pools were created on that release or later, then you can proceed with the instructions in this topic. If your worker node image is from an earlier release, then that node does not have access to Node Doctor. Note that if your Private Cloud Appliance is running a release that includes Node Doctor, then you could use node cycling to update older worker node images. See Node Cycling an OKE Node Pool.

Connect to the Worker Node Using SSH

Perform the following steps to connect to the worker node that you want to troubleshoot.

  1. Ensure that you have a private and public SSH key pair.

    You must have the private key that goes with the public key that was added to the node when the node was created.

  2. Get the node user name. OKE images have the initial user name opc configured.

  3. Get the IP address of the worker node that you need to troubleshoot.

    The IP address is on the Networking tab of the node details page in the Compute Web UI.

    • If the node has a public IP address, use the public IP address.

    • If the node is on a private IP, then connect to the node via the bastion host.

      If a bastion host is not available, see Creating a Bastion.

  4. Enter the following command at a shell prompt on your local system (public IP address) or on the bastion host (private IP address):

    ssh -i private_key_file username@ip-address
    • private_key_file. The full path and name of the file that contains the private SSH key that goes with the public key that was added to the node when the node was created.

    • username. The default user name for the node. This value probably is opc.

    • ip-address. The node IP address that you got in Step 3.

  5. Ensure that you have permission to execute the following file:

    /usr/local/bin/node-doctor.sh

Print Troubleshooting Information

While logged in to the worker node as described in Connect to the Worker Node Using SSH, enter the following command to print information that identifies potential problem areas:

$ sudo /usr/local/bin/node-doctor.sh --check

Use the following command to see more options:

$ sudo /usr/local/bin/node-doctor.sh --help

Create a Support Bundle

If you are not able to resolve the issue, use the following command to create a support bundle with relevant information for Oracle Support:

$ sudo /usr/local/bin/node-doctor.sh --generate

The support bundle is in the /tmp directory as oke-support-bundle-dateTtime.tar.

Note:

Monitor the /tmp directory to ensure that it does not fill up. Remove old files by using the rm command, for example.

See the following resources for information about uploading a bundle to a support ticket: