Troubleshooting Node Issues for Kubernetes Clusters Using the Node Doctor Script

Find out how to use the Node Doctor script to help you resolve issues with compute instances hosting worker nodes in clusters you've created using Kubernetes Engine (OKE).

The Node Doctor script is pre-installed on managed node compute instances to help you resolve issues with the instances. Depending on how you run it, the Node Doctor script:

Prints troubleshooting output that identifies potential problem areas, with links to documentation to address those areas.
Gathers system information into a bundle. My Oracle Support (MOS) provides instructions to upload the bundle to a support ticket.

If you see worker nodes with a Kubernetes Node Condition other than "Active", or with a Node State other than "Ready", use the Node Doctor script to troubleshoot the issues.

You can run the Node Doctor script in the following ways:

using SSH
using the Run Command feature

The Worker Node Troubleshooting Guide is a convenient way to start the Node Doctor script from the Console. The Worker Node Troubleshooting Guide provides dynamically populated commands to run the Node Doctor script using either SSH or the Run Command feature. To access the Worker Node Troubleshooting Guide:

On the node pool details page, select Troubleshoot Nodes from the Actions menu.
In the Worker Node Troubleshooting Guide dialog, select either SSH Connections or Run Command, and follow the instructions.

Note

The Node Doctor script is pre-installed on worker node instances created from July 19, 2021. Worker nodes created before July 19, 2021 do not have the Node Doctor script pre-installed. To find out how to install the Node Doctor script, see Downloading, Installing, and Updating the Node Doctor Script. Note that to install the Node Doctor script on such nodes, you must have SSH access to them.
Oracle releases new versions of the Node Doctor script periodically. Before running the Node Doctor script for the first time (even on worker nodes created after July 19, 2021), follow the instructions in Downloading, Installing, and Updating the Node Doctor Script to update the script to the latest version. It's also recommended good practice to update the Node Doctor script from time to time.
You can run the Node Doctor script on worker nodes in managed node pools, but not in virtual node pools.

Using the Run Command Feature to Run the Node Doctor Script

You can use the Run Command feature to troubleshoot node issues and generate a support bundle using the Node Doctor script. For more information about using the Run Command feature, see Running Commands on an Instance.

To run the Node Doctor script using the Run Command feature, do one of the following:

Use the Worker Node Troubleshooting Guide. On the node pool details page, select Troubleshoot Nodes from the Actions menu to display the Worker Node Troubleshooting Guide dialog. Select Run Command, and follow the instructions.
Follow the steps in this section.

Required IAM Policy

For administrators: To write a policy for the Run Command feature, do the following:

Write the following policy to allow any user to use the Run Command feature to issue commands, cancel commands, and view the command output for the instances in a compartment:
```
Allow any-user to use instance-agent-command-execution-family in compartment id <compartment-ocid> where request.instance.id=target.instance.id
```
If you want to save the output from the Node Doctor script in an Object Storage bucket, write the following policy:
```
Allow any-user to manage objects in compartment id <compartment-ocid-of-bucket> where all { request.principal.type='instance', request.principal.compartment.id='<compartment-ocid-of-node>', target.bucket.name = '<bucket-name>' }
```
where:
- <compartment-ocid-of-bucket> is the OCID of the compartment to which the Object Storage bucket belongs.
- <compartment-ocid-of-node> is the OCID of the compartment to which the worker node instance belongs.

Creating the Command to Run the Node Doctor Script

To create the command to run the Node Doctor script on the instance:

On the Node pools tab of the cluster details page, select the managed node pool containing the managed node you want to troubleshoot.
On the Nodes tab, select the name of the node you want to troubleshoot, to display the instance details page.
Select the Management tab of the instance details page.
In the Run command section, select Create command.
Enter a name for the command. Avoid entering confidential information.
In the Timeout in seconds box, enter the amount of time to give the Compute Instance Run Command plugin to run the command on the instance before timing out. The timer starts when the plugin starts the command. For no timeout, enter 0.
In the Add script section, upload the script that you want the Compute Instance Run Command plugin to run on the instance. Select the Paste script option and paste one of the following commands in the box:
- sudo /usr/local/bin/node-doctor.sh --check to print troubleshooting output that identifies potential problem areas, with links to documentation to address those areas.
- sudo /usr/local/bin/node-doctor.sh --generate &> /dev/null && cat /tmp/oke-support-bundle.tar to gather system information in a bundle. My Oracle Support (MOS) provides instructions to upload the bundle to a support ticket.
Note

The commands in this step assume that the Node Doctor script is located in /usr/local/bin. However, in some cases, the Node Doctor script might be located in /usr/bin rather than in /usr/local/bin. If the Node Doctor script cannot be found in /usr/local/bin, modify the commands and specify /usr/bin instead.
In the Output type section, select the location to save the output of the command:
- Output as text: The output is saved as plain text. You can review the output on the Instance Details page.
- Output to an Object Storage bucket: The output is saved to an Object Storage bucket. Select a bucket. In the Object name box, enter a name for the output file. Avoid entering confidential information.
- Output to an Object Storage URL: The output is saved to an Object Storage URL. Enter the URL.
Select Create command.

Viewing the Output of the Node Doctor Script

How to view the output of the Node Doctor script depends on whether the output was saved to an Object Storage location or as a plain text file, as follows:

If the Node Doctor script output was saved to an Object Storage location, do one of the following:
- Download the response object from the bucket where it was saved.
- Navigate to the Object Storage pre-authenticated request URL.
If the Node Doctor script output was saved as a plain text file, do the following:
1. Open the navigation menu and select Compute. Under Compute, select Instances.
2. Select the instance that you're interested in.
3. Select the Management tab of the instance details page.
4. In the Run command section, find the command in the list, and then select View Command Details from the Actions menu (three dots) beside the command.

Using SSH to Run the Node Doctor Script

If you have SSH access to a managed node, you can run the Node Doctor script using SSH to troubleshoot node issues and generate a support bundle using the Node Doctor script.

To run the Node Doctor script using SSH, do one of the following:

Use the Worker Node Troubleshooting Guide. On the node pool details page, select Troubleshoot Nodes from the Actions menu to display the Worker Node Troubleshooting Guide dialog. Select SSH Connections, and follow the instructions.
Follow the steps in this section.

Establish an SSH connection with the worker node instance on which you want to run the Node Doctor script.
For detailed instructions to establish an SSH connection, see Connecting to Managed Nodes Using SSH. At a high level, the steps are:
1. Find out the IP address of the worker node instance that you want to troubleshoot, and make a note of it.
  For example, on the Node pools tab of the cluster details page, select the node pool containing the worker node. Select the Nodes tab, and then select the name of the node you are interested in to display the instance details page. Select the Networking tab to see the instance's IP address.
2. In a terminal window, enter ssh opc@<node_ip_address> to connect to the worker node, where <node_ip_address> is the IP address of the worker node instance that you made a note of earlier. For example, you might enter:
```
ssh opc@192.0.2.254
```
  If the SSH private key is not stored in the file or in the path that the ssh utility expects (for example, the ssh utility might expect the private key to be stored in ~/.ssh/id_rsa), you must explicitly specify the private key filename and location. For more information, see Connecting to Managed Nodes Using SSH.
In the terminal window in which you have established the SSH connection with the worker node instance, enter one of the following commands:
- sudo /usr/local/bin/node-doctor.sh --check to print troubleshooting output that identifies potential problem areas, with links to documentation to address those areas.
- sudo /usr/local/bin/node-doctor.sh --generate to gather system information in a bundle. My Oracle Support (MOS) provides instructions to upload the bundle to a support ticket.
Note

The commands in this step assume that the Node Doctor script is located in /usr/local/bin. However, in some cases, the Node Doctor script might be located in /usr/bin rather than in /usr/local/bin. If the Node Doctor script cannot be found in /usr/local/bin, modify the commands and specify /usr/bin instead.

Downloading, Installing, and Updating the Node Doctor Script

Worker nodes created from July 19, 2021 already have the Node Doctor script pre-installed.

Worker nodes created before July 19, 2021 do not have the Node Doctor script pre-installed. To run the Node Doctor script on such a worker node, you must download and install the script. To download and install the Node Doctor script, you must have SSH access to the worker node.

Note

Periodically, Oracle releases new versions of the Node Doctor script. Before running the Node Doctor script for the first time (even on worker nodes created after July 19, 2021), follow the final step in the instructions below to update the script to the latest version. It's also recommended good practice to update the Node Doctor script from time to time.

To download, install, and update the Node Doctor script on a managed node:

Establish an SSH connection with the worker node.
For detailed instructions to establish an SSH connection, see Connecting to Managed Nodes Using SSH. At a high level, the steps are:
1. Find out the IP address of the worker node instance that you want to troubleshoot, and make a note of it.
  For example, on the Node pools tab of the cluster details page, select the node pool containing the worker node. Select the Nodes tab, and then select the name of the node you are interested in to display the instance details page. Select the Networking tab to see the instance's IP address.
2. In a terminal window, enter ssh opc@<node_ip_address> to connect to the worker node, where <node_ip_address> is the IP address of the worker node that you made a note of earlier. For example, you might enter:
```
ssh opc@192.0.2.254
```
  If the SSH private key is not stored in the file or in the path that the ssh utility expects (for example, the ssh utility might expect the private key to be stored in ~/.ssh/id_rsa), you must explicitly specify the private key filename and location. For more information, see Connecting to Managed Nodes Using SSH.

In the terminal window in which you have established the SSH connection with the worker node, download and install the Node Doctor script in the /usr/local/bin directory by entering:

sudo curl -s -X GET https://objectstorage.<region-name>.oraclecloud.com/n/odx-oke/b/public/o/artifacts/prd/workernode/14d06c9-431/node-doctor --output /usr/local/bin/node-doctor.sh

where <region-name> is the region in which the cluster is located. For example:

sudo curl -s -X GET https://objectstorage.us-ashburn-1.oraclecloud.com/n/odx-oke/b/public/o/artifacts/prd/workernode/14d06c9-431/node-doctor --output /usr/local/bin/node-doctor.sh

Before running the Node Doctor script for the first time, complete the next step.

When the Node Doctor script has been downloaded and installed on the worker node, get the latest version of the Node Doctor script by entering:
```
sudo /usr/local/bin/node-doctor.sh --update
```
It's recommended good practice to keep the Node Doctor script up-to-date by running the above command from time to time.

You can now use the Node Doctor script to troubleshoot worker node issues.