Using Node Doctor to Troubleshoot Worker Node Issues
If a cluster has a worker node that is in a state other than Active or Running, use the Node Doctor utility to troubleshoot the issues.
Node Doctor scans a worker node and reports the health status of the node. Node Doctor can do the following tasks:
-
Identify potential problem areas and provide references to information to help you address those problem areas. See Print Troubleshooting Information.
-
Collect node system information into a support bundle if you need help from Oracle Support. See Create a Support Bundle.
Use Node Doctor only on worker nodes. Because Node Doctor is installed on OKE images, Node Doctor is also available on cluster control plane nodes. Do not use Node Doctor on control plane nodes.
Check the Oracle Private Cloud Appliance Release Notes for the release in which Node Doctor was first delivered. If your node pools were created on that release or later, then you can proceed with the instructions in this topic. If your worker node image is from an earlier release, then that node does not have access to Node Doctor. Note that if your Private Cloud Appliance is running a release that includes Node Doctor, then you could use node cycling to update older worker node images. See Node Cycling an OKE Node Pool.
Connect to the Worker Node Using SSH
Perform the following steps to connect to the worker node that you want to troubleshoot.
-
Ensure that you have a private and public SSH key pair.
You must have the private key that goes with the public key that was added to the node when the node was created.
-
Get the node user name. OKE images have the initial user name
opc
configured. -
Get the IP address of the worker node that you need to troubleshoot.
The IP address is on the Networking tab of the node details page in the Compute Web UI.
-
If the node has a public IP address, use the public IP address.
-
If the node is on a private IP, then connect to the node via the bastion host.
If a bastion host is not available, see Creating a Bastion.
-
-
Enter the following command at a shell prompt on your local system (public IP address) or on the bastion host (private IP address):
ssh -i private_key_file username@ip-address
-
private_key_file
. The full path and name of the file that contains the private SSH key that goes with the public key that was added to the node when the node was created. -
username
. The default user name for the node. This value probably isopc
. -
ip-address
. The node IP address that you got in Step 3.
-
-
Ensure that you have permission to execute the following file:
/usr/local/bin/node-doctor.sh
Print Troubleshooting Information
While logged in to the worker node as described in Connect to the Worker Node Using SSH, enter the following command to print information that identifies potential problem areas:
$ sudo /usr/local/bin/node-doctor.sh --check
Use the following command to see more options:
$ sudo /usr/local/bin/node-doctor.sh --help
Create a Support Bundle
If you are not able to resolve the issue, use the following command to create a support bundle with relevant information for Oracle Support:
$ sudo /usr/local/bin/node-doctor.sh --generate
The support bundle is in the /tmp
directory as oke-support-bundle-dateTtime.tar
.
Note:
Monitor the /tmp
directory to ensure that it does not fill up. Remove old files by using the rm
command, for example.
See the following resources for information about uploading a bundle to a support ticket:
-
Quick User Guide to Upload Files to My Oracle Support - MOS (Doc ID 1588459.1)
-
Using Support Bundles in the Oracle Private Cloud Appliance Administrator Guide