Troubleshooting

Troubleshooting a data store involves a systematic approach to verifying service states, analyzing logs, and using built-in diagnostic utilities.

Identify Service States

Oracle NoSQL Database uses four different types of services that contribute to your store's health: Admin Nodes, Storage Nodes, Replication Nodes, and Arbiter Nodes. To know more about these services, see the Architecture topic in the Concepts guide.

Each service has a status that can be viewed using the show topology command:

The status values can be one of the following:

Name Description

ERROR_NO_RESTART The service is in an error state and is not automatically restarted. Administrative intervention is required.

ERROR_RESTARTING The service is in an error state. Oracle NoSQL Database attempts to restart the service.

RUNNING The service is running normally.

STARTING The service is coming up.

STOPPED The service was stopped intentionally and cleanly.

STOPPING The service is stopping, which may take some time.

SUCCEEDED The plan has completed successfully.

UNREACHABLE The service cannot be reached by the Admin. This may be temporary, or it may signal a STOPPED or ERROR state. Check network connectivity between the Admin and the Storage Node. If the Storage Node agent is reachable but a managed Replication Node is not, the issue likely lies elsewhere.

WAITING_FOR_DEPLOY The service is waiting for deployment commands or acknowledgments from other services during startup. Storage Nodes in this state are generally awaiting the initial deploy-sn command. Other services should transition out of this state automatically.

A healthy service starts in the STARTING state, may briefly transition to WAITING_FOR_DEPLOY, and then moves to RUNNING once operational.

ERROR_RESTARTING and ERROR_NO_RESTART indicate problems that require investigation.
If a service is UNREACHABLE, this could be a temporary state, but if it persists, check for underlying ERROR_RESTARTING or ERROR_NO_RESTART conditions or network issues.

Troubleshooting Commands

Use these commands to diagnose the current state of your data store:

ping: Checks if the services are reachable and active, and provides a concise health summary.
show plan -id <id>: Displays detailed error information for a plan that failed to execute, often revealing issues like insufficient memory or port conflicts.
show topology: Displays the physical layout of the store to ensure shards and replication nodes are correctly mapped. It also shows the status of the database services.
verify configuration: Inspects every component and reports any configuration violations.

Log Analysis

There are detailed log files available in $KVROOT/storename/log and $KVROOT/*.log.

The bootstrap logs available in $KVROOT/*.log are the most useful in diagnosing initial startup problems. The logs in $KVROOT/storename/log appear once the data store has been configured. The logs on the Storage Node chosen to run the Admin service are the most detailed and include a store-wide consolidated log file: $KVROOT/storename/log/storename_*.log

Each entry in the log files is prefixed with the timestamp, severity level, and the component name that issued the message. For example:

<date> <timestamp> UTC INFO [admin1] Initializing Admin for store: mystore

You can use the timestamp and component name to focus your search on relevant log entries, and grep SEVERE to quickly locate critical errors when troubleshooting.

In addition to log files, the log directories also contain *.perf files. These are performance files for the replication nodes and provide statistics such as throughput, request latency, and operation counts. For more details, see System Log File Monitoring.

Diagnostics

In order to catch configuration errors early, you can use the diagnostics tool when troubleshooting your data store. You can use this tool to package important information and files to send to Oracle Support. For more information, see Diagnostics Utility.

Common Failure Scenarios and Fixes:

Typical errors when bringing up a data store are typos and misconfiguration. It is also possible to run into network port conflicts, especially if a previous deployment failed and you are starting over.

If a Storage Node Agent (SNA) is misconfigured, you need to kill the existing processes. Processes associated with a data store are reported by jps -m command. Below is a sample output of the command:

3659557 ManagedService -root kvroot/mystore/sn1 -secdir kvhome/kvroot/security -store mystore -class RepNode -service rg1-rn1
3659427 kvstore.jar start -root kvroot
3659907 ManagedService -root kvroot1/mystore/sn2 -secdir kvhome/kvroot1/security -store mystore -class RepNode -service rg2-rn1
3659553 ManagedService -root kvroot/mystore/sn1 -secdir kvhome/security -store mystore -class Admin -service admin1
3659599 kvstore.jar start -root kvroot1
3659903 ManagedService -root kvroot1/mystore/sn2 -secdir kvhome/kvroot1/security -store mystore -class Admin -service admin2
3660110 ManagedService -root kvroot2/mystore/sn3 -secdir kvhome/kvroot2/security -store mystore -class Admin -service admin3
3659672 kvstore.jar start -root kvroot2
3660120 ManagedService -root kvroot2/mystore/sn3 -secdir kvhome/kvroot2/security -store mystore -class RepNode -service rg3-rn1

Here,

kvstore.jar start -root $KVROOT are the SNA processes
ManagedService are the managed child processes such as Replication Nodes and Admin Nodes

Killing the SNA process also terminates its managed child processes.

Next steps after cleanup

Clear the existing KVROOT directory and storage directories (storagedir) to ensure no old configuration or corrupt data remains.
Recreate the bootstrap configuration using the makebootconfig utility.