Troubleshooting
Troubleshooting a data store involves a systematic approach to verifying service states, analyzing logs, and using built-in diagnostic utilities.
Identify Service States
Oracle NoSQL Database uses four different types of services that contribute to your store's health: Admin Nodes, Storage Nodes, Replication Nodes, and Arbiter Nodes. To know more about these services, see the Architecture topic in the Concepts guide.
Each service has a status that can be viewed using the show topology command:
| Name | Description |
| ERROR_NO_RESTART | The service is in an error state and is not automatically restarted. Administrative intervention is required. |
| ERROR_RESTARTING | The service is in an error state. Oracle NoSQL Database attempts to restart the service. |
| RUNNING | The service is running normally. |
| STARTING | The service is coming up. |
| STOPPED | The service was stopped intentionally and cleanly. |
| STOPPING | The service is stopping, which may take some time. |
| SUCCEEDED | The plan has completed successfully. |
| UNREACHABLE | The service cannot be reached by the Admin. This may be temporary, or it may signal a STOPPED or ERROR state. Check network connectivity between the Admin and the Storage Node. If the Storage Node agent is reachable but a managed Replication Node is not, the issue likely lies elsewhere.
|
| WAITING_FOR_DEPLOY | The service is waiting for deployment commands or acknowledgments from other services during startup. Storage Nodes in this state are generally awaiting the initial deploy-sn command. Other services should transition out of this state automatically.
|
STARTING state, may briefly transition to WAITING_FOR_DEPLOY, and then moves to RUNNING once operational.
ERROR_RESTARTINGandERROR_NO_RESTARTindicate problems that require investigation.- If a service is
UNREACHABLE, this could be a temporary state, but if it persists, check for underlyingERROR_RESTARTINGorERROR_NO_RESTARTconditions or network issues.
Troubleshooting Commands
ping: Checks if the services are reachable and active, and provides a concise health summary.show plan -id <id>: Displays detailed error information for a plan that failed to execute, often revealing issues like insufficient memory or port conflicts.show topology: Displays the physical layout of the store to ensure shards and replication nodes are correctly mapped. It also shows the status of the database services.verify configuration: Inspects every component and reports any configuration violations.
Log Analysis
There are detailed log files available in $KVROOT/storename/log and $KVROOT/*.log.
The bootstrap logs available in $KVROOT/*.log are the most useful in diagnosing initial startup problems. The logs in $KVROOT/storename/log appear once the data store has been configured. The logs on the Storage Node chosen to run the Admin service are the most detailed and include a store-wide consolidated log file: $KVROOT/storename/log/storename_*.log
Each entry in the log files is prefixed with the timestamp, severity level, and the component name that issued the message. For example:
<date> <timestamp> UTC INFO [admin1] Initializing Admin for store: mystore You can use the timestamp and component name to focus your search on relevant log entries, and grep SEVERE to quickly locate critical errors when troubleshooting.
In addition to log files, the log directories also contain *.perf files. These are performance files for the replication nodes and provide statistics such as throughput, request latency, and operation counts. For more details, see System Log File Monitoring.
Diagnostics
In order to catch configuration errors early, you can use the diagnostics tool when troubleshooting your data store. You can use this tool to package important information and files to send to Oracle Support. For more information, see Diagnostics Utility.
Common Failure Scenarios and Fixes:
Typical errors when bringing up a data store are typos and misconfiguration. It is also possible to run into network port conflicts, especially if a previous deployment failed and you are starting over.
jps -m command. Below is a sample output of the command: 3659557 ManagedService -root kvroot/mystore/sn1 -secdir kvhome/kvroot/security -store mystore -class RepNode -service rg1-rn1
3659427 kvstore.jar start -root kvroot
3659907 ManagedService -root kvroot1/mystore/sn2 -secdir kvhome/kvroot1/security -store mystore -class RepNode -service rg2-rn1
3659553 ManagedService -root kvroot/mystore/sn1 -secdir kvhome/security -store mystore -class Admin -service admin1
3659599 kvstore.jar start -root kvroot1
3659903 ManagedService -root kvroot1/mystore/sn2 -secdir kvhome/kvroot1/security -store mystore -class Admin -service admin2
3660110 ManagedService -root kvroot2/mystore/sn3 -secdir kvhome/kvroot2/security -store mystore -class Admin -service admin3
3659672 kvstore.jar start -root kvroot2
3660120 ManagedService -root kvroot2/mystore/sn3 -secdir kvhome/kvroot2/security -store mystore -class RepNode -service rg3-rn1kvstore.jar start -root $KVROOTare the SNA processesManagedServiceare the managed child processes such as Replication Nodes and Admin Nodes
- Clear the existing
KVROOTdirectory and storage directories (storagedir) to ensure no old configuration or corrupt data remains. - Recreate the bootstrap configuration using the
makebootconfigutility.