Troubleshooting

Typical errors when bringing up a data store are typos and misconfiguration. It is also possible to run into network port conflicts, especially if the deployment failed and you are starting over. Processes associated with a data store are reported by jps -m command. Some examples of them are :
  • kvstore.jar start -root $KVROOT (SNA process)

  • ManagedService

If you kill the SNA process it should also kill its managed processes.

There are detailed log files available in $KVROOT/storename/log as well as logs of the bootstrap process in $KVROOT/*.log. The bootstrap logs are most useful in diagnosing initial startup problems. The logs in storename/log appear once the data store has been configured. The logs on the Storage Node chosen for the admin process are the most detailed and include a store-wide consolidated log file: $KVROOT/storename/log/storename_*.log

Each line in the log file is prefixed with the date of the message, its severity, and the name of the component which issued it. For example:

2023-05-24 14:28:26.982 UTC INFO [admin1] 
Initializing Admin for store: mystore 

When looking for more context for events at a given time, use the timestamp and component name to narrow down the section of log to peruse.

Error messages in the logs show up with SEVERE in them so you can grep for that if you are troubleshooting. SEVERE error messages are also displayed in the CLI's show events command, and when you use the ping command.

In addition to log files, the log directories may also contain *.perf files, which are performance files for the replication nodes.

In general, verify configuration is the tool of choice for understanding the state of the data store. In addition to contacting the data store components, it will cross check each component's parameters against the Admin database. For example, verify configuration might report that a replication node's helperHosts parameter was at odds with the Admin. If this was the case then it might explain why a replication node cannot come up. The Verify configuration tool also checks on Admins. It also verifies the configuration of Arbiter Nodes in the topology.

Additionally, in order to catch configuration errors early, you can use the diagnostics tool when troubleshooting your data store. Also, you can use this tool to package important information and files to be able to send them to Oracle Support. For more information, see Diagnostics Utility.

Where to Find Error Information

As your data store operates, you can discover information about any problems that may be occurring by looking at the plan history and by looking at error logs.

The plan history indicates if any configuration or operational actions you attempted to take against the store encountered problems. This information is available as the plan executes and finishes. Errors are reported in the plan history each time an attempt to run the plan fails. The plan history can be seen using the CLI show plan command.

Other problems may occur asynchronously. You can learn about unexpected failures, service downtime, and performance issues through the CLI's show events command. Events come with a time stamp, and the description may contain enough information to diagnose the issue. In other cases, more context may be needed, and the administrator may want to see what else happened around that time.

The store-wide log consolidates logging output from all services. Browsing this file might give you a more complete view of activity during the problem period. It can be viewed using the CLI's logtail command, or by directly viewing the <storename>_N.log file in the $KVHOME/<storename>/log directory.

Service States

Oracle NoSQL Database uses four different types of services, all of which should be running correctly in order for your store to be in a healthy state. The four service types are the Admin, Storage Nodes, Replication Nodes and Arbiters Nodes. You should have multiple instances of these services running throughout your store.

Each service has a status that can be viewed using any of the following:

  • The show topology command in the Administration CLI.

  • Using the ping command.

The status values can be one of the following:

Name Description
ERROR_NO_RESTART The service is in an error state and is not automatically restarted. Administrative intervention is required.
ERROR_RESTARTING The service is in an error state. Oracle NoSQL Database attempts to restart the service.
RUNNING The service is running normally.
STARTING The service is coming up.
STOPPED The service was stopped intentionally and cleanly.
STOPPING The service is stopping. This may take some time as some services can be involved in time-consuming activities when they are asked to stop.
SUCCEEDED The plan has completed successfully.
UNREACHABLE The service is not reachable by the Admin. If the status was seen using a command issued by the Admin, this state may mask a STOPPED or ERROR state. If an SN is UNREACHABLE, or an RN is having problems and its SN is UNREACHABLE, the first thing to check is the network connectivity between the Admin and the SN. However, if the managing SNA is reachable and the managed Replication Node is not, we can guess that the network is OK and the problem lies elsewhere.
WAITING_FOR_DEPLOY The service is waiting for commands or acknowledgments from other services during its startup processing. If it is a Storage Node, it is waiting for the initial deploy-SN command. Other services should transition out of this phase without any administrative intervention from the user.

A healthy service begins with STARTING. It may transition to WAITING_FOR_DEPLOY for a short period before going on to RUNNING.

ERROR_RESTARTING and ERROR_NO_RESTART indicate that there has been a problem that should be investigated. An UNREACHABLE service may only be in that state temporarily, although if that state persists, the service may be truly in an ERROR_RESTARTING or ERROR_NO_RESTART state.

Useful Commands

The following commands may be useful to you when troubleshooting your KVStore.

  • java -Xmx64m -Xms64m \
    -jar kvstore.tmp/kvstore.jar ping -host node01 -port 5000 \
    -security USER/security/admin.security
    Reports the status of the store running on the specified host and port. This command can be used against any of the host and port pairs used for Storage Nodes.

    Note:

    This assumes that you have completed the steps in Create users and configure security with remote access.

  • jps -m

    Reports the Java processes running on a machine. If the Oracle NoSQL Database processes are running, they are reported by this command.

  • ps -eaf | grep kv

    You can view the list of kvstore processes that are running.