Troubleshooting

Where to Find Error Information
Service States
Useful Commands

Typical errors when bringing up a store are typos and misconfiguration. It is also possible to run into network port conflicts, especially if the deployment failed and you are starting over. In that case be sure to remove all partial store data and configuration and kill any remnant processes. Processes associated with a store as reported by "jps -m" are one of these:

StorageNodeAgentImpl
ManagedService

If you kill the StorageNodeAgentImpl it should also kill its managed processes.

You can use the monitoring tab in the Admin Console to look at various log files.

There are detailed log files available in KVROOT/storename/log as well as logs of the bootstrap process in KVROOT/*.log. The bootstrap logs are most useful in diagnosing initial startup problems. The logs in storename/log appear once the store has been configured. The logs on the host chosen for the admin process are the most detailed and include a store-wide consolidated log file: KVROOT/storename/log/storename_*.log

Each line in a log file is prefixed with the date of the message, its severity, and the name of the component which issued it. For example:

2012-10-25 14:28:26.982 UTC INFO [admin1] Initializing Admin for store:
kvstore 

When looking for more context for events at a given time, use the timestamp and component name to narrow down the section of log to peruse.

Error messages in the logs show up with "SEVERE" in them so you can grep for that if you are troubleshooting. SEVERE error messages are also displayed in the Admin's Topology tab, in the CLI's show events command, and when you use the ping command.

In addition to log files, these directories may also contain *.perf files, which are performance files for the Replication Nodes.

Where to Find Error Information

As your store operates, you can discover information about any problems that may be occurring by looking at the plan history and by looking at error logs.

The plan history indicates if any configuration or operational actions you attempted to take against the store encountered problems. This information is available as the plan executes and finishes. Errors are reported in the plan history each time an attempt to run the plan fails. The plan history can be seen using the CLI show plan command, or in the Admin's Plan History tab.

Other problems may occur asynchronously. You can learn about unexpected failures, service downtime, and performance issues through the Admin's critical events display in the Logs tab, or through the CLI's show events command. Events come with a time stamp, and the description may contain enough information to diagnose the issue. In other cases, more context may be needed, and the administrator may want to see what else happened around that time.

The store-wide log consolidates logging output from all services. Browsing this file might give you a more complete view of activity during the problem period. It can be viewed using the Admin's Logs tab, by using the CLI's logtail command, or by directly viewing the <storename>_N.log file in the <KVHOME>/<storename>/log directory. It is also possible to download the store-wide log file using the Admin's Logs tab.

Service States

Oracle NoSQL Database uses three different types of services, all of which should be running correctly in order for your store to be in a healthy state. The three service types are the Admin, Storage Nodes, and Replication Nodes. You should have multiple instances of these services running throughout your store.

Each service has a status that can be viewed using any of the following:

  • The Topology tab in the Admin Console

  • The show topology command in the Administration CLI.

  • Using the ping command.

The status values can be one of the following:

  • STARTING

    The service is coming up.

  • RUNNING

    The service is running normally.

  • STOPPING

    The service is stopping. This may take some time as some services can be involved in time-consuming activities when they are asked to stop.

  • WAITING_FOR_DEPLOY

    The service is waiting for commands or acknowledgments from other services during its startup processing. If it is a Storage Node, it is waiting for the initial deploy-SN command. Other services should transition out of this phase without any administrative intervention from the user.

  • STOPPED

    The service was stopped intentionally and cleanly.

  • ERROR_RESTARTING

    The service is in an error state. Oracle NoSQL Database attempts to restart the service.

  • ERROR_NO_RESTART

    The service is in an error state and is not automatically restarted. Administrative intervention is required.

  • UNREACHABLE

    The service is not reachable by the Admin. If the status was seen using a command issued by the Admin, this state may mask a STOPPED or ERROR state.

A healthy service begins with STARTING. It may transition to WAITING_FOR_DEPLOY for a short period before going on to RUNNING.

ERROR_RESTARTING and ERROR_NO_RESTART indicate that there has been a problem that should be investigated. An UNREACHABLE service may only be in that state temporarily, although if that state persists, the service may be truly in an ERROR_RESTARTING or ERROR_NO_RESTART state.

Note that the Admin's Topology tab only shows abnormal service statuses. A service that is RUNNING does not display its status in that tab.

Useful Commands

The following commands may be useful to you when troubleshooting your KVStore:

  • java -jar KVHOME/lib/kvstore.jar ping -host <host> -port <registryport>

    Reports the status of the store running on the specified host and port. This command can be used against any of the host and port pairs used for Storage Nodes.

  • jps -m

    Reports the Java processes running on a machine. If the Oracle NoSQL Database processes are running, they are reported by this command.

In addition you can use the administration console to investigate the state of the KVStore. Point your browser to the administration port chosen on the administration host.