Typical errors when bringing up a store are typos and misconfiguration. It is also possible to run into network port conflicts, especially if the deployment failed and you are starting over. In that case be sure to remove all partial store data and configuration and kill any remnant processes. Processes associated with a store as reported by "jps -m" are one of these:
StorageNodeAgentImpl |
ManagedService |
If you kill the StorageNodeAgentImpl it should also kill its managed processes.
You can use the monitoring tab in the Admin Console to look at various log files.
There are detailed log files available in
KVROOT/storename/log
as well as logs of the
bootstrap process in KVROOT/*.log
. The
bootstrap logs are most useful in diagnosing initial startup
problems. The logs in storename/log
appear
once the store has been configured. The logs on the host
chosen for the admin process are the most detailed and include
a store-wide consolidated log file:
KVROOT/storename/log/storename_*.log
Each line in a log file is prefixed with the date of the message, its severity, and the name of the component which issued it. For example:
2012-10-25 14:28:26.982 UTC INFO [admin1] Initializing Admin for store: kvstore
When looking for more context for events at a given time, use the timestamp and component name to narrow down the section of log to peruse.
Error messages in the logs show up with "SEVERE" in them so you
can grep for that if you are troubleshooting. SEVERE error
messages are also displayed in the Admin's Topology tab, in the
CLI's show events
command, and when you use
the ping
command.
In addition to log files, these directories may also contain *.perf files, which are performance files for the Replication Nodes.
As your store operates, you can discover information about any problems that may be occurring by looking at the plan history and by looking at error logs.
The plan history indicates if any configuration or
operational actions you attempted to take against the store
encountered problems. This information is available as the
plan executes and finishes. Errors are reported in the plan
history each time an attempt to run the plan fails. The plan
history can be seen using the CLI show plan
command, or in the Admin's Plan History
tab.
Other problems may occur asynchronously. You can learn about
unexpected failures, service downtime, and performance issues
through the Admin's critical events display in the Logs tab,
or through the CLI's show events
command.
Events come with a time stamp, and the description may
contain enough information to diagnose the issue. In other
cases, more context may be needed, and the administrator may
want to see what else happened around that time.
The store-wide log consolidates logging output from all
services. Browsing this file might give you a more complete
view of activity during the problem period. It can be viewed
using the Admin's Logs tab, by using the CLI's
logtail
command, or by directly viewing
the <storename>_N.log file in the
<KVHOME>/<storename>/log directory. It is also
possible to download the store-wide log file using the Admin's
Logs tab.
Oracle NoSQL Database uses three different types of services, all of which should be running correctly in order for your store to be in a healthy state. The three service types are the Admin, Storage Nodes, and Replication Nodes. You should have multiple instances of these services running throughout your store.
Each service has a status that can be viewed using any of the following:
The Topology tab in the Admin Console
The show topology
command in the
Administration CLI.
Using the ping
command.
The status values can be one of the following:
STARTING
The service is coming up.
RUNNING
The service is running normally.
STOPPING
The service is stopping. This may take some time as some services can be involved in time-consuming activities when they are asked to stop.
WAITING_FOR_DEPLOY
The service is waiting for commands or acknowledgments from other services during its startup processing. If it is a Storage Node, it is waiting for the initial deploy-SN command. Other services should transition out of this phase without any administrative intervention from the user.
STOPPED
The service was stopped intentionally and cleanly.
ERROR_RESTARTING
The service is in an error state. Oracle NoSQL Database attempts to restart the service.
ERROR_NO_RESTART
The service is in an error state and is not automatically restarted. Administrative intervention is required.
UNREACHABLE
The service is not reachable by the Admin. If the status was seen using a command issued by the Admin, this state may mask a STOPPED or ERROR state.
A healthy service begins with STARTING
.
It may transition to WAITING_FOR_DEPLOY
for a short period before going on to
RUNNING
.
ERROR_RESTARTING
and
ERROR_NO_RESTART
indicate that there has
been a problem that should be investigated. An
UNREACHABLE
service may only be in that
state temporarily, although if that state persists, the
service may be truly in an
ERROR_RESTARTING
or
ERROR_NO_RESTART
state.
Note that the Admin's Topology tab only shows abnormal
service statuses. A service that is
RUNNING
does not display its status in
that tab.
The following commands may be useful to you when troubleshooting your KVStore:
java -jar KVHOME/lib/kvstore.jar ping -host <host> -port <registryport>
Reports the status of the store running on the specified host and port. This command can be used against any of the host and port pairs used for Storage Nodes.
jps -m
Reports the Java processes running on a machine. If the Oracle NoSQL Database processes are running, they are reported by this command.
In addition you can use the administration console to investigate the state of the KVStore. Point your browser to the administration port chosen on the administration host.