Troubleshooting
jps -m
command. Some
examples of them are :
-
kvstore.jar start -root $KVROOT
(SNA process) -
ManagedService
If you kill the SNA process it should also kill its managed processes.
There are detailed log files available in $KVROOT/storename/log
as
well as logs of the bootstrap process in $KVROOT/*.log
. The
bootstrap logs are most useful in diagnosing initial startup problems. The logs in
storename/log
appear once the data store has been configured.
The logs on the Storage Node chosen for the admin process are the most detailed and
include a store-wide consolidated log file:
$KVROOT/storename/log/storename_*.log
Each line in the log file is prefixed with the date of the message, its severity, and the name of the component which issued it. For example:
2023-05-24 14:28:26.982 UTC INFO [admin1]
Initializing Admin for store: mystore
When looking for more context for events at a given time, use the timestamp and component name to narrow down the section of log to peruse.
Error messages in the logs show up with SEVERE in them so you can grep for
that if you are troubleshooting. SEVERE error messages are also displayed in the
CLI's show events
command, and when you use the
ping
command.
In addition to log files, the log directories may also contain *.perf files, which are performance files for the replication nodes.
In general, verify configuration
is the tool of choice for
understanding the state of the data store. In addition to contacting the data store
components, it will cross check each component's parameters against the Admin
database. For example, verify configuration
might report that a
replication node's helperHosts parameter was at odds with the Admin. If this was the
case then it might explain why a replication node cannot come up. The Verify
configuration
tool also checks on Admins. It also verifies the
configuration of Arbiter Nodes in the topology.
Additionally, in order to catch configuration errors early, you can use the diagnostics tool when troubleshooting your data store. Also, you can use this tool to package important information and files to be able to send them to Oracle Support. For more information, see Diagnostics Utility.
Where to Find Error Information
As your data store operates, you can discover information about any problems that may be occurring by looking at the plan history and by looking at error logs.
The plan history indicates if any configuration or operational actions you attempted to take against the store encountered problems. This information is available as the plan executes and finishes. Errors are reported in the plan history each time an attempt to run the plan fails. The plan history can be seen using the CLI show plan
command.
Other problems may occur asynchronously. You can learn about unexpected failures, service downtime, and performance issues through the CLI's show events
command. Events come with a time stamp, and the description may contain enough information to diagnose the issue. In other cases, more context may be needed, and the administrator may want to see what else happened around that time.
The store-wide log consolidates logging output from all services. Browsing this
file might give you a more complete view of activity during the problem period. It can
be viewed using the CLI's logtail
command, or by directly viewing the
<storename>_N.log file in the $KVHOME/<storename>/log directory.
Service States
Oracle NoSQL Database uses four different types of services, all of which should be running correctly in order for your store to be in a healthy state. The four service types are the Admin, Storage Nodes, Replication Nodes and Arbiters Nodes. You should have multiple instances of these services running throughout your store.
Each service has a status that can be viewed using any of the following:
-
The
show topology
command in the Administration CLI. -
Using the
ping
command.
The status values can be one of the following:
Name | Description |
---|---|
ERROR_NO_RESTART | The service is in an error state and is not automatically restarted. Administrative intervention is required. |
ERROR_RESTARTING | The service is in an error state. Oracle NoSQL Database attempts to restart the service. |
RUNNING | The service is running normally. |
STARTING | The service is coming up. |
STOPPED | The service was stopped intentionally and cleanly. |
STOPPING | The service is stopping. This may take some time as some services can be involved in time-consuming activities when they are asked to stop. |
SUCCEEDED | The plan has completed successfully. |
UNREACHABLE | The service is not reachable by the Admin. If the status was seen using a command issued by the Admin, this state may mask a STOPPED or ERROR state. If an SN is UNREACHABLE, or an RN is having problems and its SN is UNREACHABLE, the first thing to check is the network connectivity between the Admin and the SN. However, if the managing SNA is reachable and the managed Replication Node is not, we can guess that the network is OK and the problem lies elsewhere. |
WAITING_FOR_DEPLOY | The service is waiting for commands or acknowledgments from other services during its startup processing. If it is a Storage Node, it is waiting for the initial deploy-SN command. Other services should transition out of this phase without any administrative intervention from the user. |
A healthy service begins with STARTING
. It may transition to WAITING_FOR_DEPLOY
for a short period before going on to RUNNING
.
ERROR_RESTARTING
and ERROR_NO_RESTART
indicate that there has been a problem that should be investigated. An UNREACHABLE
service may only be in that state temporarily, although if that state persists, the service may be truly in an ERROR_RESTARTING
or ERROR_NO_RESTART
state.
Useful Commands
The following commands may be useful to you when troubleshooting your KVStore.
-
java -Xmx64m -Xms64m \ -jar kvstore.tmp/kvstore.jar ping -host node01 -port 5000 \ -security USER/security/admin.security
Reports the status of the store running on the specified host and port. This command can be used against any of the host and port pairs used for Storage Nodes.Note:
This assumes that you have completed the steps in Create users and configure security with remote access.
-
jps -m
Reports the Java processes running on a machine. If the Oracle NoSQL Database processes are running, they are reported by this command.
-
ps -eaf | grep kv
You can view the list of kvstore processes that are running.