Monitoring the Store

Monitoring the Store
Prev	Chapter 6. Administrative Procedures	Next

Events

Information about the performance and availability of your store can be obtained both from a server side and client side perspective:

Your Oracle NoSQL Database applications can obtain performance statistics using the oracle.kv.KVStore.getStats() class. This provides a client side view of the complete round trip performance for Oracle NoSQL Database operations.
The Oracle NoSQL Database administrative service collects and aggregates status information, alerts, and performance statistics components that are generated in the store. This provides a detailed, server side view of behavior and performance of the Oracle NoSQL Database server.
Each Oracle NoSQL Database storage node maintains detailed logs of trace information from the services that are housed on that node. The administrative service presents an aggregated, store-wide view of these component logs, but the logs are nevertheless available on each storage node in the event that the administrative service is somehow not available, or if it is more convenient to examine the individual logs.
Oracle NoSQL Database allows Java Management Extensions (JMX) or Simple Network Management Protocol (SNMP) agents to be optionally available for monitoring. The SNMP and JMX interfaces allow you to poll the storage nodes for information about the storage node and about any replication nodes that are hosted on the storage node. See Standardized Monitoring Interfaces for more information.

In addition to the logging mechanisms noted above, you can also view the current health of the store using the Admin Console. This information is viewable on the Topology pane. It shows you what services are currently unavailable. Problematic services are highlighted in red. Two lines at the top of the pane summarize the number of available and unavailable services.

Finally, you can monitor the status of the store by verifying it from within the CLI. See Verifying the Store for more information. You can also use the CLI to examine events.

Events

Events are special messages that inform you of the state of your system. As events are generated, they are routed through the monitoring system so that you can see them. There are four types of events that the store reports:

State Change events are issued when a service starts up or shuts down.
Performance events report statistics about the performance of various services.
Log events are records produced by the various system components to provide trace information about debugging. These records are produced by the standard java.util.logging package.
Plan Change events record the progress of plans as they execute, are interrupted, fail or are canceled.

Note that some events are considered critical. These events are recorded in the administration service's database, and can be retrieved and viewed using the CLI or the Admin Console.

Other Events

Plan Change events cannot be directly viewed through Oracle NoSQL Database's administrative interfaces. However, State Change events, Performance events, and Log events are recorded using the EventRecorder facility internal to the Admin. Only events that are considered "critical" are recorded, and the criteria for being designated as such vary with the type of the event. All state change events are considered critical, but only SEVERE log events are. Performance events are considered critical if the reported performance is below a certain threshold.

All such events can be viewed in the CLI using the show events and show event commands.

Use the CLI show events command with no arguments to see all the unexpired events in the database. You can bound the range of events that are displayed using the -from and -to arguments. You can filter events by type or id as well, using either the -type or the -id arguments respectively.

For example, this is a fragment of the output from the show events command:

gt0hgvkiS STAT 09-25-11 16:30:54:162 EDT rg2-rn3 RUNNING sev1
gt0hgvkjS STAT 09-25-11 16:30:41:703 EDT rg1-rn1 RUNNING sev1
gt0hgvkkS STAT 09-25-11 16:30:51:540 EDT rg2-rn2 RUNNING sev1
gt0hicphL LOG  09-25-11 16:32:03:29 EDT SEVERE[admin1] Task StopAdmin 
failed: StopAdmin [INTERRUPTED] start=09-25-11 16:32:03 end=09-25-11 
16:32:03 Plan has been interrupted.: null: java.lang.InterruptedException

This shows three state change events and one severe log event. The tags at the beginning of each line are individual event record identifiers. If you want to see detailed information for a particular event, you can use the "show event" command, which takes as its argument an event record identifier:

kv-> show event -id gt0hicphL
gt0hicphL LOG  09-25-11 16:32:03:29 EDT SEVERE[admin1] Task StopAdmin 
failed: StopAdmin [INTERRUPTED] start=09-25-11 16:32:03 end=09-25-11 
16:32:03 Plan has been interrupted.: null: java.lang.InterruptedException
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.
doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1024)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.
tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1303)
    ....

and so on, for a complete stack trace.

Events expire from the system after a set period, which defaults to thirty days.