Monitoring the Store

Monitoring the Store
Prev	Chapter 6. Administrative Procedures	Next

Events

Information about the performance and availability of your store can be obtained both from a server side and client side perspective:

Your Oracle NoSQL Database applications can obtain performance statistics using the oracle.kv.KVStore.getStats() class. This provides a client side view of the complete round trip performance for Oracle NoSQL Database operations.
Oracle NoSQL Database automatically captures Replication Node performance statistics into a log file that can easily be imported and analyzed with spreadsheet software. Statistics are tracked, logged and written at a user specified interval to a CSV file ( je.stat.csv) in the Environment directory. The logging occurs per-Environment when the Environment is opened in read/write mode.

Configuration parameters control the size and number of rotating log files used (similar to java logging, see java.util.logging.FileHandler). For a rotating set of files, as each file reaches a given size limit, it is closed, rotated out, and a new file is opened. Successively older files are named by adding "0", "1", "2", etc. into the file name. The format is je.stat[version number].csv
The Oracle NoSQL Database administrative service collects and aggregates status information, alerts, and performance statistics components that are generated in the store. This provides a detailed, server side view of behavior and performance of the Oracle NoSQL Database server.
Each Oracle NoSQL Database Storage Node maintains detailed logs of trace information from the services that are housed on that node. The administrative service presents an aggregated, store-wide view of these component logs, but the logs are nevertheless available on each Storage Node in the event that the administrative service is somehow not available, or if it is more convenient to examine the individual logs.
Oracle NoSQL Database allows Java Management Extensions (JMX) or Simple Network Management Protocol (SNMP) agents to be optionally available for monitoring. The SNMP and JMX interfaces allow you to poll the Storage Nodes for information about the storage node and about any replication nodes that are hosted on the Storage Node. See Standardized Monitoring Interfaces for more information.

In addition to the logging mechanisms noted above, you can also view the current health of the store using the Admin Console. This information is viewable on the Topology pane. It shows you what services are currently unavailable. Problematic services are highlighted in red. Two lines at the top of the pane summarize the number of available and unavailable services.

Finally, you can monitor the status of the store by verifying it from within the CLI. See Verifying the Store for more information. You can also use the CLI to examine events.

Events

Events are special messages that inform you of the state of your system. As events are generated, they are routed through the monitoring system so that you can see them. There are four types of events that the store reports:

State Change events are issued when a service starts up or shuts down.
Performance events report statistics about the performance of various services.
Log events are records produced by the various system components to provide trace information about debugging. These records are produced by the standard java.util.logging package.
Plan Change events record the progress of plans as they execute, are interrupted, fail or are canceled.

Note that some events are considered critical. These events are recorded in the administration service's database, and can be retrieved and viewed using the CLI or the Admin Console.

Other Events

Plan Change events cannot be directly viewed through Oracle NoSQL Database's administrative interfaces. However, State Change events, Performance events, and Log events are recorded using the EventRecorder facility internal to the Admin. Only events that are considered "critical" are recorded, and the criteria for being designated as such vary with the type of the event. All state change events are considered critical, but only SEVERE log events are. Performance events are considered critical if the reported performance is below a certain threshold.

All such events can be viewed in the CLI using the show events and show event commands.

Use the CLI show events command with no arguments to see all the unexpired events in the database. You can bound the range of events that are displayed using the -from and -to arguments. You can filter events by type or id as well, using either the -type or the -id arguments respectively.

For example, this is a fragment of the output from the show events command:

gt0hgvkiS STAT 2011-09-25 16:30:54.162 UTC rg2-rn3 RUNNING sev1
gt0hgvkjS STAT 2011-09-25 16:30:41.703 UTC rg1-rn1 RUNNING sev1
gt0hgvkkS STAT 2011-09-25 16:30:51.540 UTC rg2-rn2 RUNNING sev1
gt0hicphL LOG  2011-09-25 16:32:03.029 UTC SEVERE[admin1] Task StopAdmin 
failed: StopAdmin [INTERRUPTED] start=09-25-11 16:32:03 end=09-25-11 
16:32:03 Plan has been interrupted.: null: java.lang.InterruptedException

This shows three state change events and one severe log event. The tags at the beginning of each line are individual event record identifiers. If you want to see detailed information for a particular event, you can use the "show event" command, which takes as its argument an event record identifier:

kv-> show event -id gt0hicphL
gt0hicphL LOG  2011-09-25 16:32:03.029 UTC SEVERE[admin1] Task StopAdmin 
failed: StopAdmin [INTERRUPTED] start=09-25-11 16:32:03 end=09-25-11 
16:32:03 Plan has been interrupted.: null: java.lang.InterruptedException
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.
doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1024)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.
tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1303)
    ....

and so on, for a complete stack trace.

Events expire from the system after a set period, which defaults to thirty days.