Changes in 11gR2.2.0.26

The following changes were made in Oracle NoSQL Database 11gR2.2.0.26.

New Features

  1. This release adds the capability to remove an Admin service replica. If you have deployed more than one Admin, you can remove one of them using the following command:

    plan remove-admin -admin <adminId> 

    You cannot remove the sole Admin if only one Admin instance is configured.

    For availability and durability reasons, it is highly recommended that you maintain at least three Admin instances at all times. For that reason, if you try to remove an Admin when the removal would result in there being fewer than three, the command will fail unless you give the -force flag.

    If you try to remove the Admin that is currently the master, mastership will transfer to another Admin. The plan will be interrupted, and subsequently can be re-executed on the new master Admin. To re-execute the interrupted plan, you would use this command:

    plan execute -id <planId>
  2. The Admin CLI verify has an added check to verify that the Replication Nodes hosted on a single Storage Node have memory settings that fit within the Storage Node's memory budget. This guards against mistakes that may occur if the system administrator overrides defaults and manually sets Replication Node heap sizes. [#21727]

  3. The Admin CLI verify command now labels any verification issues as violations or notes. Violations are of greater importance, and the system administrator should determine how to adjust the system to address the problem. Notes are warnings, and are of lesser importance. [#21950]

Bug fixes

  1. Several corrections were made to latency statistics. These corrections apply to the service-side statistics in the Admin console, CLI show perf command, .perf files and .csv files, as well as the client-side statistics returned by KVStore.getStats. However, corrections to the 95% and 99% values do not apply to the client-side statistics, since these values do not appear in the client-side API. [#21763]

    • The definition of latency has been corrected for the "multi" operation requests (multiGet, multiDelete, execute, etc). These are labeled "multi" in the Op Type column where latency information is displayed. The previous definition was "latency in milliseconds per operation" while the new definition is "latency in milliseconds per request". In other words, for a "multi" operation request, latency now applies to the entire request rather than to each operation. For "single" operation requests, the definition of latency has not changed.

    • To go along with the change above, a new column containing the number of requests in the sample has been added to all latency information displays: TotalReq. This is also available for client-side statistics using the new OperationMetrics.getTotalRequests method. For "multi" operation requests, the total number of requests is normally smaller than the total number of operations (the TotalOpscolumn). For "single" operation requests, the total number of requests and operations are equal.

    • Improved the consistency of the values reported in each sample so that, for example, the minimum latency is always less than the maximum latency. However, note that statistics are collected without synchronization to avoid impacting performance, and for small sample sizes the values in a sample are not always accurate or self-consistent.

    • Fixed a bug that caused the 95% and 99% values to show the maximum latency recorded (within 1000 ms), rather than the lowest 95% or 99% as intended. This bug only applied to the "multi" operation requests.

    • Fixed a bug that caused the 95% and 99% values to sometimes mistakenly appear as -1. These values should only appear as -1 when there were no operations in the sample with a latency below 1000 ms.

  2. Modified the Administration Process to allocate ports from within a port range if one is specified by the -servicerange argument to the makebootconfig utility. If the argument is not specified the Administration Process will use any available port. Please see the Configuring the Firewall for details regarding the configuration of ports used by Oracle NoSQL Database. [#21962]

  3. Modified the replication node to handle the unlikely case that the locally stored topology is missing. A missing topology results in a java.lang.NullPointerException being thrown in the TopologyManager and will prevent the replication node from starting. [#22015]

  4. Replication Node memory calculations are more robust for Storage Nodes that host multiple Replication Nodes. In previous releases, using the plan change-params command to reduce the capacity parameter for a Storage Node which hosts multiple Replication Nodes could result in an over aggressive increase in RN heap, which would make the Replication Nodes fail at start up. The problem would be fixed when a topology was rebalanced, but until that time, the Replication Nodes were unavailable. The default memory sizing calculation now factors in the number of RNs resident on a Storage Node, and adjusts RN heap sizes as Replication Nodes are relocated by the deploy-topology command. [#21942]

  5. Fixed a bug that could cause a NullPointerException, such as the one below, during RN start-up. The exception would appear in the RN log and the RN would fail to start. The conditions under which this problem occurred include partition migration between shards along with multiple abnormal RN shutdowns. If this bug is encountered, it can be corrected by upgrading to the current release, and no data loss will occur. [#22052]

    Exception in thread "main" com.sleepycat.je.EnvironmentFailureException: (JE
    5.0.XX) ...  last LSN=.../... LOG_INTEGRITY: Log information is incorrect,
    problem is likely persistent. Environment is invalid and must be closed.
        at com.sleepycat.je.recovery.RecoveryManager.traceAndThrowException(RecoveryManager.java:2793)
        at com.sleepycat.je.recovery.RecoveryManager.undoLNs(RecoveryManager.java:1097)
        at com.sleepycat.je.recovery.RecoveryManager.buildTree(RecoveryManager.java:587)
        at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:198)
        at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:610)
        at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:208)
        at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:246)
        at com.sleepycat.je.Environment.<init>(Environment.java:227)
        at com.sleepycat.je.Environment.<init>(Environment.java:170)
        ...
    Caused by: java.lang.NullPointerException
        at com.sleepycat.je.log.entry.LNLogEntry.postFetchInit(LNLogEntry.java:406)
        at com.sleepycat.je.txn.TxnChain.<init>(TxnChain.java:133)
        at com.sleepycat.je.txn.TxnChain.<init>(TxnChain.java:84)
        at com.sleepycat.je.recovery.RollbackTracker$RollbackPeriod.getChain(RollbackTracker.java:1004)
        at com.sleepycat.je.recovery.RollbackTracker$Scanner.rollback(RollbackTracker.java:477)
        at com.sleepycat.je.recovery.RecoveryManager.undoLNs(RecoveryManager.java:1026)
        ... 10 more
  6. Fixed a bug that causes excess memory to be used in the storage engine cache on an RN, which could result in poor performance as a result of cache eviction and additional I/O. The problem occurred only when the KVStore.storeIterator or KVStore.storeKeysIterator method was used. [#21973]

Performance and Other General Changes

  1. The replicas in a shard now dynamically configure the JE propertyRepParams.REPLAY_MAX_OPEN_DB_HANDLES which controls the size of the cache used to hold database handles during replication. The cache size is determined dynamically based upon the number of partitions currently hosted by the shard. This improved cache sizing can result in better write performance for shards hosting large numbers of partitions. [#21967]

  2. The names of the client and server JAR files no longer include release version numbers. The files are now called:

    lib/kvstore.jar
    lib/kvclient.jar

    This change should reduce the amount of work needed to switch to a new release because the names of JAR files will no longer change between releases. Note that the name of the installation directory continues to include the release version number. [#22034]

  3. A SEVERE level message is now logged and an admin alert is fired when the storage engine's average log cleaner (disk reclamation) backlog increases over time. An example of the message text is below.

    121215 13:48:57:480 SEVERE [...] Average cleaner backlog has grown from 0.0 to
    6.4. If the cleaner continues to be unable to make progress, the JE cache size
    and/or number of cleaner threads are probably too small. If this is not
    corrected, eventually all available disk space will be used.

    For more information on setting the cache size appropriately to avoid such problems, see "Determining the Per-Node Cache Size" in the Administrator's Guide. [#21111]

  4. The storage engine's log cleaner will now delete files in the latter portion of the log, even when the application is not performing any write operations. Previously, files were prohibited from being deleted in the portion of the log after the last application write. When a log cleaner backlog was present (for example, when the cache had been configured too small, relative to the data set size and write rate), this could cause the cleaner to operate continuously without being able to delete files or make forward progress. [#21069]

  5. NoSQL DB 2.0.23 introduced a performance regression over R1.2.23. The kvstore client library and Replication Node consumed a greater percentage of system CPU time. This regression has been fixed. [#22096]