Known Issues

Topology Changes May Fail During Software Upgrades

Making modifications to the store topology that include partition migration may fail if the modifications are performed while the store is being upgraded to a new software version. If you run a plan to deploy a new topology and the plan fails with problems during partition migration, check if the nodes of the store are running different software versions, and upgrade any nodes running old versions before retrying the plan.

Modifying a topology using one of the following topology commands can result in the need for partition migration. Deploying the resulting topology with the 'plan deploy-topology' command can then fail if the plan is performed during a store software version upgrade. The topology commands that can produce partition migrations are:

  • topology change-repfactor
  • topology contract
  • topology rebalance
  • topology redistribute

Other topology commands do not produce partition migration and do not cause this problem.

If a topology deployment fails, you can tell if it is related to partition migrations during a software version upgrade by looking for errors like the following:

Plan 24 ended with errors. Use "show plan -id 24" for more information
Plan Deploy Topo
Id:                    24
State:                 ERROR
Attempt number:        1
Started:               2020-04-10 15:19:59 UTC
Ended:                 2020-04-10 15:24:48 UTC
Plan failures:
	Failure 1: 17/MigratePartition PARTITION-2 from rg1 to rg2
	failed. target=rg2-rn1 state=ERROR java.lang.Exception:
	Migration of PARTITION-2 failed. Giving up after 10 attempt(s)

If you see a plan failure involving partition migrations like this, particularly if there are similar failures for all partition migration tasks, use the 'ping' or 'verify topology' commands to display information about the store and check to see if different storage nodes are running different major or minor software versions. If so, upgrade the nodes running the older software to the latest version before retrying the 'plan deploy-topology' command.

Enterprise Manager plug-in not compatible with EM 13.4.0.0 and later

Oracle NoSQL's Enterprise Manager (EM) plug-in is compatible with EM versions up to and including EM version 13.3.0.0. Because of architectural changes in EM's plug-in support, the plugin is not compatible with EM version 13.4.0.0 and subsequent versions.

[KVSTORE-141]

Limitations on Multi-Region Tables in This Release

The Multi-Region Tables feature in this release has the following limitations:

  • Specifying a non-zero TTL when inserting or updating a row in a Multi-Region table is only supported after upgrading the driver, and may fail until the local store has been completely upgraded. In addition, TTL expiration times will be lost when rows are replicated to a remote region if the multi-region agent or store for that region have not been upgraded. [#28165]
  • Only one service agent is supported for each remote region. [#28166]
  • Elasticity operations must not be performed on stores that contain multi-region tables. [#28164]
  • If the multi-region agent is unable to replicate data from a remote region for a long period of time, either due to network failure, a store failure, or a failure of the agent for other reasons, that may prevent table entries deleted in that remote region during the failure period from being deleted in the local region. [#28136]
  • The import, export, and snapshot commands should not be used to restore multi-region tables. The commands do not currently account for region information or modification times, so using these commands to restore a multi-region table to the contents from an earlier time may produce inconsistent results. [KVSTORE-444]
  • Multi-region tables created using the 19.5 release should not be used in this release or later releases. Conflict resolution for data in tables created in the 19.5 release may not be resolved correctly in later releases, meaning that multi-region tables may not contain the latest entry updated in a remote table. Note that upgrading of multi-region tables to later releases also received limited testing, so there may be other issues that have not been detected.
  • Only one XRegion Service agent is allowed for a store. [KVSTORE-984]
  • Initialization of multi-region table may lose deletion in remote region. [KVSTORE-986]

We expect all of these limitations to be removed in the future releases.

Updating Java Memory Settings after Release 18.1 Workaround

Starting with release 18.3, the Java heap overhead is explicitly accounted for via the new Storage Node parameter named jvmOverheadPercent, with a default value of 25%. If you are running a store using a version earlier than 18.3, and the store was configured with the workarounds suggested in the Memory Allocation Algorithm Fails to Account for Java Memory Overhead Can Produce OutOfMemoryErrors section of the 18.1 release notes, then you should make the following changes during the upgrade to an 18.3 or later release. The changes to make depends on whether you followed the first or second set workarounds, based on whether your configuration has more than 48 GiB of memory per RN.

If you used the first set of instructions in the release notes because your configuration no more than 48 GiB of memory per RN, then immediately before upgrading the store to release 18.3 or a later release, run the following Admin CLI commands:
  1. change-policy -params rnHeapPercent=68
  2. For each storage node, replacing snX as appropriate:

    plan change-parameters -service snX -wait -params rnHeapPercent=68
  3. After the upgrade, run the following Admin CLI command for each storage node, replacing snX as appropriate:

    plan change-parameters -service snX -wait -params memoryMB=0

You are done.

If you used the second set of instructions in the release notes because your configuration has more than 48 GiB of memory per RN, then run the following Admin CLI commands after the upgrade:
  1. change-policy -params systemPercent=10
  2. For each storage node, replacing snX as appropriate:

    plan change-parameters -service snX -wait -params systemPercent=10 memoryMB=0

You are done.

Note that making changes to multiple Storage Nodes to update Java memory settings may result in warnings in the debug logs regarding mismatched cache sizes such as:

2019-11-14 15:26:40.762 UTC WARNING - [rg1-rn3] JE: Mismatched cache sizes, feeder:516738252 replica: 375809638 feeder off-heap: 0 replica off-heap: 0

Once the changes are completed for all Storage Nodes, these warnings should not continue to be reported, and the temporary ones should be harmless.

[#27855]

Out-of-Order Processing During Streams API and Partition Migration

When an application uses the Streams API with a subscription that has multiple subscribers, and an elasticity operation is performed that involves a partition migration, the application may need to coordinate operations across subscribers. An elasticity change can cause the events being delivered for a given key to switch to a different subscriber. The Streams API delivers events in the proper order to the two subscribers, but it is up to the application to make sure that the subscribers perform actions for those events in the correct order. We hope to remove the need for this coordination in a future release.

[#27541]

Hive Used with Oracle Big Data SQL is Incompatible with Java 9, 10 and 11

Oracle NoSQL Database supports Oracle Big Data SQL using Apache Hive (TM) 1.2.1 and Hadoop-2.3.0-cdh5.1.0. The following warnings are generated when you use Java 9, 10, or 11 to start Hive:

Logging initialized using configuration in file:/scratch/kmtest/release/hadoop/hive/conf/hive-log4j.properties
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/scratch/kmtest/release/hadoop/hadoop-2.6.0-cdh5.4.8/share/hadoop/common/lib/hadoop-auth-2.6.0-cdh5.4.8.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    ...
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)

The incompatibility warnings are produced because this version of Hive is incompatible with module support introduced in Java 9. Use Java 8 to bring up Hive for Big Data SQL queries with Oracle NoSQL Database 19.1.

[#27565]

Import/Export to OCI using Migrator Tool requires Java 8

If you are using the Oracle NoSQL Database on-premises migrator tool and you want to import/export data to OCI classic object store then you must use Java 8. Other versions of Java are not supported.

Pre 19.1 Non-Java Drivers Still Require Java 8

Due to an incompatibility issue in the proxy, the Oracle NoSQL non-Java drivers released prior to 19.1 must continue to use Java 8 to run the proxy when connecting to an Oracle NoSQL Database 19.1 server that uses Java 10 or 11.

IDENTITY Column Definition Missing in Export Package

The Import/Export utility does not export the IDENTITY column property for a table into the export package DDL file (tableSchema.ddl). This is a bug and will be fixed in a future release. The user will notice the missing IDENTITY column property only during an import into an existing table using the export package. Here are possible scenarios:

  1. If the import table already exists and is non-empty, and the IDENTITY column is defined as GENERATED ALWAYS, the Oracle NoSQL Database will return an error saying that users cannot supply a value for GENERATED ALWAYS.
  2. If the import table already exists and is non-empty, and the IDENTITY column is defined as GENERATED BY DEFAULT, the Import/Export utility will return an error saying that the record is already present. The user can choose to overwrite the records by setting the import config file option overwrite to true.
  3. If the import table exists and is empty, and the IDENTITY column is defined as GENERATED ALWAYS, the Oracle NoSQL Database will return an error saying that users cannot supply a value for GENERATED ALWAYS.
  4. If the import table exists and is empty, and the IDENTITY column defined as GENERATED BY DEFAULT, the import will succeed, taking the values from the export package. The user can then set the START WITH value to the next value in the sequence using the ALTER TABLE command.
  5. If the import table does not exist, then import will create the table using the DDL in the export package that had the missing IDENTITY column property, thus losing knowledge of the original IDENTITY column. This problem will be fixed in a future release. The import will succeed as per the semantics of a table without an IDENTITY column.

For all of these options, you can add or modify the IDENTITY column property using the ALTER TABLE command. See IDENTITY column documentation for more details.

[#27562]

Export Hangs When Disk is Full at Sink

During an export, the Import/Export tool will hang if the sink runs out of disk space. This issue will be fixed in a future release. Users must restart the export after freeing up disk space at sink. The user will see a java.io.IOException: No space left on device if they had started export in -verbose mode.

java -jar /home/jinzha/mywork/kv/lib/kvtool.jar export -helper-hosts 192.168.56.1:5000 \
-store kvstore -export-all -config /home/jinzha/mywork/export.cfg -verbose
Enter command: export
2019-04-22 23:55:16.316 UTC Start migration with configuration:
{
  "configFileVersion" : 1,
  "abortOnError" : true,
  "source" : {
    "type" : "nosqldb",
    "helperHosts" : [ "192.168.56.1:5000" ],
    "storeName" : "kvstore"
  },
  "sink" : {
    "type" : "file",
    "format" : "binary",
    "path" : "/home/jinzha/mywork/data"
  }
}
2019-04-22 23:55:16.338 UTC TaskWaiter thread spawned.
2019-04-22 23:55:16.693 UTC Exporting table schema: users. TableVersion: 1
2019-04-22 23:55:16.695 UTC Creating a new RecordStream for SchemaDefinition. File segment number: 1. Chunk sequence: abcdefghijlk
2019-04-22 23:55:16.701 UTC WriteTask worker thread spawned for SchemaDefinition
2019-04-22 23:55:16.704 UTC [binary]: Exported 1 record from tableSchema: 0min 0sec 361ms
2019-04-22 23:55:16.729 UTC Exporting store data with configuration: consistency=null; requestTimeout=0ms
2019-04-22 23:55:16.773 UTC Creating a new RecordStream for users. File segment number: 1. Chunk sequence: abcdefghijlk
2019-04-22 23:55:16.788 UTC WriteTask worker thread spawned for users
2019-04-22 23:55:18.954 UTC Exception exporting users. Chunk sequence: abcdefghijlk
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.exportDataStream(LocalStoreOutput.java:211)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.doExport(LocalStoreOutput.java:149)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:639)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:620)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2019-04-22 23:56:06.705 UTC Exception exporting SchemaDefinition. Chunk sequence: abcdefghijlk
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.exportDataStream(LocalStoreOutput.java:211)
    at oracle.kv.util.expimp.utils.exp.LocalStoreOutput.doExport(LocalStoreOutput.java:149)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:639)
    at oracle.kv.util.expimp.utils.exp.AbstractStoreOutput$WriteTask.call(AbstractStoreOutput.java:620)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2019-04-22 23:56:16.708 UTC [binary]: Writing continue.., wait 1 minutes
[#27574]

Need a Minimum of 5 GB of Free Disk Space to Deploy a Storage Node That Hosts an Admin

If a Storage Node that hosts an admin is deployed on a system with less than 5 GB of free disk space, the following exception will occur:

Connected to Admin in read-only mode
(JE 18.1.8) Database AdminSchemaVersion not found. (18.1.3)

Make sure you have at least 5 GB of free disk space to successfully deploy a storage node. This same problem will occur when deploying KVLite. We expect to remove this restriction in a future release. [#26818]

Users Must Manage Admin Directory Size, Can Put All Admins Into "RUNNING,UNKNOWN" State

Every Admin is allocated a maximum of 3 GB of disk space by default, which is sufficient space for the vast majority of installations. However, under some rare circumstances you might want to change this 3 GB limit, especially if the Admin is sharing a disk with a Storage Node. For more information, see Managing Admin Directory Size.

If Admins run out of disk space, then there will be entries in the Admin logs saying "Disk usage is not within je.maxDisk or je.freeDisk limits and write operations are prohibited" and the output of the ping command will show all the Admins in the "RUNNING,UNKNOWN" state. Follow the procedure described in Managing Admin Directory Size to bring the Admins back to the "RUNNING,MASTER" or "RUNNING,REPLICA" state.

Below is sample output of the ping command and log entries that indicate that Admin ran out of disk space.
kv-> ping
Connected to Admin in read-only mode
Pinging components of store kvstore based upon topology sequence #106
90 partitions and 3 storage nodes
Time: 2018-04-03 08:20:22 UTC   Version: 18.3.0
Shard Status: healthy:3 writable-degraded:0 read-only:0 offline:0 total:3
Admin Status: read-only
Zone [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:9 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn1] on localhost:10000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin1]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn1]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:10011 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg2-rn1]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:10012 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg3-rn1]      Status: RUNNING,MASTER sequenceNumber:92 haPort:10013
Storage Node [sn2] on localhost:11000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin2]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn2]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:11021 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg2-rn2]      Status: RUNNING,MASTER sequenceNumber:93 haPort:11022
        Rep Node [rg3-rn2]      Status: RUNNING,REPLICA sequenceNumber:92 haPort:11023 delayMillis:0 catchupTimeSecs:0
Storage Node [sn3] on localhost:12000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin3]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn3]      Status: RUNNING,MASTER sequenceNumber:93 haPort:12011
        Rep Node [rg2-rn3]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:12012 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg3-rn3]      Status: RUNNING,REPLICA sequenceNumber:92 haPort:12013 delayMillis:0 catchupTimeSecs:0

2018-04-03 08:18:52.254 UTC SEVERE [admin1] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=83,086
freeDiskShortage=-6,945,071,104 diskFreeSpace=12,313,780,224
availableLogSize=-83,086 totalLogSize=2,180,238 activeLogSize=2,180,238
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

2018-04-03 08:19:34.808 UTC SEVERE [admin2] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=97,346
freeDiskShortage=-6,944,923,648 diskFreeSpace=12,313,632,768
availableLogSize=-97,346 totalLogSize=2,194,498 activeLogSize=2,194,498
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

2018-04-03 08:19:36.063 UTC SEVERE [admin3] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=101,698
freeDiskShortage=-6,944,923,648 diskFreeSpace=12,313,632,768
availableLogSize=-101,698 totalLogSize=2,198,850 activeLogSize=2,198,850
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}
[#26922]

Store With Full Text Search May Become Unsynchronized

A store that has enabled support for Full Text Search may, on rare occasions, encounter a bug in which internal components of a master Replication Node become unsynchronized, causing updates from that Replication Node to stop flowing to the Elasticsearch engine. This problem will cause data to be out of sync between the store and Elasticsearch.

When the problem occurs, the Elasticsearch indices stop being populated. The problem involves the shutdown of the feeder channel for a component called the TextIndexFeeder, and is logged in the debug logs for the Replication Node. For example:

2018-03-16 11:23:46.055 UTC INFO [rg1-rn1] JE: Inactive channel: TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78(2147483647) forced close. Timeout: 10000ms.
2018-03-16 11:23:46.059 UTC INFO [rg1-rn1] JE: Shutting down feeder for replica TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 Reason: null write time:  32ms Avg write time: 100us
2018-03-16 11:23:46.060 UTC INFO [rg1-rn1] JE: Feeder Output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 soft shutdown initiated.
2018-03-16 11:23:46.064 UTC WARNING [rg1-rn1] internal exception Expected bytes: 6 read bytes: 0
com.sleepycat.je.utilint.InternalException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:719)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.run(SubscriptionThread.java:180)
Caused by: java.io.IOException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.utilint.BinaryProtocol.fillBuffer(BinaryProtocol.java:446)
    at com.sleepycat.je.rep.utilint.BinaryProtocol.read(BinaryProtocol.java:466)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:656)
    ... 1 more

2018-03-16 11:23:46.064 UTC INFO [rg1-rn1] SubscriptionProcessMessageThread soft shutdown initiated.
2018-03-16 11:23:46.492 UTC INFO [rg1-rn1] JE: Feeder output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 shutdown. feeder VLSN: 4,066 currentTxnEndVLSN: 4,065

If the TextIndexFeeder channel is shutdown, then the user can restore it by creating a dummy full text search index. Here is an example of how you can do that.

Assuming that Elasticsearch is already registered, execute the following commands from the Admin CLI:

execute 'CREATE TABLE dummy (id INTEGER,title STRING,PRIMARY KEY (id))'
execute 'CREATE FULLTEXT INDEX dummytextindex ON dummy (title)'
execute 'DROP TABLE dummy'	

Note that dummy is the name of a temporary table that should not exist previously.

Creating a full text search index reestablishes the channel from the store to Elasticsearch and ensures that data is synced up to date. [#26859]

Data Verifier is Disabled By Default

The data verifier is turned off by default. In some cases, the data verifier was using a lot of I/O bandwidth and causing the system to slow down. Users can turn on the data verifier by issuing the following two commands from the Admin CLI:

plan change-parameters -wait -all-rns -params "configProperties=je.env.runVerifier=false"
change-policy -params "configProperties=je.env.runVerifier=false"

Note that, if the store has services with preexisting settings for the configProperties parameter, then users will need to get the current values and merge them with the new setting to disable the verifier:

show param -service rg1-rn1
show param -policy

For example, suppose rg1-rn1 has set the following cleaner parameter:

kv-> show param -service rg1-rn1
[...]
configProperties=je.cleaner.minUtilization=40

When updating the configProperties parameter, the new setting for the verifier should be added, separating the existing settings with semicolons:

plan change-parameters -wait -all-rns -params "configProperties=je.cleaner.minUtilization=40;je.env.runVerifier=false" 
[KVSTORE-639]

Subscription Cannot Connect and Fails With InternalException

If a master transfer occurs due to a failure after the publisher is started and before a subscriber connects, an InternalException can occur when the subscriber tries to connect. The exception message will read "Failed to connect, will retry after sleeping 3000 ms". Restart the publisher to work around this problem. [#27723]