Known Issues

Topics

Performance Issues with Virtual Threads

The Direct Java Driver API needs to be updated to optimize its performance if the API methods are called from the virtual threads introduced in Java 21. These performance issues will be addressed in a future release.

[KVSTORE-2121]

Topology Changes May Fail During Software Upgrades

Making modifications to the store topology that include partition migration may fail if the modifications are performed while the store is being upgraded to a new software version. If you run a plan to deploy a new topology and the plan fails with problems during partition migration, check if the nodes of the store are running different software versions, and upgrade any nodes running old versions before retrying the plan.

Modifying a topology using one of the following topology commands can result in the need for partition migration. Deploying the resulting topology with the 'plan deploy-topology' command can then fail if the plan is performed during a store software version upgrade. The topology commands that can produce partition migrations are:

topology change-repfactor
topology contract
topology rebalance
topology redistribute

Other topology commands do not produce partition migration and do not cause this problem.

If a topology deployment fails, you can tell if it is related to partition migrations during a software version upgrade by looking for errors like the following:

Plan 24 ended with errors. Use "show plan -id 24" for more information
Plan Deploy Topo
Id:                    24
State:                 ERROR
Attempt number:        1
Started:               2020-04-10 15:19:59 UTC
Ended:                 2020-04-10 15:24:48 UTC
Plan failures:
	Failure 1: 17/MigratePartition PARTITION-2 from rg1 to rg2
	failed. target=rg2-rn1 state=ERROR java.lang.Exception:
	Migration of PARTITION-2 failed. Giving up after 10 attempt(s)

If you see a plan failure involving partition migrations like this, particularly if there are similar failures for all partition migration tasks, use the 'ping' or 'verify topology' commands to display information about the store and check to see if different storage nodes are running different major or minor software versions. If so, upgrade the nodes running the older software to the latest version before retrying the 'plan deploy-topology' command.

Limitations on Multi-Region Tables in This Release

The Multi-Region Tables feature in this release has the following limitations:

Specifying a non-zero TTL when inserting or updating a row in a Multi-Region Table is only supported after upgrading both the driver and the store to release 20.3 or later, and may fail until the local store has been completely upgraded. In addition, TTL expiration times will be lost when rows are replicated to a remote region if the multi-region agent or store for that region have not been upgraded. [#28165]

Out-of-Order Processing During Streams API and Partition Migration

When an application uses the Streams API with a subscription that has multiple subscribers, and an elasticity operation is performed that involves a partition migration, the application may need to coordinate operations across subscribers. An elasticity change can cause the events being delivered for a given key to switch to a different subscriber. The Streams API delivers events in the proper order to the two subscribers, but it is up to the application to make sure that the subscribers perform actions for those events in the correct order. We hope to remove the need for this coordination in a future release.

[#27541]

Need a Minimum of 5 GB of Free Disk Space to Deploy a Storage Node That Hosts an Admin

If a Storage Node that hosts an admin is deployed on a system with less than 5 GB of free disk space, the following exception will occur:

Connected to Admin in read-only mode
(JE 18.1.8) Database AdminSchemaVersion not found. (18.1.3)

Make sure you have at least 5 GB of free disk space to successfully deploy a storage node. This same problem will occur when deploying KVLite. We expect to remove this restriction in a future release.

[#26818]

Users Must Manage Admin Directory Size, Can Put All Admins Into "RUNNING,UNKNOWN" State

Every Admin is allocated a maximum of 3 GB of disk space by default, which is sufficient space for the vast majority of installations. However, under some rare circumstances you might want to change this 3 GB limit, especially if the Admin is sharing a disk with a Storage Node. For more information, see Managing Admin Directory Size.

If Admins run out of disk space, then there will be entries in the Admin logs saying "Disk usage is not within je.maxDisk or je.freeDisk limits and write operations are prohibited" and the output of the ping command will show all the Admins in the "RUNNING,UNKNOWN" state. Follow the procedure described in Managing Admin Directory Size to bring the Admins back to the "RUNNING,MASTER" or "RUNNING,REPLICA" state.

Below is sample output of the ping command and log entries that indicate that Admin ran out of disk space.

kv-> ping
Connected to Admin in read-only mode
Pinging components of store kvstore based upon topology sequence #106
90 partitions and 3 storage nodes
Time: 2018-04-03 08:20:22 UTC   Version: 18.3.0
Shard Status: healthy:3 writable-degraded:0 read-only:0 offline:0 total:3
Admin Status: read-only
Zone [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    RN Status: online:9 offline:0 maxDelayMillis:0 maxCatchupTimeSecs:0
Storage Node [sn1] on localhost:10000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin1]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn1]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:10011 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg2-rn1]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:10012 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg3-rn1]      Status: RUNNING,MASTER sequenceNumber:92 haPort:10013
Storage Node [sn2] on localhost:11000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin2]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn2]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:11021 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg2-rn2]      Status: RUNNING,MASTER sequenceNumber:93 haPort:11022
        Rep Node [rg3-rn2]      Status: RUNNING,REPLICA sequenceNumber:92 haPort:11023 delayMillis:0 catchupTimeSecs:0
Storage Node [sn3] on localhost:12000
    Zone: [name=Houston id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]
    Status: RUNNING   Ver: 18.3.0 2018-04-03 05:36:25 UTC  Build id: ec627ef967d6 Edition: Enterprise
        Admin [admin3]          Status: RUNNING,UNKNOWN
        Rep Node [rg1-rn3]      Status: RUNNING,MASTER sequenceNumber:93 haPort:12011
        Rep Node [rg2-rn3]      Status: RUNNING,REPLICA sequenceNumber:93 haPort:12012 delayMillis:0 catchupTimeSecs:0
        Rep Node [rg3-rn3]      Status: RUNNING,REPLICA sequenceNumber:92 haPort:12013 delayMillis:0 catchupTimeSecs:0

2018-04-03 08:18:52.254 UTC SEVERE [admin1] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=83,086
freeDiskShortage=-6,945,071,104 diskFreeSpace=12,313,780,224
availableLogSize=-83,086 totalLogSize=2,180,238 activeLogSize=2,180,238
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

2018-04-03 08:19:34.808 UTC SEVERE [admin2] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=97,346
freeDiskShortage=-6,944,923,648 diskFreeSpace=12,313,632,768
availableLogSize=-97,346 totalLogSize=2,194,498 activeLogSize=2,194,498
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

2018-04-03 08:19:36.063 UTC SEVERE [admin3] JE: Disk usage is not within
je.maxDisk or je.freeDisk limits and write operations are prohibited:
maxDiskLimit=2,097,152 freeDiskLimit=5,368,709,120
adjustedMaxDiskLimit=2,097,152 maxDiskOverage=101,698
freeDiskShortage=-6,944,923,648 diskFreeSpace=12,313,632,768
availableLogSize=-101,698 totalLogSize=2,198,850 activeLogSize=2,198,850
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

[#26922]

Store With Full Text Search May Become Unsynchronized

A store that has enabled support for Full Text Search may, on rare occasions, encounter a bug in which internal components of a master Replication Node become unsynchronized, causing updates from that Replication Node to stop flowing to the Elasticsearch engine. This problem will cause data to be out of sync between the store and Elasticsearch.

When the problem occurs, the Elasticsearch indices stop being populated. The problem involves the shutdown of the feeder channel for a component called the TextIndexFeeder, and is logged in the debug logs for the Replication Node. For example:

2018-03-16 11:23:46.055 UTC INFO [rg1-rn1] JE: Inactive channel: TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78(2147483647) forced close. Timeout: 10000ms.
2018-03-16 11:23:46.059 UTC INFO [rg1-rn1] JE: Shutting down feeder for replica TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 Reason: null write time:  32ms Avg write time: 100us
2018-03-16 11:23:46.060 UTC INFO [rg1-rn1] JE: Feeder Output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 soft shutdown initiated.
2018-03-16 11:23:46.064 UTC WARNING [rg1-rn1] internal exception Expected bytes: 6 read bytes: 0
com.sleepycat.je.utilint.InternalException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:719)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.run(SubscriptionThread.java:180)
Caused by: java.io.IOException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.utilint.BinaryProtocol.fillBuffer(BinaryProtocol.java:446)
    at com.sleepycat.je.rep.utilint.BinaryProtocol.read(BinaryProtocol.java:466)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:656)
    ... 1 more

2018-03-16 11:23:46.064 UTC INFO [rg1-rn1] SubscriptionProcessMessageThread soft shutdown initiated.
2018-03-16 11:23:46.492 UTC INFO [rg1-rn1] JE: Feeder output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 shutdown. feeder VLSN: 4,066 currentTxnEndVLSN: 4,065

If the TextIndexFeeder channel is shutdown, then the user can restore it by creating a dummy full text search index. Here is an example of how you can do that.

Assuming that Elasticsearch is already registered, execute the following commands from the Admin CLI:

execute 'CREATE TABLE dummy (id INTEGER,title STRING,PRIMARY KEY (id))'
execute 'CREATE FULLTEXT INDEX dummytextindex ON dummy (title)'
execute 'DROP TABLE dummy'

Note that dummy is the name of a temporary table that should not exist previously.

Creating a full text search index reestablishes the channel from the store to Elasticsearch and ensures that data is synced up to date.

[#26859]

Data Verifier is Disabled By Default

The data verifier is turned off by default. In some cases, the data verifier was using a lot of I/O bandwidth and causing the system to slow down. Users can turn on the data verifier by issuing the following two commands from the Admin CLI:

plan change-parameters -wait -all-rns -params "configProperties=je.env.runVerifier=false"
change-policy -params "configProperties=je.env.runVerifier=false"

Note that, if the store has services with preexisting settings for the configProperties parameter, then users will need to get the current values and merge them with the new setting to disable the verifier:

show param -service rg1-rn1
show param -policy

For example, suppose rg1-rn1 has set the following cleaner parameter:

kv-> show param -service rg1-rn1
[...]
configProperties=je.cleaner.minUtilization=40

When updating the configProperties parameter, the new setting for the verifier should be added, separating the existing settings with semicolons:

plan change-parameters -wait -all-rns -params "configProperties=je.cleaner.minUtilization=40;je.env.runVerifier=false"

[KVSTORE-639]

Subscription Cannot Connect and Fails With InternalException

If a master transfer occurs due to a failure after the publisher is started and before a subscriber connects, an InternalException can occur when the subscriber tries to connect. The exception message will read "Failed to connect, will retry after sleeping 3000 ms". Restart the publisher to work around this problem.

[#27723]