Known Issues

Topics

Limitations on KVInsight Monitoring Framework (Experimental)

Please carefully review the limitations listed below for this feature.

In this experimental release, there is no https support.
In this experimental release, KVInsight supports exporting metrics via CSV and Prometheus.

Limitations on UNION ALL and SUBQUERY (Experimental)

Please carefully review the limitations listed below for this feature.

This feature is only available in the Java Direct Driver or via SQL cli command. It is not yet available for usage with the SDK.

Limitations on the Last Write Metadata feature

Please carefully review the limitations listed below for this feature.

This feature is only available in the Java SDK, Go SDK, Pyhton SDK and Java Direct Driver. It is not yet available for Rust, Node.Js, C# SDK.

bulkPut with LastWriteMetadata (LWM) against older server may write rows that cannot be read by queries

When executing bulkPut operations that include LastWriteMetadata (LWM) using the kv-26.1.x driver against a kv-24.3.4 server or earlier (which does not support LWM), the operation may complete without returning an error.

In this mixed-version scenario, the row can be written to the store, but it may not be readable through queries because the older server cannot deserialize the newer format (deserialization occurs on the Replication Node). Direct key-based retrieval using get() may still succeed. Older-version clients may also be unable to read the row via queries.

This behavior is unexpected because LWM is not supported by the target server version; the expected behavior is a clear error indicating an incompatible/unsupported feature.

Workaround Do not use LWM when targeting servers older than the version that supports it.

[KVSTORE-3257]

Need a minimum of 5GB of free disk space to deploy a storage node that hosts an admin

If a Storage Node that hosts an admin is deployed on a system with less than 5GB of free disk space, the following exception will occur:

Connected to Admin in read-only mode
oracle.kv.impl.admin.Admin$DBOperationFailedException:Disk limit violated in Admin environment directory (the-version)

Make sure you have at least 5GB of free disk space to successfully deploy a storage node. This same problem will occur when deploying KVLite.

$ java -jar $KVHOME/lib/kvstore.jar kvlite -root ./ici
Process exiting due to fault
oracle.kv.impl.fault.CommandFaultException: Disk limit violated in Admin environment directory

We expect to remove this restriction in a future release.

[#26818]

JMX SSL Connection Failure with Recent Java

Newer JDKs enforce stricter TLS hostname verification for JMX/RMI. JConsole and the Collector Service may fail to connect over TLS if the server certificate lacks Subject Alternative Name (SAN) entries matching the hostname used. The default product-generated self-signed certificate has no SAN, so it may be rejected. Customers/admins must provide certificates with appropriate SANs (wildcard for a controlled DNS suffix or explicit host list). A new makebootconfig/securityconfig option allows generating certs with SAN hostnames. KVLite includes -host in SAN automatically.

[KVSTORE-3133]

Virtual threads on Java 21 (performance limitation)

When the Direct Java Driver API is called from Java 21 virtual threads, performance may be reduced due to virtual-thread pinning in certain blocking and synchronization scenarios in the overall application stack (including third-party dependencies). Customers using Java 21 virtual threads should validate and profile end-to-end performance.

This is primarily a Java platform limitation and is expected to improve in a future JDK (see JDK 25 / JEP 491).

[KVSTORE-2121]

Limitation on Diagnostics Utility

The diagnostics utility may fail to authenticate when an EdDSA (ed25519) SSH key is present in ~/.ssh. In this scenario, it can unexpectedly prompt for the user's password and then fail with an authentication error.

Error load ssh keys, Unsupported key type (ssh-ed25519)

To enable Ed25519 (EdDSA) support, add net.i2p.crypto.eddsa to the runtime classpath. It provides the required EdDSA implementation and prevents the utility from falling back to password-based authentication when an Ed25519 key is present in ~/.ssh.

java -cp kv/lib/kvstore.jar:eddsa.jar oracle.kv.impl.util.KVStoreMain diagnostics

[KVSTORE-1799]

Data Verifier is disabled by default

The data verifier is turned off by default. In some cases, the data verifier was using a lot of I/O bandwidth and causing the system to slow down. Users can turn on the data verifier by issuing the following two commands from the Admin CLI:

plan change-parameters -wait -all-rns -params "configProperties=je.env.runVerifier=true"
change-policy -params "configProperties=je.env.runVerifier=true"

Note that, if the store has services with preexisting settings for the configProperties parameter, then users will need to get the current values and merge them with the new setting to disable the verifier:

show param -service rg1-rn1
show param -policy

For example, suppose rg1-rn1 has set the following cleaner parameter:

kv-> show param -service rg1-rn1
[...]
configProperties=je.cleaner.minUtilization=40

When updating the configProperties parameter, the new setting for the verifier should be added, separating the existing settings with semicolons:

plan change-parameters -wait -all-rns -params "configProperties=je.cleaner.minUtilization=40;je.env.runVerifier=true"

[KVSTORE-639]

Users must manage Admin directory size, can put all admins into "RUNNING,UNKNOWN" state

Every Admin is allocated a maximum of 3 GB of disk space by default, which is sufficient space for the vast majority of installations. However, under some rare circumstances you might want to change this 3 GB limit, especially if the Admin is sharing a disk with a Storage Node. For more information, see Managing Admin Directory Size.

If Admins run out of disk space, then there will be entries in the Admin logs saying "Disk usage is not within je.maxDisk or je.freeDisk limits and write operations are prohibited" and the output of the ping command will show all the Admins in the "RUNNING,UNKNOWN" state. Follow the procedure described in Managing Admin Directory Size to bring the Admins back to the "RUNNING,MASTER" or "RUNNING,REPLICA" state.

Below is sample output of the ping command, verify configuration notes and log entries that indicate that Admin ran out of disk space.

Pinging components of store OUG based upon topology sequence #308
300 partitions and 1 storage nodes
Time: 2024-03-27 17:13:33 UTC   Version: 24.4.9
Shard Status: healthy: 3 writable-degraded: 0 read-only: 0 offline: 0 total: 3
Admin Status: read-only
Zone [name=Cloud id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]   RN Status: online: 3 read-only: 0 offline: 0
Storage Node [sn1] on node1-nosql: 5000    Zone: [name=Cloud id=zn1 type=PRIMARY allowArbiters=false masterAffinity=false]    Status: RUNNING   Ver: 24.4.9 2024-11-21 17:05:14 UTC  Build id: 95fa28ea4441 Edition: Enterprise    isMasterBalanced: true        serviceStartTime: 2024-03-27 17:12:20 UTC
        Admin [admin1]          Status: RUNNING,UNKNOWN serviceStartTime: 2024-03-27 17:13:22 UTC       stateChangeTime: 2024-03-27 17:13:22 UTC        availableStorageSize: -4 GB
        Rep Node [rg1-rn1]      Status: RUNNING,MASTER sequenceNumber: 245 haPort: 5011 availableStorageSize: -4 GB storageType: HD     serviceStartTime: 2024-03-27 17:12:34 UTC       stateChangeTime: 2024-03-27 17:12:35 UTC
        Rep Node [rg2-rn1]      Status: RUNNING,MASTER sequenceNumber: 242 haPort: 5012 availableStorageSize: -4 GB storageType: HD     serviceStartTime: 2024-03-27 17:12:37 UTC       stateChangeTime: 2024-03-27 17:12:38 UTC
        Rep Node [rg3-rn1]      Status: RUNNING,MASTER sequenceNumber: 241 haPort: 5013 availableStorageSize: -4 GB storageType: HD     serviceStartTime: 2024-03-27 17:12:41 UTC       stateChangeTime: 2024-03-27 17:12:42 UTC

Verification complete, 3 violations, 0 notes found.
Verification violation: [rg1-rn1]       Storage directory on rg1-rn1 has exceeded storage size
[/scratch/test/nosql/data/disk1 size: 10 GB available: -4 GB]
Verification violation: [rg2-rn1]       Storage directory on rg2-rn1 has exceeded storage size
[/scratch/test/nosql/data/disk2 size: 10 GB available: -4 GB]
Verification violation: [rg3-rn1]       Storage directory on rg3-rn1 has exceeded storage size
[/scratch/test/nosql/data/disk3 size: 10 GB available: -4 GB]


2024-03-27 17:13:12.643 UTC SEVERE - [admin1] JE: Disk usage is not within
the je.freeDisk limit and write operations are prohibited:
maxDiskLimit=3,221,225,472 freeDiskLimit=5,368,709,120 adjustedMaxDiskLimit=3,221,225,472
maxDiskOverage=-3,220,031,078 freeDiskShortage=4,543,057,920 diskFreeSpace=825,651,200
availableLogSize=-4,543,057,920 totalLogSize=1,194,394 activeLogSize=1,194,394
reservedLogSize=0 protectedLogSize=0 protectedLogSizeMap={}

[#26922]

Topology Changes May Fail During Software Upgrades

Making modifications to the store topology that include partition migration may fail if the modifications are performed while the store is being upgraded to a new software version. If you run a plan to deploy a new topology and the plan fails with problems during partition migration, check if the nodes of the store are running different software versions, and upgrade any nodes running old versions before retrying the plan.

Modifying a topology using one of the following topology commands can result in the need for partition migration. Deploying the resulting topology with the 'plan deploy-topology' command can then fail if the plan is performed during a store software version upgrade. The topology commands that can produce partition migrations are:

topology change-repfactor
topology contract
topology rebalance
topology redistribute

Other topology commands do not produce partition migration and do not cause this problem.

If a topology deployment fails, you can tell if it is related to partition migrations during a software version upgrade by looking for errors like the following:

Plan 24 ended with errors. Use "show plan -id 24" for more information
Plan Deploy Topo
Id:                    24
State:                 ERROR
Attempt number:        1
Started:               2025-04-10 15:19:59 UTC
Ended:                 2025-04-10 15:24:48 UTC
Plan failures:
	Failure 1: 17/MigratePartition PARTITION-2 from rg1 to rg2
	failed. target=rg2-rn1 state=ERROR java.lang.Exception:
	Migration of PARTITION-2 failed. Giving up after 10 attempt(s)

If you see a plan failure involving partition migrations like this, particularly if there are similar failures for all partition migration tasks, use the 'ping' or 'verify topology' commands to display information about the store and check to see if different storage nodes are running different major or minor software versions. If so, upgrade the nodes running the older software to the latest version before retrying the 'plan deploy-topology' command.

Store With Full Text Search May Become Unsynchronized

A store that has enabled support for Full Text Search may, on rare occasions, encounter a bug in which internal components of a master Replication Node become unsynchronized, causing updates from that Replication Node to stop flowing to the Elasticsearch engine. This problem will cause data to be out of sync between the store and Elasticsearch.

When the problem occurs, the Elasticsearch indices stop being populated. The problem involves the shutdown of the feeder channel for a component called the TextIndexFeeder, and is logged in the debug logs for the Replication Node. For example:

2018-03-16 11:23:46.055 UTC INFO [rg1-rn1] JE: Inactive channel: TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78(2147483647) forced close. Timeout: 10000ms.
2018-03-16 11:23:46.059 UTC INFO [rg1-rn1] JE: Shutting down feeder for replica TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 Reason: null write time:  32ms Avg write time: 100us
2018-03-16 11:23:46.060 UTC INFO [rg1-rn1] JE: Feeder Output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 soft shutdown initiated.
2018-03-16 11:23:46.064 UTC WARNING [rg1-rn1] internal exception Expected bytes: 6 read bytes: 0
com.sleepycat.je.utilint.InternalException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:719)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.run(SubscriptionThread.java:180)
Caused by: java.io.IOException: Expected bytes: 6 read bytes: 0
    at com.sleepycat.je.rep.utilint.BinaryProtocol.fillBuffer(BinaryProtocol.java:446)
    at com.sleepycat.je.rep.utilint.BinaryProtocol.read(BinaryProtocol.java:466)
    at com.sleepycat.je.rep.subscription.SubscriptionThread.loopInternal(SubscriptionThread.java:656)
    ... 1 more

2018-03-16 11:23:46.064 UTC INFO [rg1-rn1] SubscriptionProcessMessageThread soft shutdown initiated.
2018-03-16 11:23:46.492 UTC INFO [rg1-rn1] JE: Feeder output for TextIndexFeeder-rg1-rn1-b4e92291-3c73-4128-9557-62dbd4e9ac78 shutdown. feeder VLSN: 4,066 currentTxnEndVLSN: 4,065

If the TextIndexFeeder channel is shutdown, then the user can restore it by creating a dummy full text search index. Here is an example of how you can do that.

Assuming that Elasticsearch is already registered, execute the following commands from the Admin CLI:

execute 'CREATE TABLE dummy (id INTEGER,title STRING,PRIMARY KEY (id))'
execute 'CREATE FULLTEXT INDEX dummytextindex ON dummy (title)'
execute 'DROP TABLE dummy'

Note that dummy is the name of a temporary table that should not exist previously.

Creating a full text search index reestablishes the channel from the store to Elasticsearch and ensures that data is synced up to date.

[#26859]

Subscription Cannot Connect and Fails With InternalException

If a master transfer occurs due to a failure after the publisher is started and before a subscriber connects, an InternalException can occur when the subscriber tries to connect. The exception message will read "Failed to connect, will retry after sleeping 3000 ms". Restart the publisher to work around this problem.

[#27723]