Viewing Key Distribution Statistics
As you might already know, Oracle NoSQL Database stores the data by distributing the rows across all the partitions by hashing each row’s shard key. Based on the activity in your store's tables, Oracle NoSQL Database collects the key distribution data into internally managed system tables. As needed, you can access these statistics by querying these system tables.
As an Oracle NoSQL Database administrator, you may encounter many situations where you need to view the key distribution statistics. To discuss one such use-case, consider a situation where you are not able achieve the expected amount of throughput for your Oracle NoSQL Database in spite of having multiple shards in your cluster. This might happen if the data in your store is not distributed across the shards evenly. In order to confirm if this is the reason behind low throughput, you need a mechanism to understand how the data is distributed across the Oracle NoSQL Database cluster. The Key Distribution Statistics provided by the Oracle NoSQL Database can help you understand the data distribution across multiple partitions and shards in your store.
The two system tables into which the Oracle NoSQL Database collects the key distribution statistics are:
-
SYS$TableStatsPartition
-
SYS$TableStatsIndex
Oracle NoSQL Database manages and maintains these
system tables internally. When you enable security on your store, these system tables
are read-only. Regardless of security, the schema for system tables is immutable. The
name of system tables is defined with the prefix SYS$
. You are not
allowed to create any other table name using this reserved prefix.
SYS$TableStatsPartition
This table stores the table key statistics at the partition level. It contains a row for each partition for every table. For example, if you created a store with 100 partitions, this table contains 100 rows for every table in your store. The statistics stored per partition for each table in your store are:
- The number of rows stored
- The average size of keys in bytes
- The size in bytes consumed by the rows
SYS$TableStatsPartition
table is as
below:
Column | Data Type | Description |
---|---|---|
tableName |
string | Name of the table whose Key Distribution Statistics are being stored. |
partitionId |
integer | Partition ID |
shardId |
integer | Shard ID |
count |
long | Number of rows stored. |
avgKeySize |
integer | The average size of keys in bytes. |
tableSize |
long | The size in bytes consumed by the rows. |
tableSizeWithTombstones |
long | The table storage in bytes including tombstones. |
Note:
The tombstone is a small piece of storage when a row in a multi-region table is deleted. This per-row storage overhead is used to keep some metadata of the deleted row, which will be used in conflict resolution when another row is replicated from remote regions. The tombstone will expire in 7 days after it is created and the storage will be released. For tables without tombstones (For example, non-multi-region tables, system tables, etc,) the metrictableSizeWithTombstones
would be the same as the metric
tableSize
in the system table. The difference between the two
metrics is the total storage size of tombstones in the table.
SYS$TableStatsIndex
This table stores the index key statistics at the shard level. This table contains a row for each shard for every index. You do not have direct control over the number of shards created in your store, but you can always view the store topology to know how many shards are created in your store. For more information, see show topology.
The statistics stored per shard for each table in your store are:
- The number of index rows
- The average size of the index keys in bytes
- The size in bytes consumed by the index rows
The structure of SYS$TableStatsIndex
system table is as below:
Column | Data Type | Description |
---|---|---|
tableName |
string | Name of the table whose Key Distribution Statistics are being stored. |
indexName |
string | Name of the index |
shardId |
integer | Shard ID |
count |
long | Number of index rows stored. |
avgKeySize |
integer | Average size of index keys in bytes. |
indexSize |
long | The size in bytes consumed by the index rows. |
The gathering of the key distribution statistics into the system tables is determined by two parameters:
rnStatisticsEnabled
:In Oracle NoSQL Database, the Key Distribution Statistics are enabled by default for all newly created stores. You can disable the capturing of these statistics by executing the following command from Admin Command Line Interface (CLI):
plan change-parameters -wait -all-rns -params "rnStatisticsEnabled=false"
rnStatisticsGatherInterval
:In Oracle NoSQL Database, the default time interval between two consecutive updates on
SYS$TableStatsPartition
andSYS$TableStatsIndex
is 24 hours. You can change the time interval between the capture of these statistics by modifying thernStatisticsGatherInterval
parameter. The time unit specified must be in days, hours, or minutes.For example, to instruct Oracle NoSQL Database to collect the Key Distribution Statistics after every minute, execute the following command from Admin Command Line Interface (CLI):plan change-parameters -wait -all-rns -params "rnStatisticsGatherInterval=1 min"
Note:
Enabling the Key Distribution Statistics does not immediately trigger the collection of statistics. Oracle NoSQL Database initiates the statistics collection at a time based on the collection interval defined by thernStatisticsGatherInterval
parameter.
rnStatisticsGatherInterval
is not
modified, the Key Distribution Statistics will be collected around midnight.
You can query the system tables to get key distribution data or review the gathering process.
In order to get a complete set of statistics for a given table, you must
aggregate the per-partition values stored for that table in the
SYS$TableStatsPartition
system table.
For example, to get the total number of rows in a table named
myTable
, you must sum the values in the count column for all the
rows in the SYS$TableStatsPartition
table where tableName =
myTable.
Example Query:
sql-> select * from SYS$TableStatsPartition where tableName = 'myTable';
Result:
{"tableName":"myTable","partitionId":8,"shardId":3,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":9,"shardId":4,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":1,"shardId":1,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":4,"shardId":2,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":7,"shardId":3,"count":50,"avgKeySize":15,"tableSize":103}
{"tableName":"myTable","partitionId":10,"shardId":4,"count":50,"avgKeySize":15,"tableSize":103}
{"tableName":"myTable","partitionId":5,"shardId":2,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":6,"shardId":2,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":2,"shardId":1,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":3,"shardId":1,"count":0,"avgKeySize":0,"tableSize":0}
In the above result, observe that there are 50 keys each in
"partitionId":7,"shardId":3
and
"partitionId":10,"shardId":4
whereas all the other partitions and
shards are empty. This shows that the key data is not distributed evenly across all the
partitions and shards.
Similarly, you can query the SYS$TableStatsIndex
system
table to read the index key distribution statistics for a given table at the shard
level.
For example, to get the total number of index rows in a table named
myTable
, you must sum the values in the count column for all the
index rows in the SYS$TableStatsIndex
table where tableName =
myTable.
Example Query:
sql-> select * from SYS$TableStatsIndex where tableName = 'myTable';
Result:
{"tableName":"myTable","indexName":"idx_shard_key","shardId":3,"count":50,"avgKeySize":1,"indexSize":75}
{"tableName":"myTable","indexName":"idx_shard_key","shardId":4,"count":50,"avgKeySize":1,"indexSize":75}
{"tableName":"myTable","indexName":"idx_shard_key","shardId":1,"count":0,"avgKeySize":0,"indexSize":0}
{"tableName":"myTable","indexName":"idx_shard_key","shardId":2,"count":0,"avgKeySize":0,"indexSize":0}
As you can see from the above result, there are 50 index keys each in
"shardId":3
and "shardId":4
whereas all the other
shards are empty. This shows that the index key data is not distributed evenly across
all the shards.
After collecting the key distribution statistics, they are retained in the
system tables for a fixed time period. This value is determined by the
rnStatisticsTTL
parameter. By default, these statistics are
retained for 60 days. However, you can change this value by executing the
change-parameters plan from the Admin CLI. The time unit specified must be in days or
hours.
For example, execute the following command from Admin Command Line Interface (CLI) to retain the Key Data Statistics in the system tables for 90 days:
plan change-parameters -wait -all-rns -params "rnStatisticsTTL=90 days"
Few points to note are:
- Any changes that you make to the
rnStatisticsTTL
parameter will not be applied to the existing rows in theSYS$TableStatsPartition
andSYS$TableStatsIndex
tables. They will take effect only after the next gathering scan. - If you disable the collection of Key Distribution Statistics, all the rows present in the system tables will expire after the current Time to Live (TTL) period.
- If you drop any tables or indexes in your store, their statistics rows present in the system tables will also expire after the TTL period.
- Even if you change the
rnStatisticsTTL
to a value less thanrnStatisticsGatherInterval
, all the existing statistics rows will only expire as the TTL value defined during the last scan. rnStatisticsTTL
can be set to 0 days. However, this is not recommended as it disables automatic removal of the statistics rows.