Viewing Key Distribution Statistics

As you might already know, Oracle NoSQL Database stores the data by distributing the rows across all the partitions by hashing each row’s shard key. Based on the activity in your store's tables, Oracle NoSQL Database collects the key distribution data into internally managed system tables. As needed, you can access these statistics by querying these system tables.

As an Oracle NoSQL Database administrator, you may encounter many situations where you need to view the key distribution statistics. To discuss one such use-case, consider a situation where you are not able achieve the expected amount of throughput for your Oracle NoSQL Database in spite of having multiple shards in your cluster. This might happen if the data in your store is not distributed across the shards evenly. In order to confirm if this is the reason behind low throughput, you need a mechanism to understand how the data is distributed across the Oracle NoSQL Database cluster. The Key Distribution Statistics provided by the Oracle NoSQL Database can help you understand the data distribution across multiple partitions and shards in your store.

The two system tables into which the Oracle NoSQL Database collects the key distribution statistics are:

  • SYS$TableStatsPartition

  • SYS$TableStatsIndex

Oracle NoSQL Database manages and maintains these system tables internally. When you enable security on your store, these system tables are read-only. Regardless of security, the schema for system tables is immutable. The name of system tables is defined with the prefix SYS$. You are not allowed to create any other table name using this reserved prefix.

SYS$TableStatsPartition

This table stores the table key statistics at the partition level. It contains a row for each partition for every table. For example, if you created a store with 100 partitions, this table contains 100 rows for every table in your store. The statistics stored per partition for each table in your store are:

  1. The number of rows stored
  2. The average size of keys in bytes
  3. The size in bytes consumed by the rows

The structure of the SYS$TableStatsPartition table is as below:

Column Data Type Description
tableName string Name of the table whose Key Distribution Statistics are being stored.
partitionId integer Partition ID
shardId integer Shard ID
count long Number of rows stored.
avgKeySize integer The average size of keys in bytes.
tableSize long The size in bytes consumed by the rows.
SYS$TableStatsIndex

This table stores the index key statistics at the shard level. This table contains a row for each shard for every index. You do not have direct control over the number of shards created in your store, but you can always view the store topology to know how many shards are created in your store. For more information, see show topology.

The statistics stored per shard for each table in your store are:

  1. The number of index rows
  2. The average size of the index keys in bytes
  3. The size in bytes consumed by the index rows

The structure of SYS$TableStatsIndex system table is as below:

Column Data Type Description
tableName string Name of the table whose Key Distribution Statistics are being stored.
indexName string Name of the index
shardId integer Shard ID
count long Number of index rows stored.
avgKeySize integer Average size of index keys in bytes.
indexSize long The size in bytes consumed by the index rows.
Gathering the Key Distribution Statistics

The gathering of the key distribution statistics into the system tables is determined by two parameters:

  • rnStatisticsEnabled:

    In Oracle NoSQL Database, the Key Distribution Statistics are enabled by default for all newly created stores. You can disable the capturing of these statistics by executing the following command from Admin Command Line Interface (CLI):

    plan change-parameters -wait -all-rns -params "rnStatisticsEnabled=false"
  • rnStatisticsGatherInterval:

    In Oracle NoSQL Database, the default time interval between two consecutive updates on SYS$TableStatsPartition and SYS$TableStatsIndex is 24 hours. You can change the time interval between the capture of these statistics by modifying the rnStatisticsGatherInterval parameter. The time unit specified must be in days, hours, or minutes.

    For example, to instruct Oracle NoSQL Database to collect the Key Distribution Statistics after every minute, execute the following command from Admin Command Line Interface (CLI):
    plan change-parameters -wait -all-rns -params "rnStatisticsGatherInterval=1 min"

Note:

Enabling the Key Distribution Statistics does not immediately trigger the collection of statistics. Oracle NoSQL Database initiates the statistics collection at a time based on the collection interval defined by the rnStatisticsGatherInterval parameter.
For example, if time interval is defined as 30 minutes, the statistics collection will be triggered at the half-an-hour marks such as 12:00, 12:30, 1:00, and so on. However, if the default value of rnStatisticsGatherInterval is not modified, the Key Distribution Statistics will be collected around midnight.
Reading the Key Distribution Statistics

You can query the system tables to get key distribution data or review the gathering process.

In order to get a complete set of statistics for a given table, you must aggregate the per-partition values stored for that table in the SYS$TableStatsPartition system table.

For example, to get the total number of rows in a table named myTable, you must sum the values in the count column for all the rows in the SYS$TableStatsPartition table where tableName = myTable.

Example Query:

sql-> select * from SYS$TableStatsPartition where tableName = 'myTable';

Result:

{"tableName":"myTable","partitionId":8,"shardId":3,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":9,"shardId":4,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":1,"shardId":1,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":4,"shardId":2,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":7,"shardId":3,"count":50,"avgKeySize":15,"tableSize":103}
{"tableName":"myTable","partitionId":10,"shardId":4,"count":50,"avgKeySize":15,"tableSize":103}
{"tableName":"myTable","partitionId":5,"shardId":2,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":6,"shardId":2,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":2,"shardId":1,"count":0,"avgKeySize":0,"tableSize":0}
{"tableName":"myTable","partitionId":3,"shardId":1,"count":0,"avgKeySize":0,"tableSize":0}

In the above result, observe that there are 50 keys each in "partitionId":7,"shardId":3 and "partitionId":10,"shardId":4 whereas all the other partitions and shards are empty. This shows that the key data is not distributed evenly across all the partitions and shards.

Similarly, you can query the SYS$TableStatsIndex system table to read the index key distribution statistics for a given table at the shard level.

For example, to get the total number of index rows in a table named myTable, you must sum the values in the count column for all the index rows in the SYS$TableStatsIndex table where tableName = myTable.

Example Query:

sql-> select * from SYS$TableStatsIndex where tableName = 'myTable';

Result:

{"tableName":"myTable","indexName":"idx_shard_key","shardId":3,"count":50,"avgKeySize":1,"indexSize":75}
{"tableName":"myTable","indexName":"idx_shard_key","shardId":4,"count":50,"avgKeySize":1,"indexSize":75}
{"tableName":"myTable","indexName":"idx_shard_key","shardId":1,"count":0,"avgKeySize":0,"indexSize":0}
{"tableName":"myTable","indexName":"idx_shard_key","shardId":2,"count":0,"avgKeySize":0,"indexSize":0}

As you can see from the above result, there are 50 index keys each in "shardId":3 and "shardId":4 whereas all the other shards are empty. This shows that the index key data is not distributed evenly across all the shards.

Retention of the Key Distribution Statistics

After collecting the key distribution statistics, they are retained in the system tables for a fixed time period. This value is determined by the rnStatisticsTTL parameter. By default, these statistics are retained for 60 days. However, you can change this value by executing the change-parameters plan from the Admin CLI. The time unit specified must be in days or hours.

For example, execute the following command from Admin Command Line Interface (CLI) to retain the Key Data Statistics in the system tables for 90 days:

plan change-parameters -wait -all-rns -params "rnStatisticsTTL=90 days"

Few points to note are:

  • Any changes that you make to the rnStatisticsTTL parameter will not be applied to the existing rows in the SYS$TableStatsPartition and SYS$TableStatsIndex tables. They will take effect only after the next gathering scan.
  • If you disable the collection of Key Distribution Statistics, all the rows present in the system tables will expire after the current Time to Live (TTL) period.
  • If you drop any tables or indexes in your store, their statistics rows present in the system tables will also expire after the TTL period.
  • Even if you change the rnStatisticsTTL to a value less than rnStatisticsGatherInterval, all the existing statistics rows will only expire as the TTL value defined during the last scan.
  • rnStatisticsTTL can be set to 0 days. However, this is not recommended as it disables automatic removal of the statistics rows.