Estimate total Shards and Machines

Having calculated the per shard capacity in terms of storage and throughput, the total number of shards and partitions can be estimated on the basis of the maximum storage and throughput required by the store as a whole using a simple extrapolation. The following inputs must be supplied for this calculation:

  1. The maximum number of KV pairs that will stored in the initial store. This value is defined by the cell MaxKVPairs. This initial maximum value can be increased subsequently by using the topology transformation commands described in Transforming the Topology Candidate.
  2. The maximum read/write mixed operation throughput expressed as operations/sec for the entire store. The percentage of read operations in this mix must be the same as that supplied earlier in the ReadOpsPercent cell. This value is defined by the cell MaxStorewideOpsPerSec.

The required number of shards is first computed on the basis of storage requirements as below:

MaxKVPairs/MaxKVPairsPerShard

This value is calculated by the cell StorageBasedShards.

The required number of shards is then computed again based upon IO throughput requirements as below:

MaxStorewideOpsPerSec/OpsPerShardPerSec

This value is calculated by the cell named OpsBasedShards.

The maximum of the shards computed on the basis of storage and throughput above is sufficient to satisfy both the total storage and throughput requirements of the application.

The value is calculated by the cell StoreShards. To highlight the basis on which the choice was made, the smaller of the two values in StorageBasedShards or OpsBasedShards has its value crossed out.

Having determined the number of required shards, the number of required machines is calculated as:

MAX(RF, (StoreShards*RF)/DisksPerMachine)

Number of Partitions

Every shard in the store must contain at least one partition, but it is best to configure the store so that each shard always contains more than one partition. The records in the KVStore are spread evenly across the KVStore partitions, and as a consequence they are also spread evenly across shards. The total number of partitions that the store should contain is determined when the store is initially created. This number is static and cannot be changed over the store's lifetime, so it is an important initial configuration parameter.

The number of partitions must be more than the largest number of shards the store will contain. It is possible to add shards to the store, and when you do, the store is re-balanced by moving partitions between shards (and with them, the data that they contain). Therefore, the total number of partitions is actually a permanent limit on the total number of shards your store is able to contain.

Note that there is some overhead in configuring an excessively large number of partitions. That said, it does no harm to select a partition value that provides plenty of room for growing the store. It is not unreasonable to select a partition number that is 10 times the maximum number of shards.

The number of partitions is calculated by the cell StorePartitions.

StoreShards * 10