Determine the Store's Configuration

Identify the Target Number of Shards
Identify the Number of Partitions
Identify your Replication Factor
Identify the Total Number of Nodes

Now that you have some idea of your store's storage and performance requirements, you can decide how you should configure the store. To do this, you must decide:

The following sections cover these topics in greater detail.

Identify the Target Number of Shards

The KVStore contains one or more shards. Each shard contains a single node that is responsible for servicing write requests, plus one or more nodes that are responsible for servicing read requests.

The more shards your store contains, the better your store is at servicing write requests. Therefore, if your Oracle NoSQL Database application requires high throughput on data writes (that is, record creations, updates, and deletions) then you want to configure your store with more shards.

Shards contain one or more partitions (described in the next section), and key-value pairs are spread evenly across these partitions. This means that the more shards your store contains, the less disk space your store requires on a per-node basis.

For example, suppose you know your store contains roughly n records, each of which represents a total of m bytes of data, for a total of n * m bytes of data to be managed by your store. If you have three shards, then each Storage Node must have enough disk space to contain (n * m) / 3 bytes of data.

It might help you to use the following formula to arrive at a rough initial estimate of the number of shards that you need:

RG = (((((avg key size * 2) + avg value size) * max kv pairs) * 2) + 
      (avg key size * max kv pairs) / 100 ) / 
      (node storage capacity)

Note that the final factor of two in the first line of the equation is based upon a KVStore tuning control called the cleaner utilization. Here, we assume you leave the cleaner utilization at 50%.

As an example, a store sized to hold a maximum of 1 billion key value pairs, having an average key size of 10 bytes and an average value size of 1K, with 1TB (10^12) of storage available at each node would require two shards:

((((10*2)+1000) * (10^9)) * 2) + ((10 * (10^9))/100) / 10^12 = 2 RGs

Remember that this formula only provides a rough estimate. Other factors such as I/O throughput and cache sizes need to be considered in order to arrive at a better approximation. Whatever number you arrive at here, you should thoroughly test it in a pre-production environment, and then make any necessary adjustments. (This is true of any estimate you make when planning your Oracle NoSQL Database installation.)

Identify the Number of Partitions

Every shard in your store must contain at least one partition, but you should configure your store so that it contains many partitions. The records in the KVStore are spread evenly across the KVStore partitions, and as a consequence they are also spread evenly across your shards. You identify the total number of partitions that your store should contain when you initially create your store. This number is static and cannot be changed over your store's lifetime.

Make sure the number of partitions you select is more than the largest number of shards you ever expect your store to contain. It is possible to add shards to the store, and when you do, the store is re-balanced by moving partitions between shards (and with them, the data that they contain). Therefore, the total number of partitions that you select is actually a permanent limit on the total number of shards your store is able to contain.

Note that there is some overhead in configuring an excessively large number of partitions. That said, it does no harm to select a partition value that gives you plenty of room for growing your store. It is not unreasonable to select a partition number that is 100 times the maximum number of shards that you ever expect to use with your store.

Identify your Replication Factor

The KVStore contains one or more shards. Each shard contains a single node that is responsible for servicing write requests (the master), plus one or more nodes that are responsible for servicing read requests (the replicas).

The store's Replication Factor simply describes how many nodes (master + replicas) each shard contains. A Replication Factor of 3 gives you shards with one master plus two replicas. (Of course, if you lose or shut down a node that is hosting a master, then the master fails over to one of the other nodes in the shard, giving you a shard with one master and one replica. But this should be an unusual, and temporary, condition for your shards.)

The bigger your Replication Factor, the more responsive your store can be at servicing read requests because there are more nodes per shard available to service those requests. However, a larger Replication Factor reduces the number of shards your store can have, assuming a static number of Storage Nodes.

A large Replication Factor can also slow down your store's write performance, because each shard has more nodes to which updates must be transferred.

In general, we recommend a Replication Factor of 3, unless your performance testing suggests some other number works better for your particular workload. Also, do not select a Replication Factor of 2 because doing so means that even a single failure results in too few sites to elect a new master.

Identify the Total Number of Nodes

You can estimate the total number of Storage Nodes needed for your store by multiplying the number of shards you require times your Replication Factor. This number should suffice, unless you discover that your hard disks are unable to deliver enough IOPs to meet your throughput requirements. In that case, you might need to increase your Replication Factor, or increase your total number of shards.

If you underestimate the number of Storage Nodes, remember that it is possible to dynamically increase the number of Storage Nodes in use by the store. To use the command line interface to expand your store, see Transform the Topology Candidate.

Whatever estimates you arrive at, make sure to thoroughly test your configuration before deploying your store into a production environment.