Chapter 2. Planning Your Installation

Table of Contents

Identify Store Size and Throughput Requirements
Estimating the Record Size
Estimating the Workload
Estimate the Store's Permissible Average Latency
Determine the Store's Configuration
Identify the Target Number of Shards
Identify the Number of Partitions
Identify your Replication Factor
Identify the Total Number of Nodes
Determining the Per-Node Cache Size
Sizing Advice
Arriving at Sizing Numbers

To successfully deploy a KVStore requires analyzing the workload you place on the store, and determining how many hardware resources are required to support that workload. Once you have performed this analysis, you can then determine how you should deploy the KVStore across those resources.

The overall process for planning the installation of your store involves these steps:

Once you have performed each of the above steps, you should test your installation under a simulated load, refining the configuration as is necessary, before placing your store into a production environment.

The following sections more fully describe these steps.

Identify Store Size and Throughput Requirements

Before you can plan your store's installation, you must have some understanding of the store's contents, as well as the performance characteristics that your application requires from the store.

  • The number and size of the keys and data items that are placed in the store.

  • Roughly the maximum number of put and get operations that are performed per unit of time.

  • The maximum permissible latency for each store operation.

These topics are discussed in the following sections.

Estimating the Record Size

Your KVStore contains some number of key-value pairs. The number and size of the key-value pairs contained by your store determine how much disk storage your store requires. It also defines how large an in-memory cache is required for each physical machine used to support the store.

The key portion of each key-value comprises some combination of major and minor key components. Taken together, these look something like a path to a file in a file system. Like any file system path, keys can be very short or very long. Records that use a large number of long key components obviously require more storage resources than do records with a small number of short key components.

Similarly, the amount of data associated with each key (that is, the value portion of each key-value pair) also affects how much storage capacity your store requires.

Finally, the number of records to be placed in your store also drives your storage capacity.

Ultimately, prior to an actual production deployment, there is only one way for you to estimate your store's storage requirements: ask the people who are designing and building the application that the store is meant to support. Schema design is an important part of designing an Oracle NoSQL Database application, so your engineering team should be able to describe the size of the keys as well as the size of the data items in use by the store. They should also have an idea of how many key-value pairs the store contains, and they should be able to advise you on how much disk storage you need for each node based on how they designed their keys and values, as well as how many partitions you want to use.

Estimating the Workload

In order to determine how to deploy your store, you must determine how many operations per second your store is expected to support. Estimate:

  • How many read operations your store must handle per second.

  • How many updates per second your store must support. This estimate must include all possible variants of put operations to existing keys.

  • How many record creations per second your store must support. This estimate must include all possible variants of put operations on new keys.

  • How many record deletions per second your store must support. This estimate must include all possible variants of delete operations.

If your application uses the multi-key operations (KVStore.execute(), multiGet(), or multiDelete()), then approximate the key-value pairs actually involved in each such multi-key operation to arrive at the necessary throughput numbers.

Ultimately, the throughput requirements you identify must be well matched to the I/O capacity available with the disk storage system in use by your nodes, as well as the amount of memory available at each node.

It may be necessary for you to consult with your engineering team and/or the business plan driving the development and deployment of your Oracle NoSQL Database application in order to obtain these estimates.

Estimate the Store's Permissible Average Latency

Latency is the measure of the time it takes your store to perform any given operation. You need to determine the average permissible latency for all possible store operations: reads, creates, updates, and deletes. The average latency for each of these is determined primarily by:

  • How long it takes your disk I/O system to perform reads and writes.

  • How much memory is available to the node (the more memory you have, the more data you can cache in memory, thereby avoiding expensive disk I/O).

  • Your application's data access patterns (the more your store's operations cluster on records, the more efficient the store is at servicing store operations from the in-memory cache).

Note that if your read latency requirements are less than 10ms, then the typical hard disk available on the market today is not sufficient on its own. To achieve latencies of less than 10ms, you must make sure there is enough physical memory on each node so that an appropriate fraction of your read requests can be serviced from the in-memory cache. How much physical memory your nodes require is affected in part by how well your read requests cluster on records. The more your read requests tend to access the same records, the smaller your cache needs to be.

Also, version-based write operations may require disk access to read the version number. The KVStore caches version numbers whenever possible to minimize this source of disk reads. Nevertheless, if your version-based write operations do not cluster well, then you may require a larger in-memory cache in order to achieve your latency requirements.