D Initial Capacity Planning

To deploy a store, you must specify a replication factor, the desired number of partitions, and the Storage Nodes on which to deploy the store. The following sections describe how to calculate these values based on your application's requirements and the characteristics of the hardware available to host the store.

The resource estimation is a two step process:

  1. Determine the storage and I/O throughput capacity of a representative shard, given the characteristics of the application, the disk configuration on each machine, and the disk throughput. As part of this step, you should also estimate the amount of physical memory that each machine requires, and its network throughput capacity.
  2. Use the shard level storage and I/O throughput capacities as a basis for extrapolating throughput from one shard to the required number of shards and machines, given the storewide application requirements.

Oracle NoSQL Database distribution includes a spreadsheet for you to use in the capacity planning process. The spreadsheet is located here: <KVHOME>/doc/misc/InitialCapacityPlanning.xls.

The spreadsheet has two main sections:
  • 1. Shard Capacity
  • 2. Store Sizing
The two main sections both have some required parameters for you to complete, as well as parameters with default options.
The next sections in this appendix correspond to named columns in the spreadsheet:
  • Column A lists cell names associated with the values in column B.

  • Dark purple, bold text labels represent required values for you to provide as input.

  • Dark blue, bold text labels indicate default values that you can optionally change. The supplied default values are adequate for most estimates.

  • Column C has descriptions of the value or computation associated with the value in column B.

  • The first three sections cover Shard Capacity: Application Characteristics, Hardware Characteristics Machine Physical Memory contain required inputs.

The spreadsheet computes all other cells using the following formulas.
  • After filling in the required inputs, the StoreMachines cell indicates how many Storage Nodes should be available in the Storage Node pool.

  • The StorePartitions cell indicates how many partitions to specify when creating the store.

The spreadsheet calculations also account for JVM overhead. Keep in mind that these computations yield estimates. The underlying model used as a basis for the estimation makes certain simple assumptions. These assumptions are necessary because it is difficult to provide a simple single underlying model that works well under a wide range of application requirements. Use these estimates only as an initial starting point, and refine them as necessary under a simulated or actual load.