Calculating the Number of Hosts and Data Instances for the Grid

A database is distributed across multiple data instances that collectively provide a single database image. Data instances reside on hosts. You create each host and data instance that is to be included in the grid. Thus, you need to calculate how many hosts and data instances to create when you are designing your grid.

The number of copies of the data that you define for the value of K-Safety (k) is a factor for how many data instances and hosts that you need to create for your grid. If you define a duplicate copy of the data by setting k set to 2, then you need twice as many data instances and hosts as when a single copy of the data is requested with k set to 1.

Note:

5 is the maximum number that you can assign as the value for k.

Calculate the Number of Data Instances to Create

The number of data instances that you create depends on two factors:

  • The value of k: If you set k to 1, the number of data instances you create equals the number of elements you desire for each database. If you set k to 2 or greater, then you need to create k times as many data instances, each set of data instances to manage each copy of the database contained within one of the k data space groups.

  • The number of replica sets across which you want the data distributed: The number of data instances you create is dictated by the number of elements in all replica sets, since each data instance manages one element of each database.

    All elements that make up a single copy of the database are assigned within a data space. If you set k to 3 for three copies of the database, then each replica set contains three elements, where each element is an exact copy of the other elements in the replica set. Each data space contains one of the replica elements of each replica set.

    Note:

    Each data space logically contains a full copy of the data for the database. Since there are k copies of the data, there are k data spaces.

    Data instances are assigned to data spaces based on how hosts are assigned to data space groups.

    To calculate the number of replica sets across which you want the data distributed, determine the maximum of the two values below:

    • Database size versus host memory size. The size of the database and the amount of memory you have on each host determines the number of replica sets you want. For example, if you have a two Terabyte database and hosts with 512 Gigabytes of memory each, then you need at least four replica sets to hold all of the data. More likely that you will need five hosts, since you cannot use all of the memory on each host for the data.

    • Throughput. Even if your database is small enough to fit in the memory of a single host, you need to spread your data over multiple hosts if a single host cannot handle the number of transactions per second that your applications require.

Once you decide on the number of replica sets, you can calculate the number of data instances.

For the equation to find the number of data instances required, r represents the number of replica sets (where each replica set contains 1 or more elements) and k represents the K-safety value which denotes the number of copies of the data and subsequently, the number of elements in each replica set. To create enough data instances, you need to create k * r data instances.

number of data instances = k * r

For example, if you set k to 3 for three copies of the database and each copy of the database is to be distributed across two replica sets, then you need to create 6 data instances where each of the three data spaces contains two data instances.

See K-Safety.

Calculate the Number of Hosts You Need to Support Your Data Instances

To calculate the number of physical or virtual systems for a production deployment of your grid involves considering:

The number of hosts that you need depends on the how many data instances you install on each host. The following is described in Data Instances.

Each data instance usually resides on a separate host to provide maximum data availability and as a guard against data loss should one of the hosts fail. However, you might want to run multiple data instances on a single host if:

  • The hosts in the grid contain a large amount of computing resources.

  • For experimentation of a larger grid before deployment, you might want to test a larger grid configuration on a smaller number of hosts.

Thus, to decide on the number of hosts:

  • If you install a single data instance on each host, then the number of hosts required is the same number of data instances in the grid. For example, if you have six data instances, then you would need six hosts.

  • If you install more than one data instance on each host, then the number of hosts required depends on how many data instances are on each host. For example, if you have eight data instances and you want to install two data instances on each host, then you only need four hosts.

Once you create the hosts for data instances, you assign them to a data space group. See Assigning Hosts to Data Space Groups.