Allocation of data domains

When you provision a new data domain, Endeca Server first determines whether it has sufficient capacity to host the data domain, and if yes, decides which nodes will host the Dgraph processes for the data domain. This topic summarizes the configuration parameters that affect these calculations, so that you, as a system administrator of the Endeca Server cluster can take them into account when configuring or administering your Endeca Server cluster.

Endeca Server capacity calculations

The Endeca Server uses the following parameters to calculate whether it has sufficient capacity to host (new or any additional) data domains:

Requirements configured in the data domain profile:
- Oversubscribing. You can allow Endeca Server to oversubscribe resources for a particular data domain while hosting other data domains. For detailed information on how this feature affects data domain allocation, see How oversubscribing affects hardware utilization.
- Auto-idling. You can allow Endeca Server to automatically idle a currently active data domain if it does not receive queries for the idling time period. if a data domain is set to automatically idle, this frees up resources on the Endeca Server nodes, and affects allocation of other data domains to Endeca Server nodes. For detailed information on how this feature affects data domain allocation, see How idling affects data domain behavior.
The available total capacity of the Endeca Server cluster at the time when a new data domain is being created and needs to be allocated. The term "total capacity" includes two aspects — the available total number of threads on each node, and the available amount of memory on each node that can be allocated to a particular data domain.
The available total capacity is calculated for these two aspects — memory and the number of threads, — as follows:
- Memory calculations. To determine whether the Endeca Server has sufficient amount of memory to host a data domain, the logic is as follows. When starting, the Endeca Server collects information about the total available RAM size and swap memory size. If a new data domain creation is requested, Endeca Server uses this information to decide which nodes should be assigned for the new data domain. Endeca Server only selects those nodes whose total virtual memory size is larger than the total memory footprint of all data domains already hosted on that node and the amount of memory needed for the new data domain. The memory footprint is estimated on each Endeca Server node based on this formula:
```
endeca-memory-to-index-size-ratio x indexes-size + computeCacheSizeMB
```
  where:
  - endeca-memory-to-index-size-ratio is the ratio of all virtual memory allocated for a data domain to the index size. The cluster administrator specifies this setting for each machine in the Endeca Server cluster, in the EndecaServer.properties file. The default ratio is 2.0; it is used if no other value is specified. For example, if the index size is 40MB, and the ratio is 2.0, the Endeca Server attempts to allocate 80MB of virtual memory to the data domain.
  - indexes-size is the total calculated size of all indexes for all data domains currently hosted in the Endeca Server. This setting is calculated internally by the Endeca Server and is not reported to the administrator.
  - computeCacheSizeMB is the amount of RAM, in MB, to allocate to the result cache for each Dgraph process of the data domain. The Endeca Server cluster administrator specifies this setting for the data domain profile, when creating it, by using PutDataDomainProfile of the Cluster Web Service, or endeca-cmd put-dd-profile. If you do not specify it, the default is used. The default of 0 is interpreted as follows. When an absolute value is 0, the default Dgraph cache size is computed as 10% of the amount of RAM available on the Endeca Server node hosting the Dgraph node.
- Threads calculations. To determine whether the Endeca Server has sufficient number of processing threads to host a new (or additional) data domain, the logic is as follows. When starting, the Endeca Server collects information about the total number of used threads for its hosted data domains. If a new data domain creation is requested, Endeca Server creates it only if the total number of threads it has available is sufficient to accommodate both the new data domain and all already hosted data domains. The total number of threads is estimated on each Endeca Server node based on this formula:
```
numCpuCores x endeca-threads-allowed-per-core
```
  where:
  - numCpuCores is the setting specified in the Endeca Server node profile. (The node profile, once defined, is used on all nodes in the Endeca Server cluster.) This is the number of CPU cores allocated to each Endeca Server instance. When the Endeca Server application is deployed in the WebLogic Server, the number of CPU cores is determined automatically at startup. The default is the larger of 2, or the number of available CPU cores on the node. For example, if the number of all available CPU cores on the machine is 12, it is used by default as the numCpuCores in the node profile.
  - endeca-threads-allowed-per-core is the setting specified in the EndecaServer.properties file. It is the number of threads allowed per core. The default is 1.0 and it is used if no other value is specified.
  If the number obtained in this formula is larger than the "total number of threads used" — on all the Dgraphs and on all Endeca Server nodes, then the Endeca Server attempts to allocate threads to the new data domain.
  
  Note: To calculate the "total number of threads used", the Endeca Server uses numComputeThreads from the data domain profiles. This is the setting specified in the data domain profile. It specifies the number of threads to allocate for processing requests on each Dgraph node serving a data domain using this profile. The number of threads should be equal to or greater than 4. The default is 4.
  
  Here is another way to interpret this formula: If the user who wants to create a new data domain asks for more than the total number of threads required on the Endeca Server node so that it can host both the existing data domains and the new one, then allocation of that Endeca Server node to the data domain cannot occur, and the Endeca Server proceeds to look for other nodes in the cluster to allocate this data domain. If no nodes are found with enough threads to host the Dgraph processes for the new data domain, the data domain is not created.
Note: Data domain allocation strategy ignores disabled or idle data domains when calculating threads and memory footprint. Enabling a data domain and waking up an idle data domain operations succeed only if they meet all resources constraints.
Additionally, if you have deployed Endeca Server on Linux 6, and enabled the configuration and use of cgroups in the Endeca Server, cgroups are used to guarantee that Endeca Server will not consume all of the machine's resources for its data domain Dgraph processes and will always remain reachable by the system administrator on that machine. If cgroup is enabled, Endeca Server uses the cgroup limit of total virtual memory instead of the total virtual memory of the machine to allocate data domains. For information on how cgroups are used and enabled, see the Oracle Endeca Server Administrator's Guide.

Endeca Server node allocation

Once the Endeca Server determines that it has sufficient capacity to host a data domain, it decides which Endeca Server nodes will host the particular Dgraph processes for the data domain.

Note: While this information is useful and visible to the Endeca Server cluster administrator, the allocation of data domain's nodes to Endeca Server nodes is not exposed to the data domain administrator. From the data domain administrator's point of view, the Endeca Server is hosting the data domain on its available nodes (making an effort to distribute the resource use as efficiently as possible), and the load balancing and routing of requests occurs automatically, to direct end-user queries to the Endeca Servers for processing, for each hosted data domain.

The Endeca Server uses the following principles for node allocation:

The Dgraph nodes for a given data domain cluster are automatically hosted on different Endeca Server nodes. In other words, the Endeca Server software does not support hosting multiple Dgraph nodes for the same data domain on the same Endeca Server node.
For example, in a three-node Endeca Server cluster, you cannot create a data domain with five Dgraph nodes. For this data domain to be created, the number of Endeca Server nodes would need to also increase to five. The Manage Web Service of the Endeca Server returns a fault to any request that attempts to create or rescale a data domain beyond the number of Endeca Server nodes in the cluster.
If more than one data domain cluster is hosted in the Endeca Server, then whenever possible, the leader nodes for each of the data domain clusters are hosted on different Endeca Server nodes.