Endeca Server memory consumption

The Endeca Server query performance is dependent on many characteristics of your specific deployment, such as query workload, query complexity, data domain configuration, and the characteristics of the loaded records, as well as the size of the data domain's index. In view of these characteristics, a hardware sizing must be performed prior to deployment, to assess memory consumption and other hardware needs of your deployment.

One of the characteristics that affect the sizing is estimating the projected memory consumption by the Endeca Server.

The Endeca Server memory consumption is characterized by these statements:

Initial memory consumption by the Dgraph is not indicative or predictive estimate for the number of data domains that can be provisioned. When the Endeca Server Java application is started, its Dgraph process initially claims a significant amount of RAM on the system for its use. This is observable if you run operating system diagnostic tools. However, measuring the Dgraph process size on a lightly-loaded machine is not an indication that the Dgraph process will actually utilize all of the physical memory it initially claims. The Dgraph process allocates considerable amounts of virtual memory while ingesting data or executing complex queries, however, the usage of physical memory depends on the demands of other processes on the system. The Dgraph process releases the physical memory when other processes in the operating system require it. When other memory-intensive processes are present, including other Dgraph processes, Dgraph releases a significant portion of its physical memory quickly. Without such pressure, it may retain the physical memory indefinitely.
Hence, measurements of physical memory usage on a machine with few data domains are not a predictive estimate of the memory requirements for a larger number of data domains. To conclude, you should not rely on these estimates for predicting how much memory the Dgraph actually requires to run to support your data domains.
Endeca Server relies on internal heuristics to estimate whether it can host multiple data domains. When you provision a new data domain, Endeca Server first determines whether it has sufficient capacity to host the data domain, and if yes, decides which nodes will host the Dgraph processes for the data domain. To calculate whether it has sufficient capacity to host (new or any additional) data domains Endeca Server uses the following parameters: oversubscribing, auto-idling, memory calculations, and threads calculations.
Specifically, to allocate data domains, Endeca Server chooses nodes which have sufficient processing cores for the new data domains. It denies the creation of additional data domains when it does not have enough processing cores to host them (this behavior is better guaranteed on Linux 6 with cgroups enabled, if the Endeca Server is configured to utilize cgroups). To conserve resources, Endeca Server turns to idle those inactive data domains that are configured to auto-idle, and automatically activates such data domains when an end-user's request arrives. To help the cluster administrators of the Endeca Server tune the allocation of data domains to Endeca Server instances, Endeca Server allows specifying the number of threads per core, and the ratio of allocated virtual memory to the index size, for the nodes in the Endeca Server deployment.

For detailed information about how the Endeca Server performs its own internal capacity calculations and which heuristics are being used, see the topic about data domain allocation in the Oracle Endeca Server Cluster Guide.
On Linux 6 cgroups can be utilized for OS-level data domain allocation guarantees. In addition to the parameters you can specify (such as oversubscribing auto-idling of data domains, or the number of threads and CPU cores for nodes), if the Endeca Server is deployed on Oracle Linux 6 or RHEL 6, cgroups can be used for data domain allocation, providing additional guarantees.

Important: If you are planning to deploy a large number of self-service applications in the Endeca Server, to increase the allocation guarantees and ensure that Endeca Server nodes continue to operate even when many data domains are provisioned, you are strongly encouraged to deploy Endeca Servers on Oracle Linux 6 or RHEL 6, which both allow Endeca Server to utilize cgroups. For information on cgroups, see About using control groups (cgroups) for data domains.
Data Enrichment plugins (used via Studio as Enrichments), require adding memory on each machine hosting Endeca Server. If you are planning to use data enrichment plugins (such as term extraction) in Studio, consider adding additional memory of about 10GB per each instance of Data Enrichment plugin that is expected to run concurrently in the data domain. In other words, if users in the data domain plan to run term extraction, for each such process, additional memory should be provisioned on all Endeca Server machines hosting this data domain.