The Endeca Server query performance is dependent on many
characteristics of your specific deployment, such as query workload, query
complexity, data domain configuration, and the characteristics of the loaded
records, as well as the size of the data domain's index. In view of these
characteristics, a hardware sizing must be performed prior to deployment, to
assess memory consumption and other hardware needs of your deployment.
One of the characteristics that affect the
sizing is estimating the projected memory consumption by the Endeca Server.
The Endeca Server memory consumption is characterized by these
statements:
- Initial memory
consumption by the Dgraph is not indicative or predictive estimate for the
number of data domains that can be provisioned. When the Endeca Server Java
application is started, its Dgraph process initially claims a significant
amount of RAM on the system for its use. This is observable if you run
operating system diagnostic tools. However, measuring the Dgraph process size
on a lightly-loaded machine is not an indication that the Dgraph process will
actually utilize all of the physical memory it initially claims. The Dgraph
process allocates considerable amounts of virtual memory while ingesting data
or executing complex queries, however, the usage of physical memory depends on
the demands of other processes on the system. The Dgraph process releases the
physical memory when other processes in the operating system require it. When
other memory-intensive processes are present, including other Dgraph processes,
Dgraph releases a significant portion of its physical memory quickly. Without
such pressure, it may retain the physical memory indefinitely.
Hence, measurements of physical memory usage on a machine with few
data domains are not a predictive estimate of the memory requirements for a
larger number of data domains. To conclude, you should not rely on these
estimates for predicting how much memory the Dgraph actually requires to run to
support your data domains.
- Endeca Server relies on
internal heuristics to estimate whether it can host multiple data domains.
When you provision a new data domain, Endeca Server first determines whether it
has sufficient capacity to host the data domain, and if yes, decides which
nodes will host the Dgraph processes for the data domain. To calculate whether
it has sufficient capacity to host (new or any additional) data domains Endeca
Server uses the following parameters: oversubscribing, auto-idling, memory
calculations, and threads calculations.
Specifically, to allocate data domains, Endeca Server chooses nodes
which have sufficient processing cores for the new data domains. It denies the
creation of additional data domains when it does not have enough processing
cores to host them (this behavior is better guaranteed on Linux 6 with cgroups
enabled, if the Endeca Server is configured to utilize cgroups). To conserve
resources, Endeca Server turns to idle those inactive data domains that are
configured to auto-idle, and automatically activates such data domains when an
end-user's request arrives. To help the cluster administrators of the Endeca
Server tune the allocation of data domains to Endeca Server instances, Endeca
Server allows specifying the number of threads per core, and the ratio of
allocated virtual memory to the index size, for the nodes in the Endeca Server
deployment.
For detailed information about how the Endeca Server performs its
own internal capacity calculations and which heuristics are being used, see the
topic about data domain allocation in the
Oracle Endeca Server Cluster Guide.
- On Linux 6 cgroups can be
utilized for OS-level data domain allocation guarantees. In addition to the
parameters you can specify (such as oversubscribing auto-idling of data
domains, or the number of threads and CPU cores for nodes), if the Endeca
Server is deployed on Oracle Linux 6 or RHEL 6, cgroups can be used for data
domain allocation, providing additional guarantees.
Important: If you are planning to deploy a large number
of self-service applications in the Endeca Server, to increase the allocation
guarantees and ensure that Endeca Server nodes continue to operate even when
many data domains are provisioned, you are strongly encouraged to deploy Endeca
Servers on Oracle Linux 6 or RHEL 6, which both allow Endeca Server to utilize
cgroups. For information on cgroups, see
About using control groups (cgroups) for data domains.
- Data Enrichment plugins
(used via Studio as Enrichments), require adding memory on each machine hosting
Endeca Server. If you are planning to use data enrichment plugins (such as
term extraction) in Studio, consider adding additional memory of about 10GB per
each instance of Data Enrichment plugin that is expected to run concurrently in
the data domain. In other words, if users in the data domain plan to run term
extraction, for each such process, additional memory should be provisioned on
all Endeca Server machines hosting this data domain.