You can define several data domain profiles, each serving a
typical service use case. In addition, a default data domain profile exists.
Before you define a custom data domain profile,
it is useful to know the following about the Endeca Server cluster and its data
domains:
- Determining query load
distribution in the data domain.
In the data domain profile, you can decide whether your data domain
requires a dedicated leader node for handling updates, or the leader node that
shares the regular query load with follower nodes.
For example, if the
frequency and size of your updating requests are high, it is recommended to
dedicate the leader node to process only the updating requests and not to
process regular queries — this way the follower nodes process read query
requests. On the contrary, if the index for the data domain is updated rarely,
you can configure the leader node in the data domain to share the regular
non-updating query load with other nodes, along with processing updates. The
data domain profile allows you to specify this option.
The Dgraph nodes in the data domain cluster can process read queries
and updating queries. Read queries can be processed by any Dgraph node
(follower or leader node); they represent responses to Guided Navigation
requests or search requests from the end users. Updating queries represent data
updates to the index, changes in the records schema, or changes in the Dgraph
configuration. Updating requests must be processed only by the leader node and
cannot be processed by follower nodes.
- Determining the desired number of query
processing nodes in the data domain. When creating data domain profiles,
you can specify the number of follower nodes (these are the nodes that process
only read queries, as opposed to the leader node that can process both queries
and updates). The number of follower nodes you need depends on the usage
patterns for the end users of the data domain. A data domain cluster may need
more query processing Dgraph nodes if the number of end users is high, and they
issue a high number of queries, often concurrently.
Keep in mind that you can only create a data domain with a certain
number of Dgraph nodes if you have a sufficient number of Endeca Server
instance nodes in the Endeca Server cluster. For each data domain, the Endeca
Server creates only one Dgraph node on a specific Endeca Server instance. In
other words, if you want to create a five-node data domain, the Endeca Server
cluster hosting it should have at least five Endeca Server nodes.
- Determining the allocation of processing
hardware resources in the Endeca Server cluster. When you create a new data
domain, the Endeca Server cluster allocates the CPU resources from its servers
to meet the needs of the data domain based on the configuration specified in
the data domain profile for the number of threads required for each data domain
node.
When defining a data domain profile, you can choose whether the
Endeca Server cluster should use one of the following hardware utilization
patterns:
- Dedicate 100% of its
nodes capacity to one hosted data domain.
- Share its capacity with
other data domains but remain within its total capacity.
- Is allowed to
oversubscribe — it can start multiple Dgraph nodes (for different data domains)
on its Endeca Server nodes, where the total number of CPU threads requested by
data domains may exceed the total amount of CPU available to each Endeca Server
node.
A data domain profile relies on the characteristics defined in the
Endeca Server node profile. Such as, the node profile determines the potential
limit on the number of dedicated data domains that could be hosted on the node
(dedicated data domains are those for which the Endeca Server nodes dedicate
100% of their capacity).
- Determining whether the
data domain should be auto-idled. When you create a data domain, you can
specify whether the Endeca Server should automatically turn this data domain to
idle after a specified timeout, if no queries are issued for it during the
timeout period. This setting lets you limit the data domain proliferation, in
self-service types of applications. For example, if many data domains are
provisioned, but some of them are not used actively, they can be set to
auto-idle, allowing the Endeca Server to stop allocating resources to them if
these data domains are not used, and reactivate them once queries are again
issued for them. This further conserves resources for the Endeca Server,
allowing it to allocate resources more flexibly. Note that idling data domains
is only supported for data domains with a single Dgraph process.
- Determining whether the
data domain should be read-only. When defining a data domain profile, you
can specify whether the data domain should be created as read-only. This is
useful in the development environment or for demonstration purposes. For
example, you can export an existing data domain and then import its index using
a read-only data domain profile. This way, an imported data domain will have an
index with the same data in it, but its Dgraph nodes will be read-only
(follower nodes), thus preventing end-users from modifying its configuration or
index in any way.
Note that when you initially create a new data domain that is empty
of source data, its profile should not be configured as read-only, because its
index needs to be populated with data.
- Determining Dgraph
process behavior. In the data domain profile, you can optionally decide to
specify configuration options for the Dgraph processes. If specified, these
options will be used on all Dgraph processes started by the Endeca Server for
this data domain.
You define these characteristics when configuring data domain profiles.