Deciding which data domain profile to use

You can define several data domain profiles, each serving a typical service use case. In addition, a default data domain profile exists.

Before you define a custom data domain profile, it is useful to know the following about the Endeca Server cluster and its data domains:

Determining query load distribution in the data domain.
In the data domain profile, you can decide whether your data domain requires a dedicated leader node for handling updates, or the leader node that shares the regular query load with follower nodes.

For example, if the frequency and size of your updating requests are high, it is recommended to dedicate the leader node to process only the updating requests and not to process regular queries — this way the follower nodes process read query requests. On the contrary, if the index for the data domain is updated rarely, you can configure the leader node in the data domain to share the regular non-updating query load with other nodes, along with processing updates. The data domain profile allows you to specify this option.

The Dgraph nodes in the data domain cluster can process read queries and updating queries. Read queries can be processed by any Dgraph node (follower or leader node); they represent responses to Guided Navigation requests or search requests from the end users. Updating queries represent data updates to the index, changes in the records schema, or changes in the Dgraph configuration. Updating requests must be processed only by the leader node and cannot be processed by follower nodes.
Determining the desired number of query processing nodes in the data domain. When creating data domain profiles, you can specify the number of follower nodes (these are the nodes that process only read queries, as opposed to the leader node that can process both queries and updates). The number of follower nodes you need depends on the usage patterns for the end users of the data domain. A data domain cluster may need more query processing Dgraph nodes if the number of end users is high, and they issue a high number of queries, often concurrently.
Keep in mind that you can only create a data domain with a certain number of Dgraph nodes if you have a sufficient number of Endeca Server instance nodes in the Endeca Server cluster. For each data domain, the Endeca Server creates only one Dgraph node on a specific Endeca Server instance. In other words, if you want to create a five-node data domain, the Endeca Server cluster hosting it should have at least five Endeca Server nodes.
Determining the allocation of processing hardware resources in the Endeca Server cluster. When you create a new data domain, the Endeca Server cluster allocates the CPU resources from its servers to meet the needs of the data domain based on the configuration specified in the data domain profile for the number of threads required for each data domain node.
When defining a data domain profile, you can choose whether the Endeca Server cluster should use one of the following hardware utilization patterns:
- Dedicate 100% of its nodes capacity to one hosted data domain.
- Share its capacity with other data domains but remain within its total capacity.
- Is allowed to oversubscribe — it can start multiple Dgraph nodes (for different data domains) on its Endeca Server nodes, where the total number of CPU threads requested by data domains may exceed the total amount of CPU available to each Endeca Server node.
A data domain profile relies on the characteristics defined in the Endeca Server node profile. Such as, the node profile determines the potential limit on the number of dedicated data domains that could be hosted on the node (dedicated data domains are those for which the Endeca Server nodes dedicate 100% of their capacity).
Determining whether the data domain should be auto-idled. When you create a data domain, you can specify whether the Endeca Server should automatically turn this data domain to idle after a specified timeout, if no queries are issued for it during the timeout period. This setting lets you limit the data domain proliferation, in self-service types of applications. For example, if many data domains are provisioned, but some of them are not used actively, they can be set to auto-idle, allowing the Endeca Server to stop allocating resources to them if these data domains are not used, and reactivate them once queries are again issued for them. This further conserves resources for the Endeca Server, allowing it to allocate resources more flexibly. Note that idling data domains is only supported for data domains with a single Dgraph process.
Determining whether the data domain should be read-only. When defining a data domain profile, you can specify whether the data domain should be created as read-only. This is useful in the development environment or for demonstration purposes. For example, you can export an existing data domain and then import its index using a read-only data domain profile. This way, an imported data domain will have an index with the same data in it, but its Dgraph nodes will be read-only (follower nodes), thus preventing end-users from modifying its configuration or index in any way.
Note that when you initially create a new data domain that is empty of source data, its profile should not be configured as read-only, because its index needs to be populated with data.
Determining Dgraph process behavior. In the data domain profile, you can optionally decide to specify configuration options for the Dgraph processes. If specified, these options will be used on all Dgraph processes started by the Endeca Server for this data domain.

You define these characteristics when configuring data domain profiles.