This topic describes the requirements for the shared file system
in an Endeca Server cluster.
- Access to a shared file
system. Provision a shared file system on which the index for the data
domains will be stored. When you install and deploy the Endeca Server cluster
and start the data domain, all machines hosting the Endeca Server nodes must
have full (read/write) access to this shared file system.
On Windows, it is recommended to utilize a file system that uses the
CIFS (also known as SMB) protocol. On Linux, it is recommended to use NFS.
- File system size. You
can start a data domain cluster with a single Dgraph node that serves both as
the leader and a follower node. As you add additional follower nodes, file
system size requirements (as measured by the high-water mark parameters for
shared storage) increase modestly and do not increase proportionally to the
number of follower nodes in any data domain.
- File system
performance. For each data domain cluster hosted in an Endeca Server
cluster, the index files are stored on remote shared disks. The index files are
accessed at the startup of a data domain cluster, during data and configuration
updates, and for answering queries. For regular query processing, the Endeca
Server takes advantage of its cache. For updates, in a multi-node data domain
cluster, all nodes are accessing the index on remote storage at the same time
(the leader node writes updates to the index, but all follower nodes need to
acquire read-only access to this updated index). This coordinated access may
affect performance for the network or shared file system, especially when large
updates are accessed for the first time.
File system options. Typically, the Endeca Server cluster performs
write operations from the Endeca Server instance hosting the leader node for a
given data domain. It performs read operations from the Endeca Server instances
hosting follower nodes in the data domain.
To tune the file system performance, you may choose the file system
configuration options to suit this pattern. In particular, mounting with
noatime configuration on Linux will eliminate the
cost of frequent access-time file system updates from the follower data domain
nodes, and thus improve file system performance. Particular file system types
may have further options suited to this pattern of usage.