HDFS

Storing your databases on HDFS provides increased high availability for the Dgraph—the contents of the databases are distributed across multiple nodes, so the Dgraph can continue to process queries if a node goes down. It also increases the amount of data your databases can contain.

To store your databases on HDFS, your system must meet the following requirements:

The HDFS DataNode service must be running on all nodes that will host the Dgraph. For best performance, this should be the only Hadoop service running on these nodes. In particular, the Dgraph shouldn't be hosted on Spark nodes, as both services require a lot of resources.
If you have to host the Dgraph on nodes running Spark or other Hadoop services, you should use cgroups to ensure it has access to sufficient resources. For more information, see Setting up cgroups.
For best performance, you should configure short-circuit reads in HDFS. This enables the Dgraph to access the local database files directly, rather than using the HDFS DataNode's network sockets to transfer the data. For instructions on enabling this, refer to the documentation for your Hadoop distribution.
The bdd user must have read and write permissions for the HDFS directory where the databases will be stored. Be sure to set these on all Dgraph nodes.
If you have HDFS data at rest encryption enabled in Hadoop, you must store your databases in an encryption zone. For more information, see HDFS data at rest encryption.
If you decide to not use the default HDFS mount point (the local directory where the Dgraph mounts the HDFS root directory), make sure the one you choose is empty and has read, write, and execute permissions for the bdd user. These must be set on all Dgraph nodes.
Be sure to set the DGRAPH_HDFS_USE_MOUNT property in BDD's configuration file to TRUE.

Additionally, to enable the Dgraph to access its databases in HDFS, you must install either the HDFS NFS Gateway service or FUSE. The option you use depends on your Hadoop cluster:

You must use the NFS Gateway if have CDH 5.7.1 or HDFS data at rest encryption enabled. For more information, see Installing the HDFS NFS Gateway service.
In all other cases, you can use either FUSE or the NFS Gateway. For more information on FUSE, see Installing FUSE.