HDFS

Storing your databases on HDFS provides increased high availability for the Dgraph—the contents of the databases are distributed across multiple nodes, so the Dgraph can continue to process queries if a node goes down. It also increases the amount of data your databases can contain.

Note: This information also applies to MapR-FS.

To store your databases on HDFS, your system must meet the following requirements:

The HDFS DataNode service must be running on all nodes that will host the Dgraph. For best performance, this should be the only Hadoop service running on your Dgraph nodes. In particular, the Dgraph shouldn't be co-located with Spark, as both services require a lot of resources.
If you have to co-locate the Dgraph with Spark or any other Hadoop services, you should use cgroups to isolate resources for it. For more information, see Setting up cgroups.
For best performance, configure short-circuit reads in HDFS. This enables the Dgraph to access the local database files directly, rather than using the DataNode's network sockets to transfer the data. For instructions, refer to the documentation for your Hadoop distribution.
The bdd user must have read and write permissions for the HDFS directory where the databases will be stored. Be sure to set this on all Dgraph nodes.
If you have HDFS data at rest encryption enabled in Hadoop, you must store your databases in an encryption zone. For more information, see HDFS data at rest encryption.
If you decide to not use the default HDFS mount point (the local directory where the Dgraph mounts the HDFS root directory), make sure the one you use is empty and has read, write, and execute permissions for the bdd user. This must be set on all Dgraph nodes.
Be sure to set the DGRAPH_USE_MOUNT_HDFS property in BDD's configuration file to TRUE.
To enable the Dgraph to access its databases in HDFS, you must install the HDFS NFS Gateway (called MapR NFS in MapR). For more information, see Installing the HDFS NFS Gateway service.