BDD supports many different cluster configurations. You should determine the one that best suits your needs before installing.
The following sections describe three configurations suitable for demonstration, development, and production environments, and their possible variations.
You can install BDD in a demo environment running on a single physical or virtual machine. This configuration can only handle a limited amount of data, so it is recommended solely for demonstrating the product's functionality with small a small sample database.
In a single-node deployment, all BDD and Hadoop components are hosted on the same node, and the Dgraph databases are stored on the local filesystem.
You can install BDD in a development environment running on two nodes. This configuration can handle a slightly larger database than a single-node deployment, but is still has limited processing capacity. Additionally, it doesn't provide high availability for the Dgraph or Studio.
In a two-node configuration, Hadoop and Data Processing are hosted on one node, and WebLogic Server (including Studio and the Dgraph Gateway) and the Dgraph are hosted on another. The Dgraph databases are stored on the local filesystem.
A production environment can consist of any number of nodes required for scale; however, a cluster of six nodes, with BDD deployed on at least four Hadoop nodes, provides maximum availability guarantees.
Note that this configuration is different from the two described above, in which the Dgraph is separate from Hadoop and its databases are stored on the local filesystem. Storing the databases on HDFS is a high availability option for the Dgraph and is recommended for large production environments.
Remember that you aren't restricted to the above configuration—your cluster can contain as many Data Processing, WebLogic Server, and Dgraph nodes as necessary. You can also co-locate WebLogic Server and Hadoop on the same nodes, or host your databases on a shared NFS and run the Dgraph on its own node. Be aware that these decisions may impact your cluster's overall performance and are dependent on your site's resources and requirements.
Although this document doesn't include sizing recommendations, you can use the following guidelines along with your site's specific requirements to determine an appropriate size for your cluster. You can also add more Dgraph and Data Processing nodes later on, if necessary; for more information, see the Administrator's Guide.
One way to configure your cluster is to co-locate different components on the same nodes. This is a more efficient use of your hardware, since you don't have to devote an entire node to any specific BDD component.
Be aware, however, that the co-located components will compete for memory, which can have a negative impact on performance. The decision to host different components on the same nodes depends on your site's production requirements and your hardware's capacity.
Any combination of Hadoop and BDD components can run on a single node, including all three together. Possible combinations include:
For best performance, you shouldn't host the Dgraph on a node running Spark on YARN as both processes require a lot of memory. However, if you have to co-locate them, you can use cgroups to partition resources for the Dgraph. For more information, see Setting up cgroups.