You must install one of the following Hadoop distributions on your cluster before you install BDD:
| Component | Description |
|---|---|
| Cluster manager | Your cluster manager depends on your
Hadoop distribution:
The installer uses a RESTful API to query your cluster manager for information about your Hadoop nodes, such as their hostnames and port numbers. Your cluster manager must be installed on at least one node in your cluster, although it doesn't have to be on any that will host BDD. |
| ZooKeeper | BDD uses ZooKeeper services to manage the
Dgraph instances and ensure high availability of Dgraph query processing.
ZooKeeper must be installed on at least one node in your cluster, although it
doesn't have to be on any that will host BDD. For more information on ZooKeeper
and how it affects the cluster deployment's high availability, see the
Administrator's Guide.
All Managed Servers must be able to connect to a node running ZooKeeper. |
| HDFS | The Hive tables that contain your source
data are stored in HDFS. HDFS must be installed on at least one node in your
cluster.
You can also store your Dgraph databases on HDFS. If you choose to do this, the DataNode service must be installed on all nodes that will run the Dgraph. |
| HCatalog | The Data Processing Hive Table Detector monitors HCatalog for new and deleted tables that require processing. HCatalog must be installed on at least one node in your cluster, although it doesn't have to be one that will host BDD. |
| Hive | All of your data is stored as Hive tables on HDFS. When BDD discovers a new or modified Hive table, it launches a Data Processing workflow for that table. |
| Spark on YARN | BDD uses Spark on YARN to run all Data Processing jobs. Spark on YARN must be installed on all nodes that will run Data Processing. |
| Hue | You can use Hue to load your source data
into Hive and to view data exported from Studio.
Note: HDP doesn't include Hue. If you have an HDP cluster, you
must install it separately and set the
HUE_URI property in BDD's configuration file.
You can also use the
bdd-admin script to update this property after
installation, if necessary. For more information, see the
Administrator's Guide.
|
| YARN | YARN worker nodes run all Data Processing jobs. YARN must be installed on all nodes that will run Data Processing. |
If you want to store your Dgraph databases on HDFS, the Dgraph and Dgraph HDFS Agent must be installed on Hadoop DataNodes. For more information, see Dgraph database requirements.
You must also make a few changes within your Hadoop cluster to ensure that BDD can communicate with your Hadoop nodes. These changes are described below.