Integration with Hadoop

BDD runs on top of an existing Hadoop cluster, which provides a number of components and tools that BDD requires to process and manage data. For example, the source data you load into BDD is stored in HDFS and processed by Spark on YARN.

BDD supports the following Hadoop distributions:

You must have one of these installed on your cluster before installing BDD, as the configuration of your Hadoop cluster determines where many of the BDD components will be installed. For supported versions and a list of required Hadoop components, see Hadoop requirements.