Installing Spark on CDH and HDP

This topic describes how to install Apache Spark on a CDH or HDP instance.

You must download the Apache Spark which matches the Spark version of your CDH (Cloudera Distribution for Hadoop) or HDP (Hortonworks Data Platform) cluster. You can get the Spark version information from your installed cluster or from the CDH or HDP official website.

To install the Spark 1.5.x or 1.6.x component:

Create a directory on the Admin Server machine to store the Spark software component.
For example, create a /localdisk/hadoop directory.
Download the Spark version which matches your version of Spark:
1. In a browser, go to: http://archive.apache.org/dist/spark/
2. Navigate to the directory that matches your version of Spark.
3. Download the <spark-version>-bin-hadoop2.6.tgz file.
For example, for CDH with Spark 1.6, download http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
Unpack the archive file into the /localdisk/hadoop directory.
When the file is unpacked, it produces a Spark directory. For example, the CDH version will produce a spark-1.6.0-bin-hadoop2.6 directory.

After the Spark directory is created, you set that directory as the SPARK_HOME property in the bdd-shell.conf file, as in this CDH example:

## Path to the Spark installation on the server running BDD Shell
SPARK_HOME=/localdisk/hadoop/spark-1.6.0-bin-hadoop2.6