Spark Avro JAR installation

A Spark Avro JAR must be installed on the machine that is configured as the BDD Admin Server.

The Spark Avro JAR is needed for both CDH and HDP clusters.

To install the Spark Avro JAR:

  1. If you have not already done so, create a directory on the Admin Server machine to store the Spark software component.
    This procedure assumes that you created a /localdisk/hadoop directory to store the Spark 1.5.x software. This procedure will use that same directory to store the Spark Avro JAR.
  2. Download http://repo1.maven.org/maven2/com/databricks/spark-avro_2.10/2.0.1/spark-avro_2.10-2.0.1.jar and save it to the /localdisk/hadoop directory.
After the Spark Avro JAR is downloaded, you set its location as the SPARK_EXTRA_CLASSPATH property in the bdd-shell.conf file, as in this example:
## Path of spark-avro_2.10-2.0.1.jar and other extra jars on the server running BDD Shell.
## spark-avro_2.10-2.0.1.jar is required here.
## You need to list the absolute path of each jar here separated by colon(":")
## Setup will copy the jars to BDD_HOME/common/bdd-shell/lib on localhost and every YARN Node Manager server.
SPARK_EXTRA_CLASSPATH=/localdisk/hadoop/spark-avro_2.10-2.0.1.jar

During the installation, these JARs are copied to the BDD_HOME/common/bdd-shell/lib directory.