Upgrading Hadoop

If you want to upgrade to a new version of your Hadoop distribution, you need to update your BDD cluster to integrate with it. You can do this using the bdd-admin script.

Before you run the script, you must obtain the new Hadoop client libraries for your distribution and move them to the Admin Server. When the script runs, it uses these libraries to generate a new fat jar, which it then distributes to all BDD nodes.

The script also obtains and distributes the new Hadoop client configuration files as described in Updating the Hadoop client configuration files.

Note: You can't use bdd-admin to switch to a different Hadoop distribution. For example, you could upgrade from CDH 5.4 to CDH 5.5, but not to HDP 2.3.

To upgrade Hadoop:

  1. Stop your BDD cluster by running the following from $BDD_HOME/BDD_manager/bin on the Admin Server:
    ./bdd-admin.sh stop [-t <minutes>]
  2. Upgrade your Hadoop cluster according to the instructions in your distribution's documentation.
  3. Verify that any configuration changes you made prior to installing BDD (for example, to your YARN settings) weren't reset during the upgrade.
    Additionally, if you have HDP:
    1. In mapred-site.xml, replace all instances of ${hdp.version} with your HDP version number.
    2. In hive-site.xml, remove s from the values of the following properties:
      • hive.metastore.client.connect.retry.dealay
      • hive.metastore.client.cocket.timeout
    If you have MapR, you may need to reinstall and reconfigure the MapR Client if a different version needs to be used with the new version of MapR. The MapR Client must be installed and added to the $PATH on all Dgraph, Studio, and Transform Service nodes that aren't part of your MapR cluster. For instructions on installing the Client, see Installing the MapR Client in MapR's documentation.
  4. Obtain the client libraries for the new version of your Hadoop distribution and put them on the Admin Server.
    The location you put them in is arbitrary, as you will provide the bdd-admin script with their paths at runtime.
    • If you have CDH, download the following packages from http://archive-primary.cloudera.com/cdh5/cdh/5/ and unzip them:
      • spark-<spark_version>.cdh.<cdh_version>.tar.gz
      • hive-<hive_version>.cdh.<cdh_version>.tar.gz
      • hadoop-<hadoop_version>.cdh.<cdh_version>.tar.gz
      • avro-<avro_version>.cdh.<cdh_version>.tar.gz
    • If you have HDP, copy the following directories from your Hadoop nodes to the Admin Server:
      • /usr/hdp/<version>/pig/lib/h2/
      • /usr/hdp/<version>/hive/lib/
      • /usr/hdp/<version>/spark/lib/
      • /usr/hdp/<version>/spark/external/spark-native-yarn/lib/
      • /usr/hdp/<version>/hadoop/
      • /usr/hdp/<version>/hadoop/lib/
      • /usr/hdp/<version>/hadoop-hdfs/
      • /usr/hdp/<version>/hadoop-hdfs/lib/
      • /usr/hdp/<version>/hadoop-yarn/
      • /usr/hdp/<version>/hadoop-yarn/lib/
      • /usr/hdp/<version>/hadoop-mapreduce/
      • /usr/hdp/<version>/hadoop-mapreduce/lib/
    • If you have MapR, copy the following directories from your Hadoop nodes to the Admin Server:
      • /opt/mapr/spark/spark-<version>/lib
      • /opt/mapr/hive/hive-<version>/lib
      • /opt/mapr/zookeeper/zookeeper-<version>
      • /opt/mapr/zookeeper/zookeeper-<version>/lib
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/common
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/common/lib
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/hdfs
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/hdfs/lib
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/mapreduce
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/mapreduce/lib
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/tools/lib
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/yarn
      • /opt/mapr/hadoop/hadoop-<version>/share/hadoop/yarn/lib
  5. Start your BDD cluster:
    ./bdd-admin.sh start
  6. Run the following up update BDD's Hadoop configuration:
    ./bdd-admin.sh publish-config hadoop -l <path[,path]> -j <file>
    Where:
    • <path[,path]> is a comma-separated list of the absolute paths to each of the client libraries on the Admin Server. For HDP clusters, the libraries must be specified in the order they are listed in above.
    • <file> is the absolute path to the Spark on YARN jar on your Hadoop nodes. Unless the location of your Hadoop installation has changed, you can use the value of SPARK_ON_YARN_JAR in $BDD_HOME/BDD_manager/conf/bdd.conf. Be sure to double-check the path, just in case.
  7. Restart your cluster so the changes take effect:
    ./bdd-admin.sh restart [-t <minutes>]