Adding Hadoop nodes

This topic describes how you can add a YARN NodeManager node after deployment of BDD.

The Data Processing modules are installed on all available YARN NodeManager nodes during the BDD installation process. However, you can add more nodes after deployment. The nodes to be added must be of the same type (CDH or HDP) and version as the existing nodes.

The pre-requisites to this task are that BDD must be installed and the new node must have been added to the Hadoop cluster. The node must be running the following Hadoop components:
  • YARN
  • Spark on YARN
  • HDFS
  • Hive

Consult the CDH or HDP documentation for details on how to add the node to the CDH or HDP cluster.

You will be copying files and directories from an existing YARN NodeManager node to the new YARN NodeManager node. The locations of some of the files are specified in the edp.properties file, which is located in the $BDD_HOME/dataprocessing/edp_cli/config directory. The properties with the information are:
  • sparkYarnJar
  • bddHadoopFatJar
  • edpJarDir
  • extraJars
  • oltHome
  • krb5ConfPath
  • clusterKerberosKeytabPath

For more information on these properties, see DP CLI configuration.

To add a YARN NodeManager node to an existing BDD deployment:

  1. For all types of BDD deployments (both Kerberized and non-Kerberized), copy the entire $ORACLE_HOME and /opt/bdd directories from an existing YARN NodeManager node to the new YARN NodeManager node and make sure their permissions are the same.
    For example, the $ORACLE_HOME directory should have +x permission and the $BDD_HOME/logs/edp should have 777 permissions.
  2. For all deployments, copy the files and directories specified in the sparkYarnJar, bddHadoopFatJar, edpJarDir, and extraJars properties to the new node at the same location.
    Note that the file and directory owner and permission must be the same as on other nodes.
  3. For all deployments, copy the files and directories specified in the oltHome property to the new node at the same location.
    Note that the file and directory owner and permission must be the same as on other nodes.
  4. For Kerberized clusters only, copy the files specified in the krb5ConfPath, and clusterKerberosKeytabPath properties to the new node at the same location.
    Note that the file and directory owner and permission must be the same as on other nodes.
  5. Update the YARN_NODE_MANAGER_SERVERS value in the bdd.conf on each BDD node so the uninstall.sh utility is aware of the new YARN NodeManager.
  6. If the new node has other functions in the cluster, you should re-download and replace the Hadoop configuration files on Studio. On the Studio machine:
    1. In a text editor, open the $BDD_HOME/dataprocessing/edp_cli/data_processing_CLI script and get the directory setting of the HADOOP_CONF_DIR property.
    2. Change to that directory.
    3. Download and replace the files in this directory with the files from the Hadoop cluster.

After the new node is added to the BDD deployment, it can be used by the Data Processing workflows of BDD.