Software requirements

BDD has a number of software requirements. Many of these requirements must be met by all nodes in the cluster, while others only need to be met by node of a specific type. If you are installing on a single node, that node must meet all requirements.

The following table lists the software requirements for each type of node.

Servers Software requirements
All nodes (including CDH) The following must be installed on all nodes, including CDH nodes:
  • Oracle Enterprise Linux 6 x86_64 or Red Hat Enterprise Linux 6
  • /usr/bin/sudo (this is the default version of sudo on OEL 6)
  • HotSpot JDK 1.7.0_67 or higher installed in the same location on all nodes. This location is arbitrary, but the BDD installer and this document both assume it is /usr/java/.
Note: The JDK 1.7.0_67 is also a prerequisite for CDH 5.3.0, so it should already be installed on all CDH nodes. If the JDK you installed CDH with includes HotSpot, you can copy it from a CDH node to any BDD nodes that do not currently have it installed. Be sure to copy it to the same location on all nodes that will host BDD components.
Additionally, all nodes in the cluster must have the following:
  • TTY disabled for sudo, if it is currently enabled. You can do this by changing Defaults requiretty to Defaults !requiretty in the /etc/sudoers file.
  • The $JAVA_HOME environment variable set to the same location on all nodes. If the path is set to or contains a symlink, that symlink must be identical on all other nodes.
Note: All nodes must also meet a number of access requirements for the user who will perform the installation. For more information, see User access requirements.
CDH nodes that will run Data Processing once BDD is installed The following CDH components must be installed on nodes that will host Data Processing:
  • Spark (Standalone)
  • YARN
  • HDFS
  • Hive
  • Oozie
Note: Big Data Discovery requires the Spark (Standalone) service. It does not support Spark on YARN.

Data Processing will automatically be installed on CDH nodes that host these components.

WebLogic Admin Server Perl 5.10 or higher with multi-thread must be installed on the Admin Server.

Additionally, you must determine a username and password for the WebLogic Server administrator prior to installing. The password must contain at least 8 characters, one of which must be a number, and cannot start with a number.

WebLogic Managed Servers Each Managed Server must be able to connect to at least one CDH server to ensure it has access to a ZooKeeper instance. For more information on ZooKeeper and how it affects the cluster deployment's high availability, see the Administrator's Guide.
Dgraph nodes Dgraph nodes, which the HDFS Agent will also run on, must have read/write access to the shared NFS. Note that at any given time, only one Dgraph instance, the leader, will have write access.

If you will be co-locating the Dgraph and CDH, you must enable cgroups and limit the Dgraph's memory consumption.

Note: Oracle recommends turning off hyper-threading for nodes hosting the Dgraph. Because of the way the Dgraph works, it is actually detrimental to cache performance to use hyper-threading.