Required settings

The first part of bdd.conf contains required settings. You must update these with information specific to your system, or the installer could fail.

Must Set settings

This section contains blank settings that you must provide values for. If you don't set these, the installation will fail.

Configuration property Description
ORACLE_HOME The path to the BDD root directory, where BDD will be installed on each node in the cluster. This directory must not exist, and its parent directories' permissions must be set to either 755 or 775.

Note that this is different from the ORACLE_HOME environment variable required by the database.

Important: You must ensure that the installer can create this directory on all nodes that will host BDD components, including Hadoop nodes that will host Data Processing.

On the install machine and all other nodes that will host WebLogic Server, this directory must contain at least 6GB of free space. Nodes that will host the Dgraph require 1GB of free space, and those that will host Data Processing require 2GB.

ORACLE_INV_PTR The absolute path to the Oracle inventory pointer file, which the installer will create. This file can't be located in the ORACLE_HOME directory.

If you have any other Oracle software products installed, this file will already exist. Update this property to point to it.

INSTALLER_PATH The absolute path to the installation source directory. This must contain at least 10GB of free space.

This property is optional. If you don't set it, you can either set the path to the installation source directory as the environment variable INSTALLER_PATH or enter it when the installer prompts you for it at runtime. For more information, see The BDD installer.

DGRAPH_INDEX_DIR The path to the directory on the shared NFS where the Dgraph index defined by DGRAPH_INDEX_NAME will be located. This shouldn't be located under ORACLE_HOME, or it will be deleted.

If you're installing with an existing index, set this property to its location. If you're not, set this to the location you want the installer to create the empty index in. The script will create this directory if it doesn't currently exist.

HADOOP_UI_HOST The name of the server hosting your Hadoop manager (Cloudera Manager or Ambari).
STUDIO_JDBC_URL The database JDBC URL, which Studio requires to connect to it.
There are three templates for this property. Copy the template that corresponds to your database type to STUDIO_JDBC_URL and update the URL to point to your database.
  • If you have a MySQL database, use the first template and update the URL as follows:
    jdbc:mysql://<database hostname>:<port number>/<database name>?useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false
  • If you have an Oracle database, use the first template and update the URL as follows:
    jdbc:oracle:thin:@<database hostname>:<port number>:<database SID>
  • If you're not installing on a production environment and want the installer to create a Hypersonic database for you, use the third template. The script will create the database for you in the location defined by the URL.
Note: BDD doesn't currently support database migration. After deployment, the only ways to change to a different database are to reconfigure the database itself or reinstall BDD.

General settings

This section configures settings relevant to all components and the installation process itself.

Configuration property Description
INSTALL_TYPE Determines the installation type according to your hardware and Hadoop distribution. Set this to one of the following:
  • CDH
  • HW

Note that this document doesn't cover Oracle Big Data Appliance (BDA) or Oracle Public Cloud (OPC) installations. If you want to install on the Big Data Appliance, see the Oracle Big Data Appliance Owner's Guide Release 4 (4.3 or 4.4). Additionally, see the file BDD_README.txt for a workaround for a BDA bug related to installation.

CLUSTER_MODE Determines whether you're installing on a single machine or a cluster. Use TRUE if you're installing on a cluster, and FALSE if you're installing on a single machine.
Note: If you're installing on a single machine, see Single-Node Installation.
JAVA_HOME The absolute path to the JDK install directory. This must be the same on all BDD servers and should have the same value as the $JAVA_HOME environment variable.

If you have multiple versions of the JDK installed, be sure that this points to the correct one.

TEMP_FOLDER_PATH The temporary directory used on each node during the installation. This must point to an existing directory on all BDD nodes.

On the install machine and all other nodes that will host WebLogic Server or the Dgraph, this directory must contain at least 10GB of free space. Nodes that will host Data Processing require 3GB of free space.

CDH/HDP settings

This section contains properties related to Hadoop. The installer uses these properties to query the manager for information about the Hadoop components, such as the URIs and names of their host servers.

Configuration property Description and possible settings
HADOOP_UI_PORT The port number of the server running the Hadoop manager.
HADOOP_UI_CLUSTER_NAME The name of your Hadoop cluster, which is listed in the manager. Be sure to replace any spaces in the cluster name with %20.
HUE_URI The hostname and port of the node running Hue, in the format <hostname>:<port>. This property is only required for HDP clusters.
HADOOP_CLIENT_LIB_PATHS A comma-separated list of the absolute paths to the Hadoop client libraries.
Note: You only need to set this property before installing if you have HDP. If you have CDH, the installer will download the required libraries and set this property automatically. Note that this requires an internet connection. If the script is unable to download the libraries, it will fail; see Failure to download the Hadoop client libraries for instructions on solving this issue.
There are two HDP templates for this property. Copy the template that corresponds to your HDP version to HADOOP_CLIENT_LIB_PATHS and update the paths to point to the client libraries you copied to the install machine.
  • If you have 2.2.4, use the second template.
  • If you have 2.3.x, use the third template.

Don't change the order of the paths in the list as they must be specified as they appear.

Kerberos settings

This section configures Kerberos for BDD.

Note: You only need to modify these properties if you want to enable Kerberos.
Configuration property Description and possible settings
ENABLE_KERBEROS Enables Kerberos in the BDD cluster. If Kerberos 5+ is installed on your cluster and you want BDD to integrate with it, set this value to TRUE; if not, set it to FALSE.
KERBEROS_PRINCIPAL The name of the BDD principal. This should include the name of your domain; for example, bdd-service@EXAMPLE.COM.

This property is only required if ENABLE_KERBEROS is set to TRUE.

KERBEROS_KEYTAB_PATH The absolute path to the BDD keytab file on the install machine.

The installer will rename this to bdd.keytab and copy it to $BDD_HOME/common/kerberos/ on all BDD nodes.

This property is only required if ENABLE_KERBEROS is set to TRUE.

KRB5_CONF_PATH The absolute path to the krb5.conf file on the install machine. The installer will copy this to /etc on all BDD nodes.

This property is only required if ENABLE_KERBEROS is set to TRUE.

WebLogic settings

This section configures the WebLogic Server, including the Admin Server and all Managed Servers.

Configuration property Description and possible settings
ADMIN_SERVER The hostname of the install machine, which will become the Admin Server.

If you leave this blank, it will default to the hostname of the machine you're on.

MANAGED_SERVERS A comma-separated list of the Managed Server hostnames (the servers that will run WebLogic, Studio, and the Dgraph Gateway). This list must include the hostname for the Admin Server and can't contain duplicate values.

Dgraph and HDFS Agent settings

This section configures the Dgraph and the HDFS Agent.

Configuration property Description and possible settings
DGRAPH_SERVERS A comma-separated list of the Dgraph hostnames. The installer will install the Dgraph on these nodes.

This list can't contain duplicate values and shouldn't contain hostnames of YARN worker nodes.

DGRAPH_THREADS The number of threads the Dgraph starts with. Oracle recommends the following:
  • For machines running only the Dgraph, the number of threads should be equal to the number of CPU cores on the machine.
  • For machines running the Dgraph and other BDD components, the number of threads should be the number of CPU cores minus 2. For example, a machine with 4 cores should have 2 threads.

The value you specify must be greater than 2. Be sure that the number you use is in compliance with the licensing agreement.

If you leave this property blank, it will default to the number of CPU cores minus 2.

DGRAPH_CACHE The size of the Dgraph cache, in MB. Only specify the number; don't include MB.

If you leave this property blank, it will default to either 50% of the node's available RAM or the total mount of free memory minus 2GB (whichever is larger).

Oracle recommends allocating at least 50% of the node's available RAM to the Dgraph cache. If you later find that queries are getting cancelled because there isn't enough available memory to process them, experiment with gradually decreasing this amount.

COORDINATOR_INDEX The index of the Dgraph cluster in the ZooKeeper ensemble, which ZooKeeper uses to identify it.

Note that this property is not related to the Dgraph index.

DGRAPH_INDEX_NAME The name of the Dgraph index, which will be located in the directory defined by DGRAPH_INDEX_DIR.

If you have an index, set this to its name, but don't include _indexes.

Note: If your index happens to be named base, you should rename it before installing. Otherwise, the installer will overwrite it with the empty indexes.

If you don't have an index, leave this property set to base. This tells the installer to create an empty index called base_indexes in the DGRAPH_INDEX_DIR.

Data Processing settings

This section configures Data Processing and the Hive Table Detector.

Configuration property Description and possible settings
HDFS_DP_USER_DIR The location within the HDFS /user directory that stores the Avro files created when users export data from BDD. The installer will create this directory if it doesn't already exist. The name of this directory must not include spaces or slashes (/).
YARN_QUEUE The YARN queue Data Processing jobs are submitted to.
HIVE_DATABASE_NAME The name of the Hive database that stores the source data for Studio data sets.

The default value is default. This is the same as the default value of DETECTOR_HIVE_DATABASE, which is used by the Hive Table Detector. It is possible to use different databases for these properties, but it is recommended that you start with one for a first time installation.

SPARK_ON_YARN_JAR The absolute path to the Spark on YARN jar on your Hadoop nodes. This will be added to the CLI classpath.
There are three templates for this property. Copy the value of the template that corresponds to your Hadoop distribution to SPARK_ON_YARN_JAR and update its value as follows:
  • If you have CDH, use the first template and set it to the absolute path to spark-assembly.jar.
  • If you have HDP 2.2.x, use the second template and set it to the absolute paths to hive-exec.jar and spark-assembly.jar, separated by a colon:
    <path/to/hive-exec.jar>:<path/to/spark-assembly.jar>
  • If you have HDP 2.3.x, use the third template and set it to the absolute paths to the Hive version of hive-metastore.jar and to spark-assembly.jar, separated by a colon:
    <path/to/hive-metastore.jar>:<path/to/spark-assemply.jar>