Required settings

Must Set

This section contains blank settings that you must provide values for. If you don't set these, the installation will fail.

Configuration property	Description
`ORACLE_HOME`	The path to the BDD root directory, where BDD will be installed on each node in the cluster. This directory must not exist. To ensure that the installer will be able to create it, its parent directories' permissions must be set to either 755 or 775, and there must be at least 30GB of space available on each BDD node. Note that this is different from the `ORACLE_HOME` environment variable required by the Studio database.
`ORACLE_INV_PTR`	The absolute path to the Oracle inventory pointer file, which the installer will create. This file can't be located in the `ORACLE_HOME` directory. If you have any other Oracle software products installed, this file will already exist. Update this property to point to it.
`INSTALLER_PATH`	Optional. The absolute path to the installation source directory. This must contain at least 10GB of free space. If you don't set this property, you can either set the `INSTALLER_PATH` environment variable or specify the path at runtime. For more information, see Installation overview.
`DGRAPH_INDEX_DIR`	The absolute path to the Dgraph databases. This directory shouldn't be located under `ORACLE_HOME`, or it will be deleted. The script will create this directory if it doesn't currently exist. If you're installing with existing databases, set this property to their parent directory. If you have HDFS data at rest encryption enabled in Hadoop and you want to store your databases on HDFS, be sure that this is in an encryption zone.
`HADOOP_UI_HOST`	The name of the server hosting your Hadoop manager (Cloudera Manager, Ambari, or MCS).
`STUDIO_JDBC_URL`	The JDBC URL for the Studio database. There are three templates for this property. Copy the template that corresponds to your database type to `STUDIO_JDBC_URL` and update the URL to point to your database. For MySQL databases, use the first template and update the URL as follows: jdbc:mysql://<database hostname>:<port number>/<database name>?useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false For Oracle databases, use the first template and update the URL as follows: jdbc:oracle:thin:@<database hostname>:<port number>:<database SID> If you're not installing on a production environment and want the installer to create a Hypersonic database for you, use the third template. The script will create the database for you in the location defined by the URL. If you're installing on more than one machine, be sure to use the database host's FQDN and not `localhost`.
`WORKFLOW_MANAGER_JDBC_URL`	The JDBC URL for the Workflow Manager Service database. There are two templates for this property. Copy the template that corresponds to your database type to `WORKFLOW_MANAGER_JDBC_URL` and update the URL to point to your database. For MySQL databases, use the first template and update the URL as follows: jdbc:mysql://<database hostname>:<port number>/<database name>?useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false For Oracle databases, use the first template and update the URL as follows: jdbc:oracle:thin:@<database hostname>:<port number>:<database SID> If you're installing on more than one machine, be sure to use the database host's FQDN and not `localhost`.

General

This section configures settings relevant to all components and the installation process itself.

Configuration property	Description
`INSTALL_TYPE`	Determines the installation type according to your hardware and Hadoop distribution. Set this to one of the following: `CDH` `HW` `MAPR` This document doesn't cover Oracle Big Data Appliance (`BDA`) or Oracle Public Cloud (`OPC`) installations. If you want to install on the Big Data Appliance, see the Oracle Big Data Appliance Owner's Guide Release 4 (4.x) and any corresponding MOS notes.
`JAVA_HOME`	The absolute path to the JDK install directory. This must be the same on all BDD servers and should have the same value as the `$JAVA_HOME` environment variable. If you have multiple versions of the JDK installed, be sure that this points to the correct one.
`TEMP_FOLDER_PATH`	The temporary directory used on each node during the installation. This directory must exist on all BDD nodes and must contain at least 20GB of free space.

Configuration property

Description

INSTALL_TYPE

Determines the installation type according to your hardware and Hadoop distribution. Set this to one of the following:

CDH
HW
MAPR

This document doesn't cover Oracle Big Data Appliance (BDA) or Oracle Public Cloud (OPC) installations. If you want to install on the Big Data Appliance, see the Oracle Big Data Appliance Owner's Guide Release 4 (4.x) and any corresponding MOS notes.

JAVA_HOME

The absolute path to the JDK install directory. This must be the same on all BDD servers and should have the same value as the $JAVA_HOME environment variable.

If you have multiple versions of the JDK installed, be sure that this points to the correct one.

TEMP_FOLDER_PATH

The temporary directory used on each node during the installation. This directory must exist on all BDD nodes and must contain at least 20GB of free space.

CDH/HDP/MapR

This section contains properties related to Hadoop. The installer uses these properties to query the Hadoop cluster manager (Cloudera Manager, Ambari, or MCS) for information about the Hadoop components, such as the URIs and names of their host servers.

Configuration property	Description and possible settings
`HADOOP_UI_PORT`	The port number of the server running the Hadoop cluster manager.
`HADOOP_UI_CLUSTER_NAME`	The name of your Hadoop cluster, which is listed in the cluster manager. Be sure to replace any spaces in the cluster name with `%20`.
`HUE_URI`	HDP only. The hostname and port of the node running Hue, in the format `<hostname>:<port>`.
`HADOOP_CLIENT_LIB_PATHS`	A comma-separated list of the absolute paths to the Hadoop client libraries. Note: You only need to set this property before installing if you have HDP or MapR. For CDH, the installer will download the required libraries and set this property automatically. Note that this requires an internet connection. If the script is unable to download the libraries, it will fail; see Failure to download the Hadoop client libraries for instructions on solving this issue. To set this property, copy the template for your Hadoop distribution to `HADOOP_CLIENT_LIB_PATHS` and update the paths to point to the client libraries you copied to the install machine. Be sure to replace all instances of `<UNZIPPED_XXX_BASE>` with the absolute path to the correct library. Don't change the order of the paths in the list as they must be specified as they appear.
`HADOOP_CERTIFICATES_PATH`	Only required for Hadoop clusters with TLS/SSL enabled. The absolute path to the directory on the install machine where you put the certificates for HDFS, YARN, Hive, and the KMS. Don't remove this directory after installing, as you will use it if you have to update the certificates.

Kerberos

This section configures Kerberos for BDD. Only modify these properties if you want to enable Kerberos.

Configuration property	Description and possible settings
`ENABLE_KERBEROS`	Enables Kerberos in the BDD cluster. If Kerberos is installed on your cluster and you want BDD to integrate with it, set this value to `TRUE`; if not, set it to `FALSE`.
`KERBEROS_PRINCIPAL`	The name of the BDD principal. This should include the name of your domain; for example, `bdd-service@EXAMPLE.COM`. This property is only required if `ENABLE_KERBEROS` is set to `TRUE`.
`KERBEROS_KEYTAB_PATH`	The absolute path to the BDD keytab file on the install machine. The installer will rename this to bdd.keytab and copy it to $BDD_HOME/common/kerberos/ on all BDD nodes. This property is only required if `ENABLE_KERBEROS` is set to `TRUE`.
`KRB5_CONF_PATH`	The absolute path to the krb5.conf file on the install machine. The installer will copy this to /etc on all BDD nodes. This property is only required if `ENABLE_KERBEROS` is set to `TRUE`.

WebLogic (BDD Server)

This section configures the WebLogic Server, including the Admin Server and all Managed Servers.

Configuration property	Description and possible settings
`ADMIN_SERVER`	The hostname of the install machine, which will become the Admin Server. If you leave this blank, it will default to the hostname of the machine you're on.
`MANAGED_SERVERS`	A comma-separated list of the Managed Server hostnames (the servers that will run WebLogic, Studio, and the Dgraph Gateway). This list must include the Admin Server and can't contain duplicate values. If you define more that one Managed Server, you must set up a load balancer in front of them after installing. For more information, see Configuring load balancing for Studio.

Dgraph and HDFS Agent

This section configures the Dgraph and the HDFS Agent.

Configuration property	Description and possible settings
`DGRAPH_SERVERS`	A comma-separated list of the hostnames of the nodes that will run the Dgraph and the Dgraph HDFS Agent. This list can't contain duplicate values. If you plan on storing your databases on HDFS, these must be HDFS DataNodes. For best performance, there shouldn't be any other Hadoop services running on these nodes, especially Spark.
`DGRAPH_THREADS`	The number of threads the Dgraph starts with. This should be at least 2. The exact number depends on the other services running on the machine: For machines running only the Dgraph, the number of threads should be equal to the number of cores on the machine. For machines running the Dgraph and other BDD components, the number of threads should be the number of cores minus 2. For example, a quad-core machine should have 2 threads. For HDFS nodes running the Dgraph, the number of threads should be the number of CPU cores minus the number required for the Hadoop services. For example, a quad-core machine running Hadoop services that require 2 cores should have 2 threads. If you leave this property blank, it will default to the number of CPU cores minus 2. Be sure that the number you use is in compliance with the licensing agreement.
`DGRAPH_CACHE`	The size of the Dgraph cache, in MB. Only specify the number; don't include `MB`. If you leave this property blank, it will default to either 50% of the node's available RAM or the total mount of free memory minus 2GB (whichever is larger). Oracle recommends allocating at least 50% of the node's available RAM to the Dgraph cache. If you later find that queries are getting cancelled because there isn't enough available memory to process them, experiment with gradually decreasing this amount.
`ZOOKEEPER_INDEX`	The index of the Dgraph cluster in the ZooKeeper ensemble, which ZooKeeper uses to identify it.

Data Processing

This section configures Data Processing and the Hive Table Detector.

Configuration property	Description and possible settings
`HDFS_DP_USER_DIR`	The location within the HDFS /user directory that stores the sample files created when Studio users export data. The name of this directory must not include spaces or slashes (/). The installer will create it if it doesn't already exist. If you have MapR and want to use an existing directory, it must be mounted with a volume.
`YARN_QUEUE`	The YARN queue Data Processing jobs are submitted to.
`HIVE_DATABASE_NAME`	The name of the Hive database that stores the source data for Studio data sets. The default value is `default`. This is the same as the default value of `DETECTOR_HIVE_DATABASE`, which is used by the Hive Table Detector. It is possible to use different databases for these properties, but it is recommended that you start with one for a first time installation.
`SPARK_ON_YARN_JAR`	The absolute path to the Spark on YARN JAR on your Hadoop nodes. This will be added to the CLI classpath. There are two templates for this property. Copy the value of the template that corresponds to your Hadoop distribution to `SPARK_ON_YARN_JAR` and update its value as follows: If you have CDH, use the first template. This should be the absolute path to spark-assembly.jar. For HDP, use the second template. This should be the absolute paths to hive-metastore.jar, hive-exec.jar and spark-assembly.jar, separated by a colon: <path/to/hive-metastore.jar>:<path/to/hive-exec.jar>:<path/to/spark-assembly.jar> If you have MapR, use the third template. This should be the absolute path to spark-assembly-1.5.2-mapr-1602-hadoop2.7.0-mapr-1602.jar. This JAR must be located in the same location on all Hadoop nodes.

Micro Service

This section configures the Transform Service.

Configuration property	Description and possible settings
`TRANSFORM_SERVICE_SERVERS`	A comma-separated list of the Transform Service nodes. For best performance, these should all be Managed Servers. In particular, they shouldn't be Dgraph nodes, as both the Dgraph and the Transform Service require a lot of memory. If you define multiple Transform Service nodes, you must set up a load balancer in front of them after installing. For instructions, see Configuring load balancing for the Transform Service.
`TRANSFORM_SERVICE_PORT`	The port the Transform Service listens on for requests from Studio.
`ENABLE_CLUSTERING_SERVICE`	For use by Oracle Support only. Leave this property set to `FALSE`.
`CLUSTERING_SERVICE_SERVERS`	For use by Oracle Support only. Don't modify this property.
`CLUSTERING_SERVICE_PORT`	For use by Oracle Support only. Don't modify this property.
`WORKFLOW_MANAGER_SERVERS`	The Workflow Manager Service node. Note that you can only define one.
`WORKFLOW_MANAGER_PORT`	The port the Workflow Manager Service listens on for data processing requests.