The first part of bdd.conf contains required settings. You must update these with information specific to your system, or the installer could fail.
This section contains blank settings that you must provide values for. If you don't set these, the installation will fail.
| Configuration property | Description |
|---|---|
| ORACLE_HOME | The path to the BDD root directory,
where BDD will be installed on each node in the cluster. This directory must
not exist, and its parent directories' permissions must be set to either 755 or
775.
Note that this is different from the ORACLE_HOME environment variable required by the Studio database. Important: You must ensure that the installer can
create this directory on all nodes that will host BDD components, including
Hadoop nodes that will host Data Processing.
On the install machine and all other nodes that will host WebLogic Server, this directory must contain at least 6GB of free space. Nodes that will host the Dgraph require 1GB of free space, and those that will host Data Processing require 2GB. |
| ORACLE_INV_PTR | The absolute path to the Oracle
inventory pointer file, which the installer will create. This file can't be
located in the
ORACLE_HOME directory.
If you have any other Oracle software products installed, this file will already exist. Update this property to point to it. |
| INSTALLER_PATH | Optional. The absolute path to the
installation source directory. This must contain at least 10GB of free space.
If you don't set this property, you can either set the INSTALLER_PATH environment variable or specify the path at runtime. For more information, see The BDD installer. |
| DGRAPH_INDEX_DIR | The absolute path to the Dgraph
databases. This shouldn't be located under
ORACLE_HOME, or it will be deleted.
The script will create this directory if it doesn't currently exist. If you're installing with existing databases, set this property to their parent directory. If you're installing on a CDH cluster with HDFS data at rest encryption enabled and you want to store your databases on HDFS, be sure that this is in an encryption zone. |
| HADOOP_UI_HOST | The name of the server hosting your Hadoop manager (Cloudera Manager or Ambari). |
| STUDIO_JDBC_URL | The JDBC URL for the Studio database.
There are three templates for this property. Copy the
template that corresponds to your database type to
STUDIO_JDBC_URL and update the URL to point
to your database.
Note: BDD doesn't currently support database migration. After
deployment, the only ways to change to a different database are to reconfigure
the database itself or reinstall BDD.
|
This section configures settings relevant to all components and the installation process itself.
| Configuration property | Description |
|---|---|
| INSTALL_TYPE | Determines the installation type
according to your hardware and Hadoop distribution. Set this to one of the
following:
Note that this document doesn't cover Oracle Big Data Appliance (BDA) or Oracle Public Cloud (OPC) installations. If you want to install on the Big Data Appliance, see the Oracle Big Data Appliance Owner's Guide Release 4 (4.5) and the corresponding MOS note. |
| CLUSTER_MODE | Determines whether you're installing on
a single machine or a cluster. Use
TRUE if you're installing on a cluster, and
FALSE if you're installing on a single
machine.
Note: If you're installing on a single machine, see
Single-Node Installation.
|
| JAVA_HOME | The absolute path to the JDK install
directory. This must be the same on all BDD servers and should have the same
value as the
$JAVA_HOME environment variable.
If you have multiple versions of the JDK installed, be sure that this points to the correct one. |
| TEMP_FOLDER_PATH | The temporary directory used on each
node during the installation. This must point to an existing directory on all
BDD nodes.
On the install machine and all other WebLogic and Dgraph nodes, this directory must contain at least 10GB of free space. Data Processing nodes require 3GB of free space. |
This section contains properties related to Hadoop. The installer uses these properties to query the Hadoop manager (Cloudera Manager or Ambari) for information about the Hadoop components, such as the URIs and names of their host servers.
| Configuration property | Description and possible settings |
|---|---|
| HADOOP_UI_PORT | The port number of the server running the Hadoop manager. |
| HADOOP_UI_CLUSTER_NAME | The name of your Hadoop cluster, which is listed in the manager. Be sure to replace any spaces in the cluster name with %20. |
| HUE_URI | HDP only. The hostname and port of the node running Hue, in the format <hostname>:<port>. |
| HADOOP_CLIENT_LIB_PATHS | A comma-separated list of the absolute
paths to the Hadoop client libraries.
Note: You only need to set this property before installing if
you have HDP. For CDH, the installer will download the required libraries and
set this property automatically. Note that this requires an internet
connection. If the script is unable to download the libraries, it will fail;
see
Failure to download the Hadoop client libraries
for instructions on solving this issue.
To set this property, copy the template for your Hadoop distribution to HADOOP_CLIENT_LIB_PATHS and update the paths to point to the client libraries you copied to the install machine. Don't change the order of the paths in the list as they must be specified as they appear. |
| HADOOP_CERTIFICATION_PATH | Only required for Hadoop clusters with TLS/SSL enabled. The absolute path to the directory on the install machine where you put the certificates for HDFS, YARN, Hive, and the KMS. |
This section configures Kerberos for BDD.
| Configuration property | Description and possible settings |
|---|---|
| ENABLE_KERBEROS | Enables Kerberos in the BDD cluster. If Kerberos is installed on your cluster and you want BDD to integrate with it, set this value to TRUE; if not, set it to FALSE. |
| KERBEROS_PRINCIPAL | The name of the BDD principal. This
should include the name of your domain; for example,
bdd-service@EXAMPLE.COM.
This property is only required if ENABLE_KERBEROS is set to TRUE. |
| KERBEROS_KEYTAB_PATH | The absolute path to the BDD
keytab file on the install machine.
The installer will rename this to bdd.keytab and copy it to $BDD_HOME/common/kerberos/ on all BDD nodes. This property is only required if ENABLE_KERBEROS is set to TRUE. |
| KRB5_CONF_PATH | The absolute path to the
krb5.conf file on the install machine. The
installer will copy this to
/etc on all BDD nodes.
This property is only required if ENABLE_KERBEROS is set to TRUE. |
This section configures the WebLogic Server, including the Admin Server and all Managed Servers.
| Configuration property | Description and possible settings |
|---|---|
| ADMIN_SERVER | The hostname of the install machine, which
will become the Admin Server.
If you leave this blank, it will default to the hostname of the machine you're on. |
| MANAGED_SERVERS | A comma-separated list of the Managed
Server hostnames (the servers that will run WebLogic, Studio, and the Dgraph
Gateway). This list must include the Admin Server and can't contain duplicate
values.
Note: If you define more than one Managed Server, you must set
up a load balancer in front of them after installing. For more information, see
Configuring load balancing.
|
This section configures the Dgraph and the HDFS Agent.
| Configuration property | Description and possible settings |
|---|---|
| DGRAPH_SERVERS | A comma-separated list of the hostnames
of the nodes that will run the Dgraph and the Dgraph HDFS Agent.
This list can't contain duplicate values. If you plan on storing your databases on HDFS, these must be HDFS DataNodes. For best performance, there shouldn't be any other Hadoop services running on these nodes, especially Spark. |
| DGRAPH_THREADS | The number of threads the Dgraph starts
with. This should be at least 2. The exact number depends on the other services
running on the machine:
If you leave this property blank, it will default to the number of CPU cores minus two. Be sure that the number you use is in compliance with the licensing agreement. |
| DGRAPH_CACHE | The size of the Dgraph cache, in MB.
Only specify the number; don't include
MB.
If you leave this property blank, it will default to either 50% of the node's available RAM or the total mount of free memory minus 2GB (whichever is larger). Oracle recommends allocating at least 50% of the node's available RAM to the Dgraph cache. If you later find that queries are getting cancelled because there isn't enough available memory to process them, experiment with gradually decreasing this amount. |
| ZOOKEEPER_INDEX | The index of the Dgraph cluster in the ZooKeeper ensemble, which ZooKeeper uses to identify it. |
This section configures Data Processing and the Hive Table Detector.
| Configuration property | Description and possible settings |
|---|---|
| HDFS_DP_USER_DIR | The location within the HDFS /user directory that stores the Avro files created when Studio users export data. The installer will create this directory if it doesn't already exist. The name of this directory must not include spaces or slashes (/). |
| YARN_QUEUE | The YARN queue Data Processing jobs are submitted to. |
| HIVE_DATABASE_NAME | The name of the Hive database that stores
the source data for Studio data sets.
The default value is default. This is the same as the default value of DETECTOR_HIVE_DATABASE, which is used by the Hive Table Detector. It is possible to use different databases for these properties, but it is recommended that you start with one for a first time installation. |
| SPARK_ON_YARN_JAR | The absolute path to the Spark on YARN JAR
on your Hadoop nodes. This will be added to the CLI classpath.
There are two templates for this property. Copy the value of
the template that corresponds to your Hadoop distribution to
SPARK_ON_YARN_JAR and update its value as
follows:
|
This section configures the Transform Service.
| Configuration property | Description and possible settings |
|---|---|
| TRANSFORM_SERVICE_SERVERS | A comma-separated list of the Transform
Service nodes. For best performance, these should all be Managed Servers. In
particular, they shouldn't be Dgraph nodes, as both the Dgraph and the
Transform Service require a lot of memory.
Note: If you define more than one Managed Server, you must set
up a load balancer in front of them after installing. For more information, see
Configuring load balancing.
|
| TRANSFORM_SERVICE_PORT | The port the Transform Service listens on for requests from Studio. |