Updating the configuration file

Important: The bdd.conf file defines the configuration of your BDD cluster and provides the orchestration script with parameters it requires to run. Updating this file is the most important step of the installation and deployment process. If you don't modify the file, or if you modify it incorrectly, the orchestration script could fail or your cluster could be configured differently than you intended.

You can edit the configuration file using any text editor. Be sure to save your changes before closing.

The orchestration script validates the configuration file at runtime and fails if the file contains any invalid values. To avoid this, keep the following in mind when updating the file:

You must provide a value for all properties except DGRAPH_ADDITIONAL_ARG, which is only intended for use by Oracle Support.
The accepted values for some properties are case-sensitive and must be entered exactly as they appear in this document.
You must provide fully qualified hostnames.
Any symlinks included in paths must be identical on all nodes. If any are different, or do not exist, the installation may fail.
Each port setting must have a unique value. You cannot use the same port number more than once.
Some of the directories defined in the configuration file have location requirements. These are specified in this document.

The following sections describe the properties in the configuration file and any requirements or restrictions they have. The configuration file itself also provides some of this information. Be sure to read the following sections carefully before modifying any properties.

Global settings

The first section in bdd.conf configures global settings, which are relevant to all components and the installation and deployment process itself.

Configuration property	Description
`INSTALL_TYPE`	Sets the installation type according to the hardware you're installing on. This can be set to one of the following: `BDA`: Use this value if you're installing on the Oracle Big Data Appliance. `GENERIC`: Use this value if you're installing on general-purpose hardware. This is the default value. `OPC`: Use this value if you're installing on the Oracle Public Cloud. Note that this document does not cover BDA or Cloud installation. For information on installing on either platform, please contact Oracle Customer Support.
`CLUSTER_MODE`	Determines whether you're deploying to a single machine or a cluster. Use `TRUE` if you're deploying to a cluster. This is the default value. If you're deploying to a single machine, use `FALSE`. When deploying to a single machine, you should also be sure that the `MANAGED_SERVERS`, `DGRAPH_SERVERS`, and `DETECTOR_SERVER` properties are set to `${ADMIN_SERVER}`, or the orchestration script will fail. Note that this property only accepts UPPERCASE values.
`FORCE`	Determines whether the orchestration script will remove files and directories left over from previous installations when it runs. When set to `TRUE`, the orchestration script removes any previous installations from the `ORACLE_HOME` directory. Use this value if you're rerunning the script after a failed attempt. When set to `FALSE`, the orchestration script does not remove any previous installations. If one exists, the script will fail. This is the default value. Note that this property only accepts UPPERCASE values.
`ORACLE_HOME`	The path to the BDD root directory, where BDD will be installed on all nodes in the cluster. This directory will be created by the orchestration script, and therefore should not be an existing one. Important: You must ensure that this directory can be created on all nodes that BDD will be installed on, including CDH nodes that will host Data Processing. On the Admin Server and nodes that will host WebLogic Server, this directory must contain at least 6GB of free space. Nodes that will host the Dgraph require 1GB of free space, and those that will host Data Processing require 2GB. The default value is `/localdisk/Oracle/Middleware`.
`ORACLE_INV_PTR`	The path to the Oracle inventory pointer file. This file can't be located in the `ORACLE_HOME` directory. The default value is `/localdisk/Oracle/oraInst.loc`. If any other Oracle software products are installed on the machine, this file will already exist. You should update this value to point to that file.
`JAVA_HOME`	The path to the JDK install directory. This must be the same on all BDD servers. Note that this property is not the same as the `JAVA_PATH` property. The default value is `/usr/java/jdk1.7.0_67`.
`INSTALLER_PATH`	The path to the installation source directory on the Admin Server (the location you moved the installation packages to). This directory must contain at least 6GB of free space. The default value is `/localdisk/BDD_deployer/packages`.
`BDD_HOME`	The path to the BDD install directory, which the orchestration script will create on all BDD servers. This directory must be inside `ORACLE_HOME`. The default value is `${ORACLE_HOME}/BDD1.0` .
`ENABLE_AUTOSTART`	Determines whether the BDD cluster will automatically restart after its servers are rebooted: `TRUE`: WebLogic (including Studio and the Dgraph Gateway), the Dgraph, and the HDFS Agent will automatically restart after their host servers are rebooted. This is the default value. `FALSE`: WebLogic, the Dgraph, and the HDFS Agent must be restarted manually. Note that this property only accepts UPPERCASE values.
`TEMP_FOLDER_PATH`	The temporary directory used on each node during the installation. The default value is `/tmp`. On the Admin Server and nodes that will host WebLogic Server or the Dgraph, this directory must contain at least 10GB of free space. Nodes that will host Data Processing require 3GB of free space.

WebLogic settings

The third section in bdd.conf configures the WebLogic Server, including the Admin Server and all Managed Servers. It does not configure Studio or the Dgraph Gateway.

Configuration property	Description and possible settings
`WLS_START_MODE`	Defines the mode WebLogic Server will start in. If set to `prod`, the WebLogic Server starts in production mode, which requires a username and password when it starts. This is the default value. If set to `dev`, it starts in development mode, which does not require a username or password. The orchestration script will still prompt you for a username and password at runtime, but these will not be required when starting WebLogic Server. Note that this property only accepts lowercase values.
`ADMIN_SERVER`	The fully qualified hostname of the machine that will become the WebLogic Admin Server. This should be the machine you are currently working on. There is no default value for this property, so you must provide one. Be sure to provide a value for this property, as the installation script will fail if it is not set.
`MANAGED_SERVERS`	A comma-separated list of the fully qualified hostnames of the WebLogic Managed Servers (the servers that will run WebLogic, Studio, and the Dgraph Gateway). This list must include the hostname for the Admin Server, and cannot contain duplicate values. If you're installing on a single machine, this property should be set to `${ADMIN_SERVER}`, or the orchestration script will fail.
`WEBLOGIC_DOMAIN_NAME`	The name of the WebLogic domain, which Studio and the Dgraph Gateway run in. The default value is `bdd_domain`.
`ADMIN_SERVER_PORT`	The port number used by the Admin Server. This number must be unique. The default value is `7001`.
`MANAGED_SERVER_PORT`	The port used by the Managed Server (i.e., Studio). This number must be unique. The default value is `7003`. This property is still required if you are installing on a single server.
`WLS_CPU_CORES`	This property does not set the number of CPU cores that the WebLogic Server will actually use. Instead, the orchestration script uses the value for this property at runtime to check whether each of the machines has the minimum number of CPU cores required on each Managed Server. The value you enter should be less than or equal to the number of CPU cores available on the node. If you are unsure of how many cores a node has, check its node file. If you enter a value that is greater than the total number of cores available on the node, the script issues a warning but continues to run. If you do not specify a number for this property, the script uses the default value of `4`.
`WLS_RAM_SIZE`	This property does not set the RAM size that the WebLogic Server will actually use. Instead, the orchestration script uses the value for this property at runtime to check whether each of the machines has the minimum amount of RAM available on each Managed Server, in KB. The value you enter (in KB) should be less than or equal to the total amount of RAM available on the node. If you are unsure of how much RAM a node has, check its node file. If you enter a value that is greater than the total amount of RAM available on the node, the script issues a warning but continues to run. If you do not specify a value for this property, the script uses the default value of `2048000` KB.
`WLS_SECURE_MODE`	Enables and disables SSL for Studio's outward-facing ports. This can be set to `TRUE` or `FALSE`. When set to `TRUE`, the Studio instances on the Admin Server and the Managed Servers listen for requests on the `ADMIN_SERVER_SECURE_PORT` and `MANAGED_SERVER_SECURE_PORT`, respectively. The default value is `TRUE`. Note that this property does not enable SSL for any other BDD components.
`ADMIN_SERVER_SECURE_PORT`	The secure port on the Admin Server on which Studio listens when `WLS_SECURE_MODE` is set to `TRUE`. This number must be unique. The default value is `7002`. Note that when SSL is enabled, Studio still listens on the un-secure `ADMIN_SERVER_PORT` for requests from the Dgraph Gateway.
`MANAGED_SERVER_SECURE_PORT`	The secure port on the Managed Server on which Studio listens when `WLS_SECURE_MODE` is set to `TRUE`. This number must be unique. The default value is `7004`. Note that when SSL is enabled, Studio still listens on the un-secure `MANAGED_SERVER_PORT` for requests from the Dgraph Gateway.

CDH settings

The second section in bdd.conf contains properties related to Cloudera Manager. The orchestration script uses the values you provide to query Cloudera Manager for information about the other CDH components, such as the URIs and names of their host servers.

Configuration property	Description and possible settings
`CM_HOST`	The hostname of the server running Cloudera Manager. The default value is `${ADMIN_SERVER}`.
`CM_PORT`	The port number used by the server running Cloudera Manager. The default value is `7180`.
`CM_CLUSTER_NAME`	The name of the CDH cluster, which is listed in the Cloudera Manager. Be sure to replace any spaces in the cluster name with `%20`. The default value is `Cluster%201`.

Dgraph Gateway settings

The fourth section in bdd.conf configures the Dgraph Gateway.

Configuration Property	Description and possible settings
`ENDECA_SERVER_LOG_LEVEL`	The log level used by the Dgraph Gateway: `DEBUG` `INFO` `WARN` `ERROR` `FATAL` The default value is `ERROR`. More information on Dgraph Gateway log levels is available in the Oracle Big Data Discovery Administrator's Guide.

Configuration Property

Description and possible settings

ENDECA_SERVER_LOG_LEVEL

The log level used by the Dgraph Gateway:

DEBUG
INFO
WARN
ERROR
FATAL

The default value is ERROR.

More information on Dgraph Gateway log levels is available in the Oracle Big Data Discovery Administrator's Guide.

Studio settings

The fifth section in bdd.conf configures Studio.

Configuration property	Description and possible settings
`SERVER_TIMEOUT`	The timeout value (in milliseconds) used when responding to requests sent to all Dgraph Gateway web services except the Data Ingest Web Service. A value of `0` means there is no timeout. The default value is `300000`.
`SERVER_INGEST_TIMEOUT`	The timeout value (in milliseconds) used when responding to requests sent to the Data Ingest Web Service. A value of `0` means there is no timeout. The default value is `1680000`.
`SERVER_HEALTHCHECK_TIMEOUT`	The timeout value (in milliseconds) used when checking data source availability when connections are initialized. A value of `0` means there is no timeout. The default value is `10000`.
`STUDIO_JDBC_URL`	The JDBC URL for the database, which enables Studio to connect to it. There are three templates for this property, but only one can be used. The remaining two must be commented out with a hash symbol (#). The first template is for MySQL 5.5.3 (or later) databases, which use the `com.mysql.jdbc.Driver` driver. This template is uncommented out by default. If you are using a MySQL database, leave this template uncommented, make sure the other two are commented out, and update the URL as follows: jdbc:mysql://`<database hostname>`:`<port number>`/`<database name>`?useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false The second template is for Oracle 11g or 12c databases, which use the `oracle.jdbc.OracleDriver` driver. If you are using an Oracle database, uncomment this template, comment out the other two, and update the URL as follows: jdbc:oracle:thin:@`<database hostname>`:`<port number>`:`<database SID>` The third template is for Hypersonic databases, which use the `org.hsqldb.jdbcDriver` driver. Hypersonic is not supported for production environments, so you should only use this instance if you are deploying to a demo environment. If you want the orchestration script to create a Hypersonic database for you, uncomment this template and comment out the other two. The orchestration script will create the database for you in the location defined by the URL. Note: BDD does not currently support database migration. After deployment, the only ways to change to a different database are to reconfigure the database itself or reinstall BDD.

Dgraph and HDFS Agent settings

The sixth section in bdd.conf configures the Dgraph and HDFS Agent.

Configuration property	Description and possible settings
`DGRAPH_SERVERS`	A comma-separated list of the fully qualified hostnames of all Dgraph nodes in the cluster. The orchestration script will install and deploy the Dgraph to these nodes. This list cannot contain duplicate values. Additionally, as Oracle does not recommend cohosting the Dgraph with Spark, this list should not contain hostnames of Spark nodes. If you're installing on a single machine, this property should be set to `${ADMIN_SERVER}`, or the orchestration script will fail.
`DGRAPH_CPU_CORES`	This property does not set the number of cores that the Dgraph will actually use. Instead, the orchestration script uses the value for this property at runtime to check whether each of the machines has the minimum number of CPU cores required for hosting the Dgraph and the HDFS Agent. The value you enter should be less than or equal to the number of cores available on the machine. If you enter a value that is greater than the total number of cores available on the Dgraph nodes, the script issues a warning but continues to run. If you do not specify a number for this property, the orchestration script uses the default value of `2` cores.
`DGRAPH_RAM_SIZE`	This property does not set the RAM size that the Dgraph will actually use. Instead, the orchestration script uses the value for this property (in KB) at runtime to check whether each of the machines has the minimum amount of RAM required on nodes hosting the Dgraph and the HDFS Agent. The value you enter should be less than or equal to the total amount of RAM available on the node. If you enter a value that is greater than the total amount of RAM available on the Dgraph nodes, the script issues a warning but continues to run. If you do not specify a number for this property, the orchestration script uses the default value of `2048000` KB.
`DGRAPH_OUT_FILE`	The path to the Dgraph's stdout/stderr file. The default value is ${BDD_HOME}/logs/dgraph.out.
`DGRAPH_INDEX_DIR`	The path to the directory on the shared NFS in which the Dgraph index (defined by `DGRAPH_INDEX_DIR`) will be located. The orchestration script will create this directory if it does not already exist. The default value is /share/bdd_dgraph_index. If you are installing with an existing index, be sure to change the value of this property to the name of the directory the index is located in. Important: If `DGRAPH_INDEX_NAME` is set to `base`, the orchestration script will delete any files in this location and replace them with the empty indexes.
`DGRAPH_INDEX_NAME`	The name of the Dgraph index, which will be located in the directory defined by `DGRAPH_INDEX_DIR`. The default value is `base`. Important: If you do not change this value, the orchestration script will delete all files in the `DGRAPH_INDEX_DIR` and create an empty index named `base`. Only use this value if you want to install with an empty index. If you are installing with an existing index, move the index to the directory defined by `DGRAPH_INDEX_DIR` and change the value of this property to the name of the index you are using. If the index does not exist in the `DGRAPH_INDEX_DIR` location, the orchestration script will fail. Do not include `_indexes` in the index's name. For example, if you have an index named `product_indexes`, you should only specify `product`.
`DGRAPH_THREADS`	The number of threads the Dgraph starts with. There is no default value for this property, so you must provide one. Oracle recommends the following: For machines running only the Dgraph, the number of threads should be equal to the number of CPU cores on the machine. For machines running the Dgraph and other BDD components, the number of threads should be the number of CPU cores minus 2. For example, a machine with 4 cores should have 2 threads. Be sure that the number you use is in compliance with the licensing agreement.
`DGRAPH_CACHE`	The size of the Dgraph cache, in MB. There is no default value for this property, so you must provide one. You only need to specify the number of MB to allocate to the cache. For example, a value of `50` sets the cache size to 50MB. For enhanced performance, Oracle recommends allocating at least 50% of the node's available RAM to the Dgraph cache. If you later find that queries are getting cancelled because there is not enough available memory to process them, experiment with gradually decreasing this amount.
`DGRAPH_WS_PORT`	The port number the Dgraph web service runs on. This number must be unique. The default value is `7010`.
`DGRAPH_BULKLOAD_PORT`	The port on which the Dgraph listens for bulk load ingest requests. This number must be unique. The default value is `7019`.
`COORDINATOR_INDEX`	The index of the Dgraph cluster in the ZooKeeper ensemble. ZooKeeper uses this value to identify the cluster. The default value is `cluster1`. Note that this property is not related to the Dgraph index.
`DGRAPH_ADDITIONAL_ARG`	Note: This property is only intended for use by Oracle Support. Do not provide a value for this property when installing BDD. Defines one or more flags to start the Dgraph with. More information on Dgraph flags is available in the Oracle Big Data Discovery Administrator's Guide.
`AGENT_PORT`	The port on which the HDFS Agent listens for HTTP requests. This number must be unique. The default value is `7102`.
`AGENT_EXPORT_PORT`	The port on which the HDFS Agent listens for requests from the Dgraph. This number must be unique. The default value is `7101`.
`AGENT_OUT_FILE`	The path to the HDFS Agent's stdout/stderr file. The default value is ${BDD_HOME}/logs/dgraphHDFSAgent.out.

Data Processing settings

The seventh section in bdd.conf configures Data Processing and the Hive Table Detector.

Configuration property	Description and possible settings
`HDFS_DP_USER_DIR`	The location within the HDFS /user directory that stores the Avro files created when users export data from BDD. The orchestration script will create this directory if it does not already exist. The name of this directory must not include spaces. The default value is `bdd`.
`ENABLE_HIVE_TABLE_DETECTOR`	Enables and disables the Hive Table Detector. When set to `TRUE`, the Hive Table Detector runs automatically on the server defined by `DETECTOR_SERVER`. When set to `FALSE`, the Hive Table Detector is not created. The default value is `FALSE`.
`DETECTOR_SERVER`	The fully qualified hostname of the server the Hive Table Detector runs on. This must be one of the WebLogic Managed Servers. The default value is `${ADMIN_SERVER}`. If you are installing on a single machine, this property should be set to `${ADMIN_SERVER}`, or the orchestration script will fail.
`DETECTOR_HIVE_DATABASE`	The name of the Hive database that the Hive Table Detector monitors. The default value is `default`. This is the same as the default value of `HIVE_DATABASE_NAME`, which is used by Studio and the CLI. It is possible to use different databases for these properties, but it is recommended that you start with one for a first time installation.
`DETECTOR_MAXIMUM_WAIT_TIME`	The maximum amount of time (in seconds) that the Hive Table Detector waits between update jobs. The default value is `1800`.
`DETECTOR_SCHEDULE`	A Cron format schedule that specifies how often the Hive Table Detector runs. This must be enclosed in quotes. The default value is `"0 0 * * *"`, which means the Hive Table Detector runs at midnight, every day of every month.

CLI settings

The final section in bdd.conf configures the CLI. These properties are used in both Studio and Data Processing.

Configuration property	Description and possible settings
`ENABLE_ENRICHMENTS`	Determines whether data enrichments are run during the sampling phase of data processing. This setting controls the Language Detection, Term Extraction, Geocoding Address, Geocoding IP, and Reverse Geotagger modules. When set to `true`, all of the data enrichments run, and when set to `false`, none of them run. The default value is `true`. For more information on data enrichments, see the Data Processing Guide.
`JAVA_PATH`	The path to the Java binaries within the Java installation, which should be in the same location on each server in the cluster. The default value is `${JAVA_HOME}/bin/java`. Note that this property is not the same as `JAVA_HOME`.
`MAX_RECORDS`	The maximum number of records included in a data set. For example, if a Hive table has 1,000,000 records, you could restrict the total number of sampled records to 100,000. Note that the actual number of records in each data set will sometimes be slightly more than or slightly less than the value of `MAX_RECORDS`. The default value is `1000000`.
`SPARK_EXECUTOR_MEMORY`	The amount of memory that Data Processing jobs request from the Spark worker nodes. The default value is `48g`. You should increase this value if you plan on processing very large Hive tables. The value of this property must be equal to or less than the value of Spark's `Total Java Heap Sizes of Worker's Executors in Bytes` (`executor_total_max_heapsize`) property. To access the `executor_total_max_heapsize` property, open Cloudera Manager and select Clusters > Spark (Standalone), click the Configuration tab, and select the Worker Default Group category.
`SANDBOX_PATH`	The path to the HDFS directory in which the Avro files created when users export data from BDD are stored. The default value is `/user/${HDFS_DP_USER_DIR}`.
`LANGUAGE`	Specifies either a supported ISO-639 language code (`en`, `de`, `fr`, etc.) or a value of `unknown` to set the language property for all attributes in the data set. This controls whether Oracle Language Technology (OLT) libraries are invoked during indexing. A language code requires more processing but produces better processing and indexing results by using OLT libraries for the specified language. If the value is `unknown`, the processing time is faster but the processing and indexing results are more generic and OLT is not invoked. The default value is `unknown`.
`HIVE_DATABASE_NAME`	The name of the Hive database that stores the source data for Studio data sets. This is used by Studio as well as the CLI. The default value is `default`. This is the same as the default value of `DETECTOR_HIVE_DATABASE`, which is used by the Hive Table Detector. It is possible to use different databases for these properties, but it is recommended that you start with one for a first time installation.