The orchestration script

BDD uses a single script, called the orchestration script, to install and deploy its components all at once. When the script finishes, BDD will be completely installed and your cluster will be up and running.

The orchestration script is contained in one of the BDD installation packages, which you will download to a single directory on the Admin Server. You must perform the entire installation process, including running the orchestration script, from this location.

The same installation package also contains the script's configuration file, bdd.conf, which defines the configuration of your cluster and provides the script with information it requires at runtime. You must update this file with information specific to your system and BDD cluster configuration before you run the orchestration script.

Silent installation

Normally, when the orchestration script runs, it prompts you to enter the following information:
  • The username and password for Cloudera Manager, which it uses to query Cloudera Manager for information related to your CDH cluster.
  • The username and password for the WebLogic Server administrator. The script will create this user when it deploys WebLogic.
  • The username and password for your Studio database, which it requires to connect Studio to the database.

You can avoid these steps by running the script in silent mode. To do this, you must add the following environment variables to your system before running the script. When the script runs, it checks for these environment variables and executes silently if it finds them.

This table describes the environment variables required to run the orchestration script in silent mode.

Environment variable Value
CM_USER The username for Cloudera Manager.
CM_PASSWORD The password for Cloudera Manager.
WLS_USERNAME The username for the WebLogic Server administrator.
WLS_PASSWORD The password for the WebLogic Server administrator. Remember that this must contain at least 8 characters, one of which must be a number, and cannot start with a number.
STUDIO_JDBC_USERNAME The username for your Studio database.
STUDIO_JDBC_PASSWORD The password for your Studio database.

Orchestration script behavior

The following diagram illustrates the behavior of the orchestration script.

Note: This diagram shows how the orchestration script distributes various portions of the BDD installation packages on various nodes in the deployment. This diagram is not intended to show how many nodes you can have in your deployment. For various deployment scenarios, including options for co-locating different parts of the BDD on the same nodes, see Deployment configurations and diagrams.

This diagram describes how parts of the Big Data Discovery are installed and deployed by the deployment script.

When the script runs, it does the following:
  1. Reads and validates bdd.conf.
  2. Prompts you for the user names and passwords for Cloudera Manager, the WebLogic Server administrator, and your database.
  3. Queries Cloudera Manager for CDH-related information, including the hostnames and port numbers of specific CDH nodes.
  4. Verifies that the Managed Servers nodes and Dgraph nodes meet the minimum CPU and RAM requirements defined in bdd.conf.
  5. Verifies that the COORDINATOR_INDEX defined in bdd.conf does not exist.
  6. Verifies that the Hive database defined in bdd.conf exists.
  7. Distributes the installation packages to each node in the cluster according to the configuration defined in bdd.conf.
  8. Verifies that each node meets all other requirements, including the operating system, and the JDK.
  9. If the FORCE property in bdd.conf is set to TRUE, deletes the ORACLE_HOME directory from each node.
  10. Installs the components:
    1. Installs WebLogic Server (including Studio and the Dgraph Gateway) on the Admin Server node and all Managed Server nodes.
    2. Installs the Dgraph and HDFS Agent on all nodes that will host Dgraph instances.
    3. Installs Data Processing on the HDFS node and all Spark servers.
    4. Installs the Data Processing CLI on all Managed Server nodes.
    5. Installs the bdd-admin script on all Managed Server nodes, Dgraph nodes, Spark worker nodes, and YARN node manager servers (not shown in the diagram).
  11. Deploys Data Processing:
    1. Deploys Data Processing to the HDFS node and all Spark nodes.
    2. Deploys the CLI to all Managed Server nodes.
    3. If configured to do so, deploys the Hive Table Detector to the specified node and starts it.
  12. Deploys WebLogic Server:
    1. Creates the WebLogic domain and the Managed Servers.
    2. Deploys the Dgraph Gateway and Studio as applications within the WebLogic domain.
    3. Deploys WebLogic as a service on all Managed Servers.
    4. Starts all Managed Servers.
  13. Deploys the Dgraph and HDFS Agent:
    1. Deploys both components as services to all Dgraph nodes.
    2. If configured to do so, creates the empty Dgraph index files on the NFS.
    3. Starts the Dgraph and HDFS Agent.
  14. Verifies that the entire BDD deployment cluster is running.