2 Installing Oracle Big Data SQL

Oracle Big Data SQL 3.0 can connect Oracle Database to the Hadoop environment on Oracle Big Data Appliance, other systems based on CDH (Cloudera's Distribution including Apache Hadoop), HDP (Hortonworks Data Platform), and potentially other non-CDH Hadoop systems.

The procedures for installing Oracle Big Data SQL in these environments differ. To install the product in your particular environment, see the appropriate section:

2.1 Oracle Big Data SQL Compatibility Matrix

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for up-to-date information on Big Data SQL compatibility with the following:

  • Oracle Engineered Systems.

  • Other systems.

  • Linux OS distributions and versions.

  • Hadoop distributions.

  • Oracle Database releases, including required patches.

2.2 Installing On Oracle Big Data Appliance and the Oracle Exadata Database Machine

To use Oracle Big Data SQL on an Oracle Exadata Database Machine connected to Oracle Big Data Appliance, you must install the Oracle Big Data SQL software on both systems.

2.2.1 Performing the Installation

Follow these steps to install the Oracle Big Data SQL software on Oracle Big Data Appliance and Oracle Exadata Database Machine.

Note:

This procedure is not applicable to the installation of Oracle Big Data SQL on systems other than Oracle Big Data Appliance and Oracle Exadata Database Machine.

The January 2016 Bundle Patch (12.1.0.2.160119 BP) for Oracle Database must be pre-installed on the Exadata Database Machine. Earlier Bundle Patches are not supported at this time.

  1. Download the Oracle Database one-off patch 22778199.
  2. On all Oracle Exadata Database Machine compute servers, install the patch on:
    • Grid Infrastructure home

    • Oracle Database home

    Remember to run the Datapatch part of the Bundle Patch. See the patch README for step-by-step instructions for installing the patch.

  3. On Oracle Big Data Appliance, install or upgrade the software to the latest version. See Oracle Big Data Appliance Owner's Guide. However, skip the Oracle Big Data SQL 2.0 installation option in Mammoth and do not enable Oracle Big Data SQL Release 2.0 . If Release 2.0 is present, then it must be uninstalled before installing Oracle Big Data SQL 3.0.
  4. On Oracle Big Data Appliance, download the Oracle Big Data SQL 3.0 patch.
    Patch 22911748: PATCH FOR BIG DATA SQL V3.0.0 ON BDA V4.4.0 FOR ORACLE LINUX 6
    Follow the instructions in the patch README file.
  5. On each Oracle Exadata Database Machine, run the post-installation script.

    See "Running the Post-Installation Script for Oracle Big Data SQL".

    You must run the post-installation script on every node of the Exadata database cluster.

You can use Cloudera Manager to verify that Oracle Big Data SQL is up and running.

When you are done, if the cluster is secured by Kerberos then there are additional steps you must perform on both the cluster nodes and on the Oracle Exadata Database Machine. See Enabling Oracle Big Data SQL Access to a Kerberized Cluster.

In the case of an Oracle Big Data Appliance upgrade, the customer is responsible for upgrading the Oracle Database to a supported level before re-running the post-installation script.

2.2.2 Running the Post-Installation Script for Oracle Big Data SQL

Important

Run bds-exa-install.sh on every node of the Exadata cluster. If this is not done, you will see RPC connection errors when the BDS service is started.

To run the Oracle Big Data SQL post-installation script:

  1. Copy the bds-exa-install.sh installation script from the Oracle Big Data Appliance to a temporary directory on the Oracle Exadata Database machine. (Find the script on the node where Mammoth is installed, typically the first node in the cluster.) For example:

    # curl -O http://bda1node07/bda/bds-exa-install.sh
    
  2. Verify the name of the Oracle installation owner and set the executable bit for this user. Typically, the oracle user owns the installation. For example:

    $ ls -l bds-exa-install.sh
    $ chown oracle:oinstall bds-exa-install.sh
    $ chmod +x bds-exa-install.sh
    
  3. Set the following environment variables:

    $ORACLE_HOME to <database home>
    $ORACLE_SID to <correct db SID>
    $GI_HOME to <correct grid home>
    

    Note:

    You can set the grid home with the install script as mentioned in step 5 d instead of setting the $GI_HOME as mentioned in this step.

  4. Check that TNS_ADMIN is pointing to the directory where the right listener.ora is running. If the listener is in the default TNS_ADMIN location, $ORACLE HOME/network/admin, then there is no need to define the TNS_ADMIN. But if the listener is in a non-default location, TNS_ADMIN must correctly point to it, using the command:

    export TNS_ADMIN=<path to listener.ora>
    
  5. Perform this step only if the ORACLE_SID is in uppercase, else you can proceed to the next step. This is because the install script derives the CRS database resource from ORACLE_SID, only if it is in lowercase. Perform the following sequence of steps to manually pass the SID to the script, if it is in uppercase:

    1. Run the following command to list all the resources.

      $ crsctl stat res -t
      
    2. From the output note down the ora.<dbresource>.db resource name.

    3. Run the following command to verify whether the correct ora.<dbresource>.db resource name is returned or not.

      $ ./crsctl stat res ora.<dbresource>.db
      

      The output displays the resource names as follows:

      NAME=ora.<dbresource>.db
      TYPE=ora.database.type
      TARGET=ONLINE , ONLINE
      STATE=ONLINE on <name01>, ONLINE on <name02>
      
    4. Specify the --db-name=<dbresource> as additional argument to the install script as follows:

      ./bds-exa-install.sh --db-name=<dbresource>
      

      Additionally, you can set the grid home instead of setting the $GI_HOME as mentioned in step 3, along with the above command as follows:

      ./bds-exa-install.sh --db-name=<dbresource> --grid-home=<grid home>
      

      Note:

      You can skip the next step, if you performed this step.

  6. Run the script as any user who has dba privileges (who can connect to sys as sysdba).

    ./bds-exa-install.sh
    

    You must run the script as root in another session when prompted by the script to proceed as the oracle user. For example,

    $ ./bda-exa-install.sh:
    bds-exa-install: root shell script         : /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bds-root-<cluster-name>-setup.sh
    please run as root:
    /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bds-root-<rack-name>-clu-setup.sh
    

    A sample output is shown here:

    ds-exa-install: platform is Linux
    bds-exa-install: setup script started at   : Sun Feb 14 20:06:17 PST 2016
    bds-exa-install: bds version               : bds-3.0-0.el6.x86_64
    bds-exa-install: bda cluster name          : mycluster1
    bds-exa-install: bda web server            : mycluster1bda16.us.oracle.com
    bds-exa-install: cloudera manager url      : mycluster1bda18.us.oracle.com:7180
    bds-exa-install: hive version              : hive-1.1.0-cdh5.5.1
    bds-exa-install: hadoop version            : hadoop-2.6.0-cdh5.5.1
    bds-exa-install: bds install date          : 02/14/2016 12:00 PST
    bds-exa-install: bd_cell version           : bd_cell-12.1.2.0.100_LINUX.X64_160131-1.x86_64
    bds-exa-install: action                    : setup
    bds-exa-install: crs                       : true
    bds-exa-install: db resource               : orcl
    bds-exa-install: database type             : SINGLE
    bds-exa-install: cardinality               : 1
    bds-exa-install: root shell script         : /u03/app/oracle/product/12.1.0/dbhome_1/install/bds-root-mycluster1-setup.sh
    please run as root:
    
    /u03/app/oracle/product/12.1.0/dbhome_1/install/bds-root-mycluster1-setup.sh
    
    waiting for root script to complete, press <enter> to continue checking.. q<enter> to quit
    bds-exa-install: root script seem to have succeeded, continuing with setup bds
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/install
    bds-exa-install: downloading JDK
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/install
    bds-exa-install: installing JDK tarball
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql/jdk1.8.0_66/jre/lib/security
    bds-exa-install: Copying JCE policy jars
    /bin/mkdir: cannot create directory `bigdata_config/mycluster1': File exists
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql/jlib
    bds-exa-install: removing old oracle bds jars if any
    bds-exa-install: downloading oracle bds jars
    bds-exa-install: installing oracle bds jars
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql
    bds-exa-install: downloading               : hadoop-2.6.0-cdh5.5.1.tar.gz
    bds-exa-install: downloading               : hive-1.1.0-cdh5.5.1.tar.gz
    bds-exa-install: unpacking                 : hadoop-2.6.0-cdh5.5.1.tar.gz
    bds-exa-install: unpacking                 : hive-1.1.0-cdh5.5.1.tar.gz
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql/hadoop-2.6.0-cdh5.5.1/lib
    bds-exa-install: downloading               : cdh-ol6-native.tar.gz
    bds-exa-install: creating /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql/hadoop_mycluster1.env for hdfs/mapred client access 
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql
    bds-exa-install: creating bds property files
    bds-exa-install: working directory         : /u03/app/oracle/product/12.1.0/dbhome_1/bigdatasql/bigdata_config
    bds-exa-install: created bigdata.properties
    bds-exa-install: created  bigdata-log4j.properties
    bds-exa-install: creating default and cluster directories needed by big data external tables
    bds-exa-install: note this will grant default and cluster directories to public!
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_catcon_29579.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    bds-exa-install: granted default and cluster directories to public!
    bds-exa-install: mta set to use listener end point : EXTPROC1521
    bds-exa-install: mta will be setup
    bds-exa-install: creating /u03/app/oracle/product/12.1.0/dbhome_1/hs/admin/initbds_orcl_mycluster1.ora
    bds-exa-install: mta setting agent home as : /u03/app/oracle/product/12.1.0/dbhome_1/hs/admin
    bds-exa-install: mta shutdown              : bds_orcl_mycluster1
    bds-exa-install: registering crs resource  : bds_orcl_mycluster1
    bds-exa-install: using dependency db resource of orcl
    bds-exa-install: starting crs resource     : bds_orcl_mycluster1
    CRS-2672: Attempting to start 'bds_orcl_mycluster1' on 'mycluster1bda09'
    CRS-2676: Start of 'bds_orcl_mycluster1' on 'mycluster1bda09' succeeded
    NAME=bds_orcl_mycluster1
    TYPE=generic_application
    TARGET=ONLINE
    STATE=ONLINE on mycluster1bda09
    
    bds-exa-install: patching view LOADER_DIR_OBJS
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_catcon_30123.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    bds-exa-install: creating mta dblinks
    bds-exa-install: cluster name              : mycluster1
    bds-exa-install: extproc sid               : bds_orcl_mycluster1
    bds-exa-install: cdb                       : true
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_dbcluster_dropdblink_catcon_30153.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_dbcluster_dropdblink*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_dbcluster_dropdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_default_dropdblink_catcon_30179.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_default_dropdblink*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_default_dropdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_dbcluster_createdblink_catcon_30205.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_dbcluster_createdblink*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_dbcluster_createdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_default_createdblink_catcon_30231.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_default_createdblink*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_default_createdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_catcon_30257.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_catcon_30283.lst
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u03/app/oracle/product/12.1.0/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    bds-exa-install: setup script completed all steps
    

    For additional details see "Running the bds-exa-install Script".

  7. Repeat step 6 for each database instance, if you have a multi instance database.

When the script completes, the following items including Oracle Big Data SQL is available and running on the database instance. However, if events cause the Oracle Big Data SQL agent to stop, then you must restart it. See "Starting and Stopping the Big Data SQL Agent".

  • Oracle Big Data SQL directory and configuration with jar, and environment and properties files.

  • Database dba_directories.

  • Database dblinks.

  • Database big data spfile parameter.

    For example, you can verify the dba_directories from the SQL prompt as follows:

    SQL> select * from dba_directories where directory_name like '%BIGDATA%';
    

2.2.2.1 Running the bds-exa-install Script

The bds-exa-install script generates a custom installation script that is run by the owner of the Oracle home directory. That secondary script installs all the files need by Oracle Big Data SQL into the $ORACLE_HOME/bigdatasql directory. For Oracle NoSQL Database support, it installs the client library (kvclient.jar). It also creates the database directory objects, and the database links for the multithreaded Oracle Big Data SQL agent.

2.2.2.2 bds-exa-install Syntax

The following is the bds-exa-install syntax:

Usage: bds-exa-install oracle-sid=<orcl>
           (
                --version
                --info
                --root-script-only
                --uninstall-as-primary
                --uninstall-as-secondary
                --install-as-secondary
                --jdk-home=<dir>
                --grid-home=<dir>
           )*

Options
   --version
      Prints script version.
   --info   
      Print information such as cluster name, CM host, Oracle Big Data Appliance HTTP server.
  --root-script-only. 
      Only generate the root script.
   --uninstall-as-primary
      Uninstall scaj31cdh, including hadoop client jars.
      Note: after this any secondary clusters needs to be reinstalled.
  --uninstall-as-secondary
      Attempt to uninstall scaj31cdh as a secondary cluster. 
  --install-as-secondary  
      Default = false. 
      Do not install client libraries, etc. The primary cluster will not be affected.
  --jdk-home  
      For example: /opt/oracle/bd_cell12.1.2.0.100_LINUX.X64_150912.1/jdk
  --grid-home  
      Oracle Grid Infrastructure home.
      For example: "/opt/oracle/bd_cell12.1.2.0.100_LINUX.X64_150912.1/../grid"

2.2.2.3 Troubleshooting Running bds-exa-install Script

In case of problems running the install script on Exadata, perform the following steps and open an SR with Oracle Support with the details:

  1. Collect the debug output by running the script in a debug mode as follows:

    $ ./bds-exa-install.sh --db-name=<dbresource> --grid-home=<grid home>  --root-script=false --debug
    OR
    $ ./bds-exa-install.sh --root-script=false --debug
    
  2. Collect the Oracle Database version as follows:

    1. Collect the result of opatch lsinventory from RDBMS-RAC Home.

    2. Collect the result of opatch lsinventory from Grid Home

  3. Result of the following SQL statement to confirm that the Datapatch is set up.

    SQL> select patch_id, patch_uid, version, bundle_series, bundle_id, action, status from dba_registry_sqlpatch;
    
  4. Collect the information from the following environment variables:

    • $ORACLE_HOME

    • $ORACLE_SID

    • $GI_HOME

    • $TNS_ADMIN

  5. Result of running lsnrctl status command.

2.2.3 About Data Security with Oracle Big Data SQL

Oracle Big Data Appliance already provides numerous security features to protect data stored in a CDH cluster on Oracle Big Data Appliance:

  • Kerberos authentication: Requires users and client software to provide credentials before accessing the cluster.

  • Apache Sentry authorization: Provides fine-grained, role-based authorization to data and metadata.

  • HDFS Transparent Encryption: Protects the data on disk and at rest. Data encryption and decryption is transparent to applications using the data.

  • Oracle Audit Vault and Database Firewall monitoring: The Audit Vault plug-in on Oracle Big Data Appliance collects audit and logging data from MapReduce, HDFS, and Oozie services. You can then use Audit Vault Server to monitor these services on Oracle Big Data Appliance

Oracle Big Data SQL adds the full range of Oracle Database security features to this list. You can apply the same security policies and rules to your Hadoop data that you apply to your relational data.

2.2.4 Enabling Oracle Big Data SQL Access to a Kerberized Cluster

In order to give Oracle Big Data SQL access to HDFS data on a Kerberos-enabled cluster, make each Oracle Exadata Database Machine that needs access a Kerberos client. Also run kinit on the oracle account on each cluster node and Exadata Database Machine to ensure that the account is authenticated by Kerberos. There are two situations where this procedure is required:

  • When enabling Oracle Big Data SQL on a Kerberos-enabled cluster.

  • When enabling Kerberos on a cluster where Oracle Big Data SQL is already installed.

Note:

Oracle Big Data SQL queries will run on the Hadoop cluster as the owner of the Oracle Database process (i.e. the oracle user). Therefore, the oracle user needs a valid Kerberos ticket in order to access data. This ticket is required for every Oracle Database instance that is accessing the cluster. A valid ticket is also need for each Big Data SQL Server process running on the Oracle Big Data Appliance. Run kinit oracle to obtain the ticket.

These steps enable the operating system user to authenticate with the kinit utility before submitting Oracle SQL Connector for HDFS jobs. The kinit utility typically uses a Kerberos keytab file for authentication without an interactive prompt for a password.

  1. On each node of the cluster:

    1. Log in as the oracle user.

    2. Run kinit on the oracle account.

      $ kinit oracle
      
    3. Enter the Kerberos password.

  2. Log on to the primary node and then stop and restart Oracle Big Data SQL.

    $ bdacli stop big_data_sql_cluster
    $ bdacli start big_data_sql_cluster
    
  3. On all Oracle Exadata Database Machines that need access to the cluster:

    1. Copy the Kerberos configuration file /etc/krb5.conf from the node where Mammoth is installed to the same path on each Oracle Exadata Machine.

    2. Run kinit on the oracle account and enter the Kerberos password.

    3. Re-run the Oracle Big Data SQL post-installation script

      $ ./bds-exa-install.sh
      

Avoiding Kerberos Ticket Expiration

The system should run kinit on a regular basis, before letting the Kerberos ticket expire, to enable Oracle SQL Connector for HDFS to authenticate transparently. Use cron or a similar utility to run kinit. For example, if Kerberos tickets expire every two weeks, then set up a cron job to renew the ticket weekly.

2.2.5 Starting and Stopping the Big Data SQL Agent

The Big Data SQL agent on the database is managed by Oracle Clusterware. The agent is registered with Oracle Clusterware during Big Data SQL installation to automatically start and stop with the database. To check the status, you can run mtactl check from the Oracle Grid Infrastructure home or Oracle Clusterware home:

# mtactl check bds_databasename_clustername

2.3 Installing Oracle Big Data SQL on Other Hadoop Systems

Oracle Big Data SQL is deployed using the services provides by the cluster management server. The installation process uses the management server API to register the service and start the deployment task. From there, the management server controls the process.

After installing Big Data SQL on the cluster management server, use the tools provided in the bundle to generate an installation package for the database server side.

2.3.1 Downloading Oracle Big Data SQL

You can download Oracle Big Data SQL from the Oracle Software Delivery Cloud

  1. On the cluster management server, create a new directory or choose an existing one to be the installation source directory.
  2. Log in to the Oracle Software Delivery Cloud.
  3. Search for Oracle Big Data SQL.
  4. Select Oracle Big Data SQL 3.0.0.0.0 for Linux x86-64.
  5. Read and agree to the Oracle Standard Terms and Restrictions.
  6. From the list of three files, select only one – the installer that is appropriate for your Hadoop system. You do not need V137415-01.zip.
    Big Data SQL (3.0)      
            V137415-01.zip Oracle Big Data SQL 3.0.0 cell software only                        440.6 MB
            V137419-01.zip Oracle Big Data SQL 3.0.0 installer for Cloudera Enterprise         625.4 MB
            V137420-01.zip Oracle Big Data SQL 3.0.0 installer for Hortonworks Data Platform   625.4 MB
    
  7. Download the file and extract the contents.
Your product bundle should include the content listed the table below.

Table 2-1 Oracle Big Data SQL Product Bundle Inventory

File Description
setup-bds Cluster-side installation script
bds-config.json Configuration file.
api_env.sh Setup REST API environment script
platform_env.sh BDS service configuration script
BIGDATASQL-1.0.jar CSD file (in the Cloudera product bundle only)
bin/json-select JSON-select utility
db/bds-database-create-bundle.sh Database bundle creation script
db/database-install.zip Database side installation files
repo/BIGDATASQL-1.0.0-el6.parcel Parcel file (in the CDH product bundle only)
repo/manifest.json Hash key for the parcel file (in the CDH product bundle only)
BIGDATASQL-1.0.0-el6.stack Stack file (in the HDP product bundle only)
setup-db.sh Script to acquire cluster information (Currently used in the manual portion of the HDP cluster-side installation.)

2.3.2 Prerequisites for Installing on an HDP Cluster

The following are required in order to install Oracle Big Data SQL on the Hortonworks Hadoop Data Platform (HDP).

Services Running

The following services must be running at the time of the Big Data SQL installation

  • HDP 2.3

  • Ambari 2.1.0

  • HDFS 2.7.1

  • YARN 2.7.1

  • Zookeeper 3.4.6

  • Hive 1.2.1

  • Tez 0.7.0

Packages

The following packages must be pre-installed before installing Big Data SQL.

  • JDK version 1.7 or later

  • Python version 2.6.

    OpenSSL version 1.01 build 16 or later

System Tools

  • curl

  • rpm

  • scp

  • tar

  • unzip

  • wget

  • yum

Environment Settings

The following environment settings are required prior to the installation.

  • ntp enabled

  • iptables disabled

  • Ensure that /usr/java/default exists and is linked to the appropriate Java version. To link it to the latest Java version, perform the following as root :

    $ ln -s /usr/java/latest /usr/java/default
    

Access Control Settings

The following user and group must exist.

  • oracle user

  • oinstall group

The oracle user must be a member of the oinstall group.

If Oracle Big Data SQL is Already Installed

If the Ambari Web GUI shows that Big Data SQL service is already installed, make sure that all Big Data SQL Cell components are stopped before reinstalling. (Use the actions button, as with any other service.)

2.3.3 Prerequisites for Installing on a CDH Cluster

The following conditions must be met when installing Oracle Big Data SQL on a CDH cluster that is not part of an Oracle Big Data Appliance.

Note:

The installation prerequisites as well as the procedure for installing Oracle Big Data SQL on the Oracle Big Data Appliance are different from process used for installations on other CDH systems. See Installing On Oracle Big Data Appliance and the Oracle Exadata Database Machine if you are installing on Oracle Big Data Appliance.

Services Running

The following services must be running at the time of the Oracle Big Data SQL installation

  • Cloudera’s Distribution including Apache Hadoop (CDH) 5.5 and higher

  • HDFS 2.6.0

  • YARN 2.6.0

  • Zookeeper 3.4.5

  • Hive 1.1.0

Packages

The following packages must be pre-installed before installing Oracle Big Data SQL. The Oracle clients are available for download on the Oracle Technology Network.

  • JDK version 1.7 or later

  • Oracle Instant Client – 12.1.0.2 or higher, e.g. oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm

  • Oracle Instant JDBC Client – 12.1.0.2 or higher, e.g. oracle-instantclient12.1-jdbc-12.1.0.2.0-

  • PERL LibXML – 1.7.0 or higher, e.g. perl-XML-LibXML-1.70-5.el6.x86_64.rpm

  • Apache log4j

System Tools

  • unzip

  • finger

  • wget

Environment Settings

The following environment settings are required prior to the installation.

  • Ensure that /usr/java/default exists and is linked to the appropriate Java version. To link it to the latest Java version, perform the following as root :

    $ ln -s /usr/java/latest /usr/java/default
    
  • The path to the Java binaries must exist in /usr/java/latest.

  • The default path to Hadoop libraries must be in /opt/cloudera/parcels/CDH/lib/.

Access Control Settings

The following user and group must exist.

  • oracle user

  • oinstall group

The oracle user must be a member of the oinstall group.

Settings to Save Before the Installation

If resource management is enabled on Cloudera Manager, then before installing Big Data SQL, save YARN's resource management configuration so that it can be restored to the original state if Big Data SQL is uninstalled later.

2.3.4 Installation Overview

The Oracle Big Data SQL installation consists of two stages.

  • Cluster-side installation:

    • Deploys binaries along the cluster.

    • Configures Linux and network settings for the service on each cluster node.

    • Configures the service on the management server.

    • Acquires cluster information for configure database connection.

    • Creates database bundle for the database side installation.

  • Oracle Database server-side installation:

    • Copies binaries into database node.

    • Configures network settings for the service.

    • Inserts cluster metadata into database.

2.3.5 Installing on the Hadoop Cluster Management Server

The first step of the Oracle Big Data SQL installation is to run the installer on the Hadoop cluster management server (where Cloudera Manager runs on a CDH system or Ambari runs on an HDP system). As post-installation task on the management server, you then run the script that prepares the installation bundle for the database server.

There are three tasks to perform on the cluster manager server:
  • Extract the files from BIGDATASQL product bundle saved from the download (either BigDataSQL-CDH-<version>.zip or BigDataSQL-HDP-<version>.zip) then configure and run the Oracle Big Data SQL installer found within the bundle. This installs Oracle Big Data SQL on the local server.

  • Run the database bundle creation script. This script generates the database bundle file that you will run on the Oracle Database server in order to install Oracle Big Data SQL there.

  • Check the parameters in the database bundle file and adjust as needed.

After you have checked and (if necessary) edited the database bundle file, copy it over to the Oracle Database server and run it as described in Installing on the Oracle Database Server

Install Big Data SQL on the Cluster Management Server

To install Big Data SQL on the cluster management server:

  1. Copy the appropriate zip file (BigDataSQL-CDH-<version>.zip or BigDataSQL-HDP-<version>.zip) to a temporary location on the cluster management server.

  2. Unzip file.

  3. Change directories to either BigDataSQL-HDP-<version> or BigDataSQL-CDH-<version>, depending up on which platform you are working with.

  4. Edit the configuration file.

    Table 2–4 below describes the use of each configuration parameter.

    • For CDH, edit bds-config.json , as in this example. Any unused port will work as the web server port.

      {
        "CLUSTER_NAME" : "cluster",
        "CSD_PATH" : "/opt/cloudera/csd",
        "DATABASE_IP" : "10.12.13.14/24",
        "REST_API_PORT" : "7180",  
        "WEB_SERVER_PORT" : "81",
      }
      
    • For HDP, edit bds-config.json as in this example:

      {
        "CLUSTER_NAME" : "clustername",
        "DATABASE_IP" : "10.10.10.10/24",
        "REST_API_PORT" : "8080",
      }
      

    DATABASE_IP must be the correct network interface address for the database node where you will perform the installation. You can confirm this by running /sbin/ip -o -f inet addr show on the database node.

  5. Obtain the cluster administrator user ID and password and then as root run setup-bds. Pass it the configuration file name as an argument (bds-config.json ). The script will prompt for the administrator credentials and then install BDS on the management server.

    $ ./setup-bds bds-config.json  
    

Table 2-2 Configuration Parameters for setup-bds

Configuration Parameter Use Applies To
CLUSTER_NAME The name of the cluster on the Hadoop server. CDH, HDP
CSD_PATH Location of Custom Service Descriptor files. CDH only
DATABASE_IP

The IP address of the Oracle Database server that will make connection requests. The address must include the prefix length (as in 100.112.10.36/24). Although only one IP address is specified in the configuration file, it is possible to install the database-side software on multiple database servers by using a command line parameter to override DATABASE_IP at installation time. (See the descrption of --ip-cell in Table 2–6.)

CDH, HDP
REST_API_PORT The port where the cluster management server listens for requests. CDH, HDP
WEB_SERVER_PORT A port assigned temporarily to a repository for deployment tasks during installation. This can be any port where the assignment does not conflict with cluster operations. ` CDH only.

Important

Be sure that the address provided for DATABASE_IP is the correct address of a network interface on the database server and is accessible from each DataNode of the Hadoop system, otherwise the installation will fail. You can test that the database IP replies to a ping from each DataNode. Also, currently the address string (including the prefix length) must be at least nine characters long.

If the Oracle Big Data SQL Service Immediately Fails

If Ambari or Configuration Manager reports an Oracle Big Data SQL service failure immediately after service startup, do the following.

  1. Check the cell server (CELLSRV) log on the cluster management server for the following error at the time of failure:

    ossnet_create_box_handle: failed to parse ip : <IP Address>
    
  2. If the IP address in the error message is less than nine characters in length, for example, 10.0.1.4/24, then on the cluster management server, find this address in /opt/oracle/bd_cell/cellsrv/deploy/config/cellinit.ora. Edit the string by padding one or more of the octets with leading zeros to make the total at least nine characters in length, as in:

    ipaddress1=10.0.1.004/24
    
  3. Restart the Oracle Big Data SQL service.

The need for this workaround will be eliminated in a subsequent Oracle Big Data SQL release.

2.3.6 Creating the Database-Side Installation Bundle

On the cluster management server, run the database bundle creation script from the Oracle Big Data SQL download to create an installation bundle to install the product on the Oracle Database server. If some of the external resources that the script requires are not accessible from the management server, you can add them manually.

The database bundle creation script attempts to download the following:

  • Hadoop and Hive client tarballs from Cloudera or Hortonworks repository web site.

  • Configuration files for Yarn and Hive from the cluster management server, via Cloudera Manager (for the CDH versions) or Ambari (for the HDP versions).

  • For HDP only, HDFS and MapReduce configuration files from Ambari.

  1. Change directories to BigDataSQL-CDH-<version>/db or (BigDataSQL-HDP-<version>/db).

  2. Run the BDS database bundle creation script. See the table below for optional parameters that you can pass to the script in order to override any of the default settings.

    $ bds-database-create-bundle.sh <optional parameters> 
    

    The message below is returned if the operation is successful.

          bds-database-create-bundle: database bundle creation script completed all steps
    

The database bundle file includes a number of parameters. You can change any of these parameters as necessary. Any URLs specified must be accessible from the cluster management server at the time you run bds-database-create-bundle.sh.

Table 2-3 Command Line Parameters for bds-database-create-bundle.sh

Parameter Value
--hadoop-client-ws Specifies an URL for the Hadoop client tarball download or bypass download of this client.
--no-hadoop-client-ws
--hive-client-ws Specifies an URL for the Hive client tarball download or bypass download of this client.
--no-hive-client-ws
--yarn-conf-ws Specifies an URL for the YARN configuration zip file download or bypass this download.
--no-yarn-conf-ws
--hive-conf-ws Specifies an URL for the Hive configuration zip file download or bypass this download.
--no-hive-conf-ws
--ignore-missing-files Create the bundle file even if some files are missing.
--clean-previous Deletes previous bundle files and directories from bds-database-install/
--script-only Only creates the script database installation file.
--hdfs-conf-ws Specify an URL for the HDFS configuration zip file download or bypass this download (HDP only).
--no-hdfs-conf-ws
--mapreduce-conf-ws Specify an URL for the MapReduce configuration zip file download or bypass this download (HDP only).
--no-mapreduce-conf-ws

Note:

In Big Data SQL 3.0, bds-database-create-bundle.sh does not include a command line parameter to override the default JDK (jdk-8u66-linux-x64). To include a different version of the JDK, run bds-database-create-bundle.sh twice, as follows.
  1. Run bds-database-create-bundle.sh to generate some files that you will edit or replace.

  2. Remove the existing JDK file, /bds_database_install/jdk-8u66-linux-x64.tar.gz.

  3. Remove the bds-database-install.zip bundle file generated by this first run of the script.

  4. Manually download the JDK tarball from Oracle Technology Network.. Copy it into /bds_database_install.

  5. Edit bds-database-install/db/create-bundle.env. Update the $jdktar environment variable to match the JDK you downloaded. For example: jdktar=jdk-8u77-linux-x64.tar.gz

  6. Run bds-database-create-bundle.sh again to generate a new database bundle.

Manually Adding Resources if Download Sites are not Accessible to the BDS Database Bundle Creation Script

If one or more of the default download sites is inaccessible from the cluster management server, there are two ways around this problem:

  • Download the files from another server first and then provide bds-database-create-bundle.sh with the alternate path as an argument. For example:

    $ ./bds-database-create-bundle.sh --yarn-conf-ws='http://nodexample:1234/config/yarn'
    
  • Because the script will first search locally in /bds-database-install for resources, you can download the files to another server, move the files into /bds-database-install on the cluster management server and then run the bundle creation script with no additional argument. For example:

    $ cp hadoop-xxxx.tar.gz bds-database-install/
    $ cp hive-xxxx.tar.gz bds-database-install/
    $ cp yarn-conf.zip bds-database-install/
    $ cp hive-conf.zip bds-database-install/
    $ cd db
    $ ./bds-database-create-bundle.sh
    

Copying the Database Bundle to the Oracle Database Server

Use scp to copy the database bundle you created to the Oracle Database server. In the example below, dbnode is the database server. The Linux account and target directory here are arbitrary. Use any account authorized to scp to the specified path.

$ scp bds-database-install.zip oracle@dbnode:/home/oracle

The next step is to log on to the Oracle Database server and install the bundle.

2.3.7 Installing on the Oracle Database Server

Oracle Big Data SQL must be installed on both the Hadoop cluster management server and the Oracle Database server. This section describes the database server installation.

Prerequisites for Installing on an Oracle Database Server

The information in this section does not apply to the installation of Oracle Big Data SQL on an Oracle Exadata Database Machine connected Oracle Big Data Appliance.

Important

For multi-node databases, you must repeat this installation on every node of the database. For each node, you may need to modify the DATABASE_IP parameter of the installation bundle in order to identify the correct network interface. This is described in the section, If You Need to Change the Configured Database_IP Address


Required Software

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for supported Linux distributions, Oracle Database release levels, and required patches.

Note:

Be sure that the correct Bundle Patch and one-off patch have been pre-applied before starting this installation. Earlier Bundle patches are not supported for use Big Data SQL 3.0 at this time.

Recommended Network Connections to the Hadoop Cluster

Oracle recommends Ethernet connections between Oracle Database and the Hadoop cluster of 10Gb/s Ethernet.

Extract and Run the Big Data SQL Installation Script

Perform the procedure in this section as the oracle user, except where sudo is indicated.

  1. Check that /etc/oracle/cell/network-config/cellinit.ora exists. If not, do the following to create it and add the database server IP address:

    1. Create the network-config directory and set permissions:

      $ sudo mkdir -p /etc/oracle/cell/network-config/
      $ sudo chown oracle:dba /etc/oracle/cell/network-config
      $ sudo chmod ug+wx /etc/oracle/cell/network-config
      
    2. Find the private IP address of the database server (through inet addr or other means)

      The address must include the prefix length, as in 100.112.10.36/24.

    3. Create cellinit.ora under network-config and add the IP address as the value of ipaddress1 as shown below. Also be sure to include the two lines that follow.

      ipaddress1=100.112.10.36/24
      _skgxp_ant_options=1
      _skgxp_dynamic_protocol=2
      
  2. Locate the database bundle zip file that you copied over from the cluster management server.

  3. Unzip the bundle into a temporary directory.

  4. Change directories to bds-database-install, which was extracted from the zip file.

  5. Run bds-database-install.sh. Note the optional parameters listed in Table 2-4

Table 2-4 Optional Parameters for bds-database-install.sh

Parameter Function
--version Show the bds-database-install.sh script version.
--info Show information about the cluster.
--ip-cell Set a particular IP address for db_cell process. See If You Need to Change the Configured Database_IP Address below
--install-as-secondary Specify secondary cluster installation.
--uninstall-as-primary Uninstall Oracle Big Data SQL from the primary cluster.
--uninstall-as-secondary Uninstall Oracle Big Data SQL from a secondary cluster.
--jdk-home Specify the JDK home directory.
--grid-home Specify the Grid home directory.
--db-name Specify the Oracle Database SID.
--debug Activate shell trace mode. If you report a problem, Oracle Support may want to see this output.

If You Need to Change the Configured Database_IP Address

The DATABASE_IP parameter in the bds-config.json file identifies the network interface of the database node. If you run bds-database-install.sh with no parameter passed in, it will search for that IP address (with that length, specifically) among the available network interfaces. You can pass the ––ip-cell parameter to bds-database-install.sh in order to override the configured DATABASE_IP setting:

$ ./bds-database-install.sh --ip-cell=10.20.30.40/24

Possible reasons for doing this are:

  • bds-database-install.sh terminates with an error. The configured IP address (or length) may be wrong.

  • There is an additional database node in the cluster and the defined DATABASE_IP address is not a network interface of the current node.

  • The connection is to a multi-node database. In this case, perform the installation on each database node. On each node, use the ––ip-cell parameter to set the correct DATABASE_IP value.

To determine the correct value for ip-cell, you can use list all network interfaces on a node as follows:
/sbin/ip -o -f inet addr show

2.3.8 Uninstalling Oracle Big Data SQL

The steps for uninstalling Oracle Big Data SQL from HDP and from CDH systems are different.

Uninstalling the Software from an HDP Hadoop Cluster

  1. In the Ambari web interface, stop the Big Data SQL service. All components on all DataNodes must be stopped.

  2. On the Ambari command line, delete the Big Data SQL service using a REST API call.

    curl --user admin:admin -H 'X-Requested-By:<user>' -X DELETE http://<ambari_server_fqdn>:<rest_api_port>/api/v1/clusters/<cluster_name>/services/BIGDATASQL
    
  3. On each DataNode, find and kill any Oracle Big Data SQL processes that are running.

    # ps -fea | grep bds
    # kill -9 <pid>
    
  4. On the Ambari command line, remove the BIGDATASQL stack from the services.

    # rm -rf /var/lib/ambari-server/resources/stacks/HDP/<version>/services/BIGDATASQL
    
  5. On each DataNode, remove the bd_cell RPM.

    # yum remove -y bd_cell
    
  6. On all DataNodes, remove the following directories.

    # rm -rf /opt/oracle/bd_cell
    # rm -rf /opt/oracle/bigdatasql
    # rm -rf /tmp/bigdatasql
    # rm -rf /var/log/oracle
    
  7. On the Ambari command line, restart Ambari.

    # ambari-server restart
    

Uninstalling Oracle Big Data SQL From a CDH Hadoop Cluster

To uninstall Big Data SQL from a CDH cluster (that is not hosted on Oracle Big Data Appliance), follow these steps:

  1. In the Cloudera Manager GUI, do the following:

    1. Stop the Big Data SQL service. All instances on all DataNodes must be stopped.

    2. Delete the service from the cluster.

    3. Deactivate and remove the parcel from all hosts.

    4. Delete the parcel.

    5. If resource management was not enabled on Cloudera Manager before Big Data SQL installation, disable the option Cgroup-based Resource Management and restart the YARN service.

  2. On each DataNode, kill any bds processes.
    # ps -fea | grep bds
    # kill -9 <pid>
    
  3. On the Cloudera Manager command line, remove the Big Data SQL JAR from the csd directory. (The default location is /opt/cloudera/csd, but this may differ.)

    # rm -f /opt/cloudera/csd/BIGDATASQL-1.0.jar
    
  4. On each DataNode, do the following:

    1. Remove the bd_cell RPM.

      # yum remove -y bd_cell
      
    2. Remove all Oracle Big Data SQL directories.

      # rm -rf /opt/oracle/bd_cell
      # rm -rf /opt/oracle/bigdatasql
      # rm -rf /tmp/bigdatasql
      # rm -rf /var/log/oracle
      

2.3.9 Securing Big Data SQL

Procedures for securing Oracle Big Data SQL on Hortonworks HDP and on CDH-based systems other than Oracle Big Data Appliance are not covered in this version of the guide. Please review the MOS documents referenced in this section for more information.

2.3.9.1 Big Data SQL Communications and Secure Hadoop Clusters

Please refer to MOS Document 2123125.1 at My Oracle Support for guidelines on securing Hadoop clusters for use with Big Data SQL.

2.3.9.2 Setting up Oracle Big Data SQL and Oracle Secure External Password Store

See MOS Document 2126903.1 for changes required in order to use Oracle Secure External Password Store with Oracle Big Data SQL