8 Additional Tools Installed

8.1 Copy to Hadoop and OHSH

Copy to Hadoop

Copy to Hadoop makes it easy to identify and copy data from an Oracle Database table to the Apache Hadoop Distributed File System (HDFS) and create an Apache Hive external table over that data. In HDFS, the data can be saved in Oracle Data Pump files (which store data as Oracle data types). This provides an exact copy of the data in the database, without the potential for data type conversion errors. You can then query the data via Hive (and Hadoop technologies that access data through Hive) and by Oracle Big Data SQL. Oracle Data Pump data can be converted to Parquet or ORC using Hive

Oracle Shell for Hadoop Loaders (OHSH)

OHSH is an intuitive command line tool to move data between Hadoop and Oracle Database. It provides a set of declarative commands you can use to copy contents from an Oracle Database table to a Hive table.

OHSH is bundled with both Oracle Big Data SQL and Oracle Big Data Connectors. With Oracle Big Data SQL, OHSH works with Copy to Hadoop. With Oracle Big Data Connectors, OHSH provides an interface to copy contents from Hadoop and Hive to Oracle tables using Oracle Loader for Hadoop (OLH) and Oracle SQL Connector for Hadoop Distributed File System (OSCH).

Although OHSH can serve as a front end for OLH and OSCH as well as for Copy to Hadoop, OLH and OSCH are part of the Oracle Big Data Connectors product and are not installed by Oracle Big Data SQL.

Installation and Configuration of Copy to Hadoop and OHSH

You can use Copy to Hadoop and OHSH from any of these environments:

  • A Hadoop cluster node

  • An Oracle Database node

  • An edge node.

Oracle Big Data SQL installs the kits for Copy to Hadoop and Oracle Shell for Hadoop Loaders on the Hadoop cluster management server and on any Oracle Database server node where you install Oracle Big Data SQL.

You can also manually copy the kits for both tools from the Hadoop cluster management server to an edge node and configure them for use from the edge node. The kits are stored at /opt/oracle/orahivedp-<version>.. See /opt/oracle/ohsh-<version>/doc/OHSH_EDGE_NODE_README.txt on the Hadoop cluster management server for instructions.

Note:

In the Oracle Big Data SQL installation, the Copy to Hadoop is fully automated except for the setting of the CP2HADOOP_HOME listed in the table below. You must set this manually.

You must complete configuration of Oracle Shell for Hadoop Loaders manually, as described in this section.

Environment Variables for OHSH

On the Hadoop cluster management server, the OHSH configuration file is/opt/oracle/ohsh-<version>/bin/ohsh_config.sh. On an Oracle Database node, it is $ORACLE_HOME/ohsh/bin/ohsh_config.sh. This file contains environment required by OHSH. The set of variables needed the installation on a Hadoop cluster, Oracle Database Node or an edge node differ. The table below shows which variables you need to set for each environment. Some of the required variables are set for you automatically by the Oracle Big Data SQL installer.

8.1.1 Completing the OHSH Configuration on Oracle Database Nodes

The Oracle Big Data SQL installation automatically copies the OHSH installation kit to each node where Oracle Big Data SQL is installed. There are some manual steps you must perform to complete the installation.

Note:

The README file at $ORACLE_HOME/ohsh/doc/OHSH_DB_NODE_README.txt on the Hadoop cluster management server where you installed Oracle Big Data SQL describes the full set of steps ordinarily needed to enable OHSH on an Oracle Database node. However, when OHSH is installed by Oracle Big Data SQL, some of the setup is done for you by the installer. To complete the configuration in this context, use this shortened procedure instead of the full procedure documented in the README.

Also notice below that patch 23237037 is required for 12.1.0.2 (or earlier Oracle Database releases) only. Do not install this patch if you are running a later release of Oracle Database.

  1. Log on to the Oracle Database server as the oracle user.

  2. Set up OHSH to work with Copy to Hadoop.

    1. Create a staging directory in the file system on the Oracle Database node:

      $ mkdir <OS_STAGE_DIR>
    2. Create an Oracle directory on this staging directory and grant read/write privileges to the Oracle Database user (oracle or other). In this example, ORACLE_STAGE_DIR refers to the name of the directory object in Oracle Database. OS_STAGE_DIR refers to the path on the server.

      SQL> CREATE DIRECTORY <ORACLE_STAGE_DIR> AS '<OS_STAGE_DIR>';
      SQL> GRANT READ,WRITE ON DIRECTORY <ORACLE_STAGE_DIR> TO <ORACLE_USER>;

      Note:

      Later, if you employ the “stage” method to copy Oracle tables, use the OHSH set locationdirectory command to point to the Oracle directory object name (the name of the directory you created above in this step).
    3. If you are connecting to a CDH-based Hadoop system (Oracle Big Data Appliance or other), perform the substeps below to enable the OHSH “FUSE” method, in which you copy files to Hive tables over an HDFS Mountable FUSE directory. Note that HDP does not support Fuse, therefore you cannot create a mount point to HDFS on HDP using these steps.

      1. If you are using Oracle Database 12.1.0.2 or earlier, install patch 23237037 (available on My Oracle Support) on all database nodes.

      2. On the Oracle Database host, create a FUSE mount to the root of the HDFS system on the Hadoop cluster. Refer to Cloudera’s documentation on installing FUSE. You may find current instructions at Configuring Mountable HDFS on Cloudera’s website.

      3. Create the Oracle directory on the FUSE mount and grant privileges. In this example, ORACLE_FUSE_DIR refers to the name of the directory object in Oracle Database. OS_FUSE_MOUNT refers to the path to the mount point on the server.

              SQL> CREATE DIRECTORY <ORACLE_FUSE_DIR> AS '<OS_FUSE_MOUNT>';
              SQL> GRANT READ,WRITE ON DIRECTORY <ORACLE_FUSE_DIR> TO <ORACLE_USER>;

      Note that when you use OHSH later to copy Oracle tables to Hive using the "FUSE" method, use the OHSH set locationdirectory command to point to the same <ORACLE_FUSE_DIR>

  3. Edit $ORACLE_HOME/ohsh/bin/ohsh_config.sh in order to configure the home directories of dependencies. When OHSH is installed by Oracle Big Data SQL, most of the required variables are already set for you. Set HS2_HOST_PORT and (if needed) HIVE_SESS_VAR_LIST as follows.

    export HS2_HOST_PORT=<HS2_HOST>:<HS2_PORT>
    

    Also set HIVE_SESS_VAR_LIST to any required Hive session variables. For example:

    • To connect to HiveServer2 with Kerberos authentication: export HIVE_SESS_VAR_LIST="principal=<The server principal of HiveServer2>"

      The Hive principal  is specified by the configuration property hive.server2.authentication.kerberos.principal in hive-site.xml.

    • To connect to HiveServer2 running in HTTP mode: export HIVE_SESS_VAR_LIST="transportMode=http;httpPath=<The HTTP endpoint>"

    • To connect to HiveServer2 when SSL is enabled: export HIVE_SESS_VAR_LIST="ssl=true;sslTrustStore=<Trust Store path>;trustStorePassword=<Trust Store password>"

  4. If you want to configure OHSH to use Oracle Wallet, set environment the variables TNS_ADMIN and WALLET_LOCATION to the directory path where TNS and Oracle Wallet are stored.

    export WALLET_LOCATION="<wallet_location>"
    export TNS_ADMIN="<tns_admin_location>"

    Note:

    Later, when you start an OHSH session, if the TNS_ADMIN and WALLET_LOCATION are different on the Hadoop cluster then set hadooptnsadmin and hadoopwalletlocation:

    set hadooptnsadmin <tns_admin_hadoop_node>
    set hadoopwalletlocation <wallet_location_hadoop_node>
  5. Start an OHSH session and check for resources. This check indicates whether or not the configuration is correct.

    $ ohsh
    ohsh> show resources

    You should see the resources hadoop0, hive0, and bash0.

8.1.2 Completing the OHSH Configuration on the Hadoop Cluster

On the Hadoop cluster management server, Oracle Big Data SQL installs OHSH from an RPM to /opt/oracle/ohsh. Perform the steps below as root in order to complete the Hadoop side of the configuration.

Note:

The installation instructions at $ORACLE_HOME/ohsh/doc/OHSH_HADOOP_NODE_README.txt on the Hadoop cluster management server contain extra information that is not applicable to the installation of Copy to Hadoop and OHSH in the context of the Oracle Big Data SQL installation. Use the instructions below instead.
  1. If the Hadoop system is not an Oracle Big Data Appliance, then download the SQL*Plus Instant Client and Oracle Instant Client for Linux from the Oracle Technology Network. Select the client versions that matches the version of the Oracle Database.

    For example, you can find the client downloads for Oracle Database 12.2.0.1.0 at this address:http://www.oracle.com/technetwork/topics/linuxx86-64soft-092277.html

    Extract both packages into the same directory.

    Add the directory to PATH and LD_LIBRARY_PATH.

  2. Create a local directory for TNS and from TNS_ADMIN on the Oracle Database host, copy the tnsnames.ora and sqlnet.ora file into this directory.

    If you create an Oracle Wallet for the Oracle Database host, then also copy over the Oracle Wallet file from the database server. Also check that sqlnet.ora points to the correct Oracle Wallet location.
    WALLET_LOCATION= 
          (SOURCE=(METHOD=FILE)(METHOD_DATA= 
            (DIRECTORY=<your wallet directory>))) 
       sqlnet.wallet_override=true
  3. As root, edit /opt/oracle/ohsh/bin/ohsh_config.sh. Set the following environment variables.
    export CP2HADOOP_HOME=/opt/oracle/orahivedp-<version 
    export HS2_HOST_PORT=<HiveServer2 hostname>:<Port number>
    

    Also set HIVE_SESS_VAR_LIST to any required Hive session variables. For example:

    • To connect to HiveServer2 with Kerberos authentication: export HIVE_SESS_VAR_LIST="principal=<The server principal of HiveServer2>"

    • To connect to HiveServer2 running in HTTP mode: export HIVE_SESS_VAR_LIST="transportMode=http;httpPath=<The HTTP endpoint>"

    • To connect to HiveServer2 when SSL is enabled: export HIVE_SESS_VAR_LIST="ssl=true;sslTrustStore=<Trust Store path>;trustStorePassword=<Trust Store password>"

  4. If TNS and Oracle Wallet are enabled, also add the following variables to the file and export them:

    export WALLET_LOCATION="<your wallet directory>"
    export TNS_ADMIN="<your TNS directory>"
  5. Add /opt/oracle/ohsh and $HADOOP_HOME/bin to the PATH variable

    PATH=/opt/oracle/ohsh-<version>/bin:$PATH
    PATH=$HADOOP_HOME/bin:$PATH 
  6. Start an OHSH session and check for resources. This check indicates whether or not the configuration is correct.

    $ ohsh
    ohsh> show resources

    You should see the resources hadoop0, hive0, and bash0.

8.1.3 Getting Started Using Copy to Hadoop and OHSH

See $ORACLE_HOME/ohsh/doc and $ORACLE_HOME/ohsh/examples for examples that show how to run the OHSH load methods.

See the Oracle Big Data SQL User’s Guide more information on Copy to Hadoop and Oracle Shell for Hadoop Loaders.