4 Ease of Use Tools for Oracle Big Data Connectors

Oracle Big Data Connectors are a powerful toolset for data interchange between Hadoop and Oracle Database. Learning how to set up the environment for these tools and use them to full potential is worthwhile, but there are ways to get started and be productive without fully completing that learning curve. Oracle is developing a set of ease-of-use tools for this purpose.

Oracle Shell for Hadoop Loaders is the first ease-of-use tool for Oracle Big Data Connectors.

4.1 Introducing Oracle Shell for Hadoop Loaders

What is Oracle Shell for Hadoop Loaders?

Oracle Shell for Hadoop Loaders (OHSH) is a helper shell that provides an easy-to-use command line interface to Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and Copy to Hadoop. It has basic shell features such as command line recall, history, inheriting environment variables from the parent process, setting new or existing environment variables, and performing environmental substitution in the command line. 

The core functionality of Oracle Shell for Hadoop Loaders includes the following:

  • Defining named external resources with which Oracle Shell for Hadoop Loaders interacts to perform loading tasks.

  • Setting default values for load operations.

  • Running load commands.

  • Delegating simple pre and post load tasks to the Operating System, HDFS, Hive and Oracle. These tasks include viewing the data to be loaded, and viewing the data in the target table after loading.

See Also:

  • To set up OHSH, follow the instructions in the Oracle Big Data SQL Installation Guide.

  • The examples directory in the OHSH kit contains many examples that define resources and load data using Oracle Shell for Hadoop Loaders.  See <OHSH_KIT>/examples/README.txt for a description of the examples and instructions on how to run OHSH load methods.

4.1.1 Getting Started with Oracle Shell for Hadoop Loaders

Starting an OHSH Interactive Session

To start an interactive session, enter ohsh on the command line. This brings you to the OHSH shell (if you have ohsh in your path):

$ ohsh
ohsh>

You can execute OHSH commands in this shell (using the OHSH syntax). You can also execute commands for Beeline/Hive, Hadoop, Bash, and SQL*Plus. For non-OHSH commands, you add a delegation operator prefix (“%”) to the name of the resource used to execute the command. For example:

ohsh> %bash0 ls —l

Scripting OHSH

You can also script the same commands that work in the CLI. The ohsh command provides three parameters for working with scripts.

  • ohsh —i <filename>.ohsh

    The —i parameter tells OHSH to initialize an interactive session with the commands in the script before the prompt appears. This is a useful way to set up the required session resources and automate other preliminary tasks before you start working within the shell.

    $ ohsh –i initresources.ohsh
    
  • ohsh —f <filename>.ohsh

    The ohsh command with the —f parameter starts a non-interactive session and runs the commands in the script.

    $ ohsh –f myunattendedjobs.ohsh
    
  • ohsh —i —f <filename>.ohsh

    You can use —i and —f together to initialize a non-interactive session and then run another script in the session.

    $ ohsh -i mysetup.ohsh –f myunattendedjobs.ohsh
    
  • ohsh —c

    This command dumps all Hadoop configuration properties that an OHSH session inherits at start up.

Working With OHSH Resources

A resource is some named entity that OHSH interacts with. For example: a Hadoop cluster is a resource, as is a JDBC connection to an Oracle database, a Hive database, a SQL*Plus session with an Oracle database, and a Bash shell on the local OS.

OHSH provides two default resources at start up: hive0 (to connect to the default Hive database) and hadoop0.

  • Using hive0 resource to execute a Hive command:

    ohsh> %hive0 show tables;
    

    You can create additional Hive resources to connect to other Hive databases.

  • Using the hadoop0 resource to execute a Hadoop command:

    ohsh> %hadoop0 fs -ls
    

Within an interactive or scripted session, you can create instances of additional resources, such as SQL*Plus and JDBC. You need to create these two resources in order to connect to Oracle Database through OHSH.

  • Creating an SQL*Plus resource:

    ohsh> create sqlplus resource sql0 connectid=”bigdatalite.localdomain:1521/orcl”
    
  • Creating a JDBC resource:

    ohsh> create jdbc resource jdbc0 connectid=<database connection URL>
    
  • Showing resources:

    ohsh> show resources
    

    This command lists default resources and any additional resources created within the session.

Getting Help

The OHSH shell provides online help for all commands.

To get a list of all OHSH commands:

ohsh> help

To get help on a specific command, enter help, followed by the command:    

ohsh> help show

The table below describes the help categories available.

Help Command Description
help load Describes load commands for Oracle and Hadoop tables.
help set Shows help for setting defaults for load operations. It also describes what load methods are impacted by a particular setting.
help show Shows help for inspecting default settings.
help shell Shows shell-like commands.
help resource Show commands for creating and dropping named resources.

4.1.2 Third-Party Licenses for Bundled Software

Oracle Shell for Hadoop Loaders includes the following third-party products:

These software packages are licensed under the Apache 2.0 License

Unless otherwise specifically noted, or as required under the terms of the third party license (e.g., LGPL), the licenses and statements herein, including all statements regarding Apache-licensed code, are intended as notices only.

4.1.2.1 Apache Commons Exec 1.3

Include the following License ONLY ONCE in the documentation even if there are multiple products licensed under the license.

The following applies to all products licensed under the Apache 2.0 License:

You may not use the identified files except in compliance with the Apache License, Version 2.0 (the "License.").

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. A copy of the license is also reproduced below.

4.1.2.2 ANTLR 4.7

Copyright (c) 2015 Terence Parr, Sam Harwell

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

4.2 Using Oracle SQL Developer With Oracle Big Data Connectors

Oracle SQL Developer is an essentially a graphical version SQL*Plus. Among its features is a user-friendly interface to tools for moving data between Hadoop and Oracle Database. It includes support for Copy to Hadoop, Oracle Loader for Hadoop, and Oracle SQL Connector for Hadoop Distributed File System (as well as Oracle Big Data SQL). There are wizards in the interface to assist with use of all of of these tools.

Follow these steps to set up Oracle SQL Developer to work with Oracle Big Data Connectors.

  1. Download and install Oracle SQL Developer.

  2. Download the Hive JDBC Drivers.

  3. Add the new Hive JDBC Drivers to Oracle SQL Developer.

  4. Set environment variables required for Oracle Big Data Connectors.

  5. Set up the necessary connections.

After you have installed the drivers, configured the environment, and created connections between Oracle Database and Hadoop, you can start using Oracle Big Data Connectors from within Oracle SQL Developer.

Downloading and Installing Oracle SQL Developer

Install Oracle SQL Developer 4.2 or greater. Release 4.2 is recommended, because it is the first release to include support for Copy To Hadoop.

The installation is simple. Just download the package and extract it.

  1. Go to the Oracle SQL Developer download site on the Oracle Technology Network (OTN).

  2. Accept the license agreement and download the version that is appropriate for your platform.

  3. Extract the downloaded ZIP file to your local drive.

    You can extract to any folder name.

See Installing and Getting Started with SQL Developer in the Oracle SQL Developer User’s Guide for further installation and configuration details.

Downloading and Installing the Hive JDBC Drivers for Cloudera Enterprise

To connect Oracle SQL Developer to Hive in the Hadoop environment, you need to download and install the Hive JDBC drivers for Cloudera Enterprise. These drivers are not included in the Oracle SQL Developer download package.

Note for HDP Users:

At this time, SQL Developer 4.2 requires the Cloudera JDBC drivers for Hive. However, these drivers appear to work against Hortonworks clusters as well. HDP users should test to determine if these drivers meet their needs.
  1. Download the latest Cloudera JDBC drivers for Hive from the Cloudera website to any local directory.

    You can search for “cloudera hive jdbc drivers download” on the Cloudera website to locate the available driver packages.

    You are prompted to select the driver version, OS, and OS version (32/64 bit). At this time, the latest drive version is 2.5.18. You can choose the newest version available.

  2. Unzip the archive:

    unzip hive_jdbc_<version>.zip
    
  3. View the extracted content. Notice that under the top-level folder there are multiple ZIP files. Each is for a different JDBC version. For this setup, only JBDC 4.0 is usable. Select the JDBC4_ ZIP file (JDBC4_<version>.zip).

    Important:

    Choose only the JDBC4_ ZIP file, which contains the drivers for JDBC 4.0. This is the only compatible version. The drivers in other packages, such as JDBC41_*, are not compatible with SQL Developer 4.2 and will return errors upon connection attempts.
  4. Unzip the JDBC4 archive to a target directory that is accessible to Oracle SQL Developer, for example, ./home/oracle/jdbc :

    # unzip Cloudera_HiveJDBC4_<version>.zip -d /home/oracle/jdbc/
    

    The extracted content should be similar to this:

    Cloudera_HiveJDBC4_2.5.18.1050\Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf
    Cloudera_HiveJDBC4_2.5.18.1050\Cloudera-JDBC-Driver-for-Apache-Hive-Release-Notes.pdf
    Cloudera_HiveJDBC4_2.5.18.1050\commons-codec-1.3.jar
    Cloudera_HiveJDBC4_2.5.18.1050\commons-logging-1.1.1.jar
    Cloudera_HiveJDBC4_2.5.18.1050\HiveJDBC4.jar
    Cloudera_HiveJDBC4_2.5.18.1050\hive_metastore.jar
    Cloudera_HiveJDBC4_2.5.18.1050\hive_service.jar
    Cloudera_HiveJDBC4_2.5.18.1050\httpclient-4.1.3.jar
    Cloudera_HiveJDBC4_2.5.18.1050\httpcore-4.1.3.jar
    Cloudera_HiveJDBC4_2.5.18.1050\libfb303-0.9.0.jar
    Cloudera_HiveJDBC4_2.5.18.1050\libthrift-0.9.0.jar
    Cloudera_HiveJDBC4_2.5.18.1050\log4j-1.2.14.jar
    Cloudera_HiveJDBC4_2.5.18.1050\out.txt
    Cloudera_HiveJDBC4_2.5.18.1050\ql.jar
    Cloudera_HiveJDBC4_2.5.18.1050\slf4j-api-1.5.11.jar
    Cloudera_HiveJDBC4_2.5.18.1050\slf4j-log4j12-1.5.11.jar
    Cloudera_HiveJDBC4_2.5.18.1050\TCLIServiceClient.jar
    Cloudera_HiveJDBC4_2.5.18.1050\zookeeper-3.4.6.jar
    

Adding the new Hive JDBC Drivers to Oracle SQL Developer

Next, start up SQL Developer and copy all of the extracted driver files into “Third Party JDBC Drivers” in the Preferences window.

  1. Navigate to the folder where you downloaded and extracted Oracle SQL Developer.

  2. Click the sqldeveloper subfolder. Then, click sqldeveloper.exe in this folder.

  3. In the SQL Developer menu bar, select Tools>Preferences.

  4. In the file explorer of the Preferences window, expand Database and then click Third Party JDBC Drivers.

  5. Click Add Entry.

  6. Navigate to the folder where you sent the files extracted from Cloudera_HiveJDBC4_<version>.zip. Copy all of the JAR files from the ZIP extraction into this window and then click OK.

  7. Restart Oracle SQL Developer.

Setting up Environment Variables for Using Oracle Big Data Connectors With Oracle SQL Developer

SQL Developer with Oracle Big Data Connectors requires the user to make an SSH connection from SQL Developer to a Hadoop client, Hadoop node, or Hadoop edge node. The home directory of this account requires a specific environment file for each of the Big Data Connectors it runs

Note:

If you want to do staged copies in Copy to Hadoop, then Copy to Hadoop requires an additional SSH connection to the oracle OS account on the database system.

You must create and populate the environment files. The following table provides the exact filenames and the content you must add to each file. The file must be readable by the account using the Big Data Connector.

Environment File Name Contents
For Copy to Hadoop:
.sqldev_cp2hadoop_env
#!/bin/bash
# Environment file for Copy to Hadoop

export CP2HADOOP_HOME=<Parent directory of the directory containing Copy to Hadoop JARs>  
# On Oracle Big Data Appliance, the Copy to Hadoop JARs are in /opt/oracle/bigdatasql/bdcell-12.1
export HADOOP_CLASSPATH=${CP2HADOOP_HOME}/jlib/*
# ----------------------------------------------
# If using Oracle Wallet, add the following four variables:
export WALLET_LOCATION=<Location of the Oracle Wallet files>
#For example: export WALLET_LOCATION=/home/${USER}/wallet
export TNS_ADMIN=<Like WALLET_LOCATION, this also points to the location of the Oracle Wallet files>
export CLUSTER_WALLET_LOCATION=${WALLET_LOCATION}
export CLUSTER_TNS_ADMIN=${TNS_ADMIN}
For Oracle Loader for Hadoop:
.sqldev_olh_env
#!/bin/bash
# Environment file for Oracle Loader for Hadoop

export HIVE_HOME=<For example: /opt/cloudera/parcels/CDH/lib/hive>
export HIVE_CONF_DIR=<For example: /etc/hive/conf>
export OLH_HOME=<For example (on Oracle Big Data Appliance): /opt/oracle/olh>
export HADOOP_CLASSPATH=${OLH_HOME}/jlib/*:${HIVE_CONF_DIR}:${HIVE_HOME}/lib/*
export OLH_LIB_JARS=${HIVE_HOME}/lib/hive-exec.jar,${HIVE_HOME}/lib/hive-metastore.jar,${HIVE_HOME}/lib/libfb303-0.9.2.jar
# ----------------------------------------------
# If using Oracle Wallet, add the following four variables:
export WALLET_LOCATION=<Location of the Oracle Wallet files>
export TNS_ADMIN=<Same path as WALLET_LOCATION>
export CLUSTER_WALLET_LOCATION=${WALLET_LOCATION}
export CLUSTER_TNS_ADMIN=${TNS_ADMIN}
For Oracle SQL Connector for HDFS:
.sqldev_osch_env
#!/bin/bash
# Environment file for Oracle SQL Connector for HDFS

export HIVE_HOME=<For example: /opt/cloudera/parcels/CDH/lib/hive>
export HIVE_CONF_DIR=<For example: /etc/hive/conf>
export OSCH_HOME=<For example (on Oracle Big Data Appliance): /opt/oracle/osch>
export HADOOP_CLASSPATH=${OSCH_HOME}/jlib/*:${HIVE_CONF_DIR}:${HIVE_HOME}/lib/*
# ----------------------------------------------
# If using Oracle Wallet, add the following four variables:
export WALLET_LOCATION=<Location of the Oracle Wallet files>
export TNS_ADMIN=<Same path as WALLET_LOCATION>
export CLUSTER_WALLET_LOCATION=${WALLET_LOCATION}
export CLUSTER_TNS_ADMIN=${TNS_ADMIN}

Setting Up Secure Connections for Oracle Big Data Connectors

See Apache Hadoop Connectors Support in SQL Developer in the Oracle SQL Developer User’s Guide for instructions on how to create SSH connections required for Oracle Big Data Connectors access to Hadoop.