4 Ease of Use Tools for Oracle Big Data Connectors

Oracle Big Data Connectors are a powerful toolset for data interchange between Hadoop and Oracle Database. Learning how to set up the environment for these tools and use them to full potential is worthwhile, but there are ways to get started and be productive without fully completing that learning curve. Oracle is developing a set of ease-of-use tools for this purpose.

Oracle Shell for Hadoop Loaders is the first ease-of-use tool for Oracle Big Data Connectors.

4.1 Introducing Oracle Shell for Hadoop Loaders

What is Oracle Shell for Hadoop Loaders?

Oracle Shell for Hadoop Loaders (OHSH) is a helper shell that provides a simple to use command line interface to Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and Copy to Hadoop. It has basic shell features such as command line recall, history, inheriting environment variables from the parent process, setting new or existing environment variables, and performing environmental substitution in the command line. 

The core functionality of Oracle Shell for Hadoop Loaders includes the following:

  • Defining named external resources with which Oracle Shell for Hadoop Loaders interacts to perform loading tasks.

  • Setting default values for load operations.

  • Running load commands.

  • Delegating simple pre and post load tasks to the Operating System, HDFS, Hive and Oracle. These tasks include viewing the data to be loaded, and viewing the data in the target table after loading.

Getting Started with Oracle Shell for Hadoop Loaders

To set up Oracle Shell for Hadoop Loaders, follow the instructions in the Oracle Big Data SQL Installation Guide.

The examples directory in the OHSH kit contains many examples that define resources and load data using Oracle Shell for Hadoop Loaders.  See <OHSH_KIT>/examples/README.txt for a description of the examples and instructions for running them.

Getting Help

The OHSH shell provides online help for all commands.

To get a list of all possible OHSH commands:

ohsh> help

To get help on a specific command, enter help, followed by the command:    

ohsh> help show

The table below describes the help categories available.

Help Command Description
help load Describes load commands for Oracle and Hadoop tables.
help set Shows help for setting defaults for load operations. It also describes what load methods are impacted by a particular setting.
help show Shows help for inspecting default settings.
help shell Shows shell-like commands.
help resource Show commands for creating and dropping named resources.

4.1.1 Third-Party Licenses for Bundled Software

Oracle Shell for Hadoop Loaders includes the following third-party products:

  • ANTLR 4.7

  • Apache Commons Exec 1.3

Unless otherwise specifically noted, or as required under the terms of the third party license (e.g., LGPL), the licenses and statements herein, including all statements regarding Apache-licensed code, are intended as notices only.

4.1.1.1 Apache Commons Exec 1.3

Include the following License ONLY ONCE in the documentation even if there are multiple products licensed under the license.

The following applies to all products licensed under the Apache 2.0 License:

You may not use the identified files except in compliance with the Apache License, Version 2.0 (the "License.").

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. A copy of the license is also reproduced below.

4.1.1.2 Apache License

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

  1. Definitions

    "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

    "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

    "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

    "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.

    "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

    "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

    "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

    "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

    "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."

    "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.

  2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

  3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.

  4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:

    1. You must give any other recipients of the Work or Derivative Works a copy of this License; and

    2. You must cause any modified files to carry prominent notices stating that You changed the files; and

    3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and

    4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

    You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

  5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.

  6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.

  7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.

  8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.

  9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work

To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This product includes software developed by The Apache Software Foundation (http://www.apache.org/) (listed below):

4.1.1.3 ANTLR 4.7

Copyright (c) 2015 Terence Parr, Sam Harwell

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

4.2 Using Oracle SQL Developer With Oracle Big Data Connectors

Oracle SQL Developer is an essentially a graphical version SQL*Plus. Among its features is a user-friendly interface to tools for moving data between Hadoop and Oracle Database. It includes support for Copy to Hadoop, Oracle Loader for Hadoop, and Oracle SQL Connector for Hadoop Distributed File System (as well as Oracle Big Data SQL). There are wizards in the interface to assist with use of all of of these tools.

Follow these steps to set up Oracle SQL Developer to work with Oracle Big Data Connectors.

  1. Download and install Oracle SQL Developer.

  2. Download the Hive JDBC Drivers.

  3. Add the new Hive JDBC Drivers to Oracle SQL Developer.

  4. Set environment variables required for Oracle Big Data Connectors.

  5. Set up the necessary connections.

After you have installed the drivers, configured the environment, and created connections between Oracle Database and Hadoop, you can start using Oracle Big Data Connectors from within Oracle SQL Developer.

Downloading and Installing Oracle SQL Developer

Install Oracle SQL Developer 4.2 or greater. Release 4.2 is recommended, because it is the first release to include support for Copy To Hadoop.

The installation is simple. Just download the package and extract it.

  1. Go to the Oracle SQL Developer download site on the Oracle Technology Network (OTN).

  2. Accept the license agreement and download the version that is appropriate for your platform.

  3. Extract the downloaded ZIP file to your local drive.

    You can extract to any folder name.

See Installing and Getting Started with SQL Developer in the Oracle SQL Developer User’s Guide for further installation and configuration details.

Downloading and Installing the Hive JDBC Drivers for Cloudera Enterprise

To connect Oracle SQL Developer to Hive in the Hadoop environment, you need to download and install the Hive JDBC drivers for Cloudera Enterprise. These drivers are not included in the Oracle SQL Developer download package.

Note for HDP Users:

At this time, SQL Developer 4.2 requires the Cloudera JDBC drivers for Hive. However, these drivers appear to work against Hortonworks clusters as well. HDP users should test to determine if these drivers meet their needs.
  1. Download the latest Cloudera JDBC drivers for Hive from the Cloudera website to any local directory.

    You can search for “cloudera hive jdbc drivers download” on the Cloudera website to locate the available driver packages.

    You are prompted to select the driver version, OS, and OS version (32/64 bit). At this time, the latest drive version is 2.5.18. You can choose the newest version available.

  2. Unzip the archive:

    unzip hive_jdbc_<version>.zip
    
  3. View the extracted content. Notice that under the top-level folder there are multiple ZIP files. Each is for a different JDBC version. For this setup, only JBDC 4.0 is usable. Select the JDBC4_ ZIP file (JDBC4_<version>.zip).

    Important:

    Choose only the JDBC4_ ZIP file, which contains the drivers for JDBC 4.0. This is the only compatible version. The drivers in other packages, such as JDBC41_*, are not compatible with SQL Developer 4.2 and will return errors upon connection attempts.
  4. Unzip the JDBC4 archive to a target directory that is accessible to Oracle SQL Developer, for example, ./home/oracle/jdbc :

    # unzip Cloudera_HiveJDBC4_<version>.zip -d /home/oracle/jdbc/
    

    The extracted content should be similar to this:

    Cloudera_HiveJDBC4_2.5.18.1050\Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf
    Cloudera_HiveJDBC4_2.5.18.1050\Cloudera-JDBC-Driver-for-Apache-Hive-Release-Notes.pdf
    Cloudera_HiveJDBC4_2.5.18.1050\commons-codec-1.3.jar
    Cloudera_HiveJDBC4_2.5.18.1050\commons-logging-1.1.1.jar
    Cloudera_HiveJDBC4_2.5.18.1050\HiveJDBC4.jar
    Cloudera_HiveJDBC4_2.5.18.1050\hive_metastore.jar
    Cloudera_HiveJDBC4_2.5.18.1050\hive_service.jar
    Cloudera_HiveJDBC4_2.5.18.1050\httpclient-4.1.3.jar
    Cloudera_HiveJDBC4_2.5.18.1050\httpcore-4.1.3.jar
    Cloudera_HiveJDBC4_2.5.18.1050\libfb303-0.9.0.jar
    Cloudera_HiveJDBC4_2.5.18.1050\libthrift-0.9.0.jar
    Cloudera_HiveJDBC4_2.5.18.1050\log4j-1.2.14.jar
    Cloudera_HiveJDBC4_2.5.18.1050\out.txt
    Cloudera_HiveJDBC4_2.5.18.1050\ql.jar
    Cloudera_HiveJDBC4_2.5.18.1050\slf4j-api-1.5.11.jar
    Cloudera_HiveJDBC4_2.5.18.1050\slf4j-log4j12-1.5.11.jar
    Cloudera_HiveJDBC4_2.5.18.1050\TCLIServiceClient.jar
    Cloudera_HiveJDBC4_2.5.18.1050\zookeeper-3.4.6.jar
    

Adding the new Hive JDBC Drivers to Oracle SQL Developer

Next, start up SQL Developer and copy all of the extracted driver files into “Third Party JDBC Drivers” in the Preferences window.

  1. Navigate to the folder where you downloaded and extracted Oracle SQL Developer.

  2. Click the sqldeveloper subfolder. Then, click sqldeveloper.exe in this folder.

  3. In the SQL Developer menu bar, select Tools>Preferences.

  4. In the file explorer of the Preferences window, expand Database and then click Third Party JDBC Drivers.

  5. Click Add Entry.

  6. Navigate to the folder where you sent the files extracted from Cloudera_HiveJDBC4_<version>.zip. Copy all of the JAR files from the ZIP extraction into this window and then click OK.

  7. Restart Oracle SQL Developer.

Setting up Environment Variables for Using Oracle Big Data Connectors With Oracle SQL Developer

SQL Developer with Oracle Big Data Connectors requires the user to make an SSH connection from SQL Developer to a Hadoop client, Hadoop node, or Hadoop edge node. The home directory of this account requires a specific environment file for each of the Big Data Connectors it runs

Note:

If you want to do staged copies in Copy to Hadoop, then Copy to Hadoop requires an additional SSH connection to the oracle OS account on the database system.

You must create and populate the environment files. The following table provides the exact filenames and the content you must add to each file. The file must be readable by the account using the Big Data Connector.

Environment File Name Contents
For Copy to Hadoop:
.sqldev_cp2hadoop_env
#!/bin/bash
# Environment file for Copy to Hadoop

export CP2HADOOP_HOME=<Parent directory of the directory containing Copy to Hadoop JARs>  
# On Oracle Big Data Appliance, the Copy to Hadoop JARs are in /opt/oracle/bigdatasql/bdcell-12.1
export HADOOP_CLASSPATH=${CP2HADOOP_HOME}/jlib/*
# ----------------------------------------------
# If using Oracle Wallet, add the following four variables:
export WALLET_LOCATION=<Location of the Oracle Wallet files>
#For example: export WALLET_LOCATION=/home/${USER}/wallet
export TNS_ADMIN=<Like WALLET_LOCATION, this also points to the location of the Oracle Wallet files>
export CLUSTER_WALLET_LOCATION=${WALLET_LOCATION}
export CLUSTER_TNS_ADMIN=${TNS_ADMIN}
For Oracle Loader for Hadoop:
.sqldev_olh_env
#!/bin/bash
# Environment file for Oracle Loader for Hadoop

export HIVE_HOME=<For example: /opt/cloudera/parcels/CDH/lib/hive>
export HIVE_CONF_DIR=<For example: /etc/hive/conf>
export OLH_HOME=<For example (on Oracle Big Data Appliance): /opt/oracle/olh>
export HADOOP_CLASSPATH=${OLH_HOME}/jlib/*:${HIVE_CONF_DIR}:${HIVE_HOME}/lib/*
export OLH_LIB_JARS=${HIVE_HOME}/lib/hive-exec.jar,${HIVE_HOME}/lib/hive-metastore.jar,${HIVE_HOME}/lib/libfb303-0.9.2.jar
# ----------------------------------------------
# If using Oracle Wallet, add the following four variables:
export WALLET_LOCATION=<Location of the Oracle Wallet files>
export TNS_ADMIN=<Same path as WALLET_LOCATION>
export CLUSTER_WALLET_LOCATION=${WALLET_LOCATION}
export CLUSTER_TNS_ADMIN=${TNS_ADMIN}
For Oracle SQL Connector for HDFS:
.sqldev_osch_env
#!/bin/bash
# Environment file for Oracle SQL Connector for HDFS

export HIVE_HOME=<For example: /opt/cloudera/parcels/CDH/lib/hive>
export HIVE_CONF_DIR=<For example: /etc/hive/conf>
export OSCH_HOME=<For example (on Oracle Big Data Appliance): /opt/oracle/osch>
export HADOOP_CLASSPATH=${OSCH_HOME}/jlib/*:${HIVE_CONF_DIR}:${HIVE_HOME}/lib/*
# ----------------------------------------------
# If using Oracle Wallet, add the following four variables:
export WALLET_LOCATION=<Location of the Oracle Wallet files>
export TNS_ADMIN=<Same path as WALLET_LOCATION>
export CLUSTER_WALLET_LOCATION=${WALLET_LOCATION}
export CLUSTER_TNS_ADMIN=${TNS_ADMIN}

Setting Up Secure Connections for Oracle Big Data Connectors

See Apache Hadoop Connectors Support in SQL Developer in the Oracle SQL Developer User’s Guide for instructions on how to create SSH connections required for Oracle Big Data Connectors access to Hadoop.