Skip Headers
Oracle® Big Data Connectors User's Guide
Release 1 (1.0)

Part Number E27365-06
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

1 Getting Started with Oracle Big Data Connectors

This chapter introduces you to Oracle Big Data Connectors, provides installation instructions, and identifies the permissions needed for users to access the connectors.

This chapter contains these topics:

1.1 About Oracle Big Data Connectors

Oracle Big Data Connectors facilitate data access between data stored in a Hadoop cluster and Oracle Database. They can be licensed for use on either Oracle Big Data Appliance or a Hadoop cluster running on commodity hardware.

These are the connectors:

Individual connectors may require that software components are installed in Oracle Database, the Hadoop cluster, and the user's PC. Users may also need additional access privileges in Oracle Database.

See Also:

My Oracle Support Master Note 1416116.1 and its related notes

1.2 Downloading the Software

You can download Oracle Big Data Connectors from Oracle Technology Network (OTN) or Oracle Delivery Cloud.

To download from OTN: 

  1. Use any browser to visit this website:

    http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html

  2. Click the name of each connector to download a zip file containing the installation files.

To download from Oracle Software Delivery Cloud: 

  1. You can also download the software from Oracle Software Delivery Cloud at

    https://edelivery.oracle.com/

  2. Accept the Terms and Restrictions to see the Media Pack Search page.

  3. Select the search terms:

    Select a Product Pack: Oracle Database

    Platform: Linux x86-64

  4. Click Go to display a list of product packs.

  5. Select Oracle Big Data Connectors Media Pack for Linux x86-64 (B65965-0x), then click Continue.

  6. Click Download for each connector to download a zip file containing the installation files.

1.3 Oracle Direct Connector for Hadoop Distributed File System

Oracle Direct Connector for Hadoop Distributed File System (Oracle Direct Connector) is installed and runs on the system where Oracle Database runs. Before installing Oracle Direct Connector, verify that you have the required software.

1.3.1 Required Software

Oracle Direct Connector requires the following software:

  • Cloudera's Distribution including Apache Hadoop Version CDH3 or Apache Hadoop 0.20.2.

  • Oracle JDK 1.6.0_8 or higher for CDH3. Cloudera recommends version 1.6.0_26.

  • Oracle Database Release 11g Release 2 (11.2.0.2 or 11.2.0.3) for Linux.

  • To support the Data Pump file format, a database one-off patch. To download this patch, go to http://support.oracle.com and search for bug 13079417.

  • The same version of Hadoop on the database system as your Hadoop cluster, either CDH3 or Apache Hadoop 0.20.2.

  • The same version of Oracle JDK on the database system as your Hadoop cluster.

1.3.2 Installing and Configuring Hadoop

Oracle Direct Connector works as an HDFS client. You do not need to configure Hadoop on the database system to run MapReduce jobs for Oracle Direct Connector. However, you must install Hadoop on the database system and minimally configure it for HDFS client use only.

To configure the database system as a Hadoop client: 

  1. Install CDH3 or Apache Hadoop 0.20.2 on the database system. Follow the installation instructions provided by the distributor (Cloudera or Apache). Do not follow the configuration instructions.

  2. Use a text editor to open conf/hadoop-env.sh in the Hadoop home directory on the database system, then make these changes:

    1. Uncomment the line that begins export JAVA_HOME.

    2. Set JAVA_HOME to the directory where JDK1.6 is installed.

  3. Edit conf/core-site.xml in the same directory to identify the NameNode of your Hadoop cluster as follows:

    <configuration>
       <property>
          <name>fs.default.name</name>
          <value>hdfs://host:port</value>
        </property>
    </configuration>
    
  4. Ensure that Oracle Database has access to Hadoop and HDFS:

    1. Log in to the system where Oracle Database is running using the Oracle database account.

    2. Open a bash shell and issue this command:

      $HADOOP_HOME/bin/hadoop fs -ls /user
      

      In this command, $HADOOP_HOME is the absolute path to the Hadoop home directory. You should see a list of files. If not, then first ensure that the Hadoop cluster is up and running. If the problem persists, then you must correct the Hadoop client configuration so that Oracle Database has access to the Hadoop cluster file system.

The database system is now ready for use as a Hadoop client. No other Hadoop configuration steps are needed.

1.3.3 Installing Oracle Direct Connector

To install Oracle Direct Connector:

  1. Download the zip file to a directory on the system where Oracle Database runs.

  2. Unzip orahdfs-version.zip into a directory. The unzipped files have the structure shown in Example 1-1.

  3. Open the hdfs_stream bash shell script in a text editor and make these changes:

    • HADOOP_HOME: Set to the absolute path of the Hadoop home directory.

    • DIRECTHDFS_HOME: Set to the absolute path of the Oracle Direct Connector installation directory.

    The hdfs_stream script is the preprocessor script for the HDFS external table. Comments in the script provide complete instructions for making these changes.

  4. Run the hdfs_stream script from the Oracle Direct Connector installation directory. You should see this usage information:

    $ bin/hdfs_stream
    Oracle Direct HDFS Release 1.0.0.0.0 - Production
    Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
    Usage: $HADOOP_HOME/bin/hadoop jar orahdfs.jar oracle.hadoop.hdfs.exttab.HdfsStream <locationPath>
    

    If not, then ensure that the operating system user that Oracle is running under has the following permissions:

    • Read and execute permissions on the hdfs_stream script:

      $ ls -l DIRECTHDFS_HOME/bin/hdfs_stream
      -rwxr-xr-x 1 oracle oinstall 2273 Apr 27 15:51 hdfs_stream
      

      If you do not see these permissions, then issue a chmod command to fix them:

      $ chmod 755 DIRECTHDFS_HOME/bin/hdfs_stream
      

      In these commands, DIRECTHDFS_HOME represents the Oracle Direct Connector home directory.

    • Read permission on DIRECTHDFS_HOME/jlib/orahdfs.jar.

  5. Create a database directory for the orahdfs-version/bin directory where hdfs_stream resides. In this example, the Oracle Direct Connector kit is installed in /etc:

    SQL> CREATE OR REPLACE DIRECTORY hdfs_bin_path AS  '/etc/orahdfs-1.0/bin'
    

Example 1-1 Structure of the orahdfs Directory

orahdfs-version
   bin/
      hdfs_stream
   jlib/ 
      orahdfs.jar
   log/
   README.txt

1.3.4 Granting User Access to Oracle Direct Connector

Oracle Database users require these privileges to use Oracle Direct Connector:

  • CREATE SESSION

  • EXECUTE on the UTL_FILE PL/SQL package.

  • READ and EXECUTE on the HDFS_BIN_PATH directory created in Step 0 Do not grant write access to anyone. Grant EXECUTE only to those who intend to use Oracle Direct Connector.

Example 1-2 shows the SQL commands granting these privileges to HDFSUSER.

Example 1-2 Granting Users Access to Oracle Direct Connector

CONNECT / AS sysdba;
CREATE USER hdfsuser IDENTIFIED BY password;
GRANT CREATE SESSION TO hdfsuser;
GRANT EXECUTE ON SYS.UTL_FILE TO hdfsuser;
GRANT READ, EXECUTE ON DIRECTORY hdfs_bin_path TO hdfsuser;

1.4 Oracle Loader for Hadoop

Before installing Oracle Loader for Hadoop, verify that you have the required software.

1.4.1 Required Software

Oracle Loader for Hadoop requires the following software:

  • A target database system running one of the following:

    • Oracle Database 10g Release 2 (10.2.0.5) with required patch

    • Oracle Database 11g Release 2 (11.2.0.2) with required patch

    • Oracle Database 11g Release 2 (11.2.0.3)

    Note:

    To use Oracle Loader for Hadoop with Oracle Database 10g Release 2 (10.2.0.5) or Oracle Database 11g Release 2 (11.2.0.2), you must first apply a one-off patch that addresses bug number 11897896. To access this patch, go to http://support.oracle.com and search for the bug number.
  • Cloudera's Distribution including Apache Hadoop (CDH3) or Apache Hadoop 0.20.2

  • Hive 0.7.0 or 0.7.1, if using the HiveToAvroInputFormat class

1.4.2 Installing Oracle Loader for Hadoop

Oracle Loader for Hadoop is packaged with the Oracle Database 11g Release 2 client libraries and Oracle Instant Client libraries for connecting to Oracle Database 10.2.0.5, 11.2.0.2, or 11.2.0.3.

To install Oracle Loader for Hadoop: 

  1. Unpack the content of the oraloader-version.zip archive into a directory on your Hadoop cluster.

    A directory named oraloader-version is created with the following subdirectories:

    • doc

    • jlib

    • lib

    • examples

    This guide uses the variable ${OLH_HOME} to refer to this installation directory.

  2. Add ${OLH_HOME}/jlib/* to the HADOOP_CLASSPATH variable.

1.5 Oracle Data Integrator Application Adapter for Hadoop

Installation requirements for Oracle Data Integrator Application Adapter for Hadoop are provided in these topics:

1.5.1 System Requirements and Certifications

To use the Application Adapter for Hadoop, you must first have Oracle Data Integrator, which is licensed separately from Oracle Big Data Connectors. You can download Oracle Data Integrator from the Oracle website at

http://www.oracle.com/technetwork/middleware/data-integrator/downloads/index.html

Oracle Data Integrator Application Adapter for Hadoop Knowledge Modules require a minimum version of Oracle Data Integrator 11.1.1.6.0.

Before performing any installation, read the system requirements and certification documentation to ensure that your environment meets the minimum installation requirements for the products you are installing.

The list of supported platforms and versions is available on Oracle Technical Network:

http://www.oracle.com/technology/products/oracle-data-integrator/index.html

1.5.2 Technology Specific Requirements

The list of supported technologies and versions is available on Oracle Technical Network:

http://www.oracle.com/technology/products/oracle-data-integrator/index.html

1.5.3 Location of the Oracle Data Integrator Application Adapter for Hadoop

Oracle Data Integrator Application Adapter for Hadoop is available in the xml-reference directory of the Oracle Data Integrator Companion CD.

1.6 Oracle R Connector for Hadoop

Oracle R Connector for Hadoop requires the installation of a software environment on the Hadoop side and on a client Linux system.

1.6.1 Installing the Server Software

Oracle Big Data Appliance supports Oracle R Connector for Hadoop without any additional software installation or configuration.

To use Oracle R Connector for Hadoop on any other Hadoop cluster, you must create the necessary environment.

Install these components on third-party servers: 

  • Java Virtual Machine (JVM), preferably Java HotSpot Virtual Machine 6.

  • R distribution 2.13.2 with all base libraries on all nodes in the Hadoop cluster.

  • ORHC package installed on each R engine, which must exist on every node of the Hadoop cluster. See the following instructions.

To install ORHC: 

  1. Set the environment variables for the Hadoop and JVM home directories:

    $ setenv HADOOP_HOME /usr/lib/hadoop-0.2.0
    $ setenv JAVA_HOME /usr/lib/jdk6
    

    In this example, both home directories are in /usr/lib.

  2. Unzip the downloaded file:

    $ unzip orhc.tgz.zip
    Archive:  orhc.tgz.zip
    
  3. Open R and install the package:

    > install.packages("/home/tmp/orhc.tgz", repos=NULL)
    Installing package(s) into ...
    .
    .
    .
    Hadoop is up and running.
    
  4. Alternatively, you can install the package from the Linux command line:

    $ R CMD INSTALL orhc.tgz
    * installing *source* package 'ORHC' ...
    ** R
    .
    .
    .
    Hadoop is up and running.
     
    * DONE (ORHC)
    

1.6.2 Installing the Client Software

To provide access to a Hadoop cluster to R users, install these components on a Linux server:

  • Hadoop Client to allow access to the Hadoop cluster

    For Oracle Big Data Appliance, see the Oracle Big Data Appliance Software User's Guide for detailed instructions on setting up remote client access.

  • Java Virtual Machine, preferably Java HotSpot Virtual Machine 6

  • R distribution 2.13.2

  • ORHC R package

    Follow the steps for installing ORHC in "Installing the Server Software".

  • Oracle R Enterprise libraries (optional). They support access to Oracle Database; otherwise, Oracle R Connector for Hadoop operates only with in-memory R objects and local data files without access to the advanced statistical algorithms provided by Oracle R Enterprise. For example:

    library(DBI)
    library(ROracle)
    library(OREbase)
    library(OREeda)
    library(OREgraphics)
    library(OREstats)
    library(RToXmp)
    

When you are done, ensure that users have the necessary permissions to connect to the Linux server and run R.