Preface

The Oracle Big Data Connectors User's Guide describes how to install and use Oracle Big Data Connectors:

  • Oracle Loader for Hadoop

  • Oracle SQL Connector for Hadoop Distributed File System

  • Oracle XQuery for Hadoop

  • Oracle R Advanced Analytics for Hadoop

  • Oracle Datasource for Apache Hadoop

  • Oracle Data IntegratorFoot 1

Audience

This document is intended for users of Oracle Big Data Connectors, including the following:

  • Application developers

  • Java programmers

  • XQuery programmers

  • System administrators

  • Database administrators

Documentation Accessibility

For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support

Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.

Related Documents

For more information, see the following documents:

Text Conventions

The following text conventions are used in this document:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary.

italic

Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values.

monospace

Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.

Syntax Conventions

The syntax is presented in a simple variation of Backus-Naur Form (BNF) that uses the following symbols and conventions:

Symbol or Convention Description

[ ]

Brackets enclose optional items.

{ }

Braces enclose a choice of items, only one of which is required.

|

A vertical bar separates alternatives within brackets or braces.

...

Ellipses indicate that the preceding syntactic element can be repeated.

delimiters

Delimiters other than brackets, braces, and vertical bars must be entered as shown.

Changes in Oracle Big Data Connectors Release 4 (4.8)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.8).

The following table lists the software versions installed with Oracle Big Data Connectors 4.8:

Table -1 Software Versions for Oracle Big Data Connectors 4.8

Software Version

Oracle SQL Connector for HDFS (OSCH)

3.7.1

Oracle Loader for Hadoop (OLH)

3.8.1

Oracle Shell for Hadoop Loaders (OHSH)

1.2.1

Oracle XQuery for Hadoop (OXH)

4.5.1.0.0

Oracle R Advanced Analytics for Hadoop (ORAAH)

2.7.0

Oracle DataSource for Apache Hadoop (OD4H)

1.2.0

Oracle Data Integrator (ODI)

12.2.1.1

Copy to Hadoop

3.1.2

Change History for Previous Releases

The following are changes in previous versions of the product.

Changes in Oracle Big Data Connectors Release 4 (4.7)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.7).

The following table lists the software versions installed with Oracle Big Data Connectors 4.7:

Connector Version

Oracle SQL Connector for HDFS

3.7.0

Oracle Loader for Hadoop

3.8.0

Oracle Shell for Hadoop Loaders

1.2

Oracle XQuery for Hadoop

4.5.0

Oracle R Advanced Analytics for Hadoop

2.7.0

Oracle Data Integrator

12.2.1.1

Changes in Oracle SQL Connectors for HDFS

  • The property oracle.hadoop.exttab.dataCompressionCodec is now deprecated.

    OSCH now processes datasets containing both compressed and uncompressed files. OSCH automatically discovers the compression codec of the dataset at runtime.

  • -createTable for delimited text source now supports the NULLIF clause.

    You can use the configuration property nullIfSpecifier to set NULLIF at -createTable time. For example:

    oracle.hadoop.exttab.nullIfSpecifier=<NULLIF-value> 
    

    or:

    oracle.hadoop.exttab.colMap.<columnName>.nullIfSpecifier=<NULLIF-value>
    

    Note that the column-level nullIfSpecifier overrides the external table level nullIfSpecifier.

New and Enhanced Features

  • Oracle R Advanced Analytics for Hadoop (ORAAH) 2.7

    ORAAH 2.7 provides the following new features:

    • New ORAAH Spark-based LM algorithm with summary statistics.

    • Enhanced ORAAH Spark-based GLM full formula support and summary functions for the Spark-based GLM.

    • Enhanced ORAAH Spark-based Deep Neural Networks now supporting full formula parsing, and Modeling plus Scoring in Spark, with computations up to 30% faster.

    • New Oracle R API for the Spark MLlib Gaussian Mixture Models clustering algorithm.

    • General improvements to HIVE integration, especially for BDA secure clusters with enabled SSL connection and Kerberos authentication.

    • Automated Hive JDBC driver lookup for known installations, such RPM or parcel installations.

  • Oracle Shell for Hadoop Loaders (OHSH) 1.2

    New features and changes in Release 1.2 include:

    • On-disk logging of load operations in the $HOME/.ohsh shadow directory.

    • The ability to minimize output when doing load commands. (See the help command for set outputlevel.)

    • Loading Hive tables from Oracle tables not living in the oracle user's schema.

    • Wallet and TNS usage by OHSH relies on the setting of environmental variables WALLET_LOCATION and TNS_ADMIN. The set tnsadmin and set walletlocation commands are no longer supported.

    In addition, you no longer set HIVE0_URL to the fully-qualified URL of remote HiveServer2 in order to create a %hive0 resource. In OHSH 1.2, set the environmental variable HS2_HOST_PORT in bin/ohsh, which is the <hostname>:<port> pair of HiveServer2.

Changes in Oracle Big Data Connectors Release 4 (4.6)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.6).

The following table lists the software versions installed with Oracle Big Data Connectors 4.6:

Connector Version

Oracle SQL Connector for HDFS

3.6.0

Oracle Loader for Hadoop

3.7.0

Oracle Shell for Hadoop Loaders

1.1

Oracle XQuery for Hadoop

4.5.0

Oracle R Advanced Analytics for Hadoop

2.6.0

Oracle Data Integrator

12.2.1.1

New and Enhanced Features

  • Oracle Datasource for Apache Hadoop (formerly Oracle Table Access for Apache Hadoop)

    Oracle Datasource for Apache Hadoop (OD4H) is now part of the Oracle Big Data Connectors suite and is licensed for use at no additional cost.

    OD4H turns Oracle Database tables into Hadoop data sources (i.e., external tables), enabling direct, and consistent HiveQL/SparkSQL queries, as well as direct Hadoop API access.

    OD4H optimizes queries execution plans using predicate and projection pushdown as well as partition pruning. Oracle Database table access is performed in parallel using smart and secure connections (Kerberos, SSL,  Oracle Wallet), regulated by both Hadoop (i.e., maximum concurrent tasks) and Oracle DBAs (i.e. , max pool size).

  • Oracle Shell for Hadoop Loaders 1.1

    Oracle Shell for Hadoop Loaders (OHSH) was introduced recently in Oracle Big Data Data Connectors 4.5. OHSH is an intuitive command line tool for data migration. You can set up resources to connect to Hive, HDFS or Oracle Database and access each of these data sources through OHSH’s uniform interface. Copy to Hadoop users can download OHSH from OTN .

    Changes in this release:

    • Interactive command history is now persistent across OHSH sessions.

    • Support for spooling of OHSH output to a text file. By default the spool file is ohshspool.txt in the directory where OHSH is invoked. Spooling can be turned on, off, or directed to a user-specified file as follows:

      ohsh> spool on
      ohsh> spool off
      ohsh> set spool <filename>
      
    • New Hive CLI. Beeline is now the CLI for Hive resources. The syntax to create a Hive resource is now as follows.

      ohsh> create hive resource <resource_id> connectionurl=<DQString>
      

      In this case, if the user has specified the HIVE0_URL variable in bin/ohsh, the command creates a hive0 resource.

Changes in Oracle Big Data Connectors Release 4 (4.5)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.5).

The following table lists the software versions installed with Oracle Big Data Connectors 4.5:

Connector Version

Oracle SQL Connector for HDFS

3.5.0

Oracle Loader for Hadoop

3.6.0

Oracle XQuery for Hadoop

4.5.0

Oracle R Advanced Analytics for Hadoop

2.6.0

Oracle Data Integrator

12.2.1

New and Enhanced Features

  • Oracle Shell for Hadoop Loaders

    Oracle Shell for Hadoop Loaders is a new user interface for Big Data Connectors. Is it not itself a Big Data Connector. Oracle Shell for Hadoop Loaders is a shell and command line that provides the user with a single environment for interacting with Big Data Connectors – Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and Copy to Hadoop. In addition to providing a single point of access, Oracle Shell for Hadoop Loaders can reduce some of the overhead involved in using the Connectors, because otherwise these products must be configured, managed, and run separately.

  • Oracle R Advanced Analytics for Hadoop (ORAAH) 2.6 Improvements

    ORAAH 2.6 includes expanded support for predictive modeling algorithms, including integration of many Spark MLlib capabilities, as well as enhancements for existing custom Spark algorithms.

  • Oracle XQuery for Hadoop 4.5.0 Improvements

    Adds support for W3C XQuery 3.0 including the try/catch expression, the switch expression, and standard functions and operators.

Changes in Oracle Big Data Connectors Release 4 (4.4)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.4).

This table shows the software versions installed with Oracle Big Data Connectors 4.4:

Connector Version

Oracle SQL Connector for HDFS

3.4.0

Oracle Loader for Hadoop Foot 2

3.5.0

Oracle XQuery for Hadoop

4.2.1

Oracle R Advanced Analytics for Hadoop

2.4.0

Oracle Data Integrator

12.2.1

Footnote 2

Oracle Loader for Hadoop 3.5 supports filtering of data loaded from Hive tables at the individual record level. Previously Hive data could only be filtered at the partition level.

New Features

Changes in Oracle Big Data Connectors Release 4 (4.3)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.3).

This table shows the software versions installed with Oracle Big Data Connectors 4.3:

Connector Version

Oracle SQL Connector for HDFS

3.4.0

Oracle Loader for Hadoop Foot 3

3.5.0

Oracle XQuery for Hadoop

4.2.1

Oracle R Advanced Analytics for Hadoop

2.4.0

Oracle Data IntegratorFoot 4

12.1.3.0

Footnote 3

Oracle Loader for Hadoop 3.5 supports filtering of data loaded from Hive tables at the individual record level. Previously Hive data could only be filtered at the partition level.

Footnote 4

For information about requirements and instructions to set up and use Oracle Data Integrator, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.

New Features

Changes in Oracle Big Data Connectors Release 4 (4.2)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.2).

This table shows the software versions installed with Oracle Big Data Connectors 4.2:

Connector Version

Oracle SQL Connector for HDFS

3.3.0

Oracle Loader for Hadoop

3.4.0

Oracle XQuery for HadoopFoot 5

4.2.0

Oracle R Advanced Analytics for Hadoop

2.4.0

Oracle Data IntegratorFoot 6

12.1.3.0

Footnote 5

Added support for Oracle NoSQL Database Table API and Oracle NoSQL Database Large Object API. For working with Oracle NoSQL Database Table API functions, you must have Oracle NoSQL Database 3.1 or above.

Footnote 6

For information about requirements and instructions to set up and use Oracle Data Integrator, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.

Changes in Oracle Big Data Connectors Release 4 (4.1)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.1).

This table shows the software versions installed with Oracle Big Data Connectors 4.1:

Connector Version

Oracle SQL Connector for HDFS

3.3.0

Oracle Loader for Hadoop

3.3.0

Oracle XQuery for Hadoop

4.2.0

Oracle R Advanced Analytics for Hadoop

2.4.0

Oracle Data IntegratorFoot 7

12.1.3.0

Footnote 7

For information about requirements and instructions to set up and use Oracle Data Integrator, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.

Changes in Oracle Big Data Connectors Release 4 (4.0)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.0).

This table shows the software versions installed with Oracle Big Data Connectors 4.0:

Connector Version

Oracle SQL Connector for HDFS

3.1

Oracle Loader for Hadoop

3.2

Oracle Data Integrator Application Adapter for HadoopFoot 8

12.1.3.0

Oracle XQuery for Hadoop

4.0.1

Oracle R Advanced Analytics for Hadoop

2.4

Footnote 8

For information about requirements and instructions to set up and use Oracle Data Integrator Application Adapter for Hadoop, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.



Footnote Legend

Footnote 1:

Oracle Big Data Connectors includes a restricted use license for the Oracle Data Integrator when licensed on an Oracle Big Data Appliance. However, additional licensing is required for using it on other Hadoop clusters.