The Oracle Big Data Connectors User's Guide describes how to install and use Oracle Big Data Connectors:
Oracle Loader for Hadoop
Oracle SQL Connector for Hadoop Distributed File System
Oracle XQuery for Hadoop
Oracle R Advanced Analytics for Hadoop
Oracle Datasource for Apache Hadoop
This document is intended for users of Oracle Big Data Connectors, including the following:
Application developers
Java programmers
XQuery programmers
System administrators
Database administrators
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.
Access to Oracle Support
Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.
The following text conventions are used in this document:
| Convention | Meaning | 
|---|---|
| boldface | Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary. | 
| italic | Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values. | 
| 
 | Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter. | 
The syntax is presented in a simple variation of Backus-Naur Form (BNF) that uses the following symbols and conventions:
| Symbol or Convention | Description | 
|---|---|
| [ ] | Brackets enclose optional items. | 
| { } | Braces enclose a choice of items, only one of which is required. | 
| | | A vertical bar separates alternatives within brackets or braces. | 
| ... | Ellipses indicate that the preceding syntactic element can be repeated. | 
| delimiters | Delimiters other than brackets, braces, and vertical bars must be entered as shown. | 
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.7).
The following table lists the software versions installed with Oracle Big Data Connectors 4.7:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.7.0 | 
| Oracle Loader for Hadoop | 3.8.0 | 
| Oracle Shell for Hadoop Loaders | 1.2 | 
| Oracle XQuery for Hadoop | 4.5.0 | 
| Oracle R Advanced Analytics for Hadoop | 2.7.0 | 
| Oracle Data Integrator | 12.2.1.1 | 
Changes in Oracle SQL Connectors for HDFS
The property oracle.hadoop.exttab.dataCompressionCodec is now deprecated.
OSCH does not process datasets containing both compressed and uncompressed files. OSCH automatically discovers the compression codec of the dataset at runtime.
-createTable for delimited text source does not support the NULLIF clause.
This transaction adds properties that allow you to configure null-if-specifiers on an external table (or specific columns of an external table) when OSCH reads data from a text source. The properties must be specified at -createTable time and can be configured as: - oracle.hadoop.exttab.nullIfSpecifier=<null-if-value> or , - oracle.hadoop.exttab.colMap.<columnName>.nullIfSpecifier=<null-if-value>.
New and Enhanced Features
Oracle R Advanced Analytics for Hadoop (ORAAH) 2.7
ORAAH 2.7 provides the following new features:
New ORAAH Spark-based LM algorithm with summary statistics.
Enhanced ORAAH Spark-based GLM full formula support and summary functions for the Spark-based GLM.
Enhanced ORAAH Spark-based Deep Neural Networks now supporting full formula parsing, and Modeling plus Scoring in Spark, with computations up to 30% faster.
New Oracle R API for the Spark MLlib Gaussian Mixture Models clustering algorithm.
General improvements to HIVE integration, especially for BDA secure clusters with enabled SSL connection and Kerberos authentication.
Automated Hive JDBC driver lookup for known installations, such RPM or parcel installations.
Oracle Shell for Hadoop Loaders (OHSH) 1.2
New features and changes in Release 1.2 include:
On-disk logging of load operations in the $HOME/.ohsh shadow directory.
The ability to minimize output when doing load commands. (See the help command for set outputlevel.)
Loading Hive tables from Oracle tables not living in the oracle user's schema.
Wallet and TNS usage by OHSH relies on the setting of environmental variables WALLET_LOCATION and TNS_ADMIN. The set tnsadmin and set walletlocation commands are no longer supported.
In addition, you no longer set HIVE0_URL to the fully-qualified URL of remote HiveServer2 in order to create a %hive0 resource. In OHSH 1.2, set the environmental variable HS2_HOST_PORT in bin/ohsh, which is the <hostname>:<port> pair of HiveServer2.
The following are changes in previous versions of the product.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.6).
The following table lists the software versions installed with Oracle Big Data Connectors 4.6:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.6.0 | 
| Oracle Loader for Hadoop | 3.7.0 | 
| Oracle Shell for Hadoop Loaders | 1.1 | 
| Oracle XQuery for Hadoop | 4.5.0 | 
| Oracle R Advanced Analytics for Hadoop | 2.6.0 | 
| Oracle Data Integrator | 12.2.1.1 | 
New and Enhanced Features
Oracle Datasource for Apache Hadoop (formerly Oracle Table Access for Apache Hadoop)
Oracle Datasource for Apache Hadoop (OD4H) is now part of the Oracle Big Data Connectors suite and is licensed for use at no additional cost.
OD4H turns Oracle Database tables into Hadoop data sources (i.e., external tables), enabling direct, and consistent HiveQL/SparkSQL queries, as well as direct Hadoop API access.
OD4H optimizes queries execution plans using predicate and projection pushdown as well as partition pruning. Oracle Database table access is performed in parallel using smart and secure connections (Kerberos, SSL, Oracle Wallet), regulated by both Hadoop (i.e., maximum concurrent tasks) and Oracle DBAs (i.e. , max pool size).
Oracle Shell for Hadoop Loaders 1.1
Oracle Shell for Hadoop Loaders (OHSH) was introduced recently in Oracle Big Data Data Connectors 4.5. OHSH is an intuitive command line tool for data migration. You can set up resources to connect to Hive, HDFS or Oracle Database and access each of these data sources through OHSH’s uniform interface. Copy to Hadoop users can download OHSH from OTN .
Changes in this release:
Interactive command history is now persistent across OHSH sessions.
Support for spooling of OHSH output to a text file. By default the spool file is ohshspool.txt in the directory where OHSH is invoked. Spooling can be turned on, off, or directed to a user-specified file as follows:
ohsh> spool on ohsh> spool off ohsh> set spool <filename>
New Hive CLI. Beeline is now the CLI for Hive resources. The syntax to create a Hive resource is now as follows.
ohsh> create hive resource <resource_id> connectionurl=<DQString>
In this case, if the user has specified the HIVE0_URL variable in bin/ohsh, the command creates a hive0 resource.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.5).
The following table lists the software versions installed with Oracle Big Data Connectors 4.5:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.5.0 | 
| Oracle Loader for Hadoop | 3.6.0 | 
| Oracle XQuery for Hadoop | 4.5.0 | 
| Oracle R Advanced Analytics for Hadoop | 2.6.0 | 
| Oracle Data Integrator | 12.2.1 | 
New and Enhanced Features
Oracle Shell for Hadoop Loaders
Oracle Shell for Hadoop Loaders is a new user interface for Big Data Connectors. Is it not itself a Big Data Connector. Oracle Shell for Hadoop Loaders is a shell and command line that provides the user with a single environment for interacting with Big Data Connectors – Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and Copy to Hadoop. In addition to providing a single point of access, Oracle Shell for Hadoop Loaders can reduce some of the overhead involved in using the Connectors, because otherwise these products must be configured, managed, and run separately.
Oracle R Advanced Analytics for Hadoop (ORAAH) 2.6 Improvements
ORAAH 2.6 includes expanded support for predictive modeling algorithms, including integration of many Spark MLlib capabilities, as well as enhancements for existing custom Spark algorithms.
Oracle XQuery for Hadoop 4.5.0 Improvements
Adds support for W3C XQuery 3.0 including the try/catch expression, the switch expression, and standard functions and operators.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.4).
This table shows the software versions installed with Oracle Big Data Connectors 4.4:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.4.0 | 
| Oracle Loader for Hadoop Foot 2 | 3.5.0 | 
| Oracle XQuery for Hadoop | 4.2.1 | 
| Oracle R Advanced Analytics for Hadoop | 2.4.0 | 
| Oracle Data Integrator | 12.2.1 | 
Footnote 2
Oracle Loader for Hadoop 3.5 supports filtering of data loaded from Hive tables at the individual record level. Previously Hive data could only be filtered at the partition level.
New Features
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.3).
This table shows the software versions installed with Oracle Big Data Connectors 4.3:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.4.0 | 
| Oracle Loader for Hadoop Foot 3 | 3.5.0 | 
| Oracle XQuery for Hadoop | 4.2.1 | 
| Oracle R Advanced Analytics for Hadoop | 2.4.0 | 
| Oracle Data IntegratorFoot 4 | 12.1.3.0 | 
Footnote 3
Oracle Loader for Hadoop 3.5 supports filtering of data loaded from Hive tables at the individual record level. Previously Hive data could only be filtered at the partition level.
Footnote 4
For information about requirements and instructions to set up and use Oracle Data Integrator, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.
New Features
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.2).
This table shows the software versions installed with Oracle Big Data Connectors 4.2:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.3.0 | 
| Oracle Loader for Hadoop | 3.4.0 | 
| Oracle XQuery for HadoopFoot 5 | 4.2.0 | 
| Oracle R Advanced Analytics for Hadoop | 2.4.0 | 
| Oracle Data IntegratorFoot 6 | 12.1.3.0 | 
Footnote 5
Added support for Oracle NoSQL Database Table API and Oracle NoSQL Database Large Object API. For working with Oracle NoSQL Database Table API functions, you must have Oracle NoSQL Database 3.1 or above.
Footnote 6
For information about requirements and instructions to set up and use Oracle Data Integrator, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.1).
This table shows the software versions installed with Oracle Big Data Connectors 4.1:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.3.0 | 
| Oracle Loader for Hadoop | 3.3.0 | 
| Oracle XQuery for Hadoop | 4.2.0 | 
| Oracle R Advanced Analytics for Hadoop | 2.4.0 | 
| Oracle Data IntegratorFoot 7 | 12.1.3.0 | 
Footnote 7
For information about requirements and instructions to set up and use Oracle Data Integrator, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 4 (4.0).
This table shows the software versions installed with Oracle Big Data Connectors 4.0:
| Connector | Version | 
|---|---|
| Oracle SQL Connector for HDFS | 3.1 | 
| Oracle Loader for Hadoop | 3.2 | 
| Oracle Data Integrator Application Adapter for HadoopFoot 8 | 12.1.3.0 | 
| Oracle XQuery for Hadoop | 4.0.1 | 
| Oracle R Advanced Analytics for Hadoop | 2.4 | 
Footnote 8
For information about requirements and instructions to set up and use Oracle Data Integrator Application Adapter for Hadoop, refer to Hadoop chapter of Oracle Fusion Middleware Application Adapters Guide for Oracle Data Integrator.
Footnote Legend
Footnote 1:Oracle Big Data Connectors includes a restricted use license for the Oracle Data Integrator when licensed on an Oracle Big Data Appliance. However, additional licensing is required for using it on other Hadoop clusters.