Changes in This Release for Oracle Big Data Connectors User's Guide

This preface contains:

Changes in Oracle Big Data Connectors Release 2 (2.6)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 2 (2.6).

This table shows the software versions installed with Oracle Big Data Connectors 2.6:

Connector Version
Oracle SQL Connector for HDFS 3.0.0
Oracle Loader for Hadoop 3.0.1
Oracle Data Integrator Application Adapter for Hadoop 11.1.7.0
Oracle XQuery for Hadoop 3.0.0
Oracle R Advanced Analytics for Hadoop 2.4

New Features

Oracle Big Data Connectors support Yet Another Resource Negotiator (YARN). MapReduce programs might require recompiling under YARN.

  • Oracle SQL Connector for Hadoop Distributed File System

    • Supports partitioned Hive tables. You can query a partition or a range of partitions, and load only what you need for further analysis into tables in Oracle Database.

      See "Creating External Tables from Hive Tables".

    • Provides two new commands, -drop and -describe, to the command-line tool. These commands facilitate administering the external tables and location files generated by Oracle SQL Connector for HDFS.

      See "Using the ExternalTable Command-Line Tool".

    • Supports new data types introduced in Hive versions 0.12.0 and 0.13.0, and included in CDH5: char, date, decimal(p,s), and varchar.

      See "Data Type Mappings".

  • Oracle Loader for Hadoop

    • Supports partitioned Hive tables, so that you can load one or more partitions into Oracle Database as an alternative to the entire table.

    See "Hive Table Input Format".

  • Oracle XQuery for Hadoop

Deprecated Features

The following features are deprecated in this release, and may be desupported in a future release:

Changes in Oracle Big Data Connectors Release 2 (2.5)

The following table shows the software versions installed with Oracle Big Data Connectors 2.5:

Connector Version
Oracle SQL Connector for HDFS 2.3.0
Oracle Loader for Hadoop 2.3.1
Oracle Data Integrator Application Adapter for Hadoop 11.1.1.7.0
Oracle XQuery for Hadoop 2.4.1
Oracle R Advanced Analytics for Hadoop 2.3.1

Changes in Oracle Big Data Connectors Release 2 (2.4)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors release 2 (2.4).

The following table shows the software versions installed with Oracle Big Data Connectors 2.4:

Connector Version
Oracle SQL Connector for HDFS 2.3.0
Oracle Loader for Hadoop 2.3.1
Oracle Data Integrator Application Adapter for Hadoop 11.1.1.7.0
Oracle XQuery for Hadoop 2.4.0
Oracle R Advanced Analytics for Hadoop 2.3.1

New Features

The following features are new in this release:

  • Oracle XQuery for Hadoop

    • JSON File Adapter: The JSON file adapter replaces the JSON module. The adapter provides additional support for processing JSON files in parallel. A new built-in function named json:collection-jsonxml reads JSON files stored in HDFS.

      See "JSON File Adapter"

    • Text File Adapter: A new text:collection function enables you to specify a custom delimiter as the $delimiter parameter.

      See "Text File Adapter"

Changes in Oracle Big Data Connectors Release 2 (2.3)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors release 2 (2.3).

The name Oracle R Connector for Hadoop changed to Oracle R Advanced Analytics for Hadoop.

New Features

The following features are new in this release:

  • Oracle XQuery for Hadoop

    Oracle XQuery for Hadoop is a transformation engine for semi-structured data stored in Apache Hadoop. It runs transformations expressed in the XQuery language by translating them into a series of MapReduce jobs, which are executed in parallel on the Hadoop cluster.

    See Part III, "Oracle XQuery for Hadoop"

  • Oracle SQL Connector for HDFS

  • Oracle Loader for Hadoop

  • Oracle R Advanced Analytics for Hadoop

    For data stored in HDFS:

    • All delimiters are supported.

    • Key and value delimiters can be different.

    New and revised functions:

    • hadoop.jobs provides a list of active jobs with their attributes.

    • hadoop.runJobs returns the total number of job attempts in the current R session.

    • hsdf.to.RData and hdfs.from.RData enable you to convert data between delimited text files and the Rdata binary format. The read performance of Rdata-format files is close to what Java programs can achieve on Hadoop.

    • orch.debug supports local debugging of R code that creates mappers and reducers.

    • orch.getXlevels and hdfs.attach support creating factor variables from nonnumeric columns in the input data.

    • orch.glm provides functional capabilities similar to the R glm function.

    • orch.lm supports categorical predictors, and anova, vcov, and other methods.

    • orch.pack has improved performance for binary and compressed object exchanges between mappers and reducers.

    See the online Help for the syntax of these functions.

Other Changes

The following are additional changes in the release:

Changes in Oracle Big Data Connectors Release 2 (2.2)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 2 (2.2).

New Features

The following features are new in this release.

  • Oracle SQL Connector for Hadoop Distributed File System

    Supports the Apache Hive decimal data type.

  • Oracle Loader for Hadoop

    Supports Hive 0.10.0.

Deprecated Features

The following features are deprecated in this release, and may be desupported in a future release

  • Oracle Loader for Hadoop

    • oracle.hadoop.loader.libjars

    • oracle.hadoop.loader.sharedlibs

Other Changes

The following are additional changes in the release:

  • Oracle Loader for Hadoop

    The file names of the two kits in the installation zip archive have changed to the following format:

    • oraloader-version-h1.x86_64.zip for CDH4

    • oraloader-version-h2.x86_64.zip for Apache Hadoop 0.20.2 and CDH3

Changes in Oracle Big Data Connectors Release 2 (2.0)

The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 2 (2.0).

New Features

Oracle Big Data Connectors support Cloudera's Distribution including Apache Hadoop version 4 (CDH4). For other supported platforms, see the individual connectors in Chapter 1.

The name of Oracle Direct Connector for Hadoop Distributed File System changed to Oracle SQL Connector for Hadoop Distributed File System.

  • Oracle SQL Connector for Hadoop Distributed File System

    • Automatic creation of Oracle Database external tables from Hive tables, Data Pump files, or delimited text files.

    • Management of location files.

    See Chapter 2.

  • Oracle Loader for Hadoop

    • Support for Sockets Direct Protocol (SDP) for direct path loads

    • Support for secondary sort on user-specified columns

    • New input formats for regular expressions and Oracle NoSQL Database. The Avro record InputFormat is supported code instead of sample code.

    • Simplified date format specification

    • New reject limit threshold

    • Improved job reporting and diagnostics

    See Chapter 3.

  • Oracle R Advanced Analytics for Hadoop

    Several analytic algorithms are now available: linear regression, neural networks for prediction, matrix completion using low rank matrix factorization, clustering, and nonnegative matrix factorization.

    Oracle R Advanced Analytics for Hadoop supports Hive data sources in addition to HDFS files.

    Oracle R Advanced Analytics for Hadoop can move data between HDFS and Oracle Database. Oracle R Enterprise is not required for this basic transfer of data.

    The following functions are new in this release:

    as.ore.*
    hadoop.jobs
    hdfs.head
    hdfs.tail
    is.ore.*
    orch.connected
    orch.dbg.lasterr
    orch.evaluate
    orch.export.fit
    orch.lm
    orch.lmf
    orch.neural
    orch.nmf
    orch.nmf.NMFalgo
    orch.temp.path
    ore.*
    predict.orch.lm
    print.summary.orch.lm
    summary.orch.lm
    

    See Chapter 8.

Deprecated Features

The following features are deprecated in this release, and may be desupported in a future release:

  • Oracle SQL Connector for Hadoop Distributed File System

    • Location file format (version 1): Existing external tables with content published using Oracle Direct Connector for HDFS version 1 must be republished using Oracle SQL Connector for HDFS version 2, because of incompatible changes to the location file format.

      When Oracle SQL Connector for HDFS creates new location files, it does not delete the old location files.

      See Chapter 2.

    • oracle.hadoop.hdfs.exttab namespace (version 1): Oracle SQL Connector for HDFS uses the following new namespaces for all configuration properties:

      • oracle.hadoop.connection: Oracle Database connection and wallet properties

      • oracle.hadoop.exttab: All other properties

      See Chapter 2.

    • HDFS_BIN_PATH directory: The preprocessor directory name is now OSCH_BIN_PATH.

      See "Oracle SQL Connector for Hadoop Distributed File System Setup."

  • Oracle R Advanced Analytics for Hadoop

    • keyval: Use orch.keyval to generate key-value pairs.

    • orch.reconnect: Use orch.connect to reconnect using a connection object returned by orch.dbcon.

Desupported Features

The following features are no longer supported by Oracle.

  • Oracle Loader for Hadoop

    • oracle.hadoop.loader.configuredCounters

      See Chapter 3.

Other Changes

The following are additional changes in the release:

  • Oracle Loader for Hadoop

    The installation zip archive now contains two kits:

    • oraloader-2.0.0-1.x86_64.zip for Apache Hadoop 0.20.2 and CDH3

    • oraloader-2.0.0-2.x86_64.zip for CDH4

    See "Oracle Loader for Hadoop Setup."