This preface contains:
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors release 2 (2.4).
The following table shows the software versions installed with Oracle Big Data Connectors 2.4:
Connector | Version |
---|---|
Oracle SQL Connector for HDFS | 2.3.0 |
Oracle Loader for Hadoop | 2.3.1 |
Oracle Data Integrator Application Adapter for Hadoop | 11.1.1.7.0 |
Oracle XQuery for Hadoop | 2.4.0 |
Oracle R Advanced Analytics for Hadoop | 2.3.1 |
The following features are new in this release:
Oracle XQuery for Hadoop
JSON File Adapter: The JSON file adapter replaces the JSON module. The adapter provides additional support for processing JSON files in parallel. A new built-in function named json:collection-jsonxml
reads JSON files stored in HDFS.
Text File Adapter: A new text:collection
function enables you to specify a custom delimiter as the $delimiter
parameter.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors release 2 (2.3).
The name Oracle R Connector for Hadoop changed to Oracle R Advanced Analytics for Hadoop.
The following features are new in this release:
Oracle XQuery for Hadoop
Oracle XQuery for Hadoop is a transformation engine for semi-structured data stored in Apache Hadoop. It runs transformations expressed in the XQuery language by translating them into a series of MapReduce jobs, which are executed in parallel on the Hadoop cluster.
Oracle SQL Connector for HDFS
Oracle Database 12c is supported.
Oracle SQL Connector for HDFS enables users to map Hive and text source columns to inline Oracle CLOB
types.
See "oracle.hadoop.exttab.colMap.columnLength." and "oracle.hadoop.exttab.colMap.columnType."
By setting various configuration properties, you can override the default data type mappings used to create columns in an Oracle external table with the appropriate data types for the Hive and text sources.
Specifying the --output
option with --noexecute
in the hadoop
command directs the report to a file instead of the screen.
Oracle Loader for Hadoop
Oracle Database 12c is supported.
Configuration properties identify which input fields are loaded into specific columns of the target table, when automatic mapping is not possible.
A utility is provided to convert loader map files into configuration files.
Oracle R Advanced Analytics for Hadoop
For data stored in HDFS:
All delimiters are supported.
Key and value delimiters can be different.
New and revised functions:
hadoop.jobs
provides a list of active jobs with their attributes.
hadoop.runJobs
returns the total number of job attempts in the current R session.
hsdf.to.RData
and hdfs.from.RData
enable you to convert data between delimited text files and the Rdata binary format. The read performance of Rdata-format files is close to what Java programs can achieve on Hadoop.
orch.debug
supports local debugging of R code that creates mappers and reducers.
orch.getXlevels
and hdfs.attach
support creating factor variables from nonnumeric columns in the input data.
orch.glm
provides functional capabilities similar to the R glm
function.
orch.lm
supports categorical predictors, and anova
, vcov
, and other methods.
orch.pack
has improved performance for binary and compressed object exchanges between mappers and reducers.
See the online Help for the syntax of these functions.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 2 (2.2).
The following features are new in this release.
Oracle SQL Connector for Hadoop Distributed File System
Supports the Apache Hive decimal data type.
Oracle Loader for Hadoop
Supports Hive 0.10.0.
The following are changes in Oracle Big Data Connectors User's Guide for Oracle Big Data Connectors Release 2 (2.0).
Oracle Big Data Connectors support Cloudera's Distribution including Apache Hadoop version 4 (CDH4). For other supported platforms, see the individual connectors in Chapter 1.
The name of Oracle Direct Connector for Hadoop Distributed File System changed to Oracle SQL Connector for Hadoop Distributed File System.
Oracle SQL Connector for Hadoop Distributed File System
Automatic creation of Oracle Database external tables from Hive tables, Data Pump files, or delimited text files.
Management of location files.
See Chapter 2.
Oracle Loader for Hadoop
Support for Sockets Direct Protocol (SDP) for direct path loads
Support for secondary sort on user-specified columns
New input formats for regular expressions and Oracle NoSQL Database. The Avro record InputFormat is supported code instead of sample code.
Simplified date format specification
New reject limit threshold
Improved job reporting and diagnostics
See Chapter 3.
Oracle R Advanced Analytics for Hadoop
Several analytic algorithms are now available: linear regression, neural networks for prediction, matrix completion using low rank matrix factorization, clustering, and nonnegative matrix factorization.
Oracle R Advanced Analytics for Hadoop supports Hive data sources in addition to HDFS files.
Oracle R Advanced Analytics for Hadoop can move data between HDFS and Oracle Database. Oracle R Enterprise is not required for this basic transfer of data.
The following functions are new in this release:
as.ore.* hadoop.jobs hdfs.head hdfs.tail is.ore.* orch.connected orch.dbg.lasterr orch.evaluate orch.export.fit orch.lm orch.lmf orch.neural orch.nmf orch.nmf.NMFalgo orch.temp.path ore.* predict.orch.lm print.summary.orch.lm summary.orch.lm
See Chapter 8.
The following features are deprecated in this release, and may be desupported in a future release:
Oracle SQL Connector for Hadoop Distributed File System
Location file format (version 1): Existing external tables with content published using Oracle Direct Connector for HDFS version 1 must be republished using Oracle SQL Connector for HDFS version 2, because of incompatible changes to the location file format.
When Oracle SQL Connector for HDFS creates new location files, it does not delete the old location files.
See Chapter 2.
oracle.hadoop.hdfs.exttab
namespace (version 1): Oracle SQL Connector for HDFS uses the following new namespaces for all configuration properties:
oracle.hadoop.connection
: Oracle Database connection and wallet properties
oracle.hadoop.exttab
: All other properties
See Chapter 2.
HDFS_BIN_PATH directory: The preprocessor directory name is now OSCH_BIN_PATH.
See "Oracle SQL Connector for Hadoop Distributed File System Setup."
Oracle R Advanced Analytics for Hadoop
keyval
: Use orch.keyval
to generate key-value pairs.
orch.reconnect
: Use orch.connect
to reconnect using a connection object returned by orch.dbcon
.
The following features are no longer supported by Oracle.
Oracle Loader for Hadoop
oracle.hadoop.loader.configuredCounters
See Chapter 3.