Go to main content
1/19
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Text Conventions
Syntax Conventions
Changes in Oracle Big Data Connectors Release 4 (4.8)
Change History for Previous Releases
Changes in Oracle Big Data Connectors Release 4 (4.7)
Changes in Oracle Big Data Connectors Release 4 (4.6)
Changes in Oracle Big Data Connectors Release 4 (4.5)
Changes in Oracle Big Data Connectors Release 4 (4.4)
Changes in Oracle Big Data Connectors Release 4 (4.3)
Changes in Oracle Big Data Connectors Release 4 (4.2)
Changes in Oracle Big Data Connectors Release 4 (4.1)
Changes in Oracle Big Data Connectors Release 4 (4.0)
Part I Setup
1
Getting Started with Oracle Big Data Connectors
1.1
About Oracle Big Data Connectors
1.2
Big Data Concepts and Technologies
1.2.1
What is MapReduce?
1.2.2
What is Apache Hadoop?
1.3
Downloading the Oracle Big Data Connectors Software
1.4
Oracle SQL Connector for Hadoop Distributed File System Setup
1.4.1
Software Requirements
1.4.2
Installing and Configuring a Hadoop Client on the Oracle Database System
1.4.3
Installing Oracle SQL Connector for HDFS
1.4.4
Granting User Privileges in Oracle Database
1.4.5
Setting Up User Accounts on the Oracle Database System
1.4.6
Using Oracle SQL Connector for HDFS on a Secure Hadoop Cluster
1.5
Oracle Loader for Hadoop Setup
1.5.1
Software Requirements
1.5.2
Installing Oracle Loader for Hadoop
1.5.3
Providing Support for Offline Database Mode
1.5.4
Using Oracle Loader for Hadoop on a Secure Hadoop Cluster
1.6
Oracle Shell for Hadoop Loaders Setup
1.7
Oracle XQuery for Hadoop Setup
1.7.1
Software Requirements
1.7.2
Installing Oracle XQuery for Hadoop
1.7.3
Troubleshooting the File Paths
1.7.4
Configuring Oozie for the Oracle XQuery for Hadoop Action
1.8
Oracle R Advanced Analytics for Hadoop Setup
1.8.1
Installing the Software on Hadoop
1.8.1.1
Software Requirements for a Third-Party Hadoop Cluster
1.8.1.2
Installing Sqoop on a Third-Party Hadoop Cluster
1.8.1.3
Installing Hive on a Third-Party Hadoop Cluster
1.8.1.4
Installing R on a Hadoop Client
1.8.1.5
Installing R on a Third-Party Hadoop Cluster
1.8.1.6
Installing the ORCH Package on a Third-Party Hadoop Cluster
1.8.2
Installing Additional R Packages
1.8.3
Providing Remote Client Access to R Users
1.8.3.1
Software Requirements for Remote Client Access
1.8.3.2
Configuring the Server as a Hadoop Client
1.8.3.3
Installing Sqoop on a Hadoop Client
1.8.3.4
Installing R on a Hadoop Client
1.8.3.5
Installing the ORCH Package on a Hadoop Client
1.8.3.6
Installing the Oracle R Enterprise Client Packages (Optional)
1.9
Oracle Data Integrator
1.10
Oracle Datasource for Apache Hadoop Setup
Part II Oracle Database Connectors
2
Oracle SQL Connector for Hadoop Distributed File System
2.1
About Oracle SQL Connector for HDFS
2.2
Getting Started With Oracle SQL Connector for HDFS
2.3
Configuring Your System for Oracle SQL Connector for HDFS
2.4
Using Oracle SQL Connector for HDFS with Oracle Big Data Appliance and Oracle Exadata
2.5
Using the ExternalTable Command-Line Tool
2.5.1
About ExternalTable
2.5.2
ExternalTable Command-Line Tool Syntax
2.6
Creating External Tables
2.6.1
Creating External Tables with the ExternalTable Tool
2.6.2
Creating External Tables from Data Pump Format Files
2.6.2.1
Required Properties
2.6.2.2
Optional Properties
2.6.2.3
Defining Properties in XML Files for Data Pump Format Files
2.6.2.4
Example
2.6.3
Creating External Tables from Hive Tables
2.6.3.1
Hive Table Requirements
2.6.3.2
Data Type Mappings
2.6.3.3
Required Properties
2.6.3.4
Optional Properties
2.6.3.5
Defining Properties in XML Files for Hive Tables
2.6.3.6
Example
2.6.3.7
Creating External Tables from Partitioned Hive Tables
2.6.3.7.1
Database Objects that Support Access to Partitioned Hive Tables
2.6.3.7.2
Querying the Metadata Table
2.6.3.7.3
Creating UNION ALL Views for Querying
2.6.3.7.4
Error Messages
2.6.3.7.5
Dropping Dangling Objects
2.6.4
Creating External Tables from Delimited Text Files
2.6.4.1
Data Type Mappings
2.6.4.2
Required Properties
2.6.4.3
Optional Properties
2.6.4.4
Defining Properties in XML Files for Delimited Text Files
2.6.4.5
Example
2.6.5
Creating External Tables in SQL
2.7
Publishing the HDFS Data Paths
2.7.1
ExternalTable Syntax for Publish
2.7.2
ExternalTable Example for Publish
2.8
Exploring External Tables and Location Files
2.8.1
ExternalTable Syntax for Describe
2.8.2
ExternalTable Example for Describe
2.9
Dropping Database Objects Created by Oracle SQL Connector for HDFS
2.9.1
ExternalTable Syntax for Drop
2.9.2
ExternalTable Example for Drop
2.10
More About External Tables Generated by the ExternalTable Tool
2.10.1
About Configurable Column Mappings
2.10.1.1
Default Column Mappings
2.10.1.2
All Column Overrides
2.10.1.3
One Column Overrides
2.10.1.4
Mapping Override Examples
2.10.2
What Are Location Files?
2.10.3
Enabling Parallel Processing
2.10.3.1
Setting Up the Degree of Parallelism
2.10.4
Location File Management
2.10.5
Location File Names
2.11
Configuring Oracle SQL Connector for HDFS
2.11.1
Creating a Configuration File
2.11.2
Oracle SQL Connector for HDFS Configuration Property Reference
2.12
Performance Tips for Querying Data in HDFS
3
Oracle Loader for Hadoop
3.1
What Is Oracle Loader for Hadoop?
3.2
About the Modes of Operation
3.2.1
Online Database Mode
3.2.2
Offline Database Mode
3.3
Getting Started With Oracle Loader for Hadoop
3.4
Creating the Target Table
3.4.1
Supported Data Types for Target Tables
3.4.2
Supported Partitioning Strategies for Target Tables
3.4.3
Compression
3.5
Creating a Job Configuration File
3.6
About the Target Table Metadata
3.6.1
Providing the Connection Details for Online Database Mode
3.6.2
Generating the Target Table Metadata for Offline Database Mode
3.6.2.1
OraLoaderMetadata Utility
3.7
About Input Formats
3.7.1
Delimited Text Input Format
3.7.1.1
About DelimitedTextInputFormat
3.7.1.2
Required Configuration Properties
3.7.1.3
Optional Configuration Properties
3.7.2
Complex Text Input Formats
3.7.2.1
About RegexInputFormat
3.7.2.2
Required Configuration Properties
3.7.2.3
Optional Configuration Properties
3.7.3
Hive Table Input Format
3.7.3.1
About HiveToAvroInputFormat
3.7.3.2
Required Configuration Properties
3.7.3.3
Optional Configuration Properties
3.7.4
Avro Input Format
3.7.4.1
Configuration Properties
3.7.5
Oracle NoSQL Database Input Format
3.7.5.1
About KVAvroInputFormat
3.7.5.2
Required Configuration Properties
3.7.6
Custom Input Formats
3.7.6.1
About Implementing a Custom Input Format
3.7.6.2
About Error Handling
3.7.6.3
Supporting Data Sampling
3.7.6.4
InputFormat Source Code Example
3.8
Mapping Input Fields to Target Table Columns
3.8.1
Automatic Mapping
3.8.2
Manual Mapping
3.8.3
Converting a Loader Map File
3.9
About Output Formats
3.9.1
JDBC Output Format
3.9.1.1
About JDBCOutputFormat
3.9.1.2
Configuration Properties
3.9.2
Oracle OCI Direct Path Output Format
3.9.2.1
About OCIOutputFormat
3.9.2.2
Configuration Properties
3.9.3
Delimited Text Output Format
3.9.3.1
About DelimitedTextOutputFormat
3.9.3.2
Configuration Properties
3.9.4
Oracle Data Pump Output Format
3.9.4.1
About DataPumpOutputFormat
3.10
Running a Loader Job
3.10.1
Specifying Hive Input Format JAR Files
3.10.2
Specifying Oracle NoSQL Database Input Format JAR Files
3.10.3
Job Reporting
3.11
Handling Rejected Records
3.11.1
Logging Rejected Records in Bad Files
3.11.2
Setting a Job Reject Limit
3.12
Balancing Loads When Loading Data into Partitioned Tables
3.12.1
Using the Sampling Feature
3.12.2
Tuning Load Balancing
3.12.3
Tuning Sampling Behavior
3.12.4
When Does Oracle Loader for Hadoop Use the Sampler's Partitioning Scheme?
3.12.5
Resolving Memory Issues
3.12.6
What Happens When a Sampling Feature Property Has an Invalid Value?
3.13
Optimizing Communications Between Oracle Engineered Systems
3.14
Oracle Loader for Hadoop Configuration Property Reference
3.15
Third-Party Licenses for Bundled Software
3.15.1
Apache Licensed Code
3.15.2
Apache License
3.15.2.1
Apache Avro 1.8.1
3.15.2.2
Apache Commons Mathematics Library 2.2
4
Ease of Use Tools for Oracle Big Data Connectors
4.1
Introducing Oracle Shell for Hadoop Loaders
4.1.1
Third-Party Licenses for Bundled Software
4.1.1.1
Apache Commons Exec 1.3
4.1.1.2
Apache License
4.1.1.3
ANTLR 4.5.3
Part III Oracle XQuery for Hadoop
5
Using Oracle XQuery for Hadoop
5.1
What Is Oracle XQuery for Hadoop?
5.2
Getting Started With Oracle XQuery for Hadoop
5.2.1
Basic Steps
5.2.2
Example: Hello World!
5.3
About the Oracle XQuery for Hadoop Functions
5.3.1
About the Adapters
5.3.2
About Other Modules for Use With Oracle XQuery for Hadoop
5.4
Creating an XQuery Transformation
5.4.1
XQuery Transformation Requirements
5.4.2
About XQuery Language Support
5.4.3
Accessing Data in the Hadoop Distributed Cache
5.4.4
Calling Custom Java Functions from XQuery
5.4.5
Accessing User-Defined XQuery Library Modules and XML Schemas
5.4.6
XQuery Transformation Examples
5.5
Running Queries
5.5.1
Oracle XQuery for Hadoop Options
5.5.2
Generic Options
5.5.3
About Running Queries Locally
5.6
Running Queries from Apache Oozie
5.6.1
Getting Started Using the Oracle XQuery for Hadoop Oozie Action
5.6.2
Supported XML Elements
5.6.3
Example: Hello World
5.7
Oracle XQuery for Hadoop Configuration Properties
5.8
Third-Party Licenses for Bundled Software
5.8.1
Apache Licensed Code
5.8.2
Apache License
5.8.3
ANTLR 3.2
5.8.4
Apache Ant 1.7.1
5.8.5
Apache Xerces 2.11
5.8.6
Woodstox XML Parser 4.2.0
6
Oracle XQuery for Hadoop Reference
6.1
Avro File Adapter
6.1.1
Built-in Functions for Reading Avro Files
6.1.1.1
avro:collection-avroxml
6.1.1.2
avro:get
6.1.2
Custom Functions for Reading Avro Container Files
6.1.3
Custom Functions for Writing Avro Files
6.1.4
Examples of Avro File Adapter Functions
6.1.5
About Converting Values Between Avro and XML
6.1.5.1
Reading Avro as XML
6.1.5.1.1
Reading Records
6.1.5.1.2
Reading Maps
6.1.5.1.3
Reading Arrays
6.1.5.1.4
Reading Unions
6.1.5.1.5
Reading Primitives
6.1.5.2
Writing XML as Avro
6.1.5.2.1
Writing Records
6.1.5.2.2
Writing Maps
6.1.5.2.3
Writing Arrays
6.1.5.2.4
Writing Unions
6.1.5.2.5
Writing Primitives
6.2
JSON File Adapter
6.2.1
Built-in Functions for Reading JSON
6.2.1.1
json:collection-jsonxml
6.2.1.2
json:parse-as-xml
6.2.1.3
json:get
6.2.2
Custom Functions for Reading JSON Files
6.2.3
Examples of JSON Functions
6.2.4
JSON File Adapter Configuration Properties
6.2.5
About Converting JSON Data Formats to XML
6.2.5.1
About Converting JSON Objects to XML
6.2.5.2
About Converting JSON Arrays to XML
6.2.5.3
About Converting Other JSON Types
6.3
Oracle Database Adapter
6.3.1
Custom Functions for Writing to Oracle Database
6.3.2
Examples of Oracle Database Adapter Functions
6.3.3
Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations
6.4
Oracle NoSQL Database Adapter
6.4.1
Prerequisites for Using the Oracle NoSQL Database Adapter
6.4.2
Built-in Functions for Reading from and Writing to Oracle NoSQL Database
6.4.2.1
kv:collection-text
6.4.2.2
kv:collection-avroxml
6.4.2.3
kv:collection-xml
6.4.2.4
kv:collection-binxml
6.4.2.5
kv:collection-tika
6.4.2.6
kv:put-text
6.4.2.7
kv:put-xml
6.4.2.8
kv:put-binxml
6.4.2.9
kv:get-text
6.4.2.10
kv:get-avroxml
6.4.2.11
kv:get-xml
6.4.2.12
kv:get-binxml
6.4.2.13
kv:get-tika
6.4.2.14
kv:key-range
6.4.2.15
kv:key-range
6.4.3
Built-in Functions for Reading from and Writing to Oracle NoSQL Database using Table API
6.4.3.1
kv-table:collection-jsontext
6.4.3.2
kv-table:get-jsontext
6.4.3.3
kv-table:put-jsontext
6.4.4
Built-in Functions for Reading from and Writing to Oracle NoSQL Database using Large Object API
6.4.4.1
kv-lob:get-text
6.4.4.2
kv-lob:get-xml
6.4.4.3
kv-lob:get-binxml
6.4.4.4
kv-lob:get-tika
6.4.4.5
kv-lob:put-text
6.4.4.6
kv-lob:put-xml
6.4.4.7
kv-lob:put-binxml
6.4.5
Custom Functions for Reading Values from Oracle NoSQL Database
6.4.6
Custom Functions for Retrieving Single Values from Oracle NoSQL Database
6.4.7
Custom Functions for Reading Values from Oracle NoSQL Database using Table API
6.4.8
Custom Functions for Reading Single Row from Oracle NoSQL Database using Table API
6.4.9
Custom Functions for Retrieving Single Values from Oracle NoSQL Database using Large Object API
6.4.10
Custom Functions for Writing to Oracle NoSQL Database
6.4.11
Custom Functions for Writing Values to Oracle NoSQL Database using Table API
6.4.12
Custom Functions for Writing Values to Oracle NoSQL Database using Large Object API
6.4.13
Examples of Oracle NoSQL Database Adapter Functions
6.4.14
Oracle NoSQL Database Adapter Configuration Properties
6.5
Sequence File Adapter
6.5.1
Built-in Functions for Reading and Writing Sequence Files
6.5.1.1
seq:collection
6.5.1.2
seq:collection-xml
6.5.1.3
seq:collection-binxml
6.5.1.4
seq:collection-tika
6.5.1.5
seq:put
6.5.1.6
seq:put-xml
6.5.1.7
seq:put-binxml
6.5.2
Custom Functions for Reading Sequence Files
6.5.3
Custom Functions for Writing Sequence Files
6.5.4
Examples of Sequence File Adapter Functions
6.6
Solr Adapter
6.6.1
Prerequisites for Using the Solr Adapter
6.6.1.1
Configuration Settings
6.6.1.2
Example Query Using the Solr Adapter
6.6.2
Built-in Functions for Loading Data into Solr Servers
6.6.2.1
solr:put
6.6.3
Custom Functions for Loading Data into Solr Servers
6.6.4
Examples of Solr Adapter Functions
6.6.5
Solr Adapter Configuration Properties
6.7
Text File Adapter
6.7.1
Built-in Functions for Reading and Writing Text Files
6.7.1.1
text:collection
6.7.1.2
text:collection-xml
6.7.1.3
text:put
6.7.1.4
text:put-xml
6.7.1.5
text:trace
6.7.2
Custom Functions for Reading Text Files
6.7.3
Custom Functions for Writing Text Files
6.7.4
Examples of Text File Adapter Functions
6.8
Tika File Adapter
6.8.1
Built-in Library Functions for Parsing Files with Tika
6.8.1.1
tika:collection
6.8.1.2
tika:parse
6.8.2
Custom Functions for Parsing Files with Tika
6.8.3
Tika Parser Output Format
6.8.4
Tika Adapter Configuration Properties
6.8.5
Examples of Tika File Adapter Functions
6.9
XML File Adapter
6.9.1
Built-in Functions for Reading XML Files
6.9.1.1
xmlf:collection (Single Task)
6.9.1.2
xmlf:collection-multipart (Single Task)
6.9.1.3
xmlf:collection (Multiple Tasks)
6.9.2
Custom Functions for Reading XML Files
6.9.3
Examples of XML File Adapter Functions
6.10
Utility Module
6.10.1
Oracle XQuery Functions for Duration, Date, and Time
6.10.1.1
ora-fn:date-from-string-with-format
6.10.1.2
ora-fn:date-to-string-with-format
6.10.1.3
ora-fn:dateTime-from-string-with-format
6.10.1.4
ora-fn:dateTime-to-string-with-format
6.10.1.5
ora-fn:time-from-string-with-format
6.10.1.6
ora-fn:time-to-string-with-format
6.10.1.7
Format Argument
6.10.1.8
Locale Argument
6.10.2
Oracle XQuery Functions for Strings
6.10.2.1
ora-fn:pad-left
6.10.2.2
ora-fn:pad-right
6.10.2.3
ora-fn:trim
6.10.2.4
ora-fn:trim-left
6.10.2.5
ora-fn:trim-right
6.11
Hadoop Module
6.11.1
Built-in Functions for Using Hadoop
6.11.1.1
oxh:find
6.11.1.2
oxh:increment-counter
6.11.1.3
oxh:println
6.11.1.4
oxh:println-xml
6.11.1.5
oxh:property
6.12
Serialization Annotations
7
Oracle XML Extensions for Hive
7.1
What are the XML Extensions for Hive?
7.2
Using the Hive Extensions
7.3
About the Hive Functions
7.4
Creating XML Tables
7.4.1
Hive CREATE TABLE Syntax for XML Tables
7.4.2
CREATE TABLE Configuration Properties
7.4.3
CREATE TABLE Examples
7.4.3.1
Syntax Example
7.4.3.2
Simple Examples
7.4.3.3
OpenStreetMap Examples
7.1
Oracle XML Functions for Hive Reference
7.1.1
Data Type Conversions
7.1.2
Hive Access to External Files
7.2
Online Documentation of Functions
7.3
xml_exists
7.4
xml_query
7.5
xml_query_as_
primitive
7.6
xml_table
Part IV Oracle R Advanced Analytics for Hadoop
8
Using Oracle R Advanced Analytics for Hadoop
8.1
About Oracle R Advanced Analytics for Hadoop
8.1.1
Oracle R Advanced Analytics for Hadoop Architecture
8.1.2
Oracle R Advanced Analytics for Hadoop packages and functions
8.1.3
Oracle R Advanced Analytics for Hadoop APIs
8.1.4
Inputs to Oracle R Advanced Analytics for Hadoop
8.2
Access to HDFS Files
8.3
Access to Apache Hive
8.3.1
ORCH Functions for Hive
8.3.2
ORE Functions for Hive
8.3.3
Generic R Functions Supported in Hive
8.3.4
Support for Hive Data Types
8.3.5
Usage Notes for Hive Access
8.3.6
Example: Loading Hive Tables into Oracle R Advanced Analytics for Hadoop
8.4
Access to Oracle Database
8.4.1
Usage Notes for Oracle Database Access
8.4.2
Scenario for Using Oracle R Advanced Analytics for Hadoop with Oracle R Enterprise
8.5
Oracle R Advanced Analytics for Hadoop Functions
8.5.1
Native Analytical Functions
8.5.2
Using the Hadoop Distributed File System (HDFS)
8.5.3
Using Apache Hive
8.5.4
Using Aggregate Functions in Hive
8.5.5
Making Database Connections
8.5.6
Copying Data and Working with HDFS Files
8.5.7
Converting to R Data Types
8.5.8
Using MapReduce
8.5.9
Debugging Scripts
8.6
Demos of Oracle R Advanced Analytics for Hadoop Functions
8.7
Security Notes for Oracle R Advanced Analytics for Hadoop
Part V Oracle DataSource for Apache Hadoop
9
Oracle DataSource for Apache Hadoop (OD4H)
9.1
Operational Data, Big Data and Requirements
9.2
Overview of Oracle DataSource for Apache Hadoop (OD4H)
9.2.1
Opportunity with Hadoop 2.x
9.2.2
Oracle Tables as Hadoop Data Source
9.2.3
External Tables
9.2.3.1
TBLPROPERTIES
9.2.3.2
SERDE PROPERTIES
9.2.4
List of jars in the OD4H package
9.3
How does OD4H work?
9.3.1
Create a new Oracle Database Table or Reuse an Existing Table
9.3.2
Hive DDL
9.3.3
Creating External Tables in Hive
9.4
Features of OD4H
9.4.1
Performance And Scalability Features
9.4.1.1
Splitters
9.4.1.2
Choosing a Splitter
9.4.1.3
Predicate Pushdown
9.4.1.4
Projection Pushdown
9.4.1.5
Partition Pruning
9.4.2
Smart Connection Management
9.4.3
Security Features
9.4.3.1
Improved Authentication
9.5
Using HiveQL with OD4H
9.6
Using Spark SQL with OD4H
9.7
Writing Back to Oracle Database
A
Additional Big Data Connector Resources
Index
Scripting on this page enhances content navigation, but does not change the content in any way.