1/79
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Text Conventions
Syntax Conventions
Changes in This Release for Oracle Big Data Connectors User's Guide
Changes in Oracle Big Data Connectors Release 2 (2.3)
Changes in Oracle Big Data Connectors Release 2 (2.2)
Changes in Oracle Big Data Connectors Release 2 (2.0)
Part I Setup
1
Getting Started with Oracle Big Data Connectors
1.1
About Oracle Big Data Connectors
1.2
Big Data Concepts and Technologies
1.2.1
What is MapReduce?
1.2.2
What is Apache Hadoop?
1.3
Downloading the Oracle Big Data Connectors Software
1.4
Oracle SQL Connector for Hadoop Distributed File System Setup
1.4.1
Software Requirements
1.4.2
Installing and Configuring a Hadoop Client on the Oracle Database System
1.4.3
Installing Oracle SQL Connector for HDFS
1.4.4
Providing Support for Hive Tables
1.4.5
Granting User Privileges in Oracle Database
1.4.6
Setting Up User Accounts on the Oracle Database System
1.4.7
Using Oracle SQL Connector for HDFS on a Secure Hadoop Cluster
1.5
Oracle Loader for Hadoop Setup
1.5.1
Software Requirements
1.5.2
Installing Oracle Loader for Hadoop
1.5.3
Providing Support for Offline Database Mode
1.5.4
Using Oracle Loader for Hadoop on a Secure Hadoop Cluster
1.6
Oracle Data Integrator Application Adapter for Hadoop Setup
1.6.1
System Requirements and Certifications
1.6.2
Technology-Specific Requirements
1.6.3
Location of Oracle Data Integrator Application Adapter for Hadoop
1.6.4
Setting Up the Topology
1.7
Oracle XQuery for Hadoop Setup
1.7.1
Software Requirements
1.7.2
Installing Oracle XQuery for Hadoop
1.7.3
Troubleshooting the File Paths
1.8
Oracle R Advanced Analytics for Hadoop Setup
1.8.1
Installing the Software on Hadoop
1.8.1.1
Software Requirements for a Third-Party Hadoop Cluster
1.8.1.2
Installing Sqoop on a Hadoop Cluster
1.8.1.3
Installing Hive on a Hadoop Cluster
1.8.1.4
Installing R on a Hadoop Cluster
1.8.1.5
Installing the ORCH Package on a Hadoop Cluster
1.8.2
Installing Additional R Packages
1.8.3
Providing Remote Client Access to R Users
1.8.3.1
Software Requirements for Remote Client Access
1.8.3.2
Configuring the Server as a Hadoop Client
1.8.3.3
Installing Sqoop on a Hadoop Client
1.8.3.4
Installing R on a Hadoop Client
1.8.3.5
Installing the ORCH Package on a Hadoop Client
1.8.3.6
Installing the Oracle R Enterprise Client Packages (Optional)
Part II Oracle Database Connectors
2
Oracle SQL Connector for Hadoop Distributed File System
2.1
About Oracle SQL Connector for HDFS
2.2
Getting Started With Oracle SQL Connector for HDFS
2.3
Configuring Your System for Oracle SQL Connector for HDFS
2.4
Using the ExternalTable Command-Line Tool
2.4.1
About ExternalTable
2.4.2
ExternalTable Command-Line Tool Syntax
2.5
Creating External Tables
2.5.1
Creating External Tables with the ExternalTable Tool
2.5.2
Creating External Tables from Data Pump Format Files
2.5.2.1
Required Properties
2.5.2.2
Optional Properties
2.5.2.3
Defining Properties in XML Files for Data Pump Format Files
2.5.2.4
Example
2.5.3
Creating External Tables from Hive Tables
2.5.3.1
Hive Table Requirements
2.5.3.2
Data Type Mappings
2.5.3.3
Required Properties
2.5.3.4
Optional Properties
2.5.3.5
Defining Properties in XML Files for Hive Tables
2.5.3.6
Example
2.5.4
Creating External Tables from Delimited Text Files
2.5.4.1
Data Type Mappings
2.5.4.2
Required Properties
2.5.4.3
Optional Properties
2.5.4.4
Defining Properties in XML Files for Delimited Text Files
2.5.4.5
Example
2.5.5
Creating External Tables in SQL
2.6
Publishing the HDFS Data Paths
2.7
Listing Location File Metadata and Contents
2.8
Describing External Tables
2.9
More About External Tables Generated by the ExternalTable Tool
2.9.1
About Configurable Column Mappings
2.9.1.1
Default Column Mappings
2.9.1.2
All Column Overrides
2.9.1.3
One Column Overrides
2.9.1.4
Mapping Override Examples
2.9.2
What Are Location Files?
2.9.3
Enabling Parallel Processing
2.9.4
Location File Management
2.9.5
Location File Names
2.10
Configuring Oracle SQL Connector for HDFS
2.10.1
Creating a Configuration File
2.10.2
Oracle SQL Connector for HDFS Configuration Property Reference
2.11
Performance Tips for Querying Data in HDFS
3
Oracle Loader for Hadoop
3.1
What Is Oracle Loader for Hadoop?
3.2
About the Modes of Operation
3.2.1
Online Database Mode
3.2.2
Offline Database Mode
3.3
Getting Started With Oracle Loader for Hadoop
3.4
Creating the Target Table
3.4.1
Supported Data Types for Target Tables
3.4.2
Supported Partitioning Strategies for Target Tables
3.5
Creating a Job Configuration File
3.6
About the Target Table Metadata
3.6.1
Providing the Connection Details for Online Database Mode
3.6.2
Generating the Target Table Metadata for Offline Database Mode
3.6.2.1
OraLoaderMetadata Utility
3.7
About Input Formats
3.7.1
Delimited Text Input Format
3.7.1.1
About DelimitedTextInputFormat
3.7.1.2
Required Configuration Properties
3.7.1.3
Optional Configuration Properties
3.7.2
Complex Text Input Formats
3.7.2.1
About RegexInputFormat
3.7.2.2
Required Configuration Properties
3.7.2.3
Optional Configuration Properties
3.7.3
Hive Table Input Format
3.7.3.1
About HiveToAvroInputFormat
3.7.3.2
Required Configuration Properties
3.7.4
Avro Input Format
3.7.4.1
Configuration Properties
3.7.5
Oracle NoSQL Database Input Format
3.7.5.1
About KVAvroInputFormat
3.7.5.2
Required Configuration Properties
3.7.6
Custom Input Formats
3.7.6.1
About Implementing a Custom Input Format
3.7.6.2
About Error Handling
3.7.6.3
Supporting Data Sampling
3.7.6.4
InputFormat Source Code Example
3.8
Mapping Input Fields to Target Table Columns
3.8.1
Automatic Mapping
3.8.2
Manual Mapping
3.8.3
Converting a Loader Map File
3.9
About Output Formats
3.9.1
JDBC Output Format
3.9.1.1
About JDBCOutputFormat
3.9.1.2
Configuration Properties
3.9.2
Oracle OCI Direct Path Output Format
3.9.2.1
About OCIOutputFormat
3.9.2.2
Configuration Properties
3.9.3
Delimited Text Output Format
3.9.3.1
About DelimitedTextOutputFormat
3.9.3.2
Configuration Properties
3.9.4
Oracle Data Pump Output Format
3.9.4.1
About DataPumpOutputFormat
3.10
Running a Loader Job
3.10.1
Specifying Hive Input Format JAR Files
3.10.2
Specifying Oracle NoSQL Database Input Format JAR Files
3.10.3
Job Reporting
3.11
Handling Rejected Records
3.11.1
Logging Rejected Records in Bad Files
3.11.2
Setting a Job Reject Limit
3.12
Balancing Loads When Loading Data into Partitioned Tables
3.12.1
Using the Sampling Feature
3.12.2
Tuning Load Balancing
3.12.3
Tuning Sampling Behavior
3.12.4
When Does Oracle Loader for Hadoop Use the Sampler's Partitioning Scheme?
3.12.5
Resolving Memory Issues
3.12.6
What Happens When a Sampling Feature Property Has an Invalid Value?
3.13
Optimizing Communications Between Oracle Engineered Systems
3.14
Oracle Loader for Hadoop Configuration Property Reference
3.15
Third-Party Licenses for Bundled Software
3.15.1
Apache Licensed Code
3.15.2
Apache Avro 1.7.3
3.15.3
Apache Commons Mathematics Library 2.2
3.15.4
Jackson JSON 1.8.8
4
Oracle Data Integrator Application Adapter for Hadoop
4.1
Introduction
4.1.1
Concepts
4.1.2
Knowledge Modules
4.1.3
Security
4.2
Setting Up the Topology
4.2.1
Setting Up File Data Sources
4.2.2
Setting Up Hive Data Sources
4.2.3
Setting Up the Oracle Data Integrator Agent to Execute Hadoop Jobs
4.2.4
Configuring Oracle Data Integrator Studio for Executing Hadoop Jobs on the Local Agent
4.3
Setting Up an Integration Project
4.4
Creating an Oracle Data Integrator Model from a Reverse-Engineered Hive Model
4.4.1
Creating a Model
4.4.2
Reverse Engineering Hive Tables
4.5
Designing the Interface
4.5.1
Loading Data from Files into Hive
4.5.2
Validating and Transforming Data Within Hive
4.5.2.1
IKM Hive Control Append
4.5.2.2
CKM Hive
4.5.2.3
IKM Hive Transform
4.5.3
Loading Data into an Oracle Database from Hive and HDFS
Part III Oracle XQuery for Hadoop
5
Using Oracle XQuery for Hadoop
5.1
What Is Oracle XQuery for Hadoop?
5.2
Getting Started With Oracle XQuery for Hadoop
5.2.1
Basic Steps
5.2.2
Example: Hello World!
5.3
About the Adapters
5.3.1
About the Oracle XQuery for Hadoop Functions
5.3.2
About the Avro File Adapter
5.3.3
About the Oracle Database Adapter
5.3.4
About the Oracle NoSQL Database Adapter
5.3.5
About the Sequence File Adapter
5.3.6
About the Text File Adapter
5.3.7
About the XML File Adapter
5.3.8
About Other Modules for Use With Oracle XQuery for Hadoop
5.4
Creating an XQuery Transformation
5.4.1
XQuery Transformation Requirements
5.4.2
About XQuery Language Support
5.4.3
Accessing Data in the Hadoop Distributed Cache
5.4.4
Calling Custom Java Functions from XQuery
5.4.5
Accessing User-Defined XQuery Library Modules and XML Schemas
5.4.6
XQuery Transformation Examples
5.5
Running a Query
5.5.1
Oracle XQuery for Hadoop Options
5.5.2
Generic Options
5.5.3
About Running Queries Locally
5.6
Oracle XQuery for Hadoop Configuration Properties
5.7
Third-Party Licenses for Bundled Software
5.7.1
Apache Licensed Code
5.7.2
ANTLR 3.2
5.7.3
Apache Ant 1.7.1
5.7.4
Apache Avro 1.7.3, 1.7.4
5.7.5
Apache Xerces
5.7.6
Apache XMLBeans 2.5
5.7.7
Jackson 1.8.8
5.7.8
Woodstox XML Parser 4.2
6
Oracle XQuery for Hadoop Reference
Avro File Adapter
Built-in Functions for Reading Avro Files
avro:collection-avroxml
avro:get
Custom Functions for Reading Avro Container Files
Custom Functions for Writing Avro Files
About Converting Values Between Avro and XML
Reading Avro as XML
Writing XML as Avro
Oracle Database Adapter
Custom Functions for Writing to Oracle Database
%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties
Oracle NoSQL Database Adapter
Prerequisites for Using the Oracle NoSQL Database Adapter
Built-in Functions for Reading from and Writing to Oracle NoSQL Database
kv:collection-text
kv:collection-text
kv:collection-text
kv:collection-avroxml
kv:collection-avroxml
kv:collection-avroxml
kv:collection-xml
kv:collection-xml
kv:collection-xml
kv:collection-binxml
kv:collection-binxml
kv:collection-binxml
kv:collection-binxml
kv:put-text
kv:put-xml
kv:put-binxml
kv:get-text
kv:get-avroxml
kv:get-xml
kv:get-binxml
kv:key-range
kv:key-range
Oracle NoSQL Database Adapter Examples
Custom Functions for Reading Values from Oracle NoSQL Database
Custom Functions for Retrieving Single Values from Oracle NoSQL Database
Custom Functions for Writing to Oracle NoSQL Database
Oracle NoSQL Database Adapter Configuration Properties
Sequence File Adapter
Built-in Functions for Reading and Writing Sequence Files
seq:collection
seq:collection-xml
seq:collection-binxml
seq:put
seq:put
seq:put-xml
seq:put-xml
seq:put-binxml
seq:put-binxml
Examples of Sequence File Adapter Functions
Custom Functions for Reading Sequence Files
Custom Functions for Writing Sequence Files
Text File Adapter
Built-in Functions for Reading and Writing Text Files
text:collection
text:collection-xml
text:put
text:put-xml
text:trace
Examples of Text File Adapter Functions
Custom Functions for Reading Text Files
Custom Functions for Writing Text Files
Examples of Text File Functions
XML File Adapter
Built-in Functions for Reading XML Files
xmlf:collection
xmlf:collection
Examples of XML File Adapter Functions
Custom Functions for Reading XML Files
JSON Module
Built-in Functions for Reading JSON
json:parse-as-xml
json:get
Examples of JSON Functions
Utility Module
Duration, Date, and Time Functions
String Functions
Hadoop Module
Serialization Annotations
7
Oracle XML Extensions for Hive
7.1
What are the XML Extensions for Apache Hive?
7.2
Using the Hive Extensions
7.3
Creating XML Tables
7.3.1
Hive CREATE TABLE Syntax for XML Tables
7.3.2
CREATE TABLE Examples
7.3.2.1
Simple Examples
7.3.2.2
Detailed Examples
XML Function Library for Apache Hive
Online Documentation of Functions
About Hive Access to External Files
About Data Type Conversions
xml_query
xml_query_as_
primitive
xml_exists
xml_table
Part IV Oracle R Advanced Analytics for Hadoop
8
Using Oracle R Advanced Analytics for Hadoop
8.1
About Oracle R Advanced Analytics for Hadoop
8.2
Access to HDFS Files
8.3
Access to Apache Hive
8.3.1
ORE Functions for Hive
8.3.2
Generic R Functions Supported in Hive
8.3.3
Support for Hive Data Types
8.3.4
Usage Notes for Hive Access
8.3.5
Example: Loading Hive Tables into Oracle R Advanced Analytics for Hadoop
8.4
Access to Oracle Database
8.4.1
Usage Notes for Oracle Database Access
8.4.2
Scenario for Using Oracle R Advanced Analytics for Hadoop with Oracle R Enterprise
8.5
Analytic Functions in Oracle R Advanced Analytics for Hadoop
8.6
ORCH mapred.config Class
8.7
Examples and Demos of Oracle R Advanced Analytics for Hadoop
8.7.1
Using the Demos
8.7.2
Using the Examples
8.8
Security Notes for Oracle R Advanced Analytics for Hadoop
9
ORCH Library Reference
9.1
Functions in Alphabetical Order
9.2
Functions by Category
9.2.1
Making Connections
9.2.2
Copying Data
9.2.3
Exploring Files
9.2.4
Writing MapReduce Functions
9.2.5
Debugging Scripts
9.2.6
Using Hive Data
9.2.7
Writing Analytical Functions
hadoop.exec
hadoop.run
hdfs.attach
hdfs.cd
hdfs.cp
hdfs.describe
hdfs.download
hdfs.exists
hdfs.get
hdfs.head
hdfs.id
hdfs.ls
hdfs.mkdir
hdfs.mv
hdfs.parts
hdfs.pull
hdfs.push
hdfs.put
hdfs.pwd
hdfs.rm
hdfs.rmdir
hdfs.root
hdfs.sample
hdfs.setroot
hdfs.size
hdfs.tail
hdfs.upload
is.hdfs.id
orch.connect
orch.connected
orch.dbcon
orch.dbg.lasterr
orch.dbg.off
orch.dbg.on
orch.dbg.output
orch.dbinfo
orch.disconnect
orch.dryrun
orch.export
orch.keyval
orch.keyvals
orch.pack
orch.reconnect
orch.temp.path
orch.unpack
orch.version
Index
Scripting on this page enhances content navigation, but does not change the content in any way.