1/57
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Text Conventions
Syntax Conventions
Changes in This Release for Oracle Big Data Connectors User's Guide
Changes in Oracle Big Data Connectors Release 2 (2.0)
1
Getting Started with Oracle Big Data Connectors
1.1
About Oracle Big Data Connectors
1.2
Big Data Concepts and Technologies
1.2.1
What is MapReduce?
1.2.2
What is Apache Hadoop?
1.3
Downloading the Oracle Big Data Connectors Software
1.4
Oracle SQL Connector for Hadoop Distributed File System Setup
1.4.1
Software Requirements
1.4.2
Installing and Configuring a Hadoop Client
1.4.3
Installing Oracle SQL Connector for HDFS
1.4.4
Providing Support for Hive Tables
1.4.5
Granting User Privileges in Oracle Database
1.4.6
Setting Up User Accounts on the Oracle Database System
1.4.7
Setting Up User Accounts on the Hadoop Cluster
1.5
Oracle Loader for Hadoop Setup
1.5.1
Software Requirements
1.5.2
Installing Oracle Loader for Hadoop
1.5.3
Providing Support for Offline Database Mode
1.6
Oracle Data Integrator Application Adapter for Hadoop Setup
1.6.1
System Requirements and Certifications
1.6.2
Technology-Specific Requirements
1.6.3
Location of Oracle Data Integrator Application Adapter for Hadoop
1.6.4
Setting Up the Topology
1.7
Oracle R Connector for Hadoop Setup
1.7.1
Installing the Software on Hadoop
1.7.1.1
Software Requirements for a Third-Party Hadoop Cluster
1.7.1.2
Installing Sqoop on a Hadoop Cluster
1.7.1.3
Installing Hive on a Hadoop Cluster
1.7.1.4
Installing R on a Hadoop Cluster
1.7.1.5
Installing the ORCH Package on a Hadoop Cluster
1.7.2
Installing Additional R Packages
1.7.3
Providing Remote Client Access to R Users
1.7.3.1
Software Requirements for Remote Client Access
1.7.3.2
Configuring the Server as a Hadoop Client
1.7.3.3
Installing Sqoop on a Hadoop Client
1.7.3.4
Installing R on a Hadoop Client
1.7.3.5
Installing the ORCH Package on a Hadoop Client
1.7.3.6
Installing the Oracle R Enterprise Client Packages (Optional)
2
Oracle SQL Connector for Hadoop Distributed File System
2.1
About Oracle SQL Connector for HDFS
2.2
Getting Started With Oracle SQL Connector for HDFS
2.3
Configuring Your System for Oracle SQL Connector for HDFS
2.4
Using the ExternalTable Command-Line Tool
2.4.1
About ExternalTable
2.4.2
ExternalTable Command-Line Tool Syntax
2.5
Creating External Tables
2.5.1
Creating External Tables with the ExternalTable Tool
2.5.2
Creating External Tables from Data Pump Format Files
2.5.2.1
Required Properties
2.5.2.2
Optional Properties
2.5.2.3
Defining Properties in XML Files for Data Pump Format Files
2.5.2.4
Example
2.5.3
Creating External Tables from Hive Tables
2.5.3.1
Hive Table Requirements
2.5.3.2
Required Properties
2.5.3.3
Optional Properties
2.5.3.4
Defining Properties in XML Files for Hive Tables
2.5.3.5
Example
2.5.4
Creating External Tables from Delimited Text Files
2.5.4.1
Required Properties
2.5.4.2
Optional Properties
2.5.4.3
Defining Properties in XML Files for Delimited Text Files
2.5.4.4
Example
2.5.5
Creating External Tables in SQL
2.6
Publishing the HDFS Data Paths
2.7
Listing Location File Metadata and Contents
2.8
Describing External Tables
2.9
More About External Tables Generated by the ExternalTable Tool
2.9.1
What Are Location Files?
2.9.2
Enabling Parallel Processing
2.9.3
Location File Management
2.9.4
Location File Names
2.10
Configuring Oracle SQL Connector for HDFS
2.10.1
Creating a Configuration File
2.10.2
Configuration Properties
2.11
Performance Tips for Querying Data in HDFS
3
Oracle Loader for Hadoop
3.1
What Is Oracle Loader for Hadoop?
3.2
About the Modes of Operation
3.2.1
Online Database Mode
3.2.2
Offline Database Mode
3.3
Getting Started With Oracle Loader for Hadoop
3.4
Creating the Target Table
3.4.1
Supported Data Types for Target Tables
3.4.2
Supported Partitioning Strategies for Target Tables
3.5
Creating a Job Configuration File
3.6
About the Target Table Metadata
3.6.1
Providing the Connection Details for Online Database Mode
3.6.2
Generating the Target Table Metadata for Offline Database Mode
3.6.2.1
OraLoaderMetadata Utility
3.7
About Input Formats
3.7.1
Delimited Text Input Format
3.7.1.1
About DelimitedTextInputFormat
3.7.1.2
Required Configuration Properties
3.7.1.3
Optional Configuration Properties
3.7.2
Complex Text Input Formats
3.7.2.1
About RegexInputFormat
3.7.2.2
Required Configuration Properties
3.7.2.3
Optional Configuration Properties
3.7.3
Hive Table Input Format
3.7.3.1
About HiveToAvroInputFormat
3.7.3.2
Required Configuration Properties
3.7.4
Avro Input Format
3.7.4.1
Configuration Properties
3.7.5
Oracle NoSQL Database Input Format
3.7.5.1
About KVAvroInputFormat
3.7.5.2
Configuration Properties
3.7.6
Custom Input Formats
3.7.6.1
About Implementing a Custom Input Format
3.7.6.2
About Error Handling
3.7.6.3
Supporting Data Sampling
3.7.6.4
InputFormat Source Code Example
3.8
Mapping Input Fields to Target Table Columns
3.8.1
Automatic Mapping
3.8.2
Manual Mapping
3.8.2.1
Creating a Loader Map
3.8.2.2
Example Loader Map
3.9
About Output Formats
3.9.1
JDBC Output Format
3.9.1.1
About JDBCOutputFormat
3.9.1.2
Configuration Properties
3.9.2
Oracle OCI Direct Path Output Format
3.9.2.1
About OCIOutputFormat
3.9.2.2
Configuration Properties
3.9.2.3
Additional Configuration Requirements
3.9.3
Delimited Text Output Format
3.9.3.1
About DelimitedTextOutputFormat
3.9.3.2
Configuration Properties
3.9.4
Oracle Data Pump Output Format
3.9.4.1
About DataPumpOutputFormat
3.10
Running a Loader Job
3.10.1
Specifying Hive Input Format JAR Files
3.10.2
Specifying Oracle NoSQL Database Input Format JAR Files
3.10.3
Job Reporting
3.11
Handling Rejected Records
3.11.1
Logging Rejected Records in Bad Files
3.11.2
Setting a Job Reject Limit
3.12
Balancing Loads When Loading Data into Partitioned Tables
3.12.1
Using the Sampling Feature
3.12.2
Tuning Load Balancing
3.12.3
Tuning Sampling Behavior
3.12.4
When Does Oracle Loader for Hadoop Use the Sampler's Partitioning Scheme?
3.12.5
Resolving Memory Issues
3.12.6
What Happens When a Sampling Feature Property Has an Invalid Value?
3.13
Optimizing Communications Between Oracle Engineered Systems
3.14
Oracle Loader for Hadoop Configuration Property Reference
3.15
Third-Party Licenses for Bundled Software
3.15.1
Apache Licensed Code
3.15.2
Apache Avro 1.7.3
3.15.3
Apache Commons Mathematics Library 2.2
3.15.4
Jackson JSON 1.8.8
4
Oracle Data Integrator Application Adapter for Hadoop
4.1
Introduction
4.1.1
Concepts
4.1.2
Knowledge Modules
4.1.3
Security
4.2
Setting Up the Topology
4.2.1
Setting Up File Data Sources
4.2.2
Setting Up Hive Data Sources
4.2.3
Setting Up the Oracle Data Integrator Agent to Execute Hadoop Jobs
4.2.4
Configuring Oracle Data Integrator Studio for Executing Hadoop Jobs on the Local Agent
4.3
Setting Up an Integration Project
4.4
Creating an Oracle Data Integrator Model from a Reverse-Engineered Hive Model
4.4.1
Creating a Model
4.4.2
Reverse Engineering Hive Tables
4.5
Designing the Interface
4.5.1
Loading Data from Files into Hive
4.5.2
Validating and Transforming Data Within Hive
4.5.2.1
IKM Hive Control Append
4.5.2.2
CKM Hive
4.5.2.3
IKM Hive Transform
4.5.3
Loading Data into an Oracle Database from Hive and HDFS
5
Oracle R Connector for Hadoop
5.1
About Oracle R Connector for Hadoop
5.2
Access to HDFS Files
5.3
Access to Hive
5.3.1
ORE Functions for Hive
5.3.2
Generic R Functions Supported in Hive
5.3.3
Support for Hive Data Types
5.3.4
Usage Notes for Hive Access
5.3.5
Example: Loading Hive Tables into Oracle R Connector for Hadoop
5.4
Access to Oracle Database
5.4.1
Usage Notes for Oracle Database Access
5.4.2
Scenario for Using Oracle R Connector for Hadoop with Oracle R Enterprise
5.5
Analytic Functions in Oracle R Connector for Hadoop
5.6
ORCH mapred.config Class
5.7
Examples and Demos of Oracle R Connector for Hadoop
5.7.1
Using the Demos
5.7.2
Using the Examples
5.8
Security Notes for Oracle R Connector for Hadoop
6
ORCH Library Reference
6.1
Functions in Alphabetical Order
6.2
Functions by Category
6.2.1
Making Connections
6.2.2
Copying Data
6.2.3
Exploring Files
6.2.4
Writing MapReduce Functions
6.2.5
Debugging Scripts
6.2.6
Using Hive Data
6.2.7
Writing Analytical Functions
hadoop.exec
hadoop.run
hdfs.attach
hdfs.cd
hdfs.cp
hdfs.describe
hdfs.download
hdfs.exists
hdfs.get
hdfs.head
hdfs.id
hdfs.ls
hdfs.mkdir
hdfs.mv
hdfs.parts
hdfs.pull
hdfs.push
hdfs.put
hdfs.pwd
hdfs.rm
hdfs.rmdir
hdfs.root
hdfs.sample
hdfs.setroot
hdfs.size
hdfs.tail
hdfs.upload
is.hdfs.id
orch.connect
orch.connected
orch.dbcon
orch.dbg.lasterr
orch.dbg.off
orch.dbg.on
orch.dbg.output
orch.dbinfo
orch.disconnect
orch.dryrun
orch.export
orch.keyval
orch.keyvals
orch.pack
orch.reconnect
orch.temp.path
orch.unpack
orch.version
Index
Scripting on this page enhances content navigation, but does not change the content in any way.