Go to main content
1/14
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Conventions
Backus-Naur Form Syntax
Part I Administration
1
Introducing Oracle Big Data Appliance
1.1
What Is Big Data?
1.1.1
High Variety
1.1.2
High Complexity
1.1.3
High Volume
1.1.4
High Velocity
1.2
The Oracle Big Data Solution
1.3
Software for Big Data Appliance
1.3.1
Software Component Overview
1.4
Acquiring Data for Analysis
1.4.1
Hadoop Distributed File System
1.4.2
Apache Hive
1.4.3
Oracle NoSQL Database
1.5
Organizing Big Data
1.5.1
MapReduce
1.5.2
Oracle Big Data SQL
1.5.3
Oracle Big Data Connectors
1.5.3.1
Oracle SQL Connector for Hadoop Distributed File System
1.5.3.2
Oracle Loader for Hadoop
1.5.3.3
Oracle Data Integrator Enterprise Edition
1.5.3.4
Oracle XQuery for Hadoop
1.5.3.5
Oracle R Advanced Analytics for Hadoop
1.5.3.6
Oracle Shell for Hadoop Loaders
1.5.4
Oracle R Support for Big Data
1.6
Analyzing and Visualizing Big Data
2
Administering Oracle Big Data Appliance
2.1
Monitoring Multiple Clusters Using Oracle Enterprise Manager
2.1.1
Using the Enterprise Manager Web Interface
2.1.2
Using the Enterprise Manager Command-Line Interface
2.2
Managing Operations Using Cloudera Manager
2.2.1
Monitoring the Status of Oracle Big Data Appliance
2.2.2
Performing Administrative Tasks
2.2.3
Managing CDH Services With Cloudera Manager
2.3
Using Hadoop Monitoring Utilities
2.3.1
Monitoring MapReduce Jobs
2.3.2
Monitoring the Health of HDFS
2.4
Using Cloudera Hue to Interact With Hadoop
2.5
About the Oracle Big Data Appliance Software
2.5.1
Software Components
2.5.2
Unconfigured Software
2.5.3
Allocating Resources Among Services
2.6
About the CDH Software Services
2.6.1
Where Do the Services Run on a Three-Node, Development Cluster?
2.6.2
Where Do the Services Run on a Single-Rack CDH Cluster?
2.6.3
Where Do the Services Run on a Multirack CDH Cluster?
2.6.4
About MapReduce
2.6.5
Automatic Failover of the NameNode
2.6.6
Automatic Failover of the ResourceManager
2.6.7
Map and Reduce Resource Allocation
2.7
Effects of Hardware on Software Availability
2.7.1
Logical Disk Layout
2.7.2
Critical and Noncritical CDH Nodes
2.7.2.1
High Availability or Single Points of Failure?
2.7.2.2
Where Do the Critical Services Run?
2.7.3
First NameNode Node
2.7.4
Second NameNode Node
2.7.5
First ResourceManager Node
2.7.6
Second ResourceManager Node
2.7.7
Noncritical CDH Nodes
2.8
Managing a Hardware Failure
2.8.1
About Oracle NoSQL Database Clusters
2.8.2
Prerequisites for Managing a Failing Node
2.8.3
Managing a Failing CDH Critical Node
2.8.4
Managing a Failing Noncritical Node
2.9
Stopping and Starting Oracle Big Data Appliance
2.9.1
Prerequisites
2.9.2
Stopping Oracle Big Data Appliance
2.9.2.1
Stopping All Managed Services
2.9.2.2
Stopping Cloudera Manager Server
2.9.2.3
Stopping Oracle Data Integrator Agent
2.9.2.4
Dismounting NFS Directories
2.9.2.5
Stopping the Servers
2.9.2.6
Stopping the InfiniBand and Cisco Switches
2.9.3
Starting Oracle Big Data Appliance
2.9.3.1
Powering Up Oracle Big Data Appliance
2.9.3.2
Starting the HDFS Software Services
2.9.3.3
Starting Oracle Data Integrator Agent
2.10
Managing Oracle Big Data SQL
2.10.1
Adding and Removing the Oracle Big Data SQL Service
2.10.2
Allocating Resources to Oracle Big Data SQL
2.11
Security on Oracle Big Data Appliance
2.11.1
About Predefined Users and Groups
2.11.2
About User Authentication
2.11.3
About Fine-Grained Authorization
2.11.4
About HDFS Transparent Encryption
2.11.5
About HTTPS/Network Encryption
2.11.5.1
Configuring Web Browsers to use Kerberos Authentication
2.11.6
Port Numbers Used on Oracle Big Data Appliance
2.11.7
About Puppet Security
2.12
Auditing Oracle Big Data Appliance
2.12.1
About Oracle Audit Vault and Database Firewall
2.12.2
Setting Up the Oracle Big Data Appliance Plug-in
2.12.3
Monitoring Oracle Big Data Appliance
2.13
Collecting Diagnostic Information for Oracle Customer Support
3
Supporting User Access to Oracle Big Data Appliance
3.1
About Accessing a Kerberos-Secured Cluster
3.2
Providing Remote Client Access to CDH
3.2.1
Prerequisites
3.2.2
Installing a CDH Client on Any Supported Operating System
3.2.3
Configuring a CDH Client for an Unsecured Cluster
3.2.4
Configuring a CDH Client for a Kerberos-Secured Cluster
3.2.5
Verifying Access to a Cluster from the CDH Client
3.3
Providing Remote Client Access to Hive
3.4
Managing User Accounts
3.4.1
Creating Hadoop Cluster Users
3.4.1.1
Creating Users on an Unsecured Cluster
3.4.1.2
Creating Users on a Secured Cluster
3.4.2
Providing User Login Privileges (Optional)
3.5
Recovering Deleted Files
3.5.1
Restoring Files from the Trash
3.5.2
Changing the Trash Interval
3.5.3
Disabling the Trash Facility
3.5.3.1
Completely Disabling the Trash Facility
3.5.3.2
Disabling the Trash Facility for Local HDFS Clients
3.5.3.3
Disabling the Trash Facility for a Remote HDFS Client
4
Configuring Oracle Exadata Database Machine for Use with Oracle Big Data Appliance
4.1
About Optimizing Communications
4.1.1
About Applications that Pull Data Into Oracle Exadata Database Machine
4.1.2
About Applications that Push Data Into Oracle Exadata Database Machine
4.2
Prerequisites for Optimizing Communications
4.3
Specifying the InfiniBand Connections to Oracle Big Data Appliance
4.4
Specifying the InfiniBand Connections to Oracle Exadata Database Machine
4.5
Enabling SDP on Exadata Database Nodes
4.6
Creating an SDP Listener on the InfiniBand Network
Part II Oracle Big Data Appliance Software
5
Optimizing MapReduce Jobs Using Perfect Balance
5.1
What is Perfect Balance?
5.1.1
About Balancing Jobs Across Map and Reduce Tasks
5.1.2
Ways to Use Perfect Balance Features
5.1.3
Perfect Balance Components
5.2
Application Requirements
5.3
Getting Started with Perfect Balance
5.4
Analyzing a Job's Reducer Load
5.4.1
About Job Analyzer
5.4.1.1
Methods of Running Job Analyzer
5.4.2
Running Job Analyzer as a Standalone Utility
5.4.2.1
Job Analyzer Utility Example
5.4.2.2
Job Analyzer Utility Syntax
5.4.3
Running Job Analyzer Using Perfect Balance
5.4.3.1
Running Job Analyzer Using Perfect Balance
5.4.3.2
Collecting Additional Metrics
5.4.4
Reading the Job Analyzer Report
5.5
About Configuring Perfect Balance
5.6
Running a Balanced MapReduce Job Using Perfect Balance
5.7
About Perfect Balance Reports
5.8
About Chopping
5.8.1
Selecting a Chopping Method
5.8.2
How Chopping Impacts Applications
5.9
Troubleshooting Jobs Running with Perfect Balance
5.10
Using the Perfect Balance API
5.10.1
Modifying Your Java Code to Use Perfect Balance
5.10.2
Running Your Modified Java Code with Perfect Balance
5.11
About the Perfect Balance Examples
5.11.1
About the Examples in This Chapter
5.11.2
Extracting the Example Data Set
5.12
Perfect Balance Configuration Property Reference
Part III Oracle Table Access for Hadoop and Spark
6
Oracle DataSource for Apache Hadoop (OD4H)
6.1
Operational Data, Big Data and Requirements
6.2
Overview of Oracle DataSource for Apache Hadoop (OD4H)
6.2.1
Opportunity with Hadoop 2.x
6.2.2
Oracle Tables as Hadoop Data Source
6.2.3
External Tables
6.2.3.1
TBLPROPERTIES
6.2.3.2
SERDE PROPERTIES
6.2.4
List of jars in the OD4H package
6.3
How does OD4H work?
6.3.1
Create a new Oracle Database Table or Reuse an Existing Table
6.3.2
Hive DDL
6.3.3
Creating External Tables in Hive
6.4
Features of OD4H
6.4.1
Performance And Scalability Features
6.4.1.1
Splitters
6.4.1.2
Choosing a Splitter
6.4.1.3
Predicate Pushdown
6.4.1.4
Projection Pushdown
6.4.1.5
Partition Pruning
6.4.2
Smart Connection Management
6.4.3
Security Features
6.4.3.1
Improved Authentication
6.5
Using HiveQL with OD4H
6.6
Using Spark SQL with OD4H
6.7
Writing Back to Oracle Database
Glossary
Index
Scripting on this page enhances content navigation, but does not change the content in any way.