1/10
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Conventions
1
Introducing Oracle Big Data Appliance
1.1
What Is Big Data?
1.1.1
High Variety
1.1.2
High Complexity
1.1.3
High Volume
1.1.4
High Velocity
1.2
The Oracle Big Data Solution
1.3
Software for Big Data
1.3.1
Software Component Overview
1.4
Acquiring Data for Analysis
1.4.1
Hadoop Distributed File System
1.4.2
Hive
1.4.3
Oracle NoSQL Database
1.5
Organizing Big Data
1.5.1
MapReduce
1.5.2
Oracle R Support for Big Data
1.5.3
Oracle Big Data Connectors
1.5.3.1
Oracle SQL Connector for Hadoop Distributed File System
1.5.3.2
Oracle Loader for Hadoop
1.5.3.3
Oracle XQuery for Hadoop
1.5.3.4
Oracle R Advanced Analytics for Hadoop
1.5.3.5
Oracle Data Integrator Application Adapter for Hadoop
1.6
Analyzing and Visualizing Big Data
2
Administering Oracle Big Data Appliance
2.1
Monitoring a Cluster Using Oracle Enterprise Manager
2.1.1
Using the Enterprise Manager Web Interface
2.1.2
Using the Enterprise Manager Command-Line Interface
2.2
Managing CDH Operations Using Cloudera Manager
2.2.1
Monitoring the Status of Oracle Big Data Appliance
2.2.2
Performing Administrative Tasks
2.2.3
Managing Services With Cloudera Manager
2.3
Using Hadoop Monitoring Utilities
2.3.1
Monitoring the JobTracker
2.3.2
Monitoring the TaskTracker
2.4
Using Cloudera Hue to Interact With Hadoop
2.5
About the Oracle Big Data Appliance Software
2.5.1
Software Components
2.5.2
Logical Disk Layout
2.6
About the CDH Software Services
2.6.1
Monitoring the CDH Services
2.6.2
Where Do the CDH Services Run?
2.6.2.1
Service Locations on a Single Rack
2.6.2.2
Service Locations in Multirack Clusters
2.6.3
Automatic Failover of the NameNode
2.6.4
Automatic Failover of the JobTracker
2.6.5
Unconfigured Software
2.6.6
Map and Reduce Resource Configuration
2.7
Effects of Hardware on Software Availability
2.7.1
Critical and Noncritical Nodes
2.7.2
First Namenode
2.7.3
Second NameNode
2.7.4
First JobTracker
2.7.5
Second JobTracker
2.7.6
Noncritical Nodes
2.8
Stopping and Starting Oracle Big Data Appliance
2.8.1
Prerequisites
2.8.2
Stopping Oracle Big Data Appliance
2.8.3
Starting Oracle Big Data Appliance
2.9
Security on Oracle Big Data Appliance
2.9.1
About Predefined Users and Groups
2.9.2
About User Authentication
2.9.3
About Fine-Grained Authorization
2.9.4
About On-Disk Encryption
2.9.5
Port Numbers Used on Oracle Big Data Appliance
2.9.6
About Puppet Security
2.10
Auditing Oracle Big Data Appliance
2.10.1
About Oracle Audit Vault and Database Firewall
2.10.2
Setting Up the Oracle Big Data Appliance Plug-in
2.10.3
Monitoring Oracle Big Data Appliance
2.11
Collecting Diagnostic Information for Oracle Customer Support
3
Supporting User Access to Oracle Big Data Appliance
3.1
About Accessing a Kerberos-Secured Cluster
3.2
Providing Remote Client Access to CDH
3.2.1
Prerequisites
3.2.2
Installing CDH on Oracle Exadata Database Machine
3.2.3
Installing a CDH Client on Any Supported Operating System
3.2.4
Configuring a CDH Client for an Unsecured Cluster
3.2.5
Configuring a CDH Client for a Kerberos-Secured Cluster
3.2.6
Verifying Access to a Cluster from the CDH Client
3.3
Providing Remote Client Access to Hive
3.4
Managing User Accounts
3.4.1
Creating Hadoop Cluster Users
3.4.1.1
Creating Users on an Unsecured Cluster
3.4.1.2
Creating Users on a Secured Cluster
3.4.2
Providing User Login Privileges (Optional)
3.5
Recovering Deleted Files
3.5.1
Restoring Files from the Trash
3.5.2
Changing the Trash Interval
3.5.3
Disabling the Trash Facility
3.5.3.1
Completely Disabling the Trash Facility
3.5.3.2
Disabling the Trash Facility for Local HDFS Clients
3.5.3.3
Disabling the Trash Facility for a Remote HDFS Client
4
Optimizing MapReduce Jobs Using Perfect Balance
4.1
What is Perfect Balance?
4.1.1
About Balancing Jobs Across Map and Reduce Tasks
4.1.2
Ways to Use Perfect Balance Features
4.1.3
Perfect Balance Components
4.2
Application Requirements
4.3
Getting Started with Perfect Balance
4.4
Analyzing a Job Reducer Load
4.4.1
About Job Analyzer
4.4.1.1
Methods of Running Job Analyzer
4.4.2
Running Job Analyzer as a Standalone Utility
4.4.2.1
Job Analyzer Utility Example
4.4.2.2
Job Analyzer Utility Syntax
4.4.3
Running Job Analyzer using Perfect Balance
4.4.3.1
Running Job Analyzer with Perfect Balance
4.4.3.2
Collecting Additional Metrics
4.4.4
Reading the Job Analyzer Report
4.5
About Configuring Perfect Balance
4.6
Running a Balanced MapReduce Job Using Perfect Balance
4.7
About Perfect Balance Reports
4.8
About Chopping
4.8.1
Selecting a Chopping Method
4.8.2
How Chopping Impacts Applications
4.9
Troubleshooting Jobs Running with Perfect Balance
4.9.1
Java GC Overhead Limit Exceeded Error
4.9.2
Java Out of Heap Space Errors
4.10
Using the Perfect Balance API
4.10.1
Modifying Your Java Code to Use Perfect Balance
4.10.2
Running Your Modified Java Code with Perfect Balance
4.11
About the Perfect Balance Examples
4.11.1
About the Examples in This Chapter
4.11.2
Extracting the Example Data Set
4.12
Perfect Balance Configuration Property Reference
5
Configuring Oracle Exadata Database Machine for Use with Oracle Big Data Appliance
5.1
About Optimizing Communications
5.1.1
About Applications that Pull Data Into Oracle Exadata Database Machine
5.1.2
About Applications that Push Data Into Oracle Exadata Database Machine
5.2
Prerequisites for Optimizing Communications
5.3
Specifying the InfiniBand Connections to Oracle Big Data Appliance
5.4
Specifying the InfiniBand Connections to Oracle Exadata Database Machine
5.5
Enabling SDP on Exadata Database Nodes
5.6
Configuring a JDBC Client for SDP
5.7
Creating an SDP Listener on the InfiniBand Network
Glossary
Index
Scripting on this page enhances content navigation, but does not change the content in any way.