This guide describes Oracle Big Data Appliance, which is used for acquiring, organizing, and analyzing very large data sets. It includes information about hardware operations, site planning and configuration, and physical, electrical, and environmental specifications.
This preface contains the following topics:
This guide is intended for Oracle Big Data Appliance customers and those responsible for data center site planning, installation, configuration, and maintenance of Oracle Big Data Appliance.
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.
Access to Oracle Support
Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.
Oracle Big Data Appliance Documentation Library
In the Big Data Documentation Portal
of the Oracle Help Center, you can find a link to the complete Oracle Big Data Appliance library for your release of the product. The library includes the following core documents as well as documents for products that are used in conjunction with Oracle Big Data Appliance:
Oracle Big Data Appliance Owner’s Guide (this guide)
Oracle Big Data Appliance Software User’s Guide
Oracle Big Data Connectors User’s Guide
Oracle Big Data Appliance Safety and Compliance Guide
Oracle Big Data Appliance Licensing Information User Manual
Note:
The Oracle Big Data Appliance Licensing Information User Manual is the consolidated reference for licensing information for Oracle and third-party software included in the Oracle Big Data Appliance product. Refer to this manual or contact Oracle Support if you have questions about licensing.Documentation for Affiliated Products
The following Oracle libraries contain hardware information for Oracle Big Data Appliance. Links to these libraries are available through the Big Data Documentation Portal at
https://docs.oracle.com/en/bigdata/
Oracle Server X5-2L library:
Sun Rack II 1042 and 1242 library:
Sun Network QDR InfiniBand Gateway Switch library:
Sun Datacenter InfiniBand Switch 36 library:
Oracle Integrated Lights Out Manager (ILOM) 3.1 library:
The following text conventions are used in this document:
Convention | Meaning |
---|---|
boldface |
Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary. |
italic |
Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values. |
|
Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter. |
|
The pound ( |
The syntax in this reference is presented in a simple variation of Backus-Naur Form (BNF) that uses the following symbols and conventions:
Symbol or Convention | Description |
---|---|
[ ] |
Brackets enclose optional items. |
{ } |
Braces enclose a choice of items, only one of which is required. |
| |
A vertical bar separates alternatives within brackets or braces. |
... |
Ellipses indicate that the preceding syntactic element can be repeated. |
delimiters |
Delimiters other than brackets, braces, and vertical bars must be entered as shown. |
boldface |
Words appearing in boldface are keywords. They must be typed as shown. (Keywords are case-sensitive in some, but not all, operating systems.) Words that are not in boldface are placeholders for which you must substitute a name or value. |
Oracle Big Data Appliance 4.7 is focused on defect fixes and software stack version updates, including major updates to the UEK kernel, MySQL, and Cloudera (Cloudera Manager and CDH). There are no other customer-visible changes in this release.
Software Updates
CDH (Cloudera's Distribution including Apache Hadoop) 5.9
CM (Cloudera Manager) 5.9
Oracle Big Data Connectors 4.7
Big Data SQL 3.0.1 or 3.1 (both optional)
MySQL Enterprise Edition 5.6.32
Perfect Balance 2.9
Java JDK 8u111
Oracle Linux 6.8 UEK 4 (Oracle Linux Unbreakable Kernel, Release 4)
Oracle Data Integrator Agent 12.2.1.1 (for Oracle Big Data Connectors)
Oracle R Advanced Analytics for Hadoop (ORAAH) 2.7.0
Oracle's R Distribution (ORD) 3.2.0
Oracle Big Data Discovery 1.3.2 or 1.4 (1.4.0.37.1388 or greater). This is optional.
See the Cloudera Documentation for information about CDH and Cloudera Manager 5.9
Oracle Big Data Appliance Release 4.7 includes Oracle Big Data SQL 3.0.1 as a Mammoth installation option. You do not need to download this package from the Oracle Software Delivery Cloud. Oracle Big Data SQL 3.1 is also compatible with Release 4.7 and is available for download from Oracle Software Delivery Cloud.
The following summarizes the change history of Oracle Big Data Appliance.
The following are changes in Oracle Big Data Appliance Release 4 (4.6):
Software Updates
CDH (Cloudera's Distribution including Apache Hadoop) 5.8
CM (Cloudera Manager) 5.8.1
Oracle Big Data Connectors 4.6
Oracle Data Integrator Agent 12.2.1.1 (for Oracle Big Data Connectors)
Oracle R Advanced Analytics for Hadoop (ORAAH) 2.6.0
Oracle's R Distribution (ORD) 3.2.0
Perfect Balance 2.8
Oracle Linux 6.8
Big Data Discovery 1.2.2 or 1.3.x (optional)
Big Data SQL 3.0.1 (optional)
Java JDK 8u101
New Features
Cloudera CDH 5.8 and Cloudera Manager 5.8.1
See the Cloudera Enterprise 5.8.x Documentation for information about CDH and Cloudera Manager 5.8.x
Oracle Big Data SQL Updates
Oracle Big Data Appliance Release 4.6 includes Oracle Big Data SQL 3.0.1 as a Mammoth installation option. You do not need to download it from the Oracle Software Delivery Cloud to install it on Oracle Big Data Appliance 4.6.
Oracle Big Data SQL 3.1 will be downloadable from the Oracle Software Delivery Cloud when available.
Networking Changes for Greater Configuration Flexibility
The release provides more modular control over Oracle Big Data Appliance networks by separately storing the network configuration settings for each rack and cluster: <rack_name>-rack-network.json
and <cluster-name>cluster-network.json
. When you are using the Oracle Big Data Appliance Configuration Generation Utility, these changes enable you to reconfigure the client network or private network of a cluster without affecting the configuration of other servers on the rack. They also allow you to expand a cluster or rack without affecting servers that are not part of the configuration.
In previous releases, all such information was stored in a single network.json
file. This file still exists for backward compatibility with some scripts.
Encryption for Data Spills and some Intermediate Files
Data spills to disk outside of HDFS during the following memory-intensive processes can now be protected by encryption.
Spark shuffle.
Creation of intermediate files in MapReduce encrypted shuffle and spillage during map and reduce operations.
Impala SQL queries that generate extremely large result sets.
When you enable Hadoop Network Encryption in Mammoth during a full Oracle Big Data Appliance installation, or later, via bdacli enable hadoop_network_encryption, then encryption is also enabled for Spark, Impala, as well MapReduce intermediate files and data spills.
Note:
In the case of an Oracle Big Data Appliance upgrade to Release 4.6, the upgrade does not automatically enable this new extension of encryption to data spills, regardless of whether or not Hadoop Network Encryption is already enabled. If you want this feature on a system that has been upgraded to Release 4.6, run bdacli enable hadoop_network_encryption after the upgrade.New reset
Command in bdacli
The bdacli reset
command selectively reconfigures Oracle Big Data Appliance networks. It pulls the new settings from the network configuration files generated by Oracle Big Data Appliance Configuration Generation Utility. The user controls the scope of the reset (server, network, cluster) and which networks are reset.
See "bdacli reset" in the Oracle Big Data Appliance Utilities section of this guide.
Support for Oracle NoSQL Database 4.0.9
Other Changes
Mammoth Installation Step Changes
Some Mammoth installation steps have been reorganized and renamed. An important change to note is that the Kerberos installation now consists of two separate pre- and post-cluster configuration steps in order to enable additional security setups on the cluster. See Mammoth Installation Steps.
X6-2 servers are shipped with an Oracle Big Data Appliance v4.5.0 base image.
The following are changes in Oracle Big Data Appliance Release 4 (4.5):
Software Updates
CDH (Cloudera's Distribution including Apache Hadoop) 5.7
CDM (Cloudera Manager) 5.7
Cloudera Navigator 2.4.1
Java JDK 8u92
Oracle Big Data Connectors 4.5
Oracle Data Integrator Agent 12.2.1 (for Oracle Big Data Connectors)
Oracle NoSQL Database 4.0.5
Perfect Balance 2.7
Big Data Discovery 1.2.2 (optional)
Big Data SQL 3.0.1 (optional)
Hardware Updates
Oracle Big Data Appliance X6-2 server
2 x 22-core (2.2GHz) Intel® Xeon® E5-2699 v4 processors.
8 x 32 GB DDR4-2400 memory (expandable to maximum of 768 GB per node).
X6-2 nodes can be mixed with X5-2 nodes (and older Release 4.5–compatible nodes) in a CDH or NoSQL cluster. The X6-2 server is not compatible as a node of an Oracle Big Data Appliance cluster in releases prior to Oracle Big Data Appliance Release 4.4.
See the Oracle Big Data Appliance X6-2 Data Sheet for more details.
New Features
Cloudera CDH 5.7 and Cloudera Manager 5.7
See the Cloudera Enterprise 5.7.x Documentation for information about CDH 5.7 and Cloudera Manager 5.7
Support for Either Local or Remote Key Trustee Servers
Oracle Big Data Appliance supports both local and remote Key Trustee Servers for HDFS Transparent Encryption. The Oracle Big Data Configuration Utility includes HDFS Transparent Encryption as a configuration option. You can either click a checkbox to automatically install and configure active and passive Key Trustee Servers locally on the Oracle Big Data Appliance or define an “off-board” configuration, including the address of the active and passive servers, the Key Trustee organization, and the authorization code. You can also enable HDFS Transparent Encryption via the bdacli utility at any time after Mammoth installation and will be prompted to make the same choice between remote or local key trustee services.
Athough Oracle Big Data Appliance supports local Key Trustee Servers, remote servers are still the recommended choice.
Support for Oracle Big Data SQL 3.0.1
Oracle Big Data Appliance Release 4.5 includes Oracle Big Data SQL 3.0.1 as a Mammoth installation option. See the Oracle Big Data SQL User's Guide for Release 3.0.1 installation instructions.
Enhanced Networking
Release 4.5 provides more flexibility in the configuration of networks on the Oracle Big Data Appliance. This includes support for the following options:
Separate networks for each cluster in a rack (both client and private networks).
Multiple client networks on the same BDA cluster.
VLAN tagging for client networks.
Partition keys for private InfiniBand networks.
Lower Minimum Size for CDH Clusters
The minimum recommended CDH cluster size for a production environment is now five nodes. For development purposes, the Oracle Big Data Appliance Configuration Generation Utility now enables you to create three-node CDH clusters.
Note that Oracle Big Data Appliance Starter Rack is still sold with six servers.
The following are changes in Oracle Big Data Appliance Release 4 (4.4):
Software Updates
CDH (Cloudera's Distribution including Apache Hadoop) 5.5.1
CDM (Cloudera Manager) 5.5.1
Cloudera Navigator 2.4.1
MySQL Database Enterprise Server - Advanced Edition 5.6
Oracle Big Data Connectors 4.4
Oracle Data Integrator Agent 12.2.1 (for Oracle Big Data Connectors)
Oracle NoSQL Database 3.5.2
Perfect Balance 2.6
Hardware Updates
Oracle Big Data Appliance X6-2 server
2 x 22-core (2.2GHz) Intel® Xeon® E5-2699 v4 processors.
8 x 32 GB DDR4-2400 memory (expandable to maximum of 768 GB per node).
In Oracle Big Data Appliance Release 4.4 or greater, X6-2 nodes can be mixed with X5-2 nodes (and older Release 4.4–compatible nodes) in a CDH or NoSQL cluster. The X6-2 server is not compatible as a node of an Oracle Big Data Appliance cluster in releases previous to 4.4.
See the Oracle Big Data Appliance X6-2 Data Sheet for more details.
New Features
Cloudera CDH 5.5.1 and Cloudera Manager 5.5.1
CDH 5.5.1 is a maintenance release on top of CDH 5.5. See the Cloudera CDH 5.5 Release Notes
For information on Cloudera Manager 5.5 and 5.5.1, see New Features and Changes in Cloudera Manager 5
Automated Installation for Cloudera Navigator
Mammoth now provides an automated installation for Cloudera Navigator in both a full Mammoth installation and Mammoth upgrade. No user intervention is required and the installation occurs transparently. If Cloudera Navigator is not already installed, Mammoth installs the software on node 3 of the cluster, which is where other Cloudera Management services are hosted. If Cloudera Navigator is already installed, Mammoth skips this step and does not overwrite the existing installation.
The Cloudera Navigator Metadata Server and Audit Server are automatically added to Cloudera Manager and auditing is enabled. Mammoth also enables Web UI encryption for the Audit Server.
Mammoth does not enable the Cloudera Navigator key management components.
Support for Oracle Big Data SQL 3.0
Oracle Big Data Appliance Release 4.4 includes Oracle Big Data SQL 2.0 as a Mammoth installation option. Oracle Big Data SQL 3.0 is also available for Release 4.4, as a patch. See the Oracle Big Data SQL User's Guide for Release 3.0 installation instructions.
Note:
If you want to install Oracle Big Data SQL 3.0, do not select Oracle Big Data SQL 2.0 in the Mammoth installation. If Oracle Big Data SQL 2.0 is installed, you must uninstall it prior to installing the 3.0 patch. The patch README file includes steps for removing 2.0 if you have previously installed it.Release 4.4 as an Update to a Earlier Base Image
Mammoth 4.4.0 can run on top of any earlier Oracle Big Data Appliance 4.x base image and will update the base image software as needed.
The following are changes in Oracle Big Data Appliance release 4 (4.3):
Software Updates
CDH (Cloudera's Distribution including Apache Hadoop) 5.4.7
CDM (Cloudera Manager) 5.4.7
Oracle Big Data Connectors 4.3
Oracle Big Data Discovery 1.1.1
Oracle Big Data SQL 2.0
Oracle NoSQL Database 1.3.4.7 (Community and Enterprise Edition)
Oracle Table Access for Hadoop and Spark
Perfect Balance 2.5
JDK 8u60
See Oracle Big Data Appliance Software User's Guide.
New Features
Automatic Installation for Oracle Big Data Discovery
Customers can download Big Data Discovery 1.1.1 and then use the bdacli command line utility to install the software on a designated node of the primary CDH cluster.
Oracle Table Access for Hadoop and Spark
Oracle Table Access for Hadoop and Spark is an Oracle Big Data Appliance feature that converts Oracle Database tables into Hadoop or Spark data sources. This feature enables fast and secure access to data in the Oracle Database.
HDFS Transparent Encryption
Oracle Big Data Appliance 4.3 provides the option to use HDFS Transparent Encryption. This replaces the eCryptfs on-disk encryption software provided with previous releases. Customers can enable HDFS Transparent Encryption for both new and pre-existing CDH clusters. When enabled, HDFS Transparent Encryption secures Hadoop operations running on the cluster (including HDFS, MapReduce on YARN, Spark on YARN, Hive, and Hbase tasks).
The Oracle Big Data Appliance Configuration Generation Utility provides an option to include HDFS Transparent Encryption when a new cluster is created.
HDFS Transparent Encryption can be enabled or disabled on a cluster via the bdacli command line interface.
HTTPS / Network Encryption
Provides HTTPS for Cloudera Manager, Hue, Oozie, and Hadoop Web UIs.
Enables network encryption for other internal Hadoop data transfers, such as those made through YARN shuffle and RPC.
Like HDFS Transparent Encryption, HTTPS/ Network Encryption is an option in the Oracle Big Data Appliance Configuration Generation Utility, and can also be enabled via bdacli.
Zero Downtime for Upgrades, One-Off Patches, and Cluster Extensions
In Release 4.3, Oracle Big Data Appliance leverages Cloudera’s Rolling Upgrades functionality to keep clusters operational during Mammoth upgrades, patches, and cluster extensions. This is an installation option that allows certain services on a cluster to remain continuously available while each node in the cluster is upgraded and rebooted. Zero Downtime is an option for the following tasks:
Upgrades of the Mammoth software (including Cloudera's Distribution Including Apache Hadoop, Cloudera Manager, and the Mammoth software itself).
One-off patches of Mammoth-installed software.
Cluster extensions. (For cluster extensions within a single rack, rolling upgrades are not optional. These extensions are always done as rolling upgrades.)
Deprecated Features
The following features are deprecated in this release, and may be desupported in a future release:
MapReduce 1 (MRv1)
YARN (MRv2) supersedes MRv1. Users who want to continue to use MRv1 on Oracle Big Data Appliance versions 3.x and 4.x should contact Oracle Support before using Mammoth to patch or upgrade the software.
Desupported Features
The following features are no longer supported as of this release:
eCryptfs On-Disk Encryption
This has been replaced by HDFS Transparent Encryption.
The following are changes in Oracle Big Data Appliance release 4 (4.2):
New Features
Software Upgrades
Cloudera's Distribution including Apache Hadoop 5.4.0
Cloudera Manager 5.4.0
Perfect Balance 2.4.0
Oracle Big Data SQL 1.1
No SQL Database 3.2.5
Oracle Linux 6.6 and 5.11
JDK 8u45
Hardware Upgrades
Oracle Big Data Appliance is now shipped with 8 TB disk drives
Elastic Configuration
Oracle Big Data Appliance now provides the flexibility of adding one or more servers on a starter rack using Big Data Appliance X5-2 High Capacity Nodes plus InfiniBand Infrastructure. You can add up to 12 additional servers on a starter rack.
Automatic Installation Support
Spark-on-YARN is deployed automatically
Oracle Spatial and Graph is installed and configured automatically
Oracle Big Data SQL 1.1
Copy to BDA
This utility enables you to copy relatively static tables from an Oracle database into Hadoop, with the purpose of improving query times.
Oracle NoSQL Database Support
Oracle databases on Oracle Exadata Database Machine can use Oracle Big Data SQL to connect to clusters running Oracle NoSQL Database.
Parquet Support
CDH 5.2 and later versions include Hive 0.13, which supports the Apache Parquet file format. This file format is used by Cloudera Impala and other Hadoop software.
Other Changes
Oracle Big Data Appliance X5-2
Oracle Big Data Appliance 4.2 software supports Oracle Big Data Appliance X5-2 and earlier version server hardware.
See "Server Components".
Oracle Big Data Appliance Configuration Generation Utility
This utility generates two new configuration files:
network.json
: Supersedes BdaDeploy.json
. For software upgrades, Mammoth converts the existingBdaDeploy.json
to network.json
. New installations must have network.json
.
networkexpansion.json
: Supersedes BdaExpansion.json
.
CDH Deployment
Mammoth uses parcels instead of RPMs to deploy CDH.
Apache Sentry
Installation of Apache Sentry does not require sentry-provider.ini
as a prerequisite.
Microsoft Active Directory Server in Mammoth
Support for directly using Microsoft Active Directory named as Active Directory Kerberos in Mammoth.
Oracle Linux Support
Oracle Linux 5 support for Oracle Big Data Appliance X5-2 servers.
Cloudera Navigator Trustee Server
Cloudera Navigator Trustee Server installer package and documentation are now shipped in Mammoth. It must be manually installed on a separate server.
Deprecated Features
The following features are deprecated in this release, and may be desupported in a future release:
Mammoth Reconfiguration Utility
The bdacli
utility supersedes mammoth-reconfig
. The mammoth-reconfig
utility is only needed to change the disk encryption password.
See "bdacli".
MapReduce 1 (MRv1)
YARN (MRv2) supersedes MRv1. Users who want to continue to use MRv1 on Oracle Big Data Appliance versions 3.x and 4.x should contact Oracle Support before using Mammoth to patch or upgrade the software.
Disk Encryption
A new encryption system that is more flexible and robust will replace the current system in an upcoming release.
The following are changes in Oracle Big Data Appliance release 4 (4.1):
New Features
Software Upgrades
Cloudera's Distribution including Apache Hadoop 5.3.0
Cloudera Manager 5.3.0
Perfect Balance 2.3.0
Oracle Big Data SQL 1.1
Oracle Big Data Connectors 4.1
Oracle Linux 6.5
Oracle Big Data SQL 1.1
Copy to BDA
This utility enables you to copy relatively static tables from an Oracle database into Hadoop, with the purpose of improving query times.
Oracle NoSQL Database Support
Oracle databases on Oracle Exadata Database Machine can use Oracle Big Data SQL to connect to clusters running Oracle NoSQL Database.
Parquet Support
CDH 5.2 and later versions include Hive 0.13, which supports the Apache Parquet file format. This file format is used by Cloudera Impala and other Hadoop software.
Oracle NoSQL Database
The bdacli admin_cluster
command supports Oracle NoSQL Database nodes that require repair or replacement.
Other Changes
Oracle Big Data Appliance X5-2
Oracle Big Data Appliance 4.1 software supports the Oracle Big Data Appliance X5-2 server hardware.
See "Server Components".
Oracle Big Data Appliance Configuration Generation Utility
This utility generates two new configuration files:
network.json
: Supersedes BdaDeploy.json
. For software upgrades, Mammoth converts the existingBdaDeploy.json
to network.json
. New installations must have network.json
.
networkexpansion.json
: Supersedes BdaExpansion.json
.
CDH Deployment
Mammoth uses parcels instead of RPMs to deploy CDH.
Apache Sentry
Installation of Apache Sentry does not require sentry-provider.ini
as a prerequisite.
Deprecated Features
The following features are deprecated in this release, and may be desupported in a future release:
Mammoth Reconfiguration Utility
The bdacli
utility supersedes mammoth-reconfig
. The mammoth-reconfig
utility is only needed to change the disk encryption password.
See "bdacli".
MapReduce 1 (MRv1)
YARN (MRv2) supersedes MRv1. Users who want to continue to use MRv1 on Oracle Big Data Appliance versions 3.x and 4.x should contact Oracle Support before using Mammoth to patch or upgrade the software.
Disk Encryption
A new encryption system that is more flexible and robust will replace the current system in an upcoming release.
The following are changes in Oracle Big Data Appliance release 4 (4.0):
New Features
Oracle Big Data SQL 1.0.0
Oracle Big Data SQL supports queries against vast amounts of big data stored in multiple data sources, including HDFS and Hive. You can view and analyze data from various data stores together, as if it were all stored in an Oracle database. Support for Oracle Big Data SQL includes the following new features in Oracle Database:
DBMS_HADOOP
PL/SQL package
Hive static data dictionary views
Access drivers for Hadoop and Hive
Oracle Big Data SQL is an installation option, which you can specify using the Oracle Big Data Appliance Configuration Generation Utility.
You can monitor and manage Oracle Big Data SQL using the bdacli
command and Cloudera Manager.
See "bdacli" and Oracle Big Data Appliance Software User's Guide.
Service Migration
The bdacli
utility can migrate services from a failing critical node to a healthy noncritical node. It can also remove failing critical and noncritical nodes from a cluster, and restore them to the cluster after repairs. See "bdacli" and Oracle Big Data Appliance Software User's Guide.
Software Upgrades
Cloudera's Distribution including Apache Hadoop 5.1.0
Cloudera Manager 5.1.1
Perfect Balance 2.2.0
Oracle Data Integrator Agent 12.1.3.0 (for Oracle Big Data Connectors)
Oracle NoSQL Database Zone Support
The Oracle Big Data Appliance Configuration Generation Utility and the mammoth -e command support multiple zones on Oracle NoSQL Database clusters. You can add nodes to an existing zone, or create a new primary or secondary zones.
See "Oracle NoSQL Configuration" and "Mammoth Software Installation and Configuration Utility".
Multiple Rack Clusters
You can now install a cluster on multiple racks using one cluster_name-config.json file.