About Oracle Big Data Cloud Service

An entitlement to Oracle Big Data Cloud Service gives you access to the resources of a preconfigured Oracle Big Data environment, including a complete installation of the Cloudera Distribution Including Apache Hadoop (CDH) and Apache Spark. Use Oracle Big Data Cloud Service to capture and analyze the massive volumes of data generated by social media feeds, e-mail, web logs, photographs, smart meters, sensors, and similar devices.

Note:

Oracle Big Data Cloud Service is offered on Oracle Cloud, using state-of-the-art Oracle-managed data centers. You can also choose Oracle Big Data Cloud at Customer, which provides Oracle Big Data Cloud Service hosted in your data center.

When you set up your Oracle Big Data Cloud Service, you can create a cluster of 3 to 60 nodes, consisting of Oracle Compute Units (OCPU), memory, and storage. All clusters must start with the 3–node starter pack and can have up to 57 additional permanent nodes. Those nodes can be added when creating the cluster or later. You can also temporarily extend the processing power (OCPUs) and memory of the cluster by adding cluster compute nodes (“bursting”). Oracle manages the whole hardware and networking infrastructure as well as the initial setup, while you have complete administrator’s control of the software.

All nodes in an Oracle Big Data Cloud Service instance form a cluster.

Software

An Oracle Big Data Cloud Service entitlement includes the following:

  • Oracle Linux operating system

  • Cloudera Distribution Including Apache Hadoop (CDH)

    CDH has a batch processing infrastructure that can store files and distribute work across a set of computers. Data is processed on the same computer where it is stored. In a single Oracle Big Data Cloud Service cluster, CDH distributes the files and workload across a number of servers, which compose a cluster. Each server is a node in the cluster.

    CDH includes:

    • File system: The Hadoop Distributed File System (HDFS) is a highly scalable file system that stores large files across multiple servers. It achieves reliability by replicating data across multiple servers without RAID technology. It runs on top of the Linux file system.

    • MapReduce engine: The MapReduce engine provides a platform for the massively parallel execution of algorithms written in Java. Oracle Big Data Cloud Service runs YARN by default.

    • Administrative framework: Cloudera Manager is a comprehensive administrative tool for CDH.

    • Apache projects: CDH includes Apache projects for MapReduce and HDFS, such as Hive, Pig, Oozie, ZooKeeper, HBase, Sqoop, and Spark.

    • Cloudera applications: Oracle Big Data Cloud Service includes all products included in Cloudera Enterprise Data Hub Edition, including Impala, Search, and Navigator.

    Several CDH utilities and other software available on Oracle Big Data Cloud Service provide graphical, web-based, and other language interfaces for ease of use.

  • Built-in utilities for managing data and resources.

  • Oracle Big Data Connectors, which facilitate access to data stored in an Apache Hadoop cluster.

    Included are:
    • Oracle SQL Connector for Hadoop Distributed File System

    • Oracle Loader for Hadoop

    • Oracle XQuery for Hadoop

    • Oracle R Advanced Analytics for Hadoop

    • Oracle Data Integrator Enterprise Edition

  • Oracle Big Data Spatial and Graph, which provides advanced spatial and graph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms.

Oracle Big Data SQL Cloud Service (Optional)

You can optionally integrate Oracle Big Data SQL Cloud Service. As a prerequisite, you must have an entitlement for the Oracle Big Data SQL Cloud Service add-on to Oracle Big Data Cloud Service and an entitlement for Oracle Database Exadata Cloud Service. For more information, contact an Oracle Sales Representative.

Oracle Cloud Infrastructure Object Storage Classic Integration (Optional)

If you have an entitlement for Oracle Cloud Infrastructure Object Storage Classic, you can integrate the storage with Oracle Big Data Cloud Service. For information about the storage service, see Oracle Cloud Infrastructure Object Storage Classic Get Started.