Big Data Overview

Big Data provides enterprise-grade Hadoop as a service, with end-to-end security, high performance, and ease of management and upgradeability.

Big Data is an Oracle Cloud Infrastructure service designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Big Data scales to meet an organization’s requirements at a low cost and with the highest levels of security.

Big Data includes:

  • A choice of Hadoop technology stacks. You can choose to create a cluster based on either of the following:
    • A Hadoop stack that includes an installation of Oracle Distribution including Apache Hadoop (ODH). ODH includes Apache Ambari, Apache Hadoop, Apache HBase, Apache Hive, Apache Spark, and other services for working with and securing big data.

      For a detailed list of what’s in ODH, see About Oracle Distribution including Apache Hadoop (ODH).

    • A Hadoop stack that includes complete installation of the Cloudera Distribution including Apache Hadoop (CDH). CDH includes Cloudera Manager, Apache Flume, Apache Hadoop, Apache HBase, Apache Hive, Apache Hue, Apache Kafka, Apache Pig, Apache Sentry, Apache Solr, Apache Spark, and other services for working with and securing big data.

      The current version of Big Data includes CDH 6.3.3. See CDH 6.3.3 Packaging in the "Cloudera Enterprise 6.x Release Notes" for a complete list of the included components.

  • Oracle Cloud Infrastructure features and resources, including identity management, networking, compute, storage, and monitoring.
  • A REST API for creating and managing clusters.
  • bda-oss-admin command line interface for managing storage providers.
  • odcp command line interface for copying and moving data.
    Note

    ODCP is only available in clusters that use Cloudera Distribution including Hadoop.
  • The ability to create clusters of any size, based on native Oracle Cloud Infrastructure shapes. For example, you can create small, short-lived clusters in flexible virtual environments, very large, long-running clusters on dedicated hardware, or any combination between.
  • Optional secure, high availablity (HA) clusters.
  • Oracle Cloud SQL integration, for analyzing data across Apache Hadoop, Apache Kafka, NoSQL, and object stores using Oracle SQL query language.
  • Full access to customize what is deployed on your Big Data clusters.

About Oracle Distribution including Apache Hadoop (ODH)

ODH is built from the ground up, natively integrated into Oracle's data platform. It is fully managed, with the same Hadoop components you know and build on today.

The table below lists the components included in ODH and their versions.

Component Version
Apache Ambari 2.7.5
Apache Hadoop 3.1.2
Apache HBase 2.2.6
Apache Hive 3.1.2
Apache Oozie 5.2.0
Apache Spark + HistoryServer 3.0.2
Apache Sqoop 1.4.7
Apache Tez 0.10.0
Apache Zookeeper 3.4.14

Ways to Access Big Data

You access Big Data using the Console, OCI CLI, REST APIs, or SDKs.

  • The OCI Console is an easy-to-use, browser-based interface. To access the Console, you must use a supported browser.
  • The OCI CLI provides both quick access and full functionality without the need for programming. Use the Cloud Shell environment to run your CLIs.
  • The REST APIs provide the most functionality, but require programming expertise. API Reference and Endpoints provide endpoint details and links to the available API reference documents including the Big Data Service API.
  • OCI provides SDKs that interact with Big Data without the need to create a framework.

Additional Resources

Take a Getting Started Workshop to Learn Big Data

If you're new to Big Data and want to get up and running quickly, try one of the Using Cloudera Distribution including Hadoop with Big Data workshops. (There's one for a highly-available (HA) cluster and one for a non-HA cluster.) A series of step-by-step labs guide you through the process of setting up a simple environment and creating a small cluster.

Get started with Big Data (HA Cluster)
Learn about Big Data. Set up the Oracle Cloud Infrastructure environment and create a highly available (HA) and secure cluster with Cloud SQL support.
Get started with Big Data (Non-HA Cluster)
Learn about Big Data. Set up the Oracle Cloud Infrastructure environment and create a non-HA cluster with Cloud SQL support.
Use a load balancer to access services on Big Data (HA Cluster)
Learn about Big Data. Set up the Oracle Cloud Infrastructure environment and create a highly available (HA) and secure cluster with Cloud SQL support.
Use a load balancer to access services on Big Data (Non-HA Cluster)
Create a load balancer to be used as a front end for securely accessing Cloudera Manager, Hue, and Oracle Data Studio on your non-highly-available (non-HA) Big Data cluster.
Connecting Oracle DataSource for Apache Hadoop and Big Data to Autonomous Data Warehouse
Learn how to connect Oracle DataSource for Apache Hadoop(OD4H) on Big Data to Autonomous Data Warehouse (ADW).

Learn Hadoop

Watch the videos on this Oracle Learning playlist to learn about Apache Hadoop and the Hadoop Ecosystem.

Resource Identifiers

Big Data resources, like most types of resources in Oracle Cloud Infrastructure, have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID).

For information about the OCID format and other ways to identify your resources, see Resource Identifiers.

Service Limits

Big Data has the following default limits for paid accounts in all regions:

Resource Monthly Universal Credits Pay-as-You-Go
VM.Standard2.1 12 instances (12 OCPUs) 8 instances (8 OCPUs)
VM.Standard2.2 12 instances (24 OCPUs) 8 instances (16 OCPUs)
VM.Standard2.4 12 instances (48 OCPUs) 8 instances (32 OCPUs)
VM.Standard2.8 8 instances (64 OCPUs) Contact us
VM. Standard2.16 8 instances (128 OCPUs) Contact us
VM.Standard2.24 8 instances (192 OCPUs) Contact us

VM.DenseIO2.8

VM.DenseIO2.16

VM.DenseIO2.24

BM.HPC2.36

BM.DenseIO2.52

BM.Standard2.52

Contact us Contact us

Big Data Service also has the following limits for trial accounts. To obtain a free trial account, go to Oracle Free Tier.

Resource Trial Accounts
VM.Standard2.1 3 instances (3 OCPUs)
VM.Standard2.4 2 instances (8 OCPUs)

For more information about service limits, see Service Limits.

To submit a request to increase your service limits, see Requesting a Service Limit Increase.

Service Quotas

Big Data administrators can set quota policies to enforce restrictions on users by limiting the resources that they can create.

For information about how Oracle Cloud Infrastructure handles quotas, see Compartment Quotas.

Use the following information to create quotas:

Service name:big-data

Quotas:
Quota Name Scope Description
vm-standard-2-1-ocpu-count Regional Number of VM.Standard2.1 OCPUs
vm-standard-2-2-ocpu-count Regional Number of VM.Standard2.2 OCPUs
vm-standard-2-4-ocpu-count Regional Number of VM.Standard2.4 OCPUs
vm-standard-2-8-ocpu-count Regional Number of VM.Standard2.8 OCPUs
vm-standard-2-16-ocpu-count Regional Number of VM.Standard2.16 OCPUs
vm-standard-2-24-ocpu-count Regional Number of VM.Standard2.24 OCPUs
vm-dense-io-2-8-ocpu-count Regional Number of VM.DenseIO2.8 OCPUs
vm-dense-io-2-16-ocpu-count Regional Number of VM.DenseIO2.16 OCPUs
vm-dense-io-2-24-ocpu-count Regional Number of VM.DenseIO2.24 OCPUs
bm-hpc2-36-ocpu-count Regional Number of BM.HPC2.36 OCPUs
bm-dense-io-2-52-ocpu-count Regional Number of BM.DenseIO2.52 OCPUs
bm-standard-2-52-ocpu-count Regional Number of BM.Standard2.52 OCPUs

Big Data quota policy examples:

  • Limit the number of VM.Standard2.4 OCPUs that users can allocate to services they create in the mycompartment compartment to 40.

    Set big-data quota vm-standard-2-4-ocpu-count to 40in Compartment mycompartment

  • Limit the number of BM.DenseIO2.52 OCPUs that users can allocate to services they create in the testcompartment compartment to 20.

    Set big-data quota bm-dense-io-2-52-ocpu-count to 20 in Compartment testcompartment

  • Don't allow users to create any VM.Standard2.4 OCPUs in the examplecompart compartment.

    Zero big-data quota vm-standard-2-4-ocpu-count in Compartment examplecompart

Integrated Services

Big Data is integrated with various services and features.

Service Events

Certain actions performed on Big Data clusters emit events.

You can define rules that trigger a specific action when an event occurs. For example, you might define a rule that sends a notification to administrators when someone deletes a resource. See Overview of Events and Getting Started with Events.

The following table lists Big Data event types.

Friendly Name Event Type
Create Instance Begin com.oraclecloud.bds.cp.createinstance.begin
Create Instance End com.oraclecloud.bds.cp.createinstance.end
Terminate Instance Begin com.oraclecloud.bds.cp.terminateinstance.begin
Terminate Instance End com.oraclecloud.bds.cp.terminateinstance.end
Add Worker Node Begin com.oraclecloud.bds.cp.addnode.begin
Add Worker Node End com.oraclecloud.bds.cp.addnode.end
Add Block Storage Begin com.oraclecloud.bds.cp.addblockstorage.begin
Add Block Storage End com.oraclecloud.bds.cp.addblockstorage.end
Configure Cloud SQL Begin com.oraclecloud.bds.cp.addcloudsql.begin
Configure Cloud SQL End com.oraclecloud.bds.cp.addcloudsql.end
Disable Cloud SQL Begin com.oraclecloud.bds.cp.removecloudsql.begin
Disable Cloud SQL End com.oraclecloud.bds.cp.removecloudsql.end
Asynchronous Work Requests

The following Big Data operations create work requests. You can view these work requests in a Big Data cluster's detail page.

Big Data API Work Request Operation Work Request Status Options

CreateBdsInstance

UpdateBdsInstance

DeleteBdsInstance

AddBlockStorage

AddWorkerNodes

AddCloudSql

RemoveCloudSql

ChangeBdsInstanceCompartment

ChangeShape

RestartNode

AddAutoScalingConfiguration

UpdateAutoScalingConfiguration,

RemoveAutoScalingConfiguration

CREATE_BDS

UPDATE_BDS

DELETE_BDS

ADD_BLOCK_STORAGE

ADD_WORKER_NODES

ADD_CLOUD_SQL

REMOVE_CLOUD_SQL

CHANGE_COMPARTMENT_FOR_BDS

CHANGE_SHAPE

RESTART_NODE

UPDATE_INFRA

UPDATE_INFRA

UPDATE_INFRA

ACCEPTED

IN_PROGRESS

FAILED

SUCCEEDED

CANCELING

CANCELED

References:

Typical Workflow

Describes the steps to start using Big Data.

Tasks Description More Information
Create and sign into your cloud account Provide your information and sign up for Oracle Cloud Infrastructure.
Set up your infrastructure Create and configure a network, create users and groups, and configure access controls and security.
Create a cluster Create a Big Data cluster through the Cloud Console, SDK, or CLI. You can create a highly-available and secure cluster with one extra click.
Access a cluster Establish connections and work with the cluster through:
  • Cloud Console
  • Secure Shell (SSH)
  • Services such as Hue, Apache Ambari, and Cloudera Manager
Scale a cluster Add worker nodes, add block storage, and change the shapes of nodes.
Explore and visualize data with notebooks Use the Big Data Studio notebook application to explore and visualize data.
Query data with Cloud SQL Add Cloud SQL to Big Data, and use it to make queries against non-relational data stored in multiple big data sources, including Apache Hive, HDFS, Oracle NoSQL Database, Apache Kafka, Apache HBase, and other object stores.