Overview

Big Data Service provides enterprise-grade Hadoop as a service, with end-to-end security, high performance, and ease of management and upgradeability.

Big Data Service is an Oracle Cloud Infrastructure service designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Big Data Service scales to meet an organization's requirements at a low cost and with the highest levels of security.

Note

The data at rest in Block Volumes used by the Big Data Service service is encrypted by default.

Big Data Service includes:

An Hadoop stack that includes an installation of Oracle Distribution including Apache Hadoop (ODH). ODH includes Apache Ambari, Apache Hadoop, Apache HBase, Apache Hive, Apache Spark, and other services for working with and securing big data.

For a detailed list of what's in ODH, see About Oracle Distribution Including Apache Hadoop (ODH).
Oracle Cloud Infrastructure features and resources, including identity management, networking, compute, storage, and monitoring.
A REST API for creating and managing clusters.
The ability to create clusters of any size, based on native Oracle Cloud Infrastructure shapes. For example, you can create small, short-lived clusters in flexible virtual environments, very large, long-running clusters on dedicated hardware, or any combination between.
Optional secure, high availability (HA) clusters.
Oracle Cloud SQL integration, for analyzing data across Apache Hadoop, Apache Kafka, NoSQL, and object stores using Oracle SQL query language.
Full access to customize what is deployed on your Big Data Service clusters.
Big Data Service releases patches that are visible in the OCI Console. These patches must be applied to keep your Big Data Service clusters up to date and supported. See Patching in Big Data Service for more details on the Big Data Service release patch.

About Oracle Distribution Including Apache Hadoop (ODH)

ODH is built from the ground up, natively integrated into Oracle's data platform. ODH is fully managed, with the same Hadoop components you know and build on today. ODH is available as versions ODH 2.x and ODH 1.x.

For more information, see:

Note

Apache Hive supports functions for data masking which may include weak algorithms. For strong encryption algorithm custom functions can be written. For more information see Apache Hive UDF Reference at: hive/languagemanual+udf.

See Big Data Service About Oracle Distribution Including Apache Hadoop (ODH) for details of components included in each version of ODH.

Big Data Service Release and Patch Versions

Big Data Service releases software feature updates and patches in a quarterly cadence. The software feature updates and patches can include one or more of ODH (Oracle Distribution for Hadoop) updates including component version updates and bug fixes, CVE (Common Vulnerabilities and Exposures) fixes, OS (Operating System) updates, OS upgrades and OS bug fixes.

For the latest releases, see Big Data Service release notes.

Big Data Service users are supported if their Big Data Service software version is either the latest Big Data Service release (N), or one version older than the latest Big Data Service release (N-1) or two versions older than the latest Big Data Service release (N-2).

The following table lists the Big Data Service release and patch versions for each release.


Big Data Service Release	ODH Version	JDK Version	OS Version	Linux Version
3.1.0	ODH 2.1.0.22	JDK 1.8.0_431	OS 2.1.0	OL8.10
3.0.29	ODH 2.0.10.22	JDK 1.8.0_411	OS 1.29.0	OL7.9
3.0.28	ODH 2.0.9.41 ODH 1.1.13.21	JDK 1.8.0_411	OS 1.28.0	OL7.9
3.0.27	ODH 2.0.8.45 ODH 1.1.12.16 ODH 0.9.10.6	JDK 1.8.0_411	OS 1.27.0	OL7.9
3.0.26	ODH 2.0.7.11 ODH 1.1.11.7 ODH 0.9.9.7	JDK 1.8.0_381	OS 1.26.0	OL7.9
3.0.25	ODH 2.0.6.5 ODH 1.1.10.4 ODH 0.9.8.3	JDK 1.8.0_381	OS 1.25.0	OL7.9

ODH 2.x Based on Apache Hadoop 3.3.3

The following table lists the components included in ODH and their versions.


Component	Version
Apache Ambari	2.7.5
Apache Flink	1.15.2
Apache Flume	1.10.0
Apache Hadoop (HDFS, YARN, MR)	3.3.3
Apache HBase	2.4.13
Apache Hive	3.1.3
Apache Hue	4.10.0
Apache JupyterHub	2.1.1
Apache Kafka	3.2.0
Apache Livy	0.7.1
Apache Oozie	5.2.1
Apache Parquet MR	1.10
Apache Ranger and InfrSolr	2.3.0 and 0.1.0
Apache Spark	3.2.1
Apache Sqoop	1.4.7
Apache Tez	0.10.2
Apache Zookeeper	3.7.1
Kerberos	1.1-15
ODH Utilities	1.0
Schema Registry	1.0.0
Trino	389
Additional value added service
ORAAH	included

ODH 1.x Based on Apache Hadoop 3.1

The following table lists the components included in ODH 1.x and their versions.


Component	Version
Apache Ambari	2.7.5
Apache Flink	1.15.2
Apache Flume	1.10.0
Apache Hadoop (HDFS, YARN, MR)	3.1.2
Apache HBase	2.2.6
Apache Hive	3.1.2
Apache Hue	4.10.0
Apache JupyterHub	2.1.1
Apache Kafka	3.2.0
Apache Livy	0.7.1
Apache Oozie	5.2.0
Apache Parquet MR	1.10
Apache Ranger and InfrSolr	2.1.0 and 0.1.0
Apache Spark	3.0.2
Apache Sqoop	1.4.7
Apache Tez	0.10.0
Apache Zookeeper	3.5.9
Kerberos	1.1-15
ODH Utilities	1.0
Schema Registry	1.0.0
Trino	360
Additional value added service
ORAAH	included

Accessing Big Data Service

You access Big Data Service using the Console, OCI CLI, REST APIs, or SDKs.

The OCI Console is an easy-to-use, browser-based interface. To access the Console, you must use a supported browser.
The OCI CLI provides both quick access and full functionality without the need for programming. Use the Cloud Shell environment to run your CLIs.
The REST API documentation provide the most functionality, but require programming expertise. API Reference and Endpoints provide endpoint details and links to the available API reference documents including the Big Data Service API.
OCI provides SDKs that interact with Big Data Service without the need to create a framework.

Resource Identifiers

Big Data Service resources, like most types of resources in Oracle Cloud Infrastructure, have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID).

For information about the OCID format and other ways to identify your resources, see Resource Identifiers.

Regions and Availability Domains

Regions and availability domains indicate the physical and logical organization of your Big Data Service resources. A region is a localized geographic area, and an availability domain is one or more data centers located within a region.

For the latest information on the regions where Big Data Service, Oracle Cloud SQL, and related services are available, see Data Regions for Oracle Cloud Infrastructure and Platform Services.

Service Limits

When you sign up for Oracle Cloud Infrastructure (OCI), a set of service limits is configured for your tenancy. The service limit is the quota or allowance set on a resource. These limits might be increased for you automatically based on your OCI resource usage and account standing. See Service Limits.

Default Service Limits

Among the limits set on your tenancy are limits on the number of Big Data Service cluster nodes you can create. More specifically, you're restricted to a certain number of nodes of a certain shape.

The following table shows the default limits to various cluster shapes. These are your limits if you didn't make other arrangements when you bought your subscription and if you haven't already asked for an increase.


Resource	Monthly universal credits	Pay-as-you-go
VM.Standard2.1	12 instances (12 OCPUs)	8 instances (8 OCPUs)
VM.Standard2.2	12 instances (24 OCPUs)	8 instances (16 OCPUs)
VM.Standard2.4	12 instances (48 OCPUs)	8 instances (32 OCPUs)
VM.Standard2.8	8 instances (64 OCPUs)	Contact us
VM.Standard2.16	8 instances (128 OCPUs)	Contact us
VM.Standard2.24	8 instances (192 OCPUs)	Contact us
VM.DenseIO2.8 VM.DenseIO2.16 VM.DenseIO2.24 VM.DenseIO.E4 BM.HPC2.36 BM.DenseIO2.52 BM.DenseIO.E4 BM.Optimized3 BM.Standard2.52 BM.Standard3.64 BM.Standard.E4	Contact us	Contact us

Units Shown

In practice, you increase the number of nodes, or instances, in a cluster. ("Nodes" and "instances" mean the same thing in this context. OCI services usually use the term "instance," but Big Data Service follows the Hadoop convention of using the term "node.")

However, the limits are usually expressed as a number of Oracle Compute Units (OCPUs). Each type of Big Data Service node shape has a set number of OCPUs. The number after the decimal in the node shape name indicates the number of OCPUs in a single node of that shape. For example, a VM.Standard2.1 node has one OCPU, a VM.Standard2.4 node has four OCPUs, and a BM.DenseIO2.52 node has 52 OCPUs.

For example, if your subscription uses monthly universal credits, the default limit for node shape VM.Standard2.4 is 48 OCPUs, which equals 12 nodes. The calculation is as follows: 48 OCPUs service limit divided by 4 OCPUs per node equals 12 nodes.

Finding Tenancy Limits

Note

You must have permission to view limits and usage. See "To view your tenancy's limits and usage" under Service Limits.

To view limits and usage, see Viewing Your Service Limits, Quotas, and Usage.

Requesting a Service Limit Increase

To request a service limit increase, see Requesting a service limit increase.

Service Quotas

Big Data Service administrators can set quota policies to enforce restrictions on users by limiting the resources that they can create.

For information about how Oracle Cloud Infrastructure handles quotas, see Overview of Compartment Quotas.

Use the following information to create quotas:

Service name:big-data

Quotas:


Quota Name	Scope	Description
vm-standard-2-1-ocpu-count	Regional	Number of VM.Standard2.1 OCPUs
vm-standard-2-2-ocpu-count	Regional	Number of VM.Standard2.2 OCPUs
vm-standard-2-4-ocpu-count	Regional	Number of VM.Standard2.4 OCPUs
vm-standard-2-8-ocpu-count	Regional	Number of VM.Standard2.8 OCPUs
vm-standard-2-16-ocpu-count	Regional	Number of VM.Standard2.16 OCPUs
vm-standard-2-24-ocpu-count	Regional	Number of VM.Standard2.24 OCPUs
vm-dense-io-2-8-ocpu-count	Regional	Number of VM.DenseIO2.8 OCPUs
vm-dense-io-2-16-ocpu-count	Regional	Number of VM.DenseIO2.16 OCPUs
vm-dense-io-2-24-ocpu-count	Regional	Number of VM.DenseIO2.24 OCPUs
bm-hpc2-36-ocpu-count	Regional	Number of BM.HPC2.36 OCPUs
bm-dense-io-2-52-ocpu-count	Regional	Number of BM.DenseIO2.52 OCPUs
bm-standard-2-52-ocpu-count	Regional	Number of BM.Standard2.52 OCPUs

Big Data Service quota policy examples:

Limit the number of VM.Standard2.4 OCPUs that users can allocate to services they create in the mycompartment compartment to 40.

Set big-data quota vm-standard-2-4-ocpu-count to 40in Compartment mycompartment
Limit the number of BM.DenseIO2.52 OCPUs that users can allocate to services they create in the testcompartment compartment to 20.

Set big-data quota bm-dense-io-2-52-ocpu-count to 20 in Compartment testcompartment
Don't allow users to create any VM.Standard2.4 OCPUs in the examplecompart compartment.

Zero big-data quota vm-standard-2-4-ocpu-count in Compartment examplecompart

Integrated OCI Services

Big Data Service is integrated with various OCI services and features.

Big Data Service is integrated with OCI Search. Search lets you find resources within a tenancy and important information about clusters and configuration objects, such as API keys, metastore configurations, lake configurations.

Examples of search queries:

Example 1: Search for all Big Data Service resources

query bigdataservice resources

Example 2: Search for all active Big Data Service clusters

query bigdataservice resources where lifecycleState = 'ACTIVE'

Big Data Service is fully integrated with OCI Search and supports specific resource types.


Resource Type	Supported Fields
`BigDataService`	`id` `compartmentId` `displayName` `lifecycleState` `freeformTags` `definedTags` `timeCreated` `clusterVersion` `isHighAvailability` `isSecure` `isCloudSqlConfigured` `clusterDetails` `nodes` `numberOfNodes` `kmsKeyId` See BdsInstance Reference.
`BigDataServiceApiKey`	`id` `compartmentId` `displayName` `lifecycleState` `freeformTags` `definedTags` `timeCreated` `userId` `bdsId` `keyAlias` See BdsApiKey Reference.
`BigDataServiceMetastoreConfig`	`id` `compartmentId` `displayName` `lifecycleState` `freeformTags` `definedTags` `timeCreated` `bdsApiKeyId` `bdsId` `metastoreId` `metastoreType` See BdsMetastoreConfiguration Reference.
`BigDataServiceLakehouseConfig`	`id` `compartmentId` `displayName` `lifecycleState` `freeformTags` `definedTags` `timeCreated` `bdsApiKeyId` `bdsId` `lakehouseId`

Service Events

Certain actions performed on Big Data Service clusters emit events.

You can define rules that trigger a specific action when an event occurs. For example, you might define a rule that sends a notification to administrators when someone deletes a resource. See Overview of Events and Getting Started with Events.

The following table lists Big Data Service event types.


Friendly Name	Event Type
Create Instance Begin	`com.oraclecloud.bds.cp.createinstance.begin`
Create Instance End	`com.oraclecloud.bds.cp.createinstance.end`
Terminate Instance Begin	`com.oraclecloud.bds.cp.terminateinstance.begin`
Terminate Instance End	`com.oraclecloud.bds.cp.terminateinstance.end`
Add Worker Node Begin	`com.oraclecloud.bds.cp.addnode.begin`
Add Worker Node End	`com.oraclecloud.bds.cp.addnode.end`
Add Block Storage Begin	`com.oraclecloud.bds.cp.addblockstorage.begin`
Add Block Storage End	`com.oraclecloud.bds.cp.addblockstorage.end`
Configure Cloud SQL Begin	`com.oraclecloud.bds.cp.addcloudsql.begin`
Configure Cloud SQL End	`com.oraclecloud.bds.cp.addcloudsql.end`
Disable Cloud SQL Begin	`com.oraclecloud.bds.cp.removecloudsql.begin`
Disable Cloud SQL End	`com.oraclecloud.bds.cp.removecloudsql.end`
Disasble ODH Service Certificate Begin	`com.oraclecloud.bds.cp.disableodhservicecertificate.begin`
Disable ODH Service Certificate End	`com.oraclecloud.bds.cp.disableodhservicecertificate.end`
Enable ODH Service Certificate Begin	`com.oraclecloud.bds.cp.enableodhservicecertificate.begin`
Enable ODH Service Certificate End	`com.oraclecloud.bds.cp.enableodhservicecertificate.end`
Renew ODH Service Certificate Begin	`com.oraclecloud.bds.cp.renewodhservicecertificate.begin`
Renew ODH Service Certificate End	`com.oraclecloud.bds.cp.renewodhservicecertificate.end`

Asynchronous Work Requests

The following Big Data Service operations create work requests. You can view these work requests in a Big Data Service cluster's detail page.


Big Data Service API	Work Request Operation	Work Request Status Options
CreateBdsInstance UpdateBdsInstance DeleteBdsInstance AddBlockStorage AddWorkerNodes AddCloudSql RemoveCloudSql ChangeBdsInstanceCompartment ChangeShape RestartNode AddAutoScalingConfiguration UpdateAutoScalingConfiguration, RemoveAutoScalingConfiguration	CREATE_BDS UPDATE_BDS DELETE_BDS ADD_BLOCK_STORAGE ADD_WORKER_NODES ADD_CLOUD_SQL REMOVE_CLOUD_SQL CHANGE_COMPARTMENT_FOR_BDS CHANGE_SHAPE RESTART_NODE UPDATE_INFRA UPDATE_INFRA UPDATE_INFRA	`ACCEPTED` `IN_PROGRESS` `FAILED` `SUCCEEDED` `CANCELING` `CANCELED`

References:

Oracle Cloud Infrastructure Documentation