Creating and Modifying Clusters

Planning a Cluster

Before you create a cluster, understand your options.

Understand Instance Types and Shapes

Big Data cluster nodes run in Oracle Cloud Infrastructure compute instances (servers).

When you create a cluster, you choose an instance type, which determines whether the instance will run directly on the "bare metal" of the hardware or in a virtualized environment. You also choose a shape, which configures the resources assigned to the instance.

About Instance Types

There are two types of Oracle Cloud Infrastructure compute instances:

  • Bare Metal: A bare metal compute instance uses a dedicated physical server for the node, for highest performance and strongest isolation.

  • Virtual Machine (VM): Through virtualization, a VM compute instance can host multiple, isolated nodes that run on a single physical bare metal machine. VM instances are less expensive than bare metal instances and are useful for building less demanding clusters that don't require the performance and resources (CPU, memory, network bandwidth, storage) of an entire physical machine for each node.

VM instances run on the same hardware as bare metal instances, with the same firmware, software stack, and networking infrastructure.

For more information about compute instances, see Overview of Compute Service.

About Shapes

The shape determines the number of CPUs, amount of memory, and other resources allocated to the compute instance hosting the cluster node. See Supported Node Shapes in the Oracle Cloud Infrastructure documentation for the available shapes.

The shapes of the Big Data master nodes and the worker nodes don't have to match. But the shapes of all of the master nodes must match each other and the shapes of all the worker nodes must match each other.

Plan Your Cluster Layout, Shape, and Storage

Before you start the process to create a cluster, you should plan the layout of the cluster, the node shapes, and storage.

Cluster Layout

Nodes and services are organized differently on clusters, based on whether the cluster is highly available (HA) and secure, or not.

About Using HA Clusters

Use HA clusters for production environments. They're required for resiliency and to minimize downtime.

In this release, a cluster must be both HA and secure, or neither.

Types of Nodes

There are two types of nodes:

  • Master or utility nodes include the services required for the operation and management of the cluster. These nodes do not store or process data.
  • Worker nodes store and process data. The loss of a worker node doesn't affect the operation of the cluster, although it may affect performance.

High Availability (HA) Cluster Layout

A high availabiltiy cluster has two master nodes, two utility nodes, and 3 or more worker nodes.

Type of node Services on ODH Services on CDH
First master node
  • Ambari Metrics Monitor
  • HDFS Client
  • HDFS JournalNode
  • HDFS NameNode
  • HDFS ZKFailoverController
  • Hive Client
  • Kerberos Client
  • MapReduce2 Client
  • Spark3 Client
  • Spark3 History Server
  • YARN Client
  • YARN ResourceManager
  • ZooKeeper Server
  • HDFS Failover Controller
  • HDFS JournalNode
  • HDFS NameNode
  • Hive Client
  • Key Trustee KMS Key Management Server Proxy
  • Key Trustee Server Active Database
  • Key Trustee Server Active Key Trustee Server
  • Spark Client
  • Spark History Server
  • YARN (MR2 Included) JobHistory Server
  • YARN (MR2 Included) ResourceManager
  • ZooKeeper Server
Second master node
  • Ambari Metrics Monitor
  • HDFS Client
  • HDFS JournalNode
  • HDFS NameNode
  • HDFS ZKFailoverController
  • Kerberos Client
  • MapReduce2 Client
  • MapReduce2 History Server
  • Spark3 Client
  • Tez Client
  • YARN Client
  • YARN Registry DNS
  • YARN ResourceManager
  • YARN Timeline Service V1.5
  • ZooKeeper Server
  • HDFS Balancer
  • HDFS Failover Controller
  • HDFS HttpFS
  • HDFS JournalNode
  • HDFS NameNode
  • Hive Client
  • Hue Load Balancer
  • Hue Server
  • Hue Kerberos Ticket Renewer
  • Key Trustee KMS Key Management Server Proxy
  • Key Trustee Server Passive Database
  • Key Trustee Server Passive Key Trustee Server
  • YARN (MR2 Included) ResourceManager
  • ZooKeeper Server
First utility node
  • Ambari Metrics Monitor
  • Ambari Server
  • HDFS Client
  • HDFS JournalNode
  • Hive Metastore
  • HiveServer2
  • Kerberos Client
  • MapReduce2 Client
  • Oozie Server
  • Spark3 Client
  • Tez Client
  • YARN Client
  • ZooKeeper Client
  • ZooKeeper Server
  • HDFS Client
  • HDFS JournalNode
  • Hive Client
  • Cloudera Management Service Alert Publisher
  • Cloudera Management Service Event Server
  • Cloudera Management Service Host Monitor
  • Cloudera Management Service Navigator Audit Server
  • Cloudera Management Service Navigator Metadata Server
  • Cloudera Management Service Reports Manager
  • Cloudera Management Service Service Monitor
  • Sentry Server
  • Spark Client
  • YARN (MR2 Included) Client
  • ZooKeeper Server
Second utility node
  • Ambari Metrics Collector
  • Ambari Metrics Monitor
  • HDFS Client
  • Hive Client
  • Kerberos Client
  • MapReduce2 Client
  • Spark3 Client
  • YARN Client
  • HDFS Client
  • Hive Client
  • Hive Metastore Server
  • HiveServer2
  • Hive WebHCat Server
  • Hue Load Balancer
  • Hue Server
  • Hue Kerberos Ticket Renewer
  • Oozie Server
  • Sentry Server
  • Spark Client
  • YARN (MR2 Included) Client
Worker nodes (3 minimum)
  • Ambari Metrics Monitor
  • HDFS DataNode
  • HDFS Client
  • Hive Client
  • Kerberos Client
  • MapReduce2 Client
  • Oozie Client
  • Spark3 Client
  • Spark3 Thrift Server
  • Tez Client
  • YARN Client
  • YARN NodeManager
  • ZooKeeper Client
  • HDFS DataNode
  • Hive Client
  • Spark Client
  • YARN (MR2 Included) NodeManager

Minimal (non-HA) Cluster Layout

A non-high-availabiltiy cluster has one master node, one utility node, and 3 or more worker nodes.

Type of node Services on ODH Services on CDH
Master node
  • Ambari Metrics Monitor
  • HDFS Client
  • HDFS NameNode
  • Hive Client
  • MapReduce2 Client
  • Spark3 Client
  • Spark3 History Server
  • YARN Client
  • YARN Registry DNS
  • YARN ResourceManager
  • ZooKeeper Server
  • HDFS Balancer
  • HDFS NameNode
  • Hive Client
  • Spark Client
  • Spark History Server
  • YARN (MR2 Included) JobHistory Server
  • YARN (MR2 Included) ResourceManager
  • ZooKeeper Server
Utility node
  • Ambari Metrics Collector
  • Ambari Metrics Monitor
  • Ambari Server
  • HDFS Client
  • HDFS Secondary NameNode
  • Hive Metastore
  • HiveServer2
  • MapReduce2 Client
  • MapReduce2 History Server
  • Oozie Server
  • Spark3 Client
  • Tez Client
  • YARN Client
  • YARN Timeline Service V1.5
  • ZooKeeper Client
  • ZooKeeper Server
  • HDFS HttpFS
  • HDFS SecondaryNameNode
  • Hive Client
  • Hive Metastore Server
  • HiveServer2
  • Hive WebHCat Server
  • Hue Load Balancer
  • Hue Server
  • Cloudera Management Service Alert Publisher
  • Cloudera Management Service Event Server
  • Cloudera Management Service Host Monitor
  • Cloudera Management Service Navigator Audit Server
  • Cloudera Management Service Navigator Metadata Server
  • Cloudera Management Service Reports Manager
  • Cloudera Management Service Service Monitor
  • Oozie Server
  • Spark Client
  • YARN (MR2 Included) Client
  • ZooKeeper Server
Worker nodes
  • Ambari Metrics Monitor
  • HDFS DataNode
  • HDFS Client
  • Hive Client
  • MapReduce2 Client
  • Oozie Client
  • Spark3 Client
  • Spark3 Thrift Server
  • Tez Client
  • YARN Client
  • YARN NodeManager
  • ZooKeeper Client
  • ZooKeeper Server
  • HDFS DataNode
  • Hive Client
  • Spark Client
  • YARN (MR2 Included) NodeManager
  • ZooKeeper Server
Supported Node Shapes

The shape describes the resources allocated to the node.

The shapes used for master/utility nodes and worker nodes can be different. But all master/utility nodes must be of the same shape and all worker nodes must be of the same shape.

The following table shows what shapes can be used for the different node types. For a list of the resources provided by each shape, see Compute Shapes.

Node Type Available Shapes Required Number of Virtual Network Interface Cards (VNICs)
Master/utility

VM.Standard2.4

VM. Standard2.8

VM. Standard2.16

VM.Standard2.24

VM.DenseIO2.8

VM.DenseIO2.16

VM.DenseIO2.24

BM.Standard2.52

BM.DenseIO2.52

BM.HPC2.36

3 minimum

Used for the cluster subnet, the DP access subnet, and the customer's subnet

Worker

VM.Standard2.1*

VM.Standard2.2*

VM. Standard2.4

VM. Standard2.8

VM. Standard2.16

VM.Standard2.24

VM.DenseIO2.8

VM.DenseIO2.16

VM.DenseIO2.24

BM.Standard2.52

BM.DenseI2.52

BM.HPC2.36

2 minimum

Used for the cluster subnet and the customer's subnet

* Be aware that VM.Standard2.1 and VM.Standard2.2 are very small shapes and won't support running large workloads.

Cloud SQL query server

VM. Standard2.4

VM. Standard2.8

VM. Standard2.16

VM.Standard2.24

VM.DenseIO2.8

VM.DenseIO2.16

VM.DenseIO2.24

BM.HPC2.36

BM.Standard2.52

--

Not all shapes are available by default. To see what shapes are available by default through the Cloud console, see Find the Limits for Your Tenancy . To submit a request to increase your service limits, see Request an Increase for Big Data Nodes.

Allocation of Block Storage for Nodes with Standard VM Shapes

Nodes based on standard VM shapes use network-attached block storage.

Note

Block storage isn't supported for nodes based on DenseIO shapes, which use directly attached storage.

All nodes have a boot volume of 150 GB.

Option Limits/Guidelines
Minimum initial block storage 150 GB
Default initial block storage * 150 GB
Minimum additional block storage 150 GB
Default additional block storage * 1 TB
Incremental step for (initial and additional) block storage 50 GB
Maximum block storage for a single node

48 TB

The 48 TB total results from 12 volumes of 4 TB each.

If you add block storage multiple times, the maximum remains 48 TB, but it may be spread across more than 12 volumes.

Maximum block volume size

4 TB

If you specify the maximum 48 TB, 12 drives of 4 TB each are created.

If you specify a lower number, enough 4 TB devices for that amount are created, and more devices are created as you add more storage.

The figures below show initial sizes only. because you can't add additional block storage to master or utility nodes.

Option Limits and Guidelines
Minimum initial block storage 150 GB
Default initial block storage 1 TB
Minimum additional block storage 150 GB
Default additional block storage 1 TB
Incremental step for (initial and additional) block storage 50 GB
Maximum block storage for a single node 32TB
Maximum block volume size 32 TB
MySQL placement For utility nodes move /var/lib/mysql to /u01 and create a symbolic link. This prevents filling up the boot volume.
Option Guidelines
Default initial block storage 2 TB
Minimum initial block storage 150 GB

Query server storage is used for temporary table space to perform heavy JOIN and GROUP BY operations. 2 TB is recommended for typical processing. For small environments, for example development, this number can be adjusted down.

For best performance, consider these factors:

  • I/O throughput
  • Networking between the compute device and block storage device.

See Block Volume Performance in the Oracle Cloud Infrastructure documentation.

The table below describes how Big Data allocates block volume storage for nodes of different sizes.

What Amount
Initial volume allocation for master nodes and utility nodes 1 large volume
Volume allocation for additional block storage for master nodes and utility nodes 1 large volume
Initial volume allocation for worker nodes.
  • Storage: Less than 6 TB.

    Volume size: 500 GB The last volume may be smaller than 500 GB.

  • Storage: 6 TB to 48 TB.

    Volume size: Divide evenly into 12 volumes, each of which is at least 500 GB.

  • Storage: Greater than 48 TB.

    Volume size: Not allowed.

Volume allocation for additional block storage for worker nodes

The minimum number of volumes that can accommodate the storage size, with a maximum volume size of 4 TB per volume. (The last volume may be smaller than 4 TB.)

It's recommended that you use edge nodes for staging.

Creating a Cluster

Create an Big Data cluster from the Oracle Cloud Console.

Before you can create a cluster, you must have:

The Cluster creation wizard (described below) will ask you to provide information about your network and to make choices based on your network. To prepare for those questions, have the name of your network, its compartment and its regional subnet name ready.

To create a cluster:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select a compartment to host the cluster.
  3. Click Create Cluster.
  4. Enter the following information:
    • Cluster name - Enter a name to identify the cluster.

    • Cluster Admin Password - Enter a string to be used as the cluster password. You'll need this password to sign into Apache Ambari or Cloudera Manager depending on your cluster version and to perform certain actions on the cluster through the Cloud Console.

    • Secure and Highly Available (HA) - Check this box to make the cluster secure and highly available. A secure cluster has the full Hadoop security stack, including HDFS Transparent Encryption, Kerberos, and Apache Sentry. This setting can't be changed for the life of the cluster.

    • Cluster Version - Select a distribution and a version of the Hadoop distribution to use for your cluster. Choose one of the following:

      • Select Oracle Distribution including Apache Hadoop to use Oracle's implementation of Hadoop.
      • Select a version of the Cloudera Distribution Including Apache Hadoop (CDH) software to use for this cluster. The listed versions are fully supported. See the Cloudera documentation for descriptions of the features in each release.
  5. In the Hadoop Nodes section of the page, configure the types, shapes, and numbers of the compute instances (servers) to host the master and worker nodes of the cluster. For information about the choices you can make, see Understand Instance Types and Shapes. Not all shapes are available by default, although you can request those not listed. See Increasing Service Limits.

    Enter details for Master/Utility Nodes:

    • Choose Instance Type - Click the Virtual Machine box or the Bare Metal box, to indicate what type of compute instances you want for the master nodes.

    • Choose Worker Node Shape - Select the shape for the master and utility nodes. See Compute Shapes and see Service Limits for details about the available shapes.

    • Block Storage Size per Master/Utility Node (in GB) - Enter the block storage size, in gigabytes (GB); for each master and utility node.

    • Number of Master and Utility Nodes - A high-availability (HA) cluster always has 4 master/utility nodes, and a non-HA cluster always has 2 master/utility nodes. Therefore, this read-only field shows 4 nodes for an HA cluster or 2 nodes for a non-HA cluster.

  6. Enter details for Worker Nodes
    • Choose Instance Type - Click the Virtual Machine box or the Bare Metal box, to indicate what kind of compute instances you want.

    • Worker Node Shape - Select the shape for the worker nodes. See Compute Shapes and Big Data Default Limits for details about the available shapes.

    • Block Storage Size per Worker Node - Enter the block storage size, in gigabytes (GB), for each worker node.

    • Number of Worker Nodes - Enter the number of worker nodes for the cluster, with a minimum 3 nodes.

  7. In the Network Settings: Cluster Private Network section, enter a CIDR block for the cluster private network that will be created for the cluster.

    The cluster private network is created in the Oracle tenancy (not your customer tenancy), and it's used exclusively for private communication among the nodes of the cluster. No other traffic travels over this network, it isn't accessible by outside hosts, and you can't modify it once it's created. All ports are open on this network.

    In the CIDR Block field, enter a CIDR block to assign the range of contiguous IP addresses available for this private network, or accept the default 10.0.0/16. This CIDR block range cannot overlap the CIDR block range in your customer network, discussed in the next step.

  8. In the Network Settings: Customer Network section, enter the information below to add the cluster to your Virtual Cloud Network (VCN) and a regional subnet in that VCN.
    • Choose VCN in <compartment> - Accept the current compartment, or click Change Compartment to select a different one. Then select the name of an existing VCN in that compartment to use for the cluster. The VCN must contain a regional subnet.

    • Choose Regional Subnet in <compartment> - Choose a regional subnet to use for the cluster.

      Important: If you plan to make any of the IP addresses in the subnet public (to allow access to a node from the internet), you must select a public subnet for the cluster.

    For more information, see VCNs and Subnets.

  9. In the Network Settings: Network Options section, select one of the following:
    • Deploy Oracle-managed Service gateway and NAT gateway (Quick Start)

      Select this option to simplify your network configuration by allowing Oracle to provide and manage these communication gateways. When you select this option, a service gateway and a Network Address Translation (NAT) gateway are deployed for private use by the cluster. These gateways are created in the Oracle tenancy and can't be modified after the cluster is created.

      • A NAT gateway enables nodes without public IP addresses to initiate connections to and receive responses from the internet but not to receive inbound connections initiated from the internet. See NAT Gateway.
      • A service gateway enables nodes without public IP addresses to privately access Oracle services, without exposing the data to an internet gateway or a NAT gateway. See Service Gateway.

      Follow these guidelines:

      • Choose this option to give all nodes in the cluster private network full outbound access to the public internet. When you select this option, you won't be able to limit that access in any way (for example by restricting egress to only a few IP ranges).

        If you select this option, your cluster won't be able to use service gateways or NAT gateways in your customer network.

      • If you don't choose this option, you must create gateways in your customer VCN. When you do this, you can also create security rules to limit egress to specified IP ranges.

      • If you map the private IP addresses of the cluster nodes to public IP addresses, then a NAT gateway isn't needed. See Map a Private IP Address to a Public IP Address.

    • Use the gateways in your selected Customer VCN (Customizable)

      Select this option to permit the cluster to use the gateways in your customer VCN. You must create and configure these resources yourself.

      Note

      If you create your network by using one of the network creation wizards in the console, gateways are created for you, but you may have to configure them further to suit your needs. See Virtual Networking Quickstart. For complete information about setting up a gateway, see Service Gateway.

  10. Under Additional Options, associate an SSH key with the cluster.

    SSH Key - Enter an SSH public key in any of the following ways:

    • Select Choose SSH Key File, then either

      • Drag and drop a public SSH key file into the box,

      • or click select one... and navigate to and choose a public SSH key file from your local file system.

    • Select Paste SSH Key and paste the contents from a public SSH key file into the box.

  11. Tags (Optional) - Use tags to help you organize and list resources. Enter tags as described in Tagging Overview.
Watch a video on Creating a CDH cluster
In this video, you'll learn how to plan and create a simple Cloudera Distribution Including Apache Hadoop (CDH) cluster using the Big Data.

Creating a Secure and Highly Available Cluster

You make a cluster secure and highly available by setting an option when you create it. A secure cluster has the full Hadoop security stack, including HDFS Transparent Encryption, plus Kerberos and Apache Sentry.

Make a new cluster secure and highly available by selecting a Secure and Highly Available (HA) under General Settings in the Create Cluster wizard. You can't make an existing cluster secure and highly available if it wasn't created with those features, and you can't remove those features from an existing cluster. See Creating a Cluster for instructions.

Define Security Rules

An administrator must configure security rules to control network traffic to and from Big Data resources.

Background

In Oracle Cloud Infrastructure, two kinds of virtual firewalls are available for controlling traffic to and from your cloud resources. Security lists include security rules that apply to an entire subnet. Network security groups include security rules that apply to a defined set of resources that are organized into groups. Network security groups allow finer-grained control, while security lists are easier to set up and maintain.

Security lists and network security groups both include security rules. A security rule allows a particular type of traffic in or out of a Virtual Network Interface Card (VNIC).

Note

A VNIC is a networking component that enables a networked resource such as an instance (a node in Big Data Service) to connect to a virtual cloud network (VCN). The VNIC determines how the instance connects with endpoints inside and outside the VCN. Each VNIC resides in a subnet in a VCN. A security list defines a set of security rules that apply to all the VNICs in a subnet. A network security group defines a set of security rules that apply to a group of VNICs that you define.

It's important to understand the role of VNICs in your network architecture, but for the purposes of this documentation, it's usually sufficient to refer how security rules work in VCNs and subnets.

For more information see Security Rules.

Creating Security Rules in Security Lists

Typically, Big Data uses security lists. That means that you create security rules for a subnet, and any cluster in that subnet is subject to those rules. The following instructions tell how to create security rules in a security list defined for the subnet used by your cluster.

A security list can define both ingress rules (for incoming traffic) and egress rules (for outgoing traffic).

Each security rule specifies:
  • Direction (ingress or egress)
  • Stateful or stateless
  • Source type and source (ingress rules only)

For complete documentation about security rules, see Parts of a Security Rule.

The following sections contain specific details about creating ingress and egress rules for Big Data clusters.

Create Ingress Rules (and Open Ports)

You must open certain ports on Big Data clusters to allow access to services like Apache Ambari, Cloudera Manager, and Hue. Configure these ports in the security ingress rules that apply to a cluster.

To set ingress rules in a security list:
  1. Open the navigation menu and click Networking. Then click Virtual Cloud Networks.
  2. Under Compartment in the panel on the left, select the compartment that hosts your VCN.
  3. Click the name of the VCN used by the cluster. (The VCN was associated with the cluster when the cluster was created.)
  4. Under Subnets in <compartment> Compartment, click the name of the subnet used by the cluster. (The subnet was associated with the cluster when the cluster was created.)
  5. Under Security Lists, click the name of a security list defined for the subnet. If you choose the Default security list for the subnet, it may already have some rules defined. For example, it may have a rule that allows incoming traffic on port 22, for Secure Shell (SSH) access.
  6. If the ingress rules aren't displayed, click Ingress Rules on the left side of the page, under Resources.
  7. Click the Add Ingress Rules button to configure the ingress rules for the subnet.
  8. In the Add Ingress Rules dialog box, set the following options to open port 22 for SSH access (if it isn't already open):
    • Stateless - Leave this box unchecked. This makes the rule stateful, which means that any response to the incoming traffic is allowed back to the originating host, regardless of any egress rules applicable to the instance.
    • Source Type - Select CIDR.
    • Source CIDR - Enter 0.0.0.0/0, which indicates that traffic from all sources on the internet is allowed.
    • IP Protocol - Select TCP.
    • Source Port Range - Accept the default All.
    • Destination Port Range - Enter 22, to allow access via SSH.
    • Description - Add an optional description.
  9. At the bottom of the dialog box, click +Additional Ingress Rules, and enter the values for another rule. Do this for as many times as necessary, to create all the rules you need, and then click Add Ingress Rules.
    For a typical set of ingress rules for a cluster, create additional rules using the same values as above, but with different Destination Port Ranges:
    • Apache Ambari - port 7183
    • Cloudera Manager - port 7183
    • Hue - port 8888
    • Web Resource Manager - port 8090
    • Spark History Server - port 18088
    • Cloud SQL (if Cloud SQL is installed) - port 1521

    The ingress rules will look similar to the following:

    Stateless Source IP Protocol Source Port Range Destination Port Range Type and Code Allows Description
    No 0.0.0.0/0 TCP All 22 . TCP traffic for ports: 22 SSH ... SSH
    No 0.0.0.0/0 ICMP . . 3,4 ICMP traffic for: 3, 4 ...
    No 0.0.0.0/0 ICMP . . 3 ICMP traffic for: 3 ...
    No 0.0.0.0/0 TCP All 7183 . TCP traffic for ports: 7183 CM
    No 0.0.0.0/0 TCP All 8888 . TCP traffic for ports: 8888 Hue
    No 0.0.0.0/0 TCP All 8090 . TCP traffic for ports: 8090 Web Resource Manager
    No 0.0.0.0/0 TCP All 180888 . TCP traffic for ports: 180888 Spark History Server
    No 0.0.0.0/0 TCP All 1521 . TCP traffic for ports: 1521 Cloud SQL (query server)
Create Egress Rules

When creating a cluster, you have the option to use a NAT gateway or not. Whether or not you choose that option affects how you can control outbound traffic.

  • If you choose the NAT gateway option when creating a cluster, all nodes will have full outbound access to the public internet. You won't be able to limit that access in any way (for example by restricting egress to only a few IP ranges).

  • If you choose not to create a NAT gateway when creating a cluster, you can create a NAT gateway on the VCN you're using to access the cluster. You can also edit policies on this NAT gateway to limit egress to specified IP ranges.

  • If you map the VM IPs onto public IPs then a NAT gateway isn't needed.

Establishing Connections to Nodes with Private IP Addresses

By default, cluster nodes are assigned private IP addresses and are therefore not publicly available on the internet. You can make them available in any of the ways described in the following topics:

Map a Private IP Address to a Public IP Address

Big Data nodes are by default assigned private IP addresses, which aren't accessible from the public internet. One way to make a node accessible from the internet is to map a node's private IP address to a public IP address.

The instructions below use the Oracle Cloud Infrastructure Cloud Shell, which is a web browser-based terminal accessible from the Oracle Cloud Console. You'll gather some information about your network and your cluster nodes, and then you'll pass that information to commands in the shell. To perform this task, you must have a cluster running in a VCN in your tenancy, and that cluster must have a regional, public subnet.

Required IAM Privileges for Mapping Private to Public IP Address

You must have appropriate Oracle Infrastructure Identity and Access Management (IAM privileges) to map private to public IP addresses.

The tenancy administrator or a delegated administrator with the appropriate privileges must create a policy according to the following guidelines.

Group

The policy can assign privileges to any Big Data group, to give members of that group the rights to map IP addresses.

Verbs
The policy must contain policy statements with the following IAM verbs:
  • vnic_read
  • private_ip_read
  • public_ip_read
  • public_ip_delete
  • public_ip_create
  • public_ip_update
  • private_ip_assign_public_ip
  • private_ip_unassign_public_ip
  • public_ip_assign_private_ip
  • public_ip_unassign_private_ip
Resource

The policy must specify the tenancy or the <compartment_name> of the compartment containing the subnet used for the IP addresses.

Example
allow group bds_net_admins to vnic_read in tenancy
allow group bds_net_admins to private_ip_read in tenancy
allow group bds_net_admins to public_ip_read in tenancy
allow group bds_net_admins to public_ip_delete in tenancy
allow group bds_net_admins to public_ip_create in tenancy 
allow group bds_net_admins to public_ip_update in tenancy 
allow group bds_net_admins to private_ip_assign_public_ip in tenancy 
allow group bds_net_admins to private_ip_unassign_public_ip in tenancy 
allow group bds_net_admins to public_ip_assign_private_ip in tenancy
allow group bds_net_admins to public_ip_unassign_private_ip in tenancy
Gather Information About the Cluster
  1. In the Oracle Cloud Console, open the navigation menu navigation menu. Under Data and AI, click Big Data.
  2. On the Clusters page, click the name of your cluster, for example mycluster.
  3. On the Cluster Information tab of the Cluster Details page, under Network Information, click the Copy link next to Subnet OCID. Then paste that OCID to an editor or a file, so you can retrieve it later in this process.
  4. On the same page, under List of Cluster Nodes, find the node you want to map.
    Node names are constructed like clusterndp where
    • cluster is the first 7 letters of the cluster name.
    • nd is the type of node: mn = master node, un = utility node, and wn = worker node.
    • p is the position in the list of the type of node identified by nd, where 0 is the first, 1 is the second, etc.

    For example, the name myclustun0 refers to the first (0) utility node (un) in a cluster named mycluster.

  5. In the IP Address column, find the private IP address for that node, for example, 192.0.2.1. Save the address so you can retrieve it later.
Map the Private IP Address to a Public IP Address
  1. In the Cloud Console, click the Cloud Shell Cloud Shell icon at the top of the page. It may take a few moments to connect and authenticate you.

  2. At the prompt, enter the following.

    export DISPLAY_NAME=<display-name>

    export SUBNET_OCID=<subnet-ocid>

    export PRIVATE_IP=<ip-address>

    oci network public-ip create --display-name $DISPLAY_NAME --compartment-id `oci network private-ip list --subnet-id $SUBNET_OCID --ip-address $PRIVATE_IP | jq -r '.data[] | ."compartment-id"'` --lifetime "RESERVED" --private-ip-id `oci network private-ip list --subnet-id $SUBNET_OCID --ip-address $PRIVATE_IP | jq -r '.data[] | ."id"'`

    The three export statements above set variables that are used in the oci network command that follows. The variables are:

    • <display-name> (optional) is a "friendly name" that will be attached to the reserved public IP address. This name is not pre-existing. It's created when running this command.

      For convenience, you might want to use the name of the node whose private IP address you're mapping, for example myclusun0, which is the name of the first utility node in a cluster named mycluster.

    • <subnet-ocid> is the OCID of the customer public subnet used by the cluster for example, ocid1.subnet.oc1.iad....

    • <ip-address> is the private IP address assigned to the node you want to map, for example, 192.0.2.1.

    Enter the command beginning with oci network public-ip create --compartment-id... exactly as it's shown above, with no breaks.

    For example:

    • $ export DISPLAY_NAME="myclustun0"
    • $ export SUBNET_OCID="ocid1.subnet.oc1.…"
    • $ export PRIVATE_IP="192.0.2.1"
    • $ oci network public-ip create --display-name $DISPLAY_NAME --compartment-id `oci network private-ip list --subnet-id $SUBNET_OCID --ip-address $PRIVATE_IP | jq -r '.data[] | ."compartment-id"'` --lifetime "RESERVED" - private-ip-id `oci network private-ip list --subnet-id $SUBNET_OCID --ip-address $PRIVATE_IP | jq -r '.data[] | ."id"'`
    The output returned is:
    • { "data": {
    •     "assigned-entity-id": "ocid1.privateip.oc1...",
    •     "assigned-entity-type": "PRIVATE_IP",
    •     "availability-domain": null,
    •     "compartment-id": "ocid1.compartment.oc1...",
    •     "defined-tags": {},
    •     "display-name": "publicip...",
    •     "freeform-tags": {},
    •     "id": "ocid1.publicip.oc1....",
    •     "ip-address": "203.0.113.1",
    •     "lifecycle-state": "ASSIGNED",
    •     "lifetime": "RESERVED",
    •     "private-ip-id": "ocid1.privateip....",
    •     "scope": "REGION",
    •     "time-created": "2020-04-13..."
    •    },
    •    "etag": "1234abcd"
    • }
  3. In the output returned, find the value for ip-address. In the above example, it's 203.0.113.1. That is the new reserved public IP address that is mapped to the private IP address for the node.

    To see the reserved public IP address in the console, click the navigation menu navigation menu. Under Core Infrastructure, point to Networking, and click Virtual Cloud Networks. Then, in the navigation list on the left, under Networking, click IP Management. The new reserved public IP address appears in the Reserved Public IP Addresses list. If you supplied a display name in the command you ran, above, that name will appear in the Name column. Otherwise, a name like publicipnnnnnnnnn is generated.

Delete a Public IP Address

To delete a public IP, you can use the following:

oci network public-ip delete --public-ip-id ocid1.publicip.oc1....

The value for --public-ip-id is shown in output returned by the previous command, as shown above: "id": "ocid1.publicip.oc1....",.

Alternatively, you can go to the Networking Reserved Public IP Addresses page in the Cloud Console and delete reserved public IPs there.

Open Ports to Make Services Available

Making the node publicly available isn't enough to make a service like Apache Ambari or Cloudera Manager available from the internet. You must also open the port for the service by adding an ingress rule to a security list. See Define Security Rules.

Use a Bastion Host to Connect to Your Service

You can use a bastion host to provide access to the a cluster's private network from the public internet.

A bastion host is a compute instance that serves as the public entry point for accessing a private network from external networks like the internet. Traffic must flow through the bastion host to access the private network, and you can set up security mechanisms on the bastion to handle that traffic.

Use Oracle Cloud Infrastructure Site-to-Site VPN to Connect to Your Service

Site-to-Site VPN provides a site-to-site IPSec VPN between your on-premises network and your virtual cloud network (VCN). The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.

For details for connecting to Big Data with VPN see Site-to-Site VPN.

Use Oracle Cloud Infrastructure FastConnect to Connect to Your Service

Use FastConnect to access public services in Oracle Cloud Infrastructure without using the internet, for example, access to Object Storage, or the Oracle Cloud Console and APIs. Without FastConnect, the traffic destined for public IP addresses would be routed over the internet. With FastConnect, that traffic goes over your private physical connection.

For details for connecting Big Data with Oracle Cloud Infrastructure FastConnect see FastConnect Overview.

Adding Worker Nodes to a Cluster

When you add worker nodes to a cluster, you expand both compute and storage. The new nodes use the same instance shape and amount of block storage as the existing worker nodes in the cluster.

To add worker nodes to a cluster:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, find the cluster to which you want to add worker nodes.
  4. To the right of the name of the cluster, click the Action icon button, and select Add Nodes from the menu.
  5. Enter details in the Add Nodes dialog box, as follows:
    • Node shape - This read-only field shows the shape used for the existing worker nodes. The same shape will be used for any new nodes you add. For information about the shapes, see Understand Instance Types and Shapes.

    • Block Storage Per Node - This read-only field shows the block storage used for existing worker nodes. The same amount of storage will be used for any new nodes you add.

    • Number of Worker Nodes - Enter the number of worker nodes to add to the cluster. A cluster can have from 3 to 256 worker nodes.

    • Cluster Admin Password - Enter the administration password for the cluster.

    • Object Storage URL (Optional) - Enter the URL for Oracle Infrastructure Object Storage to be use with these nodes.

Adding Block Storage to Worker Nodes

Block storage is a network-attached storage volume that you can use like a regular hard drive. You can attach extra block storage to the worker nodes of a cluster.

Note

Nodes in a cluster can have remote, network-attached, block storage or local, direct-attached, Non-Volatile Memory Express (NVMe) storage. Remote block storage is flexible and economical, while local NVMe storage provides the highest performance. The default storage type is determined when the cluster is created, based on the shape chosen for the cluster. The high-performance bare metal nodes and dense I/O virtual machine nodes are created with NVMe storage. Other kinds of virtual machine nodes are created with block storage.

You can attach extra storage to any cluster. You can't remove storage.

To add a block volume to the cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, find the cluster to which you want to add block storage.
  4. To the right of the name of the cluster, click Action icon, and select Add Block Storage from the menu.
  5. In the Add Block Storage dialog box, enter information, as follows:
    • Additional Block Storage per Node (in GB) - Enter a number to indicate how many gigabytes of block storage to add, between 150GB and 32TB, in increments of 50GB.
    • Cluster Admin Password - Enter the administration password for the cluster.
  6. Click Add.

Adding Cloud SQL to a Cluster

You can add Oracle Cloud SQL to a cluster so you can use SQL to query your big data sources.

Note

Cloud SQL is not included with Big Data. You must pay an extra fee for use of Cloud SQL.

When you add Cloud SQL support to a cluster, a query server node is added and big data cell servers are created on all worker nodes.

For information about using Cloud SQL with Big Data see Using Cloud SQL with Big Data.
Note

For clusters with Oracle Distribution including Apache Hadoop, Cloud SQL is only supported for non-HA clusters.
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, find the cluster to which you want to add Cloud SQL.
  4. To the right of the name of the cluster, click the action menu, and select Add Cloud SQL from the menu.
  5. Enter details in the Add Cloud SQL dialog box, as follows:
    • Query Server Node Shape Configuration - Select a shape to be used for the query server node. For information about the available shapes, see Understand Instance Types and Shapes.
    • Cluster Admin Password - Enter your cluster administration password.
    • Click Add.

Install Ext JS to Support Oozie UI

Oozie UI depends on the Ext JS JavaScript framework, and you must manually install Ext JS on your cluster.

Note

These steps apply for clusters using Cloudera Distribution including Apache Hadoop.

Ext JS is a JavaScript framework for building data-intensive, cross-platform web and mobile applications. Big Data requires Ext JS 2.2 placed in the Oozie webapp directory to make Oozie UI work. Ext JS is licensed under GPLv2.

To install Ext JS 2.2:

  1. Download ext2.2.zip from http://archive.cloudera.com/gplextras/misc/ext-2.2.zip.
  2. Copy the zip file to a cluster node where Oozie server is installed, and unzip it there.
  3. Place the unzipped directory, with name ext-2.2, in /usr/lib/oozie/embedded-oozie-server/webapp/ and verify that ownership of the directory is with oozie:hadoop.

Modifying a Cluster

Renaming a Cluster

You can change the name of any cluster.

To rename the cluster:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. Click the action menu to the right of the name of the cluster you want to rename, and select Rename Cluster from the menu.
  4. In the Rename Cluster dialog box, enter a new name for the cluster and click Rename.

Changing the Shape of Cluster Nodes

You can change the shapes of the nodes of a cluster after the cluster is created, with the following restrictions:

  • All the master nodes and utility nodes in a cluster must use the same shape, and all worker nodes must use the same shape. However, the master nodes and utility nodes can use a different shape than the worker nodes. Therefore, when you change the shapes of nodes, you can change the shape of all the master and utility nodes together, and you can change the shape of all the worker nodes together.
  • You can change the shapes only of nodes using standard shapes, and you can only change them to other standard shapes. For information about standard shapes, see Compute Shapes. For information about shapes supported in Big Data Service, see Plan Your Cluster Layout, Shape, and Storage.
Note

You can only change the shape of clusters that use Cloudera Distribution Including Hadoop (CDH).

To change the shape of a cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. Click Change Shape.
  5. In the Change Shape dialog box, do the following:
    • Choose Your Node Type - From the list, select Master/utility to change the shape of all the master nodes and utility nodes, select Worker to change the shape of all the worker nodes, or choose Cloud SQL to change the shape of all the Cloud SQL nodes, if installed.
    • Existing Shape - This read-only field shows the current shape of the nodes of the type you selected for Choose your node type, above.
    • New Shape - Select a new shape for the nodes of the selected type.
    • Cluster Admin Password - Enter the admin password for the cluster. (The password was assigned when the cluster was created.) Then click Change Shape.

Autoscaling a Cluster

You can create an autoscale configuration for a cluster so that the compute shapes of the worker nodes are automatically increased or decreased, based on the CPU utilization thresholds.

Note

You can only autoscale clusters that use Cloudera Distribution Including Hadoop (CDH).

Autoscaling allows you to maintain optimum performance of your cluster, while keeping costs as low as possible. Autoscaling monitors your CPU utilization and automatically adjusts the CPU capacity, based on the configuration parameters you set. When thresholds are met, the shapes of all the worker nodes in the cluster are automatically scaled up to the next higher VM.Standard shape or scaled down to the next lower VM.Standard shape.

Note

When a cluster is scaled up or down via autoscaling, the new details should be reflected in Cloudera Manager. To register that change with Cloudera Manager, a new Cloudera Manager password is created when you create an autoscale configuration. The password is deleted when the autoscale configuration is deleted.
How Autoscaling Works

The autoscale feature collects data about the CPU utilization of the worker nodes in a cluster. An autoscale configuration includes the parameters for scaling up the nodes (switching to the next larger compute shape) and scaling down the nodes (switching to the next smaller compute shape). A scale-up configuration specifies a duration and a percentage, so that when the average CPU usage exceeds the specified percentage for the specified duration, the node is scaled up. A scale-down configuration specifies a duration and a percentage, so that when the average CPU usage falls below the specified percentage for the specified duration, the node is scaled down.

Be aware that the average usage is based on the entire duration specified in the configuration. That is, the scale-up or scale-down action is triggered at the end of the specified duration. If the scale-up configuration is set to 60% for 6 hours, then the average CPU usage for the entire six hours must exceed 60% for the six hour duration. The usage may fall below or rise above 60% for brief periods in that six hour window, but the scale up action will be triggered only after the data for the entire six hours is evaluated and averaged, and that average exceeds the percentage specified in the configuration.

If you want the cluster to scale up and down more frequently, based on more frequent fluctuations in CPU activity, use shorter duration values. Legal values for scale up and scale down durations are 5-60 minutes or 1-24 hours. Enter hours as units of 60 minutes, that is 60, 120, 180, 240, etc., to 1440 minutes.

Autoscale durations are mapped to Oracle Cloud Infrastructure Monitoring Query Language (MQL) interval values, where the ranges of values allowed for interval are 1m-60m, 1h-24h, and 1d. (Notice that while the minimum MQL interval is one minute, the minimum Big Data Service interval is five minutes.) See the "Interval Query Component" section in Monitoring Query Language (MQL) Reference.

Autoscale takes advantage of Oracle Cloud Infrastructure alarms, and the autoscale duration value is also used as the notification interval for the autoscale alarm. (See Managing Alarms.) If the conditions for a scale-up or scale-down action are still in effect after another interval (6 hours in the example above), then the alarm will trigger another scale up or scale down.

Oracle recommends that you constantly tune your autoscale values to meet your needs. See the recommendations for tuning alarms in the "Routinely Tune Your Alarms" section in Best Practices for Your Alarms.

Prerequisites

Quota

Your tenancy must have a quota that allows you to scale up to the next larger VM.Standard shape for all nodes. If not, the operation will fail. See Viewing Your Service Limits, Quotas, and Usage.

Network

When your cluster was created, one of the following options was selected:

  • Deploy Oracle-managed Service gateway and NAT gateway (Quick Start)

    If the cluster was created with this option selected, you can configure and use autoscale.

  • Use the gateways in your selected Customer VCN (Customizable)

    If the cluster was created with this option selected:

Create an Autoscale Configuration

You can have one autoscale configuration per cluster.

To create an autoscale configuration:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Autoscale Configurations.
  5. Click Create Autoscale Configuration.
  6. On the Create Autoscale Configuration dialog, enter the following information:
    • Autoscale Configuration Name: Enter a name for this configuration.
    • Cluster Admin Password: Enter the admin password for the cluster. (You assign an admin password when you create a cluster.)
  7. Under Performance Metrics (CPU Utilization), set the conditions that will trigger the autoscaling.

    The scale-up rule sets the conditions for scaling up the cluster (use a larger compute shape), and the scale-down rule sets the conditions for scaling down the cluster (use a smaller compute shape).

    • Scale-up rule

      • Average CPU threshold percentage: Set a percentage of average CPU utilization, so that when the CPU operates at that average percentage (or higher) for the minimum duration in minutes (below), the cluster will be scaled up.
      • Minimum duration in minutes: Set a duration, so that when the average CPU threshold percentage (above) operates for that duration, the cluster will be scaled up.

      For example, if the scale-up rule is:

      • Threshold percentage = 80%
      • Minimum duration = 30 minutes

      and...

      • The shapes of the worker nodes are VM.Standard2.4

      then, when the CPU utilization averages 80% or more for 30 minutes, the worker nodes will be scaled up to VM.Standard2.8.

    • Scale-down Rule

      • Average CPU threshold percentage: Set a percentage of average CPU utilization, so that when the CPU operates at that average percentage (or lower) for the minimum duration in minutes (below), the cluster will be scaled down.
      • Minimum duration in minutes: Set a duration, so that when the average CPU threshold percentage (above) operates for that duration, the cluster will be scaled down.

      For example, if the scale-down rule is:

      • Threshold percentage = 20%
      • Minimum duration = 30 minutes

      and...

      • The shapes of the worker nodes are VM.Standard2.8

      then, when the CPU utilization averages 20% or less for 30 minutes, the worker nodes will be scaled down to VM.Standard2.4.

  8. Click Create.

    It will take a few minutes for the configuration to take effect. During that time, the cluster will be in an "Updating " state.

    When an autoscale event is triggered, the worker nodes are updated on a rolling basis; that is, one node is updated, then the next node is updated, and so forth.

Edit an Autoscale Configuration

To edit an autoscale configuration:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Autoscale Configurations.
  5. In the list of autoscale configurations, click the name of your autoscale configuration.
  6. Click Edit.
  7. On the Edit Autoscale Configuration panel, change any of the values, as described in Create an Autoscale Configuration.

    It will take a few minutes for the configuration to take effect.

Delete an Autoscale Configuration

To delete an autoscale configuration:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Autoscale Configurations.
  5. In the list of autoscale configurations, click the name of your autoscale configuration.
  6. Click Delete.
  7. To confirm, in the Delete Autoscale Configuration dialog box, enter the autoscale configuration name, the cluster admin passwsord, and then click Delete.

Using Object Storage API keys

Big Data Service uses the OCI API signing key mechanism to connect to Object Storage.

Note

To use the Object Storage API keys, you must create a Big Data Service cluster with version 3.0.4 or later. The Big Data Service Version is displayed on the Cluster Information tab of the cluster details page. For earlier versions, use OCI HDFS connector for Object Storage.
Create access policy
A tenancy's administrator group users can manage API keys for any user. To allow other users to create and manage Object Storage API keys for themselves, create a policy using the following statement in the root compartment.
allow any-user to {USER_INSPECT, USER_READ, USER_UPDATE, USER_APIKEY_ADD, USER_APIKEY_REMOVE} in tenancy where request.principal.id = target.user.id
Create an Object Storage API key

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Object Storage API keys.
  5. Click Create key.
  6. In the Create API key panel, enter a key alias to uniquely identify this key in the cluster.
  7. Enter the OCID of the user who can use this API key. To retreive the user OCID, from the Console navigate to Identity & Security → Users. From the actons menu for the user, click Copy OCID.
  8. Enter and confirm a passphrase. This passphrase is used to encrypt the API key and cannot be changed later.
  9. Select a default region that is used to establish the Object Storage endpoint name.
  10. Click Create.
The API key is listed in the Object Storage API keys page. When the API key is successfully created, it's status changes to Active.
View the configuration file

You can view and copy the public key of the Object Storage API key from its configuraton file.

  1. Access the cluster details page of the cluster that has the API key you want to view.
  2. In the left panel, under Resources click Object Storage API keys.
  3. From the actions menu of the API key you want to view, click View configuration file.
  4. The public key details of the API key are displayed in the View configuration file dialog.
  5. View the configuration file details or copy the public key.
Test the connection to Object Storage

  1. Access the cluster details page of the cluster that has the API key you want to test.
  2. In the left panel, under Resources click Object Storage API keys.
  3. From the actions menu of the API key you want to test, click Test connection.
  4. Enter the Object Storage URI for the bucket you want to connect to in the URI format oci://MyBucket@MyNamespace/.
  5. Enter the passphrase of the API key. You specified this passphrase when you created the API key.
  6. Click Test connection. The status of the test connection is displayed.
Delete an Object Storage API key

When an Object Storage API key is deleted, all user access to run Objet Storage jobs on the Big Data Service clusters is revoked.

  1. Access the cluster details page of the cluster that has the API key you want to delete.
  2. In the left panel, under Resources click Object Storage API keys.
  3. From the actions menu of the API key you want to delete, click Delete.
  4. To confirm the deletion, enter the key alias of the key you want to delete.
  5. Click Delete.

Applying Tags to a Cluster

You can use Oracle Cloud Infrastructure tags to help organize your resources.

Tags are named key/values pairs that you can associate with resources, which you can use to organize and list the resources.

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. Click More Actions, and then Add Tags.
  5. In the Add One or More Tags to This Cluster dialog box, enter information, as described in Resource Tags.

Restarting a Cluster Node

You can restart a node in a running cluster.

To restart a node:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of the cluster with the node you want to restart.
  4. On the Cluster Details page, click the action menu to the right of the name of the node you want to restart. Select Restart Node from the menu.
  5. In the Restart Node dialog box, enter the name of the node to restart, and click Restart.

Removing Cloud SQL from a Cluster

Oracle Cloud SQL can be added to a Big Data cluster, for an extra fee. If Cloud SQL has been added to a cluster, you can remove it, and you'll no longer be charged for Cloud SQL on the cluster.

Note

Removing Cloud SQL from a cluster terminates the query server node and deletes any files on that node. This is an irreversible action.

Removing Cloud SQL from the cluster:

  • Removes Cloud SQL cells from the cluster worker nodes
  • Terminates the query server node and deletes any files or work that you have on that host. (The VM is terminated.)
  • Has no impact on Hive metadata or the sources that Cloud SQL accesses.
  • Ends the billing for Cloud SQL. You no longer pay for Cloud SQL once it is removed.

To remove Cloud SQL from a cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. To the right of the name of the cluster, click action menu, and select Remove Cloud SQL from the menu.
  4. In the Remove Cloud SQL dialog box, enter the cluster admin password and click Remove.

Terminating a Cluster

You can terminate any cluster.

Caution

Terminating a cluster deletes the cluster and removes all the data contained in local storage or block storage. This is an irreversible action.

To terminate the cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. Click the action menu to the right of the name of the cluster you want to terminate, and select Terminate Big Data Clusteeer from the menu.
  4. In the Terminate Big Data Cluster dialog box, enter the name of the cluster and click Terminate.