Creating a Cluster

Create a cluster in Big Data Service.

Before you can create a cluster, you must have the following prerequisites:

To create a cluster, provide information about your network and make choices based on your network. To prepare for those questions, have the name of your network, its compartment, and its regional subnet name ready.

You make a cluster secure and highly available by setting an option when you create it. A secure cluster has the full Hadoop security stack, including HDFS Transparent Encryption, plus Kerberos and Apache Sentry.

You can't make an existing cluster secure and highly available if it wasn't created with those features, and you can't remove those features from an existing cluster.

    1. Open the navigation menu and click Analytics & AI. Under Data Lake, click Big Data Service.
    2. Under Compartment, select a compartment to host the cluster.
    3. Click Create cluster.
    4. In the Create cluster panel, enter the following information:
      • Cluster name: Enter a name to identify the cluster.

      • Cluster admin password: Enter a string to be used as the cluster password. You need this password to sign in to Apache Ambari or Cloudera Manager depending on the cluster version and to perform certain actions on the cluster through the Oracle Cloud Console.

      • Confirm cluster admin password: Reenter the password.

      • Secure & highly available (HA): Select this checkmark to make the cluster secure and highly available. A secure cluster has the full Hadoop security stack, including HDFS Transparent Encryption, Kerberos, and Apache Sentry. This setting can't be changed for the life of the cluster.

      • Kerberos realm name: This field appears when you select Secure & highly available (HA). The default value is BDSCLOUDSERVICE.ORACLE.COM. However, you can provide a different value. Typically, the realm name is the same as your DNS domain name except that the realm name is in uppercase. This convention helps differentiate problems with the Kerberos service from the problems with the DNS namespace, while keeping a name that's familiar.

        A valid Kerberos realm name must consist of 2 to 32 ASCII characters and must be a combination of uppercase letters, numbers, dashes (-), and dots (.). The realm name must also start and end with uppercase letters. If you plan to integrate the Big Data Service cluster with an existing Active Directory server, you must ensure that the Kerberos realm name of the Big Data Service cluster is different from the DNS names of the Active Directory domains.

      • Cluster version: Select a version of the Hadoop distribution to use for the cluster.

      • Cluster profile: Select the cluster profile for the cluster (available for ODH 2.0 and ODH 1.0 versions only). For more information, see Cluster Profile Types. If you select Kafka, step 7 is required.
    5. In the Hadoop nodes section of the page, configure the types, shapes, and numbers of the compute instances (servers) to host the master and worker nodes of the cluster. For information about the choices you can make, see Understanding Instance Types and Shapes. Not all shapes are available by default, although you can request those not listed. For more information, see Service Limits.

      In the Master/Utility nodes section, provide the following details :

      • Choose instance type: Click Virtual machine or Bare metal, to indicate what type of compute instances you want for the master nodes.

      • Choose Master/Utility node shape: Select the Understanding Instance Types and Shapes for the master and utility nodes. For more information about available shapes, see Service Limits.

      • Block storage size per master/utility node: Enter the block storage size, in gigabytes (GB) for each master and utility node.

      • Number of OCPUs: This option is available only if you select Virtual machine. For more information, see Service Limits.

      • Amount of memory: This option is available only if you select Virtual machine. Enter the memory size, in gigabytes (GB) for each node.

    6. In the Worker Nodes section, provide the following details:
      • Choose instance type: Select Virtual machine or Bare metal to indicate what kind of compute instances you want for the worker nodes.

      • Choose Worker node shape: Select the shape for the worker nodes. For more information about available shapes, see Service Limits.

      • Block storage size per worker node: Enter the block storage size, in gigabytes (GB), for each worker node.

      • Number of OCPUs: This option is available only if you select Virtual machine. For more information, see Service Limits.

      • Amount of memory: This option is available only if you select Virtual machine. Enter the memory size, in gigabytes (GB) for each node.

      • Number of Worker nodes: Enter the number of worker nodes for the cluster, with a minimum of 3 nodes.

    7. If you selected Kafka broker in step 4, then provide the following details in the Kafka broker nodes section. Otherwise, this section isn't applicable.
      • Choose instance type: Select Virtual Machine or Bare Metal, to indicate what kind of compute instances you want.

      • Choose Kafka broker node shape: Select the shape for Kafka broker nodes. For more information about available shapes, see Service Limits.

      • Block storage size per Kafka broker node: Enter the block storage size, in gigabytes (GB), for each Kafka broker node.

      • Number of OCPUs: This option is available only if you select Virtual machine. For more information, see Service Limits.

      • Amount of memory: This option is available only if you select Virtual machine. Enter the memory size, in gigabytes (GB) for each node.

      • Number of Kafka broker nodes: Enter the number of Kafka broker nodes for the cluster, with a minimum of 3 for secure clusters and 1 for nonsecure clusters.

    8. In the Network settings section, provide the network details for the cluster.
      • Cluster private network: Enter a CIDR block for the cluster private network of the cluster.

        The cluster private network is created in the Oracle tenancy (not the customer tenancy), and it's used solely for private communication among the nodes of the cluster. No other traffic travels over this network, it isn't accessible by outside hosts, and you can't change it after it's created. All ports are open on this network.

      • CIDR block: Enter a CIDR block to assign the range of contiguous IP addresses available for this private network, or accept the default 10.0.0/16. This CIDR block range can't overlap the CIDR block range in your customer network, discussed in the next step.

      • Customer network: Enter information to add the cluster to your virtual cloud network (VCN) and a regional subnet in that VCN.

        • Choose VCN in <compartment>: Accept the current compartment, or click Change Compartment to select a different one. Then select the name of an existing VCN in that compartment to use for the cluster. The VCN must contain a regional subnet.

        • Choose regional subnet in <compartment>: Choose a regional subnet to use for the cluster.

          Important If you plan to make any of the IP addresses in the subnet public (to allow access to a node from the internet), you must select a public subnet for the cluster. For more information, see VCNs and Subnets.

      • Networking Options: Select one of the following:
        • Deploy Oracle-managed service gateway and NAT gateway (Quick start): Select this option to simplify the network configuration by allowing Oracle to provide and manage these communication gateways. When you select this option, a service gateway and a Network Address Translation (NAT) gateway are deployed for private use by the cluster. These gateways are created in the Oracle tenancy and can't be changed after the cluster is created.
          • A NAT gateway enables nodes without public IP addresses to initiate connections to and receive responses from the internet but not to receive inbound connections initiated from the internet.
          • A service gateway enables nodes without public IP addresses to privately access Oracle services, without exposing the data to an internet gateway or a NAT gateway.

          Follow these guidelines:

          • Choose this option to give all nodes in the cluster private network full outbound access to the public internet. When you select this option, you can't limit that access in any way (for example by restricting egress to only a few IP address ranges).

            If you select this option, your cluster can't use service gateways or NAT gateways in your customer network.

          • If you don't choose this option, you must create gateways in your customer VCN. When you do this, you can also create security rules to limit egress to specified IP address ranges.

          • If you map the private IP addresses of the cluster nodes to public IP addresses, then a NAT gateway isn't needed. For more information, see Map a Private IP Address to a Public IP Address.

        • Use the gateways in your selected customer VCN (customizable): Select this option to permit the cluster to use the gateways in your customer VCN. You must create and configure these resources yourself.
          Note

          If you create your network by using one of the network creation wizards in the console, gateways are created for you, but you might need to configure them further to suit your needs.

    9. In the Encryption section, select one of the following:
      • Encrypt using Oracle-managed keys: Select this option to leave all encryption related matters to Oracle.

      • Encrypt using customer-managed keys: Select this option if you have access to and want to use a valid customer-managed encryption key. Select the following values:

        • Vault in <compartment>: Accept the current compartment, or click Change Compartment to select a different compartment. Then select the name of an existing vault in that compartment.
        • Master encryption key in <compartment>: Accept the current compartment, or click Change Compartment to select a different compartment. Then select an existing master encryption key in that compartment.

          For information about creating and managing vaults, see Creating a Vault and Managing Vaults. For more information on creating and managing master encryption keys, see Creating a Master Encryption Key and Managing Keys.

    10. Under Additional options, provide the following details:
      • SSH public key: Enter an SSH public key in any of the following ways:

        • Select Choose SSH public key file, then either

          • Select a public SSH key file into the box, and then drag a public SSH key file into the box,

          • or click Select one and navigate to and choose a public SSH key file from your local file system.

        • Select Paste SSH public key and paste the contents from a public SSH key file into the box.

      • Bootstrap script URL: Enter a publicly accessible URL of the bootstrap script. The script runs on all the cluster nodes after a cluster is created, when the shape of a cluster changes, or when you add or remove nodes from a cluster. You can use this script to install, configure , and manage custom components in a cluster.
    11. Tags: Enter tags as described in Tagging Overview.
    12. Click Create cluster.
  • Use the oci bds instance create command and required parameters to edit an autoscale configuration.

    oci bds block-storage add --bds-instance-id <bds_instance_id> --block-volume-size-in-gbs <block_volume_size_in_gbs> --cluster-admin-password <cluster_admin_password> --node-type <ode_type> [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the Command Line Reference for Big Data.

  • Use the CreateBdsInstance operation to create Big Data Service clusters.