Create a Cluster

To create a cluster, use the Oracle Big Data Cloud wizard as described in the following procedure.

Before You Begin

When you create a cluster, you may need to provide information about other resources, such as the following:

  • An SSH public/private key pair

    An SSH public key is used for authentication when you use an SSH client to connect to a node associated with the cluster. When you connect, you must provide the private key that matches the public key.

    You can have the wizard create a public/private key pair for you, or you can create one beforehand and upload or paste its private key value. If you want to create a key pair beforehand, you can use a standard SSH key generation tool. See Generate a Secure Shell (SSH) Public/Private Key Pair for instructions.

  • A cloud storage location (Optional on Oracle Cloud Infrastructure Classic)

    The type of location you specify depends on the infrastructure the cluster is built on:

    • Oracle Cloud Infrastructure: Data consumed and generated by Big Data Cloud is stored in an Oracle Cloud Infrastructure Object Storage bucket. You must create a storage bucket before you create a cluster. See Prerequisites for Oracle Platform Services in the Oracle Cloud Infrastructure documentation.

    • Oracle Cloud Infrastructure Classic: Data consumed and generated by Big Data Cloud is stored in the Oracle Cloud Infrastructure Object Storage Classic container associated with a cluster when the cluster is created. Job logs are also stored in Oracle Cloud Infrastructure Object Storage Classic. You can create the container beforehand and provide the wizard with information about it, or you can have the wizard create the container for you. If you want to create the container beforehand, see Creating Containers in Using Oracle Cloud Infrastructure Object Storage Classic for instructions.

Also, before you create a cluster, review the information in About Cluster Topology. The size of a cluster when it's first created determines the cluster's topology, and even though the cluster can be scaled up or down later, the underlying cluster topology that defines master services remains unchanged.

Tutorial icon Tutorial (Oracle Cloud Infrastructure)

Tutorial icon Tutorial (Oracle Cloud Infrastructure Classic)

Procedure

To create a cluster:

  1. Open the service console. See Access the Service Console for Big Data Cloud.

  2. Click Create Instance.

    The Create Instance wizard starts and the Instance page is displayed. For information about the details on this page, see Service Console Create Instance: Instance Page.

  3. On the Instance page, provide cluster information, then click Next to advance to the Service Details page.

    Element Description

    Instance Name

    Name for the new cluster. The name:

    • Must not exceed 30 characters.

    • Must start with a letter.

    • For IDCS-enabled clusters: Must contain only letters and numbers.

    • For non-IDCS-enabled clusters: Can contain hyphens. Hyphens are the only special characters you can use.

    • Must be unique within the identity domain.

    Description

    (Optional) Description for the new cluster.

    Notification Email

    (Optional) Email address that provisioning status updates should be sent to.

    Region

    (Displayed only if your account has multiple regions)

    The region for the cluster. If you choose a region that supports Oracle Cloud Infrastructure, the Availability Domain and Subnet fields are displayed and populated, and the cluster will be created on Oracle Cloud Infrastructure. Otherwise, those fields are not displayed and the cluster will be created on Oracle Cloud Infrastructure Classic.

    To create your cluster on Oracle Cloud Infrastructure, select us-phoenix-1, us-ashburn-1, eu-frankfurt-1, or uk-london-1 if those regions are available to you (which regions are displayed depends on which default data region was selected during the subscription process). If you select any other region, the cluster will be created on Oracle Cloud Infrastructure Classic.

    Select No Preference to let Big Data Cloud choose an Oracle Cloud Infrastructure Classic region for you.

    Availability Domain

    (Displayed only on Oracle Cloud Infrastructure)

    The availability domain (within the region) where the cluster will be placed.

    Subnet

    (Displayed only on Oracle Cloud Infrastructure)

    The subnet (within the availability domain) that will determine network access to the cluster.

    Select a subnet from a virtual cloud network (VCN) that you created previously on Oracle Cloud Infrastructure. Select No Preference to let Big Data Cloud choose a subnet for you.

    IP Network

    (Not available on Oracle Cloud Infrastructure)

    (Available only if you have selected a region and you have defined one or more IP networks created in that region using Oracle Cloud Infrastructure Compute Classic.)

    Select the IP network where you want the cluster placed. Choose No Preference to use the default shared network provided by Oracle Cloud Infrastructure Compute Classic.

    For more information about IP networks, see About IP Networks and Creating an IP Network in Using Oracle Cloud Infrastructure Compute Classic.

    Metering Frequency

    (Displayed only if you have a traditional metered subscription)

    Metering frequency used to determine the billing for resources used by the cluster.

    Tags

    (Not available on Oracle Cloud at Customer)

    (Optional) Select existing tags or add tags to associate with the cluster.

    To select existing tags, select one or more check boxes from the list of tags that are displayed on the drop-down menu. If no tags are displayed, then no tags have been created.

    To create tags, click Click to create a tag (plus sign) to display the Create Tags dialog box. In the New Tags field, enter one or more comma-separated tags that can be a key or a key:value pair.

    If you do not assign tags during provisioning, you can create and manage tags after the cluster is created. See Create, Assign, and Unassign Tags.

  4. On the Service Details page, complete the Cluster Configuration section. For information about the details on this page, see Service Console Create Instance: Service Details Page.

    Element Description

    Deployment Profile

    Type of cluster you want to create based on its intended use. Deployment profiles are predefined sets of services optimized for specific uses. The deployment profile can’t be changed after the cluster is created.

    Choices are:

    • Full: (default) Provisions the cluster with Spark, Spark Thrift, Zeppelin, MapReduce, Hive, Alluxio, and Ambari Metrics. Use this profile if you want all of the features of Big Data Cloud.

    • Basic: Subset of the Full profile. Provisions the cluster with Spark, Zeppelin, MapReduce, and Ambari Metrics. Use this profile if you don’t need all of the features of Big Data Cloud and just want to run Spark or MapReduce jobs and use Notebooks. This profile does not include Alluxio (the in-memory cache), or Hive or JDBC connectivity for BI tools.

    • Snap: Provisions the cluster with SNAP, Spark, and Zeppelin. Once the SNAP cluster is provisioned, the SNAP service is started and can be viewed in the Ambari user interface. The SNAP service is started only on the master node. All lifecycle operations (start/stop/restart) can be performed on the SNAP service using Ambari. See Access Big Data Cloud Using Ambari. SNAP clusters can only be used for the SNAP application and cannot be used for general-purpose Spark processing. Use the Full or Basic profile for general-purpose Spark processing. For information about SNAP, see the SNAP documentation.

    Number of Nodes

    Number of nodes to be allocated to the cluster. Specify three or more nodes to provide high availability (HA), with multiple master nodes. If fewer than three nodes are specified, one node will be the master node with all critical services running on the same node in non-HA mode.

    Compute Shape

    Number of Oracle Compute Units (OCPUs) and amount of memory (RAM) for each node of the new cluster. Big Data Cloud offers many OCPU/RAM combinations.

    Queue Profile

    YARN capacity scheduler queue profile. Defines how queues and workloads are managed. Also determines which queues are created and available by default when the cluster is created. See Manage Work Queue Capacity.

    Note: The preemption setting can’t be changed after the cluster is created.

    • Preemption Off: Jobs can't consume more resources than a specific queue allows.

    • Preemption On: Jobs can consume more resources than a queue allows, but could lose those resources when another job comes in that has priority for those resources. If preemption is on, higher-priority applications don’t have to wait because lower priority applications have taken up the available capacity.

    Spark Version

    Spark version to be deployed on the cluster, Spark 1.6 or 2.1.

    Note: Oracle R Advanced Analytics for Hadoop (ORAAH) is installed for Spark 1.6 clusters only.

  5. On the Service Details page, complete the Credentials section. The user name and password credentials are used to log in to the cluster and run jobs.

    Element Description

    Use Identity Cloud Service to login to the console

    (Not available on Oracle Cloud Infrastructure)

    (Not displayed for all user accounts)

    Select this to use IDCS as the client authentication mechanism for the cluster. Users will access the cluster with their own IDCS identity and credentials.

    When this option is selected, cluster users and cluster access are managed through IDCS. If this option is not selected, HTTP Basic authentication is used and users access the cluster with the shared administrative user name and password specified below. For more information about cluster authentication, see Use Identity Cloud Service for Cluster Authentication.

    SSH Public Key

    Edit

    The SSH public key to be used for authentication when using an SSH client to connect to a node associated with your cluster.

    Click Edit to specify the public key. You can upload a file containing the public key value, paste in the value of a public key, or have the wizard generate a key pair for you.

    If you paste in the value, make sure the value does not contain line breaks or end with a line break.

    If you have the wizard generate a key pair for you, make sure you download the zip file containing the keys that the wizard generated.

    User Name

    Administrative user name. The user name cannot be admin.

    For clusters that use Basic authentication, the administrative user name and password are used to access the cluster console, REST APIs, and Apache Ambari.

    For clusters that use IDCS for authentication, the administrative user name and password are used only to access Ambari. Cluster access is managed through IDCS.

    Password

    Confirm Password

    Password of the user specified in User Name.

  6. On the Service Details page, complete the Associations section by selecting the services you’d like to associate with this cluster.

    You can associate a Big Data Cloud cluster with other Oracle Cloud services you've already provisioned. When you associate a cluster with another service, networking between the service instances is reconfigured so the instances can communicate with one another. This is helpful if you have Apache Spark jobs that require interaction between services or have some dependency. To associate a cluster with a service, you must already have an active subscription for that service.

  7. On the Service Details page, complete the Cloud Storage Credentials section. The fields in this section are different depending on whether the cluster is being created on Oracle Cloud Infrastructure or on Oracle Cloud Infrastructure Classic.

    On Oracle Cloud Infrastructure, provide the following information. Oracle Cloud Infrastructure Object Storage is used for object storage.

    Note:

    Oracle Big Data Cloud uses the native Oracle Cloud Infrastructure object storage API rather than the Swift API. As such, an API signing key is required for authentication to Oracle Cloud Infrastructure Object Storage, not a Swift user name and password.
    Element Description

    OCI Cloud Storage URL

    The Oracle Cloud Infrastructure Object Storage URL. For example:

    https://objectstorage.us-phoenix-1.oraclecloud.com

    For information about the object storage URL, see REST APIs in the Oracle Cloud Infrastructure documentation.

    OCI Cloud Storage Bucket URL

    The URL of an existing bucket in Oracle Cloud Infrastructure Object Storage.

    Format:

    oci://bucket@namespace/, where bucket is the default bucket where application binaries and application logs are stored, and namespace is your namespace.

    Note: The bucket URL must have a trailing slash. If it doesn’t, provisioning will fail.

    OCI Cloud Storage User OCID

    The Oracle Cloud Infrastructure Object Storage User OCID. See Where to Get the Tenancy's OCID and User's OCID in the Oracle Cloud Infrastructure documentation.

    OCI Cloud Storage PEM Key

    The Oracle Cloud Infrastructure Object Storage PEM key. This must be generated. See How to Generate an API Signing Key in the Oracle Cloud Infrastructure documentation.

    Note: In Big Data Cloud, the PEM key must be created without a password.

    OCI Cloud Storage PEM Key Fingerprint

    The Oracle Cloud Infrastructure Object Storage PEM key fingerprint. This must be generated. See How to Generate an API Signing Key in the Oracle Cloud Infrastructure documentation.

    On Oracle Cloud Infrastructure Classic, provide the following information. Oracle Cloud Infrastructure Object Storage Classic is used for object storage.

    Element Description

    Cloud Storage Container

    Name of an existing Oracle Cloud Infrastructure Object Storage Classic container to be associated with the cluster, or a new one to be created. The container is used for writing application logs and reading application JARs and other supporting files.

    You must enter the complete (fully qualified) REST URL for Oracle Cloud Infrastructure Object Storage Classic, appended by the container name.

    Format:

    rest_endpoint_url/containerName

    You can find the REST endpoint URL of the Oracle Cloud Infrastructure Object Storage Classic service instance in the Infrastructure Classic Console. See Finding the REST Endpoint URL for Your Cloud Account in Using Oracle Cloud Infrastructure Object Storage Classic.

    Example:

    https://acme.storage.oraclecloud.com/v1/MyService-acme/MyContainer

    The same formatting requirement applies to the cloudStorageContainer attribute in the REST API.

    User Name

    User name of the user who has access to the specified Oracle Cloud Infrastructure Object Storage Classic container.

    Password

    Password of the user specified in User Name.

    Create Cloud Storage Container

    Select this to create a new Oracle Cloud Infrastructure Object Storage Classic container as part of cluster creation. Specify the container name and the user name and password in the preceding fields.

    The user specified in User Name and Password must have the privileges needed to create storage containers.

    If you select this option, the new storage container is created when you click Next on the Service Details page, and the storage container remains even if you cancel out of the wizard without creating a new cluster. If this happens, you can use the container in the future or manually delete it. See Deleting Containers in Using Oracle Cloud Infrastructure Object Storage Classic.

  8. On the Service Details page, complete the Block Storage Settings section by selecting the services you’d like to associate with this cluster, then click Next to advance to the Confirmation page.

    Element Description

    Use High Performance Storage

    (Not available on Oracle Cloud at Customer or Oracle Cloud Infrastructure)

    Select this to use high performance storage for HDFS. With this option the storage attached to nodes uses SSDs (solid state drives) instead of HDDs (hard disk drives). Use this option for performance-critical workloads. An additional cost is associated with this type of storage.

    Usable HDFS Storage (GB)

    Amount of HDFS storage to be allocated to the cluster.

    Usable BDFS Cache (GB)

    Amount of storage the Big Data File System (BDFS) will use as a cache to accelerate workloads. The total amount of cache provided by BDFS is the sum of RAM allocated to BDFS plus the total block storage allocated for spillover.

    The amount of memory allocated to BDFS is based on the compute shape selected for the cluster. For details about BDFS and memory allocation, see the information about BDFS Tiered Storage in About the Big Data File System (BDFS).

    Total Allocated Storage (GB)

    Total allocated storage for the cluster. You’re billed for this amount.

  9. On the Confirmation page, review the information listed. If you're satisfied with what you see, click Create to create the cluster.

    If you need to change something, click Previous at the top of the wizard to step back through the pages, or click Cancel to cancel out of the wizard without creating a new cluster.