Service Console Create Instance: Service Details Page

You can use the Create Instance: Service Details page to provide more details about the new Oracle Big Data Cloud cluster that you are about to create.

What You See in the Navigation Area

Element Description

< Previous

Click to navigate to the Create Instance: Instance page.

Cancel

Click to cancel creating a Oracle Big Data Cloud cluster.

Next >

Click to navigate to the Create Instance: Confirmation page.

Selection Summary

Click to see service details.

What You See in the Cluster Configuration Section

Element Description

Deployment Profile

Deployment profile for the cluster, based on its intended use. Deployment profiles are predefined sets of services optimized for specific uses. The deployment profile can’t be changed after the cluster is created. Choices are:

  • Full: (default) Provisions the cluster with Spark, Spark Thrift, Zeppelin, MapReduce, Hive, Alluxio, and Ambari Metrics. Use this profile if you want all of the features of Big Data Cloud.

  • Basic: Subset of the Full profile. Provisions the cluster with Spark, Zeppelin, MapReduce, and Ambari Metrics. Use this profile if you don’t need all of the features of Big Data Cloud and just want to run Spark or MapReduce jobs and use Notebooks. This profile does not include Alluxio (the in-memory cache), or Hive or JDBC connectivity for BI tools.

  • Snap: Provisions the cluster with SNAP, Spark, and Zeppelin. Once the SNAP cluster is provisioned, the SNAP service is started and can be viewed in the Ambari user interface. The SNAP service is started only on the master node. All lifecycle operations (start/stop/restart) can be performed on the SNAP service using Ambari. See Access Big Data Cloud Using Ambari. SNAP clusters can only be used for the SNAP application and cannot be used for general-purpose Spark processing. Use the Full or Basic profile for general-purpose Spark processing. For information about SNAP, see the SNAP documentation.

Number of Nodes

Total number of nodes to be allocated to the cluster.

Choosing 3 or more nodes provide high availability(HA) with multiple master nodes. If you choose less than 3 nodes, only one node will be master node with all critical services running on same node in non-HA mode.

Any node in excess of the first 4 nodes which are not designated as compute only slaves will run as a compute + storage node.

Compute Shape

Number of Oracle Compute Units (OCPUs) and amount of memory (RAM) for each node of the new cluster. Big Data Cloud offers many OCPU/RAM combinations.

Queue Profile

YARN capacity scheduler queue profile. Defines how queues and workloads are managed. Also determines which queues are created and available by default when the cluster is created. See Manage Work Queue Capacity.

Queue profile defines job queues appropriate for different types of workloads. Each queue has minimum guaranteed capacity and maximum allowed capacity. The preemption setting is explained below and it can’t be changed after the cluster is created.

  • Preemption Off: Indicates that Jobs can't consume more resources than a specific queue allows. This will lead to potentially lower cluster utilization.

  • Preemption On: Indicates that Jobs can consume more resources than a queue allows, but could lose those resources when another job comes in that has priority for those resources.

    If preemption is on, jobs submitted to a particular queue do not have to wait because jobs of some other queue have taken up the available cluster capacity. If preemption is on, and if the cluster is unused, then jobs from any queue can utilize 100% of the cluster capacity. This will lead to better cluster utilization.

Spark Version

Spark version to be deployed on the cluster, Spark 1.6 or 2.1.

Note: Oracle R Advanced Analytics for Hadoop (ORAAH) is installed for Spark 1.6 clusters only.

What You See in the Credentials Section

Element Description

Use Identity Cloud Service to login to the console

(Not available on Oracle Cloud Infrastructure)

(Not displayed for all user accounts)

Select this to use IDCS as the client authentication mechanism for the cluster. Users will access the cluster with their own IDCS identity and credentials.

When this option is selected, cluster users and cluster access are managed through IDCS. If this option is not selected, HTTP Basic authentication is used and users access the cluster with the shared administrative user name and password specified below. For more information about cluster authentication, see Use Identity Cloud Service for Cluster Authentication.

SSH Public Key

The SSH public key to be used for authentication when using an SSH client to connect to a compute node of the new cluster.

Click Edit to specify the public key. You can upload a file containing the public key value, paste in the value of a public key, or create a system-generated key pair.

If you paste in the value, make sure the value does not contain line breaks or end with a line break.

User Name

Administrative user name. The user name cannot be admin.

For clusters that use Basic authentication, the administrative user name and password are used to access the cluster console, REST APIs, and Apache Ambari.

For clusters that use IDCS for authentication, the administrative user name and password are used only to access Ambari. Cluster access is managed through IDCS.

Password

Confirm Password

Password of the user specified in User Name. The password:

  • Must be between 8 and 30 characters.

  • Must contain at least one lowercase letter.

  • Must contain at least one uppercase letter.

  • Must contain at least one number.

  • Must contain at least one special character.

What You See in the Associations Section

This section allows you to associate your new Oracle Big Data Cloud cluster with other cloud services, such as, Oracle Event Hub Cloud Service, MySQL Cloud Service, and Oracle Database Cloud Service.

Select the Cloud Service that you want to associate with your Oracle Big Data Cloud cluster.

What You See in the Cloud Storage Credentials Section

The fields in this section are different depending on whether the cluster is being created on Oracle Cloud Infrastructure or on Oracle Cloud Infrastructure Classic.

Element Description

(Oracle Cloud Infrastructure)

OCI Cloud Storage URL

The Oracle Cloud Infrastructure Object Storage URL. For example:

https://objectstorage.us-phoenix-1.oraclecloud.com

For information about the object storage URL, see REST APIs in the Oracle Cloud Infrastructure documentation.

(Oracle Cloud Infrastructure)

OCI Cloud Storage Bucket URL

The URL of an existing bucket in Oracle Cloud Infrastructure Object Storage.

Format:

oci://bucket@namespace/, where bucket is the default bucket where application binaries and application logs are stored, and namespace is your namespace.

Note: The bucket URL must have a trailing slash. If it doesn’t, provisioning will fail.

(Oracle Cloud Infrastructure)

OCI Cloud Storage User OCID

The Oracle Cloud Infrastructure Object Storage User OCID. See Where to Get the Tenancy's OCID and User's OCID in the Oracle Cloud Infrastructure documentation.

(Oracle Cloud Infrastructure)

OCI Cloud Storage PEM Key

The Oracle Cloud Infrastructure Object Storage PEM key. This must be generated. See How to Generate an API Signing Key in the Oracle Cloud Infrastructure documentation.

Note: In Big Data Cloud, the PEM key must be created without a password.

(Oracle Cloud Infrastructure)

OCI Cloud Storage PEM Key Fingerprint

The Oracle Cloud Infrastructure Object Storage PEM key fingerprint. This must be generated. See How to Generate an API Signing Key in the Oracle Cloud Infrastructure documentation.

(Oracle Cloud Infrastructure Classic)

Cloud Storage Container

The name of the Oracle Cloud Infrastructure Object Storage Classic container that is associated with this cluster. The Oracle Cloud Infrastructure Object Storage Classic container is where the job logs are pushed upon completion.

You must enter the complete (fully qualified) REST URL for Oracle Cloud Infrastructure Object Storage Classic, appended by the container name.

Format:

rest_endpoint_url/containerName

You can find the REST endpoint URL of the Oracle Cloud Infrastructure Object Storage Classic service instance in the Infrastructure Classic Console. See Finding the REST Endpoint URL for Your Cloud Account in Using Oracle Cloud Infrastructure Object Storage Classic.

Example:

https://acme.storage.oraclecloud.com/v1/MyService-acme/MyContainer

The same formatting requirement applies to the cloudStorageContainer attribute in the REST API.

(Oracle Cloud Infrastructure Classic)

User Name

The user name of an Oracle Cloud user who has access to the container specified in Cloud Storage Container.

(Oracle Cloud Infrastructure Classic)

Password

The password of the user specified in Cloud Storage user name.

(Oracle Cloud Infrastructure Classic)

Create Cloud Storage Container

Avail this option if you do not have an Oracle Cloud Infrastructure Object Storage Classic container or if you do not want to reuse your existing Oracle Cloud Infrastructure Object Storage Classic containers.

Specify the above credentials and then select Create Cloud Storage Container check box to create a new Oracle Cloud Infrastructure Object Storage Classic container with those credentials.

What You See in the Block Storage Settings Section

Element Description

Use High Performance Storage

(Not available on Oracle Cloud at Customer or Oracle Cloud Infrastructure)

Select this to use high performance storage for HDFS. With this option the storage attached to nodes uses SSDs (solid state drives) instead of HDDs (hard disk drives). Use this option for performance-critical workloads. An additional cost is associated with this type of storage.

Usable HDFS Storage (GB)

The amount of storage in GB for HDFS.

Oracle Big Data Cloud uses a replication factor of 2 for HDFS. Hence the Usable HDFS Storage will roughly be half of the total allocated storage.

Usable BDFS Cache (GB)

The amount of storage in GB the Big Data File System (BDFS) will use as a cache to accelerate workloads. The total amount of cache provided by BDFS is the sum of RAM allocated to BDFS plus the total block storage allocated for spillover.

The amount of memory allocated to BDFS is based on the compute shape selected when the cluster was created. For details about BDFS and memory allocation, see the information about BDFS Tiered Storage in About the Big Data File System (BDFS).

Total Allocated Storage (GB)

The amount of raw block storage in GB that will be allocated to the new cluster.