Create a Job

Use the following procedure to create and run a job. When you’re done creating the job, the job is automatically submitted for execution. Once the cluster has enough capacity, the job is executed.

To create a job:
  1. Open the cluster console for the desired cluster. See Access the Big Data Cloud Console.
  2. Click Jobs.

    The Spark Jobs page is displayed, listing any jobs associated with the cluster. For information about the details on this page, see Big Data Cloud Console: Jobs Page.

    The Zeppelin entry represents a running Apache Spark job used for notebooks. Apache Zeppelin is the notebook interface and coding environment for Big Data Cloud.

  3. Click New Job.

    The New Job wizard starts and the Details page is displayed.

  4. On the Details page, specify the following and then click Next to advance to the Configuration page.
    • Name: Name for the job.

    • Description: (Optional) Description for the job.

    • Type: Type of job: Spark, Python Spark, or MapReduce. For Spark job submissions, the application can be written in any language as long as the application can be executed on the Java Virtual Machine. For more information about submitting MapReduce jobs, see About MapReduce Jobs.

  5. On the Configuration page, configure the driver, executor, and queue settings for the job, then click Next to advance to the Driver File page.

    Note: For MapReduce jobs, you’ll just specify the queue on the Configuration page.

    • Driver Cores: Number of CPU cores assigned to the Spark driver process.

    • Driver Memory: Amount of memory assigned to the Spark driver process, in GB or MB. This value cannot exceed the memory available on the driver host, which is dependent on cluster shape. Some memory is reserved for supporting processes.

    • Executor Cores: Number of CPU cores made available for each Spark executor.

    • Executor Memory: Amount of memory made available for each Spark executor, in GB or MB.

    • No. of Executors: Number of Spark executor processes used to execute the job.

    • Queue: Name of the resource queue for which the job will be targeted. When a cluster is created, a set of queues is also created and configured by default. Which queues get created is determined by the queue profile specified when the cluster was created and whether preemption was set to Off or On. The preemption setting can’t be changed after a cluster is created.

      If preemption was set to Off (disabled), the following queues are available by default:
      • dedicated: Queue used for all REST API and Zeppelin job submissions. Default capacity is 80, with a maximum capacity of 80.

      • default: Queue used for all Hive and Spark Thrift job submissions. Default capacity is 20, with a maximum capacity of 20.

      If preemption was set to On (enabled), the following queues are available by default:
      • api: Queue used for all REST API job submissions. Default capacity is 50, with a maximum capacity of 100.

      • interactive: Queue used for all Zeppelin job submissions. Default capacity is 40, with a maximum capacity of 100. To allocate more of the cluster's resources to Notebook, increase this queue's capacity.

      • default: Queue used for all Hive and Spark Thrift job submissions. Default capacity is 10, with a maximum capacity of 100.

      In addition to the queues created by default, you can also create and use custom queues. See Create Work Queues.

  6. On the Driver File page, specify the job driver file and its main class (for Spark jobs), command line arguments, and any additional JARs or supporting files needed for executing the job. Then click Next to advance to the Confirmation page.
    • File Path: Path to the executable for the job. Click Browse to select a file in HDFS or Cloud Storage, or to upload a file from your local file system. The file must have a .jar or .zip extension. In the Browse HDFS window, you can also browse to and try some examples.

    • Main Class: (Spark and MapReduce jobs only) Main class to run the job.

    • Arguments: (Optional) Any arguments used to invoke the main class. Specify one argument per line.

    • Additional Py Modules: (Python Spark jobs only) Any Python dependencies required for the application. You can specify more than one file. Click Browse to select a file in HDFS or Cloud Storage, or to upload a file from your local file system (.py file only).

    • Additional Jars: (Optional) Any JAR dependencies required for the application, such as Spark libraries. Multiple files can be specified. Use Browse to select a file (.jar or .zip file only).

    • Additional Support Files: (Optional) Any additional support files required for the application. Multiple files can be specified. Use Browse to select a file (.jar or .zip file only).

  7. On the Confirmation page, review the information listed. If you're satisfied with what you see, click Create to create the job and submit the job for execution.

    If you need to change something before creating and submitting the job, click Prev at the top of the wizard to step back through the pages, or click Cancel to cancel out of the wizard.

    When you’re done creating the job, the job is automatically submitted for execution. It typically sits in an Accepted state for a short period and then execution begins. If a job sits in the Accepted state for a long period of time, this usually means there aren't enough resources available on the cluster to satisfy the job requirements as defined by the job submission. You can address this by either reducing the resource requirements of the job or by terminating existing jobs that aren't required (such as Zeppelin).