About MapReduce Jobs

You can submit MapReduce jobs using the cluster console, the REST API, or the command line interface.

Note:

  • The MapReduce API is based on org.apache.hadoop.mapreduce.Job and creates its own YARN application, and thus requires its own slots in the cluster. You must customize the job according to https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapred/jobcontrol/Job.html.

  • The Stocator Swift driver (swift2d://) is not supported using the MapReduce API. All MapReduce applications that require access to Oracle Cloud Infrastructure Object Storage Classic should make use of the Hadoop OpenStack Swift driver (swift://).

Use the Cluster Console

You can submit MapReduce jobs from the Jobs tab in the Big Data Cloud Console. See Create a Job.

Use the REST API

You can use the REST API to submit MapReduce jobs.

Example job submission:

{
  "job": {
    "applicationClass": "org.apache.hadoop.examples.ExampleDriver",
    "applicationFile": "hdfs:///mapred/examples/hadoop-mapreduce-examples.jar",
    "applicationArguments": [
      "wordcount",
      "hdfs:///mapred/data/one_word.txt",
      "hdfs:///tmp/one_word-1487791745.out"
    ],
    "hadoopConf": {
      "a.hadoop.conf.key": "a.hadoop.conf.value"
    },
    "queue": "api",
    "applicationName":"MapReduceWordCount"
  }
}

Assuming the above content is contained within payload_mr_job.json, the corresponding REST API request would look as follows:

curl -k -s -X POST "
"https://big_data_cluster_host:1080/bdcsce/api/v1.1/clustermgmt/identity_domain/instances/cluster_name/jobs/mapred >/jobs/mapred"
 -H "X-ID-TENANT-NAME: identity_domain" -H "Content-Type: application/json; charset=utf-8" --user "bdcsce_admin:csm_password"
-d @payload_mr_job.json

For information about using the REST API, see REST API for Oracle Big Data Cloud.

Use the Command Line Interface

MapReduce jobs can be executed from the shell command line. To do so, SSH to any node in the cluster, and then submit the job. The following example shows how to submit a MapReduce job using the command line:

opc@host/>sudo su oracle
oracle@host/>hadoop fs -mkdir /user/oracle/mapredsmokeinput
oracle@host/>hadoop fs -put /tmp/ambari.properties.1 /user/oracle/mapredsmokeinput
oracle@host/>yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar wordcount
/user/oracle/mapredsmokeinput /user/oracle/mapredsmokeoutput