Implementing the MapReduce Application
Implement a MapReduce application.
Building the Application JAR
Run mvn clean package from the project root folder to build application JAR in target folder. For example, wordcountjava-1.0-SNAPSHOT.jar.
mvn clean package
This command cleans any previous build artifacts, downloads any dependencies that haven't already been installed, and then builds and package the application.
After the command finishes, the wordcountjava/target directory contains a file named wordcountjava-1.0-SNAPSHOT.jar.
Note
The
The
wordcountjava-1.0-SNAPSHOT.jar file is an uberjar, which contains not only the WordCount job, but also dependencies that the job requires at runtime.Running the MapReduce Application JAR
Copy the JAR to any node. For example:
- yarn jar /tmp/wordcountjava-1.0-SNAPSHOT.jar org.apache.hadoop.examples.WordCount /example/data/input.txt /example/data/wordcountoutParameters:
-
/example/data/input.txt: HDFS path for the input text file -
/example/data/wordcountout: HDFS path for result (output of reducer)
Sample output:
>> hdfs dfs -cat /example/data/wordcountout/*
zeal 1
zelus 1
zenith 2
Note
The
The
wordcountjava-1.0-SNAPSHOT.jar file is a fat JAR if built through maven-shade-plugin. Alternatively, use -libjars as option to provide additional JARs to the job class path. In a secure cluster, kinit and appropriate Ranger policies are required before submitting the job.