Implementing the MapReduce Application
Implement a MapReduce application.
Building the Application JAR
Run mvn clean package
from the project root folder to build application JAR in target folder. For example, wordcountjava-1.0-SNAPSHOT.jar
.
mvn clean package
This command cleans any previous build artifacts, downloads any dependencies that haven't already been installed, and then builds and package the application.
After the command finishes, the wordcountjava/target
directory contains a file named wordcountjava-1.0-SNAPSHOT.jar
.
Note
The
The
wordcountjava-1.0-SNAPSHOT.jar
file is an uberjar, which contains not only the WordCount job, but also dependencies that the job requires at runtime.Running the MapReduce Application JAR
Copy the JAR to any node. For example:
- yarn jar /tmp/wordcountjava-1.0-SNAPSHOT.jar org.apache.hadoop.examples.WordCount /example/data/input.txt /example/data/wordcountout
Parameters:
/example/data/input.txt
: HDFS path for the input text file/example/data/wordcountout
: HDFS path for result (output of reducer)
Sample output:
>> hdfs dfs -cat /example/data/wordcountout/*
zeal 1
zelus 1
zenith 2
Note
The
The
wordcountjava-1.0-SNAPSHOT.jar
file is a fat JAR if built through maven-shade-plugin. Alternatively, use -libjars
as option to provide additional JARs to the job class path. In a secure cluster, kinit
and appropriate Ranger policies are required before submitting the job.