Processes groups of records from two different data sets that belong to the same partition and determines which pair of records meet the spatial interaction defined in the job's
SpatialOperationConfig
.
Each reducer task is meant to process all the records for one partition which are expected to be grouped using
JoinGroupGroupComparator
and the values should be sorted using
JoinGroupKeyComparator
.
The output value is a text in JSON format containing a pair of records (each record from a different data set), which met the spatial interaction defined.
As the same records may lay in more than one partition, duplicates may exist so
JoinOutputCommitter
can be used to removed those duplicated joins.