public class Partitioning extends MultipleInputsJob
setSamplingRatio(double)
property (by default it is set to 0.1).Modifier and Type | Field and Description |
---|---|
static java.lang.String |
PARTITION_RESULT_FILE |
protected double |
samplingRatio |
inputDataSets, miConf
Constructor and Description |
---|
Partitioning() |
Modifier and Type | Method and Description |
---|---|
void |
configure(org.apache.hadoop.mapred.JobConf job)
Validates and adds the current parameters to the job configuration
|
protected void |
defineGlobaBounds()
Defines the dimension boundaries for the partitioning space based on the
SpatialOperationConfig and the SpatialConfig defined for each input data set. |
java.lang.String |
getCmdOptions()
Gets a description of the arguments expected from command line.
|
protected long |
getPathsLength(org.apache.hadoop.fs.Path[] paths, org.apache.hadoop.conf.Configuration conf) |
protected org.apache.hadoop.fs.Path |
getSamplePath(org.apache.hadoop.fs.Path sampleDir, org.apache.hadoop.conf.Configuration conf) |
double |
getSamplingRatio()
Gets the ratio of the sample size to the input data size
|
static void |
main(java.lang.String[] args) |
void |
processArgs(java.lang.String[] args, org.apache.hadoop.conf.Configuration conf)
Extracts and validates arguments from the command line
|
int |
run(java.lang.String[] args) |
boolean |
runFullPartitioningProcess(org.apache.hadoop.mapred.JobConf job)
Runs the full partitioning process.
|
void |
setSamplingRatio(double samplingRatio)
Sets the ratio of the sample size to the input data size so only a fraction of the whole input data is used for partitioning.
|
addInputDataSet, configure, configureInputs, configureInputs, getInputListCmdOptions, getInputs, getMultipleInputDataSetsParams, removeInputDataSet, setInputDataSets
createJob, createJob, createJob, createJob, createJobConf, createJobConf, createJobConf, getInput, getInputFormatClass, getJarClass, getOutput, getRecordInfoProviderClass, getSpatialConfig, setInput, setInputFormatClass, setJarClass, setOutput, setRecordInfoProviderClass, setSpatialConfig
public static final java.lang.String PARTITION_RESULT_FILE
protected double samplingRatio
public void setSamplingRatio(double samplingRatio)
samplingRatio
-public double getSamplingRatio()
public void processArgs(java.lang.String[] args, org.apache.hadoop.conf.Configuration conf) throws java.lang.Exception
BaseJob
processArgs
in class MultipleInputsJob
args
- arguments from the command lineconf
- the job configurationjava.lang.Exception
public java.lang.String getCmdOptions()
BaseJob
getCmdOptions
in class MultipleInputsJob
public void configure(org.apache.hadoop.mapred.JobConf job) throws java.lang.Exception
BaseJob
configure
in class MultipleInputsJob
job
- the job configurationjava.lang.Exception
public boolean runFullPartitioningProcess(org.apache.hadoop.mapred.JobConf job) throws java.lang.Exception
job
-java.lang.Exception
protected void defineGlobaBounds()
SpatialOperationConfig
and the SpatialConfig
defined for each input data set.public int run(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
protected long getPathsLength(org.apache.hadoop.fs.Path[] paths, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
java.io.IOException
protected org.apache.hadoop.fs.Path getSamplePath(org.apache.hadoop.fs.Path sampleDir, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
java.io.IOException
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception