FileSplitInputFormat subclass that can take more than one data set.MultipleInputsConfig in the following way:
//create MultipleInputsConfig
MultipleInputsConfig miConf = new MultipleInputsConfig();
//setup and add dataset 1
InputDataSet dataSet1 = new InputDataSet();
dataSet1.setInputString("/user/someUser/dataset1/*");
dataSet1.setInputFormatClass(GeoJsonInputFormat.class);
dataSet1.setRecordInfoProviderClass(GeoJsonRecordInfoProvider.class);
miConf.addInputDataSet(dataSet1, jobConf);
//setup and add dataset 2
InputDataSet dataSet2 = new InputDataSet();
dataSet1.setIndexName("dataset2_index");
miConf.addInputDataSet(dataSet2, jobConf);
//save MultipleInputsConfig to the job's configuration
miConf.store(jobConf);
public class MultipleInputsFileSplitInputFormat extends FileSplitInputFormat
| Modifier and Type | Class and Description |
|---|---|
static class |
MultipleInputsFileSplitInputFormat.MultipleInputsHelper
Deprecated.
|
static class |
MultipleInputsFileSplitInputFormat.SplitSelector
Deprecated.
|
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
CONF_CURRENT_DS_ID
Deprecated.
|
iInputFormat| Constructor and Description |
|---|
MultipleInputsFileSplitInputFormat()
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
void |
configure(org.apache.hadoop.mapred.JobConf conf)
Deprecated.
|
int |
getDataSetId(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Deprecated.
Gets the id which identifies the path's data set
|
org.apache.hadoop.mapred.InputFormat<?,?> |
getInputFormatForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Deprecated.
Gets an instance of the InputFormat class used to read the data set from the given path
|
RecordInfoProvider<?,?> |
getRecordInfoProviderForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Deprecated.
Gets an instance of the
RecordInfoProvider class used to interpret records of the data set from the given path |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter)
Deprecated.
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf conf, int splitCount)
Deprecated.
|
static boolean |
isMultipleInputsJob(org.apache.hadoop.mapred.JobConf jobConf)
Deprecated.
Returns true if the current job uses more than one input data sets
|
getDelegateRecordReadergetFittingInputSplit, getInternalInputFormat, getInternalInputFormatClass, getRecordInfoProvider, getRecordInfoProviderClass, setInternalInputFormatClass, setRecordInfoProviderClassaddInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSizepublic static final java.lang.String CONF_CURRENT_DS_ID
public MultipleInputsFileSplitInputFormat()
public static boolean isMultipleInputsJob(org.apache.hadoop.mapred.JobConf jobConf)
jobConf -
public org.apache.hadoop.mapred.InputFormat<?,?> getInputFormatForPath(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
throws java.io.IOException
path - a data set pathconf - the job configurationjava.io.IOExceptionpublic RecordInfoProvider<?,?> getRecordInfoProviderForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
RecordInfoProvider class used to interpret records of the data set from the given pathpath - a data set pathconf - the job configurationRecordInfoProvider instance or null if the given path does not belong to a configured input data set.java.io.IOException
public int getDataSetId(org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
throws java.io.IOException
path - a data set pathconf - the job configurationjava.io.IOException
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>getRecordReader in class CompositeInputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>java.io.IOException
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf,
int splitCount)
throws java.io.IOException
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>getSplits in class CompositeInputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>java.io.IOExceptionpublic void configure(org.apache.hadoop.mapred.JobConf conf)
configure in interface org.apache.hadoop.mapred.JobConfigurableconfigure in class FileSplitInputFormatCopyright © 2016 Oracle and/or its affiliates. All Rights Reserved.