FileSplitInputFormat
subclass that can take more than one data set.MultipleInputsConfig
in the following way:
//create MultipleInputsConfig
MultipleInputsConfig miConf = new MultipleInputsConfig();
//setup and add dataset 1
InputDataSet dataSet1 = new InputDataSet();
dataSet1.setInputString("/user/someUser/dataset1/*");
dataSet1.setInputFormatClass(GeoJsonInputFormat.class);
dataSet1.setRecordInfoProviderClass(GeoJsonRecordInfoProvider.class);
miConf.addInputDataSet(dataSet1, jobConf);
//setup and add dataset 2
InputDataSet dataSet2 = new InputDataSet();
dataSet1.setIndexName("dataset2_index");
miConf.addInputDataSet(dataSet2, jobConf);
//save MultipleInputsConfig to the job's configuration
miConf.store(jobConf);
public class MultipleInputsFileSplitInputFormat extends FileSplitInputFormat<java.lang.Object,java.lang.Object>
FileSplitInputFormat.FileSplitRecordReader
iInputFormat
Constructor and Description |
---|
MultipleInputsFileSplitInputFormat()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapreduce.lib.input.FileSplit> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
Deprecated.
|
int |
getDataSetId(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Deprecated.
Gets the id which identifies the path's data set
|
org.apache.hadoop.mapreduce.InputFormat<?,?> |
getInputFormatForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Deprecated.
Gets an instance of the InputFormat class used to read the data set from the given path
|
RecordInfoProvider<?,?> |
getRecordInfoProviderForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
Deprecated.
Gets an instance of the
RecordInfoProvider class used to interpret records of the data set from the given path |
java.util.List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Deprecated.
|
createInternalInputFormat, getFittingInputSplit, getInternalInputFormat, getInternalInputFormatClass, getRecordInfoProvider, getRecordInfoProviderClass, setInternalInputFormatClass, setRecordInfoProviderClass
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
public MultipleInputsFileSplitInputFormat()
public org.apache.hadoop.mapreduce.InputFormat<?,?> getInputFormatForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
path
- a data set pathconf
- the job configurationjava.io.IOException
public RecordInfoProvider<?,?> getRecordInfoProviderForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
RecordInfoProvider
class used to interpret records of the data set from the given pathpath
- a data set pathconf
- the job configurationRecordInfoProvider
instance or null if the given path does not belong to a configured input data set.java.io.IOException
public int getDataSetId(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
path
- a data set pathconf
- the job configurationjava.io.IOException
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapreduce.lib.input.FileSplit> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws java.io.IOException, java.lang.InterruptedException
createRecordReader
in class FileSplitInputFormat<java.lang.Object,java.lang.Object>
java.io.IOException
java.lang.InterruptedException
public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws java.io.IOException
getSplits
in class WrapperInputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapreduce.lib.input.FileSplit,java.lang.Object,java.lang.Object>
java.io.IOException
Copyright © 2016 Oracle and/or its affiliates. All Rights Reserved.