MultipleInputsFileSplitInputFormat (Oracle Big Data Spatial and Graph Vector Analysis Java API Reference)

java.lang.Object
- org.apache.hadoop.mapred.FileInputFormat<K,V>
- - oracle.spatial.hadoop.vector.mapred.input.CompositeInputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>
  - - oracle.spatial.hadoop.vector.mapred.input.FileSplitInputFormat
    - - oracle.spatial.hadoop.vector.mapred.input.MultipleInputsFileSplitInputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>, org.apache.hadoop.mapred.JobConfigurable

public class MultipleInputsFileSplitInputFormat
extends FileSplitInputFormat

FileSplitInputFormat subclass that can take more than one data set.
Input data sets are added through MultipleInputsConfig in the following way:

 
            //create MultipleInputsConfig
        MultipleInputsConfig miConf = new MultipleInputsConfig();
        
        //setup and add dataset 1
        InputDataSet dataSet1 = new InputDataSet();
        dataSet1.setInputString("/user/someUser/dataset1/*");
        dataSet1.setInputFormatClass(GeoJsonInputFormat.class);
        dataSet1.setRecordInfoProviderClass(GeoJsonRecordInfoProvider.class);
        miConf.addInputDataSet(dataSet1, jobConf);
        
        //setup and add dataset 2
        InputDataSet dataSet2 = new InputDataSet();
        dataSet1.setIndexName("dataset2_index");
        miConf.addInputDataSet(dataSet2, jobConf);
        
        //save MultipleInputsConfig to the job's configuration
        miConf.store(jobConf);

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`MultipleInputsFileSplitInputFormat.MultipleInputsHelper`
`static class`	`MultipleInputsFileSplitInputFormat.SplitSelectetor`

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
org.apache.hadoop.mapred.FileInputFormat.Counter

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String CONF_CURRENT_DS_ID
- Fields inherited from class oracle.spatial.hadoop.vector.mapred.input.CompositeInputFormat
  iInputFormat
- Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
  INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`CONF_CURRENT_DS_ID`

Constructor Summary

Constructors
Constructor and Description

MultipleInputsFileSplitInputFormat()

Constructors
Constructor and Description
`MultipleInputsFileSplitInputFormat()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`configure(org.apache.hadoop.mapred.JobConf conf)`
`int`	`getDataSetId(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)` Gets the id which identifies the path's data set
`org.apache.hadoop.mapred.InputFormat<?,?>`	`getInputFormatForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)` Gets an instance of the InputFormat class used to read the data set from the given path
`RecordInfoProvider<?,?>`	`getRecordInfoProviderForPath(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)` Gets an instance of the `RecordInfoProvider` class used to interpret records of the data set from the given path
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter)`
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf conf, int splitCount)`
`static boolean`	`isMultipleInputsJob(org.apache.hadoop.mapred.JobConf jobConf)` Returns true if the current job uses more than one input data sets

Methods inherited from class oracle.spatial.hadoop.vector.mapred.input.FileSplitInputFormat
getDelegateRecordReader

Methods inherited from class oracle.spatial.hadoop.vector.mapred.input.CompositeInputFormat
getFittingInputSplit, getInternalInputFormat, getInternalInputFormatClass, getRecordInfoProvider, getRecordInfoProviderClass, setInternalInputFormatClass, setRecordInfoProviderClass

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- CONF_CURRENT_DS_ID
```
public static final java.lang.String CONF_CURRENT_DS_ID
```
  See Also:
  
  Constant Field Values

Constructor Detail
- MultipleInputsFileSplitInputFormat
```
public MultipleInputsFileSplitInputFormat()
```

Method Detail

isMultipleInputsJob
```
public static boolean isMultipleInputsJob(org.apache.hadoop.mapred.JobConf jobConf)
```
Returns true if the current job uses more than one input data sets

Parameters:

jobConf -

Returns:

getInputFormatForPath

public org.apache.hadoop.mapred.InputFormat<?,?> getInputFormatForPath(org.apache.hadoop.fs.Path path,
                                                                       org.apache.hadoop.conf.Configuration conf)
                                                                throws java.io.IOException

Gets an instance of the InputFormat class used to read the data set from the given path

Parameters:: path - a data set path; conf - the job configuration
Returns:: an InputFormat instance or null if the given path does not belong to a configured input data set.
Throws:: java.io.IOException

getRecordInfoProviderForPath
```
public RecordInfoProvider<?,?> getRecordInfoProviderForPath(org.apache.hadoop.fs.Path path,
                                                            org.apache.hadoop.conf.Configuration conf)
                                                     throws java.io.IOException
```
Gets an instance of the RecordInfoProvider class used to interpret records of the data set from the given path

Parameters:

path - a data set path

conf - the job configuration

Returns:

an RecordInfoProvider instance or null if the given path does not belong to a configured input data set.

Throws:

java.io.IOException

getDataSetId
```
public int getDataSetId(org.apache.hadoop.fs.Path path,
                        org.apache.hadoop.conf.Configuration conf)
                 throws java.io.IOException
```
Gets the id which identifies the path's data set

Parameters:

path - a data set path

conf - the job configuration

Returns:

a number equal or greater than zero or -1 if the given path does not belong to a configured input data set.

Throws:

java.io.IOException

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
                                                                                                                                   org.apache.hadoop.mapred.JobConf conf,
                                                                                                                                   org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                            throws java.io.IOException

Specified by:: getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>
Overrides:: getRecordReader in class CompositeInputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>
Throws:: java.io.IOException

getSplits
```
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf,
                                                       int splitCount)
                                                throws java.io.IOException
```
Specified by:

getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>

Overrides:

getSplits in class CompositeInputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.mapred.FileSplit>

Throws:

java.io.IOException

configure
```
public void configure(org.apache.hadoop.mapred.JobConf conf)
```
Specified by:

configure in interface org.apache.hadoop.mapred.JobConfigurable

Overrides:

configure in class FileSplitInputFormat

Class MultipleInputsFileSplitInputFormat

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat

Field Summary

Fields inherited from class oracle.spatial.hadoop.vector.mapred.input.CompositeInputFormat

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class oracle.spatial.hadoop.vector.mapred.input.FileSplitInputFormat

Methods inherited from class oracle.spatial.hadoop.vector.mapred.input.CompositeInputFormat

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat

Methods inherited from class java.lang.Object

Field Detail

CONF_CURRENT_DS_ID

Constructor Detail

MultipleInputsFileSplitInputFormat

Method Detail

isMultipleInputsJob

getInputFormatForPath

getRecordInfoProviderForPath

getDataSetId

getRecordReader

getSplits

configure