Class TableInputFormat
- java.lang.Object
-
- org.apache.hadoop.mapreduce.InputFormat<K,V>
-
- oracle.kv.hadoop.table.TableInputFormat
-
public class TableInputFormat extends InputFormat<K,V>
A Hadoop InputFormat class for reading data from an Oracle NoSQL Database. Map/reduce keys and values are returned as PrimaryKey and Row objects respectively.For information on the parameters that may be passed to this class, refer to the javadoc for the parent class of this class;
TableInputFormatBase.A simple example demonstrating the Oracle NoSQL DB Hadoop oracle.kv.hadoop.table.TableInputFormat class can be found in the KVHOME/example/table/hadoop directory. It demonstrates how, in a MapReduce job, to read records from an Oracle NoSQL Database that were written using Table API. The javadoc for that program describes the simple Map/Reduce processing as well as how to invoke the program in Hadoop.
- Since:
- 3.1
-
-
Constructor Summary
Constructors Constructor Description TableInputFormat()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description RecordReader<PrimaryKey,Row>createRecordReader(InputSplit split, TaskAttemptContext context)Returns the RecordReader for the given InputSplit.List<InputSplit>getSplits(JobContext context)static voidsetBatchSize(int batchSize)Specifies the suggested number of keys to fetch during each network round trip by the InputFormat.static voidsetConsistency(Consistency consistency)Specifies the read consistency associated with the lookup of the child KV pairs.static voidsetDirection(Direction newDirection)Specifies the order in which records are returned by the InputFormat.static voidsetFieldRangeProperty(String newProperty)Sets the String to use for the property value whose contents are used to construct the field range to employ when iterating the table.static voidsetKVHadoopHosts(String[] newHadoopHosts)Set the KV Hadoop data node host name(s) for this InputFormat to operate on.static voidsetKVHelperHosts(String[] newHelperHosts)Set the KV Helper host:port pair(s) for this InputFormat to operate on.static voidsetKVSecurity(String loginFile, PasswordCredentials userPasswordCredentials, String trustFile)Sets the login properties file and the public trust file (keys and/or certificates), as well as thePasswordCredentialsfor authentication.static voidsetKVStoreName(String newStoreName)Set the KV Store name for this InputFormat to operate on.static voidsetMaxBatches(int newMaxBatches)Specifies the maximum number of result batches that can be held in memory on the client side before processing on the server side pauses.static voidsetMaxRequests(int newMaxRequests)Specifies the maximum number of client side threads to use when running an iteration; where a value of 1 causes the iteration to be performed using only the current thread, and a value of 0 causes the client to base the number of threads to employ on the current store topology.static voidsetPrimaryKeyProperty(String newProperty)Sets the String to use for the property value whose contents are used to construct the primary key to employ when iterating the table.voidsetQueryInfo(int newQueryBy, String newWhereClause, Integer newPartitionId)static voidsetTableName(String newTableName)Set the name of the table in the KV store that this InputFormat will operate on.static voidsetTimeout(long timeout)Specifies an upper bound on the time interval for processing a particular KV retrieval.static voidsetTimeoutUnit(TimeUnit timeoutUnit)Specifies the unit of the timeout parameter.
-
-
-
Method Detail
-
createRecordReader
public RecordReader<PrimaryKey,Row> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException
Returns the RecordReader for the given InputSplit.- Specified by:
createRecordReaderin classInputFormat<PrimaryKey,Row>- Throws:
IOExceptionInterruptedException
-
getSplits
public List<InputSplit> getSplits(JobContext context) throws IOException, InterruptedException
- Specified by:
getSplitsin classInputFormat<K,V>- Parameters:
context- job configuration.- Returns:
- an array of
InputSplits for the job. - Throws:
IOExceptionInterruptedException
-
setKVStoreName
public static void setKVStoreName(String newStoreName)
Set the KV Store name for this InputFormat to operate on. This is equivalent to passing theoracle.kv.kvstoreHadoop property.- Parameters:
newStoreName- the new KV Store name to set
-
setKVHelperHosts
public static void setKVHelperHosts(String[] newHelperHosts)
Set the KV Helper host:port pair(s) for this InputFormat to operate on. This is equivalent to passing theoracle.kv.hostsHadoop property.- Parameters:
newHelperHosts- array of hostname:port strings of any hosts in the KV Store.
-
setKVHadoopHosts
public static void setKVHadoopHosts(String[] newHadoopHosts)
Set the KV Hadoop data node host name(s) for this InputFormat to operate on. This is equivalent to passing theoracle.kv.hadoop.hostsproperty.- Parameters:
newHadoopHosts- array of hostname strings corresponding to the names of the Hadoop data node hosts in the Hadoop cluster that this InputFormat will use to support MapReduce jobs and/or service Hive queries.
-
setTableName
public static void setTableName(String newTableName)
Set the name of the table in the KV store that this InputFormat will operate on. This is equivalent to passing theoracle.kv.tableNameproperty.- Parameters:
newTableName- the new table name to set.
-
setPrimaryKeyProperty
public static void setPrimaryKeyProperty(String newProperty)
Sets the String to use for the property value whose contents are used to construct the primary key to employ when iterating the table. The format of the String input to this method must be a comma-separated String of the form:fieldName,fieldValue,fieldType,fieldName,fieldValue,fieldType,..where the number of elements separated by commas must be a multiple of 3, and each fieldType must be 'STRING', 'INTEGER', 'LONG', 'FLOAT', 'DOUBLE', or 'BOOLEAN'. Additionally, the values referenced by the various fieldType and fieldValue components of this String must satisfy the semantics of PrimaryKey for the given table; that is, they must represent a first-to-last subset of the table's primary key fields, and they must be specified in the same order as those primary key fields. If the String referenced by this property does not satisfy these requirements, a full primary key wildcard will be used when iterating the table.This is equivalent to passing the
oracle.kv.primaryKeyHadoop property.- Parameters:
newProperty- the new shard key property to set
-
setFieldRangeProperty
public static void setFieldRangeProperty(String newProperty)
Sets the String to use for the property value whose contents are used to construct the field range to employ when iterating the table. The format of this property's value must be a list of name:value pairs in JSON FORMAT like the following:-Doracle.kv.fieldRange="{\"name\":\"fieldName\", \"start\":\"startVal\",[\"startInclusive\":true|false], \"end\"\"endVal\",[\"endInclusive\":true|false]}"where for the given field over which to range, the 'start', and 'end' components are required, and the 'startInclusive' and 'endInclusive' components are optional; defaulting to 'true' if not included. Note that the list itself is enclosed in un-escaped double quotes and corresponding curly brace; and each name component and string type value component is enclosed in ESCAPED double quotes.In addition to the JSON format requirement above, the values referenced by the components of this Property's value must also satisfy the semantics of FieldRange; that is,
- the values associated with the target key must correspond to a valid primary key in the table
- the value associated with the fieldName must be the name of a valid field of the primary key over which iteration will be performed
- the values associated with the start and end of the range must correspond to valid values of the given fieldName
- the value associated with either of the inclusive components must be either 'true' or 'false'
This is equivalent to passing the
oracle.kv.fieldRangeHadoop property.- Parameters:
newProperty- the new field range property to set
-
setDirection
public static void setDirection(Direction newDirection)
Specifies the order in which records are returned by the InputFormat. Note that when doing PrimaryKey iteration, only Direction.UNORDERED is allowed.- Parameters:
newDirection- the direction to retrieve data
-
setConsistency
public static void setConsistency(Consistency consistency)
Specifies the read consistency associated with the lookup of the child KV pairs. Version- and Time-based consistency may not be used. If null, the default consistency is used. This is equivalent to passing theoracle.kv.consistencyHadoop property.- Parameters:
consistency- the consistency
-
setTimeout
public static void setTimeout(long timeout)
Specifies an upper bound on the time interval for processing a particular KV retrieval. A best effort is made to not exceed the specified limit. If zero, the default request timeout is used. This is equivalent to passing theoracle.kv.timeoutHadoop property.- Parameters:
timeout- the timeout
-
setTimeoutUnit
public static void setTimeoutUnit(TimeUnit timeoutUnit)
Specifies the unit of the timeout parameter. It may be null only if timeout is zero. This is equivalent to passing theoracle.kv.timeoutHadoop property.- Parameters:
timeoutUnit- the timeout unit
-
setMaxRequests
public static void setMaxRequests(int newMaxRequests)
Specifies the maximum number of client side threads to use when running an iteration; where a value of 1 causes the iteration to be performed using only the current thread, and a value of 0 causes the client to base the number of threads to employ on the current store topology.This is equivalent to passing the
oracle.kv.maxRequestsHadoop property.- Parameters:
newMaxRequests- the suggested number of threads to employ when an iteration.
-
setBatchSize
public static void setBatchSize(int batchSize)
Specifies the suggested number of keys to fetch during each network round trip by the InputFormat. If 0, an internally determined default is used. This is equivalent to passing theoracle.kv.batchSizeHadoop property.- Parameters:
batchSize- the suggested number of keys to fetch during each network round trip.
-
setMaxBatches
public static void setMaxBatches(int newMaxBatches)
Specifies the maximum number of result batches that can be held in memory on the client side before processing on the server side pauses. This parameter can be used to prevent the client side memory from being exceeded if the client cannot consume results as fast as they are generated by the server side.This is equivalent to passing the
oracle.kv.maxBatchesHadoop property.- Parameters:
newMaxBatches- the suggested number of threads to employ when an iteration.
-
setKVSecurity
public static void setKVSecurity(String loginFile, PasswordCredentials userPasswordCredentials, String trustFile) throws IOException
Sets the login properties file and the public trust file (keys and/or certificates), as well as thePasswordCredentialsfor authentication. The value of theloginFileandtrustFileparameters must be either a fully qualified path referencing a file located on the local file system, or the name of a file (no path) whose contents can be retrieved as a resource from the current VM's classpath.Note that this class provides the
getSplitsmethod; which must be able to contact a secure store, and so will need access to local copies of the login properties and trust files. As a result, if the values input for theloginFileandtrustFileparameters are simple file names rather than fully qualified paths, this method will retrieve the contents of each from the classpath and generate private, local copies of the associated file for availability to thegetSplitsmethod.- Throws:
IOException
-
-