3 Oracle Loader for Hadoop

This chapter explains how to use Oracle Loader for Hadoop to copy data from Hadoop files into tables in an Oracle database. It contains the following sections:

What Is Oracle Loader for Hadoop?
Using Oracle Loader for Hadoop
Output Modes During OraLoader Invocation
Balancing Loads When Loading Data into Partitioned Tables
Primary Configuration Properties for the Load Balancing Feature
OraLoader Configuration Properties
Example of Using Oracle Loader for Hadoop
Target Table Characteristics
Loader Map XML Schema Definition
OraLoader for Hadoop Configuration Properties
Third-Party Licenses for Bundled Software

3.1 What Is Oracle Loader for Hadoop?

Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. Oracle Loader for Hadoop prepartitions the data if necessary and transforms it into a database-ready format. It optionally sorts records by primary key before loading the data or creating output files. Oracle Loader for Hadoop is a MapReduce application that is invoked as a command-line utility.

Note:

Partitioning is a database feature for managing and efficiently querying very large tables. It provides a way to decompose a large table into smaller and more manageable pieces called partitions, in a manner entirely transparent to applications. For more information about partitioning, see the Oracle Database VLDB and Partitioning Guide.

After the prepartitioning and transforming steps, there are two modes for loading the data into an Oracle database from a Hadoop cluster:

Online database mode: The data is loaded into the database using either a JDBC output format or an OCI Direct Path output format. The OCI Direct Path output format performs a high-performance direct path load of the target table. The JDBC output format performs a conventional path load. For more information about these online load methods, including restrictions for the OCI Direct Path output format, see "Output Modes During OraLoader Invocation".
Offline database mode: The reducer nodes create binary or text format output files. The Oracle Data Pump output format creates binary format files that are ready to be loaded into an Oracle database using an external table and the ORACLE_DATAPUMP access driver. The Delimited Text output format creates text files in delimited record format. (This is usually called comma-separated values (CSV) format when the delimiter is a comma.) These text files are ready to be loaded into an Oracle database using an external table and the ORACLE_LOADER access driver. The files can also be loaded using the SQL*Loader utility.

3.2 Using Oracle Loader for Hadoop

This section describes the following steps for using Oracle Loader for Hadoop:

Implementing InputFormat
Creating the loaderMap Document
Accessing Table Metadata
Invoking OraLoader
Loading Files Into an Oracle Database (Offline Loads Only)

See "Oracle Loader for Hadoop Setup" for installation instructions.

3.2.1 Implementing InputFormat

Oracle Loader for Hadoop is a MapReduce application that gets its input from an org.apache.hadoop.mapreduce.RecordReader implementation as provided by the org.apache.hadoop.mapreduce.InputFormat class that is specified in the mapreduce.inputformat.class configuration property. Oracle Loader for Hadoop requires that the RecordReader return an Avro IndexedRecord input object from the getCurrentKey() method. The method signature should be:

public org.apache.avro.generic.IndexedRecord getCurrentKey()     
throws IOException, InterruptedException;

Oracle Loader for Hadoop uses the schema of the IndexedRecord input object to discover the names of the input fields and map them to the columns of the table to load. This mapping is discussed in more detail in the following sections.

Oracle Loader for Hadoop comes with two built-in input formats; it also provides the source code for two InputFormat examples. The example source code is located in the jsrc/ directory. Table 3-1 lists the class names for all these input formats, along with the types of input they handle and how they generate the Avro schema field names. (Understanding how an InputFormat generates field names is critical to getting the data loaded into the target table.)

The built-in input format classes are described in the next subsections. For the InputFormat examples, consult the source code and Javadoc for these classes for more information.

Table 3-1 InputFormat Classes, Types, and Field Names

Class	Input Type	Avro Schema Field Names
`oracle.hadoop.loader.lib.input.HiveToAvroInputFormat`	Hive table sources	Hive table's column names (upper cased)
`oracle.hadoop.loader.lib.input.DelimitedTextInputFormat`	Delimited text files	Comma-separated list from the property `oracle.hadoop.loader. input.fieldNames` (or F0, F1, … if property is not defined)
`oracle.hadoop.loader.examples.CSVInputFormat`	Simple, delimited text files	F0, F1,...
`oracle.hadoop.loader.examples.AvroInputFormat`	Binary format Avro record files	Field names from input files' Avro schemas

3.2.1.1 HiveToAvroInputFormat

This class presents an input format that reads data from a Hive table. It requires that the Hive database and table names be specified using the following configuration properties:

oracle.hadoop.loader.input.hive.tableName
oracle.hadoop.loader.input.hive.databaseName

HiveToAvroInputFormat contacts the HiveMetaStoreClient to retrieve information about the table's columns, location, InputFormat, SerDe, and so on. Depending on the way in which Hive was configured, additional Hive-specific properties must be set (such as hive.metastore.uris and hive.metastore.local).

Note that Oracle Loader for Hadoop does not currently support Hive partitioned tables.

HiveToAvroInputFormat imports the entire table (all the files in the Hive table's directory). All other (file-based) input formats discussed in this document allow "globbing" (that is, appending wildcard patterns to the input directories to restrict the input).

Rows in the Hive table are transformed into Avro records whose field names are the Hive table's column names in uppercase. This makes it more likely that the fields will coincide with the database column names. See "Creating the loaderMap Document".

3.2.1.2 DelimitedTextInputFormat

This is an InputFormat class for delimited text files, such as comma-separated values files or tab-separated values files. DelimitedTextInputFormat requires that records be separated by newline characters and that fields be delimited using single-character markers.

The DelimitedTextInputFormat class is meant to emulate the "terminated by t [optionally enclosed by ie [and te]]" behavior of SQL*Loader. The t is the field terminator, ie is the initial field encloser, and te is the trailing field encloser.

DelimitedTextInputFormat uses a parser based on the following grammar:

Line = Token t Line | Token\n
Token = EnclosedToken | UnenclosedToken
EnclosedToken = (white-space)* ie [(non-te)* te te]* (non-te)* te (white-space)*
UnenclosedToken = (white-space)* (non-t)*
white-space = {c | Character.isWhitespace(c) and c!=t}

Any trailing field encloser character contained inside an enclosed token must be encoded by "doubling it up" (that is, printing it twice).

White space around enclosed tokens is discarded. For unenclosed tokens the leading white space is discarded, but not the trailing white space (if any).

Any empty-string token (either enclosed or unenclosed) is replaced with a null.

This implementation allows custom enclosers and terminator characters (see Table 3-2), but hard codes the record terminator (to newline) and white space (to Java's Character.isWhitespace()). The enclosers must be different from the terminator character and white spaces (but can be equal to each other). The terminator can be a white space (but that value is removed from the class of white space characters).

Table 3-2 describes the delimiters available for DelimitedTextInputFormat. In the table, HHHH is a big-endian hexadecimal representation of the character in UTF-16.

Table 3-2 Delimiters for DelimitedTextInputFormat

Delimiter Type	Property	Possible Values	Default
Field terminator	oracle.hadoop.loader.input.fieldTerminator	one character \uHHHH	, (comma)
Initial field encloser	oracle.hadoop.loader.input.initialFieldEncloser	one character \uHHHH nothing	No default
Trailing field encloser	oracle.hadoop.loader.input.trailingFieldEncloser	one character \uHHHH nothing	No default

The field enclosers must be either both set or both not set. If they are not set, then the EnclosedToken non-terminal element is essentially removed from the grammar of the parser that DelimitedTextInputFormat uses. If field enclosers are set, then the parser attempts to read each field as an EnclosedToken first before reading it as an UnenclosedToken. Note that if the initial field encloser is set, then the trailing field encloser must also be set, even when the two are the same.

DelimitedTextInputFormat reads the field names as a comma-separated list from the configuration property oracle.hadoop.loader.input.fieldNames. If parsing a line results in more tokens (fields) than field names, the extra tokens are discarded. If there are fewer tokens than field names, the missing trailing tokens are set to null.

If the oracle.hadoop.loader.input.fieldNames property is not set, then the DelimitedTextInputFormat's RecordReader uses F0, F1,… Fn as field names (where n is the largest number of tokens encountered by that RecordReader in any line so far).

3.2.2 Creating the loaderMap Document

Oracle Loader for Hadoop loads data into a single database table. This table is referred to as the target table. You can use the following ways to specify the target table, the columns to load, and how input fields are mapped to database columns:

To indicate that all columns of the database table will be loaded and that the names of the input fields exactly match the database column names, use the configuration property oracle.hadoop.loader.targetTable. It allows you to define a schema-qualified name for the target load table. For each database column, the loader uses the column name to discover an input field with the same name. The value of the field is then loaded into the column.
To load a subset of the target table columns or to create explicit mappings when the input field names are not exactly the same as the database column names, create a loaderMap document to specify the target table, columns, and how the input fields should be mapped to the database columns. The location of the loaderMap document is specified using the oracle.hadoop.loader.loaderMapFile configuration property.

3.2.2.1 Example loaderMap Document

The following example loaderMap document specifies a list of columns in the HR.EMPLOYEES table that should be loaded. It includes a mapping of input data field names to table column names. It also specifies the format of input data that should be used for that column.

<?xml version="1.0" encoding="UTF-8"?>
<LOADER_MAP>
<SCHEMA>HR</SCHEMA>
<TABLE>EMPLOYEES</TABLE>
<COLUMN field="empId">EMPLOYEE_ID</COLUMN>
<COLUMN field="lastName">LAST_NAME</COLUMN>
<COLUMN field="email">EMAIL</COLUMN>
<COLUMN field="hireDate" format="MM-dd-yyyy">HIRE_DATE</COLUMN>
<COLUMN field="jobId">JOB_ID</COLUMN>
</LOADER_MAP>

Note:

If all the columns in the target table are used for loading, and if the input data field names in the IndexedRecord input object match the column names exactly, then the loaderMap file is not needed unless a table column is a DATE data type. Input fields mapped to DATE columns are parsed using the default Java date format. If the input is in a different format, then you must create a loaderMap document and use the format attribute to specify the Java date format string to use when parsing input values.

3.2.3 Accessing Table Metadata

Oracle Loader for Hadoop uses table metadata from Oracle Database to control the execution of a loader job. The loader automatically fetches the metadata whenever a JDBC connection can be established. Sometimes it may be impossible for the loader job to access the database. For example, the Hadoop cluster may be on a different network than the database. In this case, the OraLoaderMetadata utility program is used to extract table metadata from the database into an XML document. The metadata document is then transferred to the Hadoop cluster. The configuration property oracle.hadoop.loader.tableMetadataFile is used to specify the location of the metadata document. When the loader job runs, it accesses this document to discover all necessary metadata information about the target table.

3.2.3.1 Running the OraLoaderMetadata Utility

To run the OraLoaderMetadata Java utility, add the following JAR files to the CLASSPATH variable:

$OLH_HOME/jlib/oraloader.jar
$OLH_HOME/jlib/ojdbc6.jar
$OLH_HOME/jlib/oraclepki.jar

Note:

The oraclepki.jar library is required only if you are connecting to the database using credentials stored in an Oracle wallet.

Then run the following command:

java oracle.hadoop.loader.metadata.OraLoaderMetadata \ 
-user <username> -connection_url <connection URL> [-schema <schemaName>] \
-table <tableName> -output <output filename>

OraLoaderMetadata Parameters

The OraLoaderMetadata utility accepts the following parameters:

-user is the Oracle database user name. The user is prompted for the password.
-connection_url is the connection URL to connect to Oracle Database.
-schema is the name of the schema containing the target table. If this parameter is not specified, then the target table is assumed to be in the user schema specified in the connect URL.
-table is the name of the target table.
-output is the output file name used to store the metadata document.

3.2.4 Invoking OraLoader

OraLoader is a Hadoop job that you execute using the standard Hadoop tools. OraLoader implements the org.apache.hadoop.util.Tool interface and follows the standard Hadoop methods for building MapReduce applications. OraLoader performs the following actions:

Reads and verifies input configuration parameters.
Retrieves and verifies table and column metadata information for the target table. Metadata is retrieved from the database whenever a JDBC connection can be made. Otherwise, the loader looks for metadata stored in the location specified by the oracle.hadoop.loader.tableMetadataFile property.
Prepares internal configuration information for the MapReduce tasks of OraLoader and stores table metadata information and dependent Java libraries in the distributed cache so that they are available to the map and reduce tasks throughout the cluster.
Submits the MapReduce job to Hadoop.
Consolidates reporting information from individual tasks to create a common log file for the job after the map and reduce tasks are complete. The log file is written to the job output directory and is named oraloader-report.txt.

OraLoader is invoked from the command line and accepts any of the generic command-line options. The following is an example invocation:

HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$OLH_HOME/jlib/*"

bin/hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader \ 
-conf MyConf.xml

When using one of the example input formats shown in Table 3-1, you must include the -libjars option in the OraLoader invocation:

bin/hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader \
-conf MyConf.xml -libjars $OLH_HOME/jlib/oraloader-examples.jar

3.2.5 Loading Files Into an Oracle Database (Offline Loads Only)

For offline loads, Oracle Loader for Hadoop produces files that must be copied to the database server and loaded into an Oracle database. The following section describes the available offline load method.

3.2.5.1 Loading From Delimited Text Files Into an Oracle Database

After you copy the delimited text files to the database system, use the generated control files to invoke SQL*Loader and load the data from the delimited text files into the database. Alternatively, you can use the generated SQL scripts to perform external table loads into the database. See "Delimited Text Output".

3.3 Output Modes During OraLoader Invocation

This section describes the following output options:

JDBC Output
Oracle OCI Direct Path Output
Delimited Text Output
Oracle Data Pump Output

3.3.1 JDBC Output

JDBC is an output option in online database mode. The output records of the loader job are loaded directly into the target table by map or reduce tasks as part of the OraLoader process. There is no need to execute additional steps to load the data. A JDBC connection between the Hadoop system and Oracle Database is required for this output option.

The JDBC output option uses standard JDBC batching to increase performance and efficiency. If an error occurs during batch execution (for example a constraint is violated), the JDBC driver stops execution at the first error. Thus, if there are 100 rows in a batch and the tenth row causes an error then nine rows are inserted and 91 rows are not. Moreover, the JDBC driver does not provide information to identify which row caused the error. In this case, Oracle Loader for Hadoop does not know the insert status for any of the rows in the batch. It counts all rows in the batch as "in question" and continues loading the next batch. A load report is produced at the end of the job that details the number of batch errors incurred and the number of rows whose insert status is in question. One way to handle this problem is by using a unique key on the data. After the data is loaded, the key can be enabled and used to discover missing key values. The missing rows must be located in the input data and reloaded after it has been determined why they failed to load.

To select the JDBC output format, set the following Hadoop property:

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
</property>

The relevant property for configuring JDBC output is oracle.hadoop.loader.jdbc.defaultExecuteBatch, which controls the size of the batch.

3.3.2 Oracle OCI Direct Path Output

The Oracle OCI Direct Path output format is available in online database mode. This output format uses the OCI Direct Path interface to load rows into the target table. Parallel direct path load is possible because each reducer loads into a distinct database partition.

To select the Oracle OCI Direct Path output format, set the following Hadoop property:

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.OCIOutputFormat</value>
</property>

The size of the direct path stream buffer can be controlled using the following property:

<property>
  <name>oracle.hadoop.loader.output.dirpathBufsize</name>
  <value>131072</value>
  <description>
   This property is used to set the size, in bytes, of the direct path 
   stream buffer for OCIOutputFormat.  If needed, values are rounded 
   up to the next nearest multiple of 8k.
  </description>
</property>

The Oracle OCI Direct Path output format has the following restrictions:

It is only available on a Linux x86.64 platform.
The load target table must be partitioned.
The number of reducers must be greater than zero.
OCI Direct Path output cannot load a composite interval partitioned table where the subpartition key contains a CHAR, VARCHAR2, NCHAR, or NVARCHAR2 column. The loader checks for this condition and stops with an error if the target load table meets this condition. Composite interval partitions where the subpartition key does not contain a character type column are supported.

The Oracle OCI Direct Path output format requires the following configuration steps. These steps enable the loader to locate the C shared libraries that implement the output format. These libraries are automatically distributed to compute nodes using the Hadoop Distributed Cache mechanism.

Create the environment variable JAVA_LIBRARY_PATH to point to the directory $OLH_HOME/lib. This environment variable is required only on the node where the job is submitted. The $HADOOP_HOME/bin/hadoop command in CDH automatically injects this variable value into the Java system property java.library.path when the job is created. For the Apache Hadoop distribution, you must edit the $HADOOP_HOME/bin/hadoop command so that it concatenates new values to an existing value. The Apache hadoop command begins with an empty JAVA_LIBRARY_PATH value and does not import a value from the environment.
Add $OLH_HOME/lib to the LD_LIBRARY_PATH variable on the client where the loader job is submitted.

3.3.3 Delimited Text Output

Delimited text is an output option in offline database mode. Comma-separated value (CSV) format files, or other delimited text files, are generated by map or reduce tasks. These files are then loaded into the target table by using either SQL*Loader or external tables.

To select the Delimited Text output format, set the following Hadoop property:

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat</value>
</property>

Each output task generates a delimited text format file and a SQL*Loader control file (see Example 3-1). If the table is not partitioned or if oracle.hadoop.loader.loadByPartition=false, then a single SQL*Loader control file is generated for the entire job. Additionally, a single SQL script is generated to load the delimited text file into the target table.

Delimited text files have the following template:

oraloader-${taskId}-csv-${partitionId}.dat

SQL*Loader control file names have the following template:

oraloader-${taskId}-csv-${partitionId}.ctl

Definitions of the template parameters are as follows:

${taskId}: mapper (reducer) Id

${partitionId}: Partition identifier

If the table is not partitioned or if oracle.hadoop.loader.loadByPartition=false, then a single SQL*Loader control file is generated: oraloader-csv.ctl.

The SQL script for loading into an external table is called oraloader-csv.sql.

The formatting of records and fields in the delimited text file is controlled by the following properties:

oracle.hadoop.loader.output.fieldTerminator: Identifies a single character that delimits the fields.
oracle.hadoop.loader.output.initialFieldEncloser: Identifies the beginning of a field. It indicates that the fields are always enclosed between this character and the trailingFieldEncloser character.
oracle.hadoop.loader.output.trailingFieldEncloser: Identifies the end of a field that begins with the initialFieldEncloser character. If this property is not set, then the value of the initialFieldEncloser is used as the field terminator.
oracle.hadoop.loader.output.escapeEnclosers: Identifies the character that is used to escape embedded trailing field encloser characters.

Example 3-1 shows a sample SQL*Loader control file that might be generated by an output task.

Example 3-1 Sample SQL*Loader Control File

LOAD DATA CHARACTERSET AL32UTF8
INFILE 'oraloader-csv-1-0.dat'
BADFILE 'oraloader-csv-1-0.bad'
DISCARDFILE 'oraloader-csv-1-0.dsc'
INTO TABLE "SCOTT"."CSV_PART" PARTITION(10) APPEND
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(
"ID"      DECIMAL EXTERNAL,
"NAME"    CHAR,
"DOB"     DATE 'SYYYY-MM-DD HH24:MI:SS'
)

3.3.4 Oracle Data Pump Output

The Oracle Data Pump output format is available in offline database mode. The loader produces binary format files that can be loaded into the target table using an external table and the ORACLE_DATAPUMP access driver. The output files must be copied from the HDFS file system to a local file system that is accessible to Oracle Database.

To select the Oracle Data Pump output format, set the following Hadoop property:

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.DataPumpOutputFormat</value>
</property>

Oracle Data Pump output file names have the following template:

oraloader-${taskId}-dp-${partitionId}.dat

Oracle Loader for Hadoop also produces a SQL file that contains commands to perform the following tasks:

Create an external table definition using the ORACLE_DATAPUMP access driver. The binary format Oracle Data Pump output files are listed in the LOCATION clause of the external table.

Create a directory object that is used by the external table. This command must be uncommented before it can be used. To specify the directory name that is produced in the SQL file, set the following property:

<property>
  <name>oracle.hadoop.loader.extTabDirectoryName</name>
  <value>OLH_EXTTAB_DIR</value>
  <description>
   The name of the Oracle directory object for the external table's
   LOCATION data files. This property applies only to the CSV and 
   DataPump output formats.
  </description>
</property>

Insert the rows from the external table into the target table. This command must be uncommented before it can be used.

3.4 Balancing Loads When Loading Data into Partitioned Tables

To balance loads across reducers when data is loaded into a partitioned database table, use the sampling feature of Oracle Loader for Hadoop.

The execution time of a reducer is usually proportional to the number of records that it processes—the more records, the longer the execution time. When the sampling feature is disabled, all records from a given database partition are sent to one reducer. This can result in unbalanced reducer loads because some database partitions may have more records than others. Because the execution time of a Hadoop job is the execution time of its slowest reducer, unbalanced reducer loads can slow down the entire job.

Although hashing records uniformly across reducers can generate balanced reducer loads, it does not necessarily group records by database partition before inserting them into the database.

The sampling feature of Oracle Loader for Hadoop generates an efficient MapReduce partitioning scheme that groups records by database partition while also balancing reducer load.

3.4.1 Using the Sampling Feature

To enable the sampling feature, set the configuration property oracle.hadoop.loader.sampler.enableSampling to true.

Even if the enableSampling property is set to true, the loader automatically disables the sampling feature if sampling is not necessary or if the loader determines that a good sample cannot be made. For example, sampling is automatically disabled if the table is not partitioned, the number of reducer tasks is less than two, or there is too little input data to compute a good load balance. In those cases, the loader returns an informational message.

Note:

The sampler is multithreaded, and each sampler thread instantiates its own copy of the supplied InputFormat class. Any new InputFormat implementations provided to Oracle Loader for Hadoop should ensure that data structures that are static and mutable are synchronized for multiple thread access.

It is possible for the sampler to return an out-of-memory error on the client node where the loader job is submitted. This can occur when the input splits returned by the InputFormat do not fit in memory.

The following are possible solutions to this problem:

Increase the heap size of the JVM where the job is submitted.
Adjust the following properties:
```
oracle.hadoop.loader.sampler.hintMaxSplitSize
oracle.hadoop.loader.sampler.hintNumMapTasks
```
See "OraLoader for Hadoop Configuration Properties" for descriptions of these properties.

3.4.2 Tuning Load Balancing and Sampling Behavior

Oracle Loader for Hadoop provides properties that you can use to tune load balancing and sampling behavior. These properties are summarized in Table 3-3.

3.4.2.1 Properties to Tune Load Balancing

The goal of load balancing is to generate a MapReduce partitioning scheme that assigns approximately the same amount of work to all reducers. This scheme is used in the partitioning step during Oracle Loader for Hadoop job execution.

Two properties control the quality of load balancing: maxLoadFactor and loadCI. The sampler uses the expected reducer load factor to evaluate the quality of its partitioning scheme. Load factor is a metric that indicates how much a reducer's load deviates from a perfectly balanced reducer load. A load factor of one indicates a perfectly balanced load (no overload).

Small load factors indicate better load balancing. The maxLoadFactor default of 0.05 means that no reducer is ever overloaded by more than 5%. The sampler guarantees this maxLoadFactor with a statistical confidence level determined by the value of loadCI. The default value of loadCI is 0.95, which means that any reducer's load factor exceeds maxLoadFactor in only 5% of the cases.

There is a trade-off between the execution time of the sampler and the quality of load balancing. Lower values of maxLoadFactor and higher values of loadCI result in more balanced reducer loads at the expense of longer sampling times. The default values of maxLoadFactor=0.05 and loadCI=0.95 provide a good trade-off between load balancing quality and execution time.

3.4.2.2 Properties to Tune Sampling Behavior

By default, the sampler runs until it collects just enough samples to generate a partitioning scheme that satisfies the maxLoadFactor and loadCI criteria.

However, you can limit the sampler's running time by using the maxSamplesPct property, which specifies the maximum number of records that the sampler should sample before stopping.

3.4.3 Does Oracle Loader for Hadoop Always Use the Sampler's Partitioning Scheme?

Oracle Loader for Hadoop uses the generated partitioning scheme only if sampling is successful. A sampling is successful if it generates a partitioning scheme with a maximum reducer load factor of (1+ maxLoadFactor) guaranteed at a statistical confidence level of loadCI. The default values of maxLoadFactor, loadCI, and maxSamplesPct allow the sampler to successfully generate high-quality partitioning schemes for a variety of different input data distributions. However, in some cases the sampler might be unsuccessful in generating a partitioning scheme that satisfies these constraints (for example, if the constraints are too rigid or if the number of samples it requires exceeds the user-specified maximum of maxSamplesPct). In such cases, Oracle Loader for Hadoop prints a log message saying that there were not enough samples. It then defaults to partitioning records by database partition and provides no load balancing guarantees (as described in "Tuning Load Balancing and Sampling Behavior").

An alternative approach would be to reset the configuration properties to less rigid values. You can do this either by increasing maxSamplesPct or by decreasing maxLoadFactor or loadCI, or both.

3.4.4 What Happens When a Sampling Feature Property Has an Invalid Value?

If any configuration properties of the sampling feature are set to values outside the accepted range, an exception is not returned. Instead, the sampler prints a warning message, resets the property to its default value, and continues executing.

3.5 Primary Configuration Properties for the Load Balancing Feature

Table 3-3 describes the primary properties available to tune sampling behavior. See "OraLoader for Hadoop Configuration Properties" for a complete list of properties.

Table 3-3 Configuration Properties for the Oracle Loader for Hadoop Sampling Feature

Property Name Attributes and Description

Property Name	Attributes and Description
`oracle.hadoop.loader.sampler.maxSamplesPct`	Type: Float Default: 0.01 Accepted Range: [0, 1] A value of <=0 disables this property. Description: The maximum sample size as a percentage of the number of records in the input data. A value of 0.05 indicates that the sampler never samples more than 5% of the total number of records. The sampler may collect fewer samples than this amount.
`oracle.hadoop.loader.sampler.maxLoadFactor`	Type: Float Default: 0.05 Accepted Range: >=0 A value of <=0 resets the property to the default. Description: Maximum acceptable load factor for reducer workload.
`oracle.hadoop.loader.sampler.loadCI`	Type: Float Default: 0.95 Accepted Range: >=0.5 and <1 Recommended values are >= 0.9. A value of < 0.5 resets the property to the default. Description: The statistical confidence level for the maximum reducer load factor. Commonly used values other than the default are 0.90 and 0.99.

oracle.hadoop.loader.sampler.maxSamplesPct

Type: Float

Default: 0.01

Accepted Range: [0, 1]

A value of <=0 disables this property.

Description: The maximum sample size as a percentage of the number of records in the input data. A value of 0.05 indicates that the sampler never samples more than 5% of the total number of records. The sampler may collect fewer samples than this amount.

oracle.hadoop.loader.sampler.maxLoadFactor

Type: Float

Default: 0.05

Accepted Range: >=0

A value of <=0 resets the property to the default.

Description: Maximum acceptable load factor for reducer workload.

oracle.hadoop.loader.sampler.loadCI

Type: Float

Default: 0.95

Accepted Range: >=0.5 and <1

Recommended values are >= 0.9.

A value of < 0.5 resets the property to the default.

Description: The statistical confidence level for the maximum reducer load factor. Commonly used values other than the default are 0.90 and 0.99.

3.6 OraLoader Configuration Properties

OraLoader uses the standard method in Hadoop for specifying configuration properties. These properties can be specified in a configuration file or by using the -D property=value option to GenericOptionsParser and ToolRunner.

Table 3-4 provides brief descriptions of the primary job configuration properties, and Table 3-5 describes the general properties for Oracle Loader for Hadoop. For a complete list and detailed descriptions of all configuration properties, see the oraloader-conf.xml document in "OraLoader for Hadoop Configuration Properties".

Table 3-4 Primary Job Configuration Properties for Oracle Loader for Hadoop

Property Name	Attributes and Description
`oracle.hadoop.loader.jobName`	Type: String Default: OraLoader Description: A Hadoop job name for this Oracle loader job. Used as input for the `Job.setJobName()` method.
`oracle.hadoop.loader.targetTable`	Type: String Default: Not defined Description: A schema-qualified name for the table to be loaded. Use this option to indicate that all columns of the table are to be loaded and that the names of the input fields match the column names. This property takes precedence over the `oracle.hadoop.loader.loaderMapFile` property.
`oracle.hadoop.loader.loaderMapFile`	Type: String Default: Not defined Description: Path to the loader map file
`oracle.hadoop.loader.tableMetadataFile`	Type: String Default: Not defined Description: Path to the target table metadata file. Use this option when running in disconnected mode. The table metadata file is created by running the `OraLoaderMetadata` utility.
`oracle.hadoop.loader.olhcachePath`	Type: String Default: ${mapred.output.dir}/.../olhcache Description: Path to a directory where Oracle Loader for Hadoop can create files that are loaded into the `DistributedCache`. In distributed mode, the value must be an HDFS path.
`oracle.hadoop.loader.extTabDirectoryName`	Type: String Default: `OLH_EXTTAB_DIR` Description: The name of the database directory object for the external table's `LOCATION` data files. This property applies only to the Delimited Text and Data Pump output formats.
`oracle.hadoop.loader.sampler.enableSampling`	Type: Boolean Default: true Description: Indicates whether the sampling feature is enabled
`oracle.hadoop.loader.enableSorting`	Type: Boolean Default: true Description: Indicates whether output records within each reducer group should be sorted by the primary key for the table
`oracle.hadoop.loader.connection.url`	Type: String Default: Not defined Description: Specifies the URL of the database connection string. This property takes precedence over all other connection properties. If an Oracle wallet is configured as an external password store, then the property value must start with the driver prefix `jdbc:oracle:thin:@` and the `db_connect_string` must exactly match the credential defined in the wallet.
`oracle.hadoop.loader.connection.user`	Type: String Default: Not defined Description: Name for database login
`oracle.hadoop.loader.connection.password`	Type: String Default: Not defined Description: Password for the connecting user
`oracle.hadoop.loader.connection.wallet_location`	Type: String Default: Not defined: Description: File path to an Oracle wallet where the connection information is stored. This property is used only for JDBC connections. For JDBC output format, when an Oracle wallet is used as an external password store, set the following two properties: `oracle.hadoop.loader.connection.wallet_location` `oracle.hadoop.loader.connection.url` Or, set the following three properties: `oracle.hadoop.loader.connection.wallet_location` `oracle.hadoop.loader.connection.tnsEntryName` `oracle.hadoop.loader.connection.tns_admin` For the OCI output format, set the `oracle.hadoop.loader.connection.tns_admin` property to indicate wallet location. Note that JDBC connections are always made for online loads, even when the OCI Direct Path output format is specified. The same wallet can be used for both connection types.
`oracle.hadoop.loader.connection.tnsEntryName`	Type: String Default: Not defined Description: Specifies a TNS entry name defined in the `tnsnames.ora` file. This property is used with the `oracle.hadoop.loader.connection.tns_admin` property.
`oracle.hadoop.loader.connection.tns_admin`	Type: String Default: Not defined Description: File path to a directory containing Oracle Net configuration files such as `sqlnet.ora` and `tnsnames.ora`. If this property is not set, then the value (if any) of the environment variable `TNS_ADMIN`, is used. Define this property in order to use TNS entry names in database connect strings. This property must be defined when using an Oracle wallet with OCI connections.
`oracle.hadoop.loader.connection.defaultExecuteBatch`	Type: Integer Default: 100 Description: Applicable only for the JDBC and OCI Direct Path output formats. It is the default value for the number of records to be inserted in a batch for each trip to the database. Specify a value greater than 1 to override the default value. If the specified value is less than 1, then this property assumes the default value. Although the maximum value is unlimited, using very large batch sizes is not recommended because it results in a large memory footprint without much increase in performance.
`oracle.hadoop.loader.connection.sessionTimeZone`	Type: String Default: LOCAL Description: This property is used to alter the session time zone for database connections. Valid values are as follows: [+\|-] hh:mm: Hours and minutes before or after UTC Local: The default time zone of the JVM time_zone_region: A valid time zone region This property also determines the default time zone used when parsing input data that is loaded into the following database column types: `TIMESTAMP`, `TIMESTAMP WITH TIME ZONE` and `TIMESTAMP WITH LOCAL TIME ZONE`.
`oracle.hadoop.loader.output.dirpathBufsize`	Type: Integer Default: 131072 Description: This property is used to set the size, in bytes, of the direct path stream buffer for `OCIOutputFormat`.If needed, values are rounded up to the next nearest multiple of 8 KB.
`oracle.hadoop.loader.output.fieldTerminator`	Type: String Default: , (comma) Description: A single character to delimit fields for `DelimitedTextOutputFormat`. Alternate representation: \uHHHH (where HHHH is the character's UTF-16 encoding).
`oracle.hadoop.loader.output.initialFieldEncloser`	Type: String Default: None Description: When this value is set, fields are always enclosed between the specified character and `${oracle.hadoop.loader.output.trailingFieldEncloser}`. If this value is set, it must be either a single character or \uHHHH (where HHHH is the character's UTF-16 encoding). A zero-length value means that there are no enclosers (default value). `${oracle.hadoop.loader.output.initialFieldEncloser}` and `${oracle.hadoop.loader.output.trailingFieldEncloser}` must be either both not set or both set. Use these properties when some field may contain the `fieldTerminator`. If some field may also contain the `trailingFieldEncloser`, then set the `escapeEnclosers` property to `true`.
`oracle.hadoop.loader.output.trailingFieldEncloser`	Type: String Default: None Description: When this value is set, fields are always enclosed between `${oracle.hadoop.loader.output.initialFieldEncloser}` and the specified character for this property. If this value is set, it must be either a single character, or \uHHHH (where HHHH is the character's UTF-16 encoding). A zero-length value means that there are no enclosers (default value). `${oracle.hadoop.loader.output.initialFieldEncloser}` and `${oracle.hadoop.loader.output.trailingFieldEncloser}` must be either both not set or both set. Use these properties when some field may contain the `fieldTerminator`. If some field may also contain the `trailingFieldEncloser`, then set the `escapeEnclosers` property to `true`.
`oracle.hadoop.loader.output.escapeEnclosers`	Type: Boolean Default: false Description: When this is set to `true` and both initial and trailing field enclosers are set, fields are scanned and embedded trailing encloser characters are escaped. Use this option when some of the field values may contain the trailing encloser character.
`oracle.hadoop.loader.input.fieldTerminator`	Type: String Default: , (comma) Description: A single character to delimit fields for `DelimitedTextInputFormat`. Alternate representation: \uHHHH (where HHHH is the character's UTF-16 encoding).
`oracle.hadoop.loader.input.initialFieldEncloser`	Type: String Default: None Description: When this value is set, fields can be enclosed between the specified character and `${oracle.hadoop.loader.input.trailingFieldEncloser}`. If this value is set, it must be either a single character or \uHHHH (where HHHH is the character's UTF-16 encoding). A zero-length value means no enclosers (default value). `${oracle.hadoop.loader.input.initialFieldEncloser}` and `${oracle.hadoop.loader.input.trailingFieldEncloser}` must be either both not set, or both set.
`oracle.hadoop.loader.input.trailingFieldEncloser`	Type: String Default: None Description: When this value is set, fields can be enclosed between `${oracle.hadoop.loader.input.initialFieldEncloser}` and the specified character. If this value is set, it must be either a single character or \uHHHH (where HHHH is the character's UTF-16 encoding). A zero-length value means no enclosers (default value). `${oracle.hadoop.loader.input.initialFieldEncloser}` and `${oracle.hadoop.loader.input.trailingFieldEncloser}` must be either both not set, or both set.
`oracle.hadoop.loader.input.fieldNames`	Type: Comma-separated list of strings Default: F0,F1,F2,... Description: Names to assign to input fields. The names are used to create the Avro schema for the record. The strings must be valid JSON name strings.

Table 3-5 General Properties for Oracle Loader for Hadoop

Property Name Description

Property Name	Description
`mapreduce.inputformat.class`	Name of the class implementing `InputFormat`
`mapreduce.outputformat.class`	Output options supported by Oracle Loader for Hadoop. The values can be: `oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat` Writes data records to delimited text format files such as comma-separated values (CSV) format files `oracle.hadoop.loader.lib.output.JDBCOutputFormat` Inserts data records into the target table using JDBC `oracle.hadoop.loader.lib.output.OCIOutputFormat` Inserts rows into the target table using the Oracle OCI Direct Path interface `oracle.hadoop.loader.lib.output.DataPumpOutputFormat` Writes rows into binary format files that can be loaded into the target table using an external table

mapreduce.inputformat.class

Name of the class implementing InputFormat

mapreduce.outputformat.class

Output options supported by Oracle Loader for Hadoop. The values can be:

oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat

Writes data records to delimited text format files such as comma-separated values (CSV) format files
oracle.hadoop.loader.lib.output.JDBCOutputFormat

Inserts data records into the target table using JDBC
oracle.hadoop.loader.lib.output.OCIOutputFormat

Inserts rows into the target table using the Oracle OCI Direct Path interface
oracle.hadoop.loader.lib.output.DataPumpOutputFormat

Writes rows into binary format files that can be loaded into the target table using an external table

3.7 Example of Using Oracle Loader for Hadoop

The example shown in this section uses Oracle Loader for Hadoop in the online database mode using JDBC. It involves the following steps:

Create a table in the database. This example uses the HR.EMPLOYEES table available as part of the HR sample schema in Oracle Database.
Implement an InputFormat class similar to the examples in the oracle.hadoop.loader.examples package.

Set the configuration properties. The MyLoaderMap.xml document contains the mapping of input data fields to columns in the HR.EMPLOYEES table, as follows:

<?xml version="1.0" encoding="UTF-8"?>
<LOADER_MAP>
<SCHEMA>HR</SCHEMA>
<TABLE>EMPLOYEES</TABLE>
<COLUMN field="empId">EMPLOYEE_ID</COLUMN>
<COLUMN field="lastName">LAST_NAME</COLUMN>
<COLUMN field="email">EMAIL</COLUMN>
<COLUMN field="hireDate" format="MM-dd-yyyy">HIRE_DATE</COLUMN>
<COLUMN field="jobId">JOB_ID</COLUMN>
</LOADER_MAP>

The configuration properties in MyConf.xml are as follows:

<configuration>
  <property>
    <name>mapreduce.inputformat.class</name>
    <value><full_class_name>.MyInputFormat</value>
    <description> Name of the class implementing InputFormat </description>
  </property>
 
  <property>
    <name>mapreduce.outputformat.class</name>
    <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
    <description> Output mode after the loader job executes on Hadoop  </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.loaderMapFile</name>
    <value>MyLoaderMap.xml</value>
    <description> The loaderMap file specifying the mapping of input data
     fields to the table columns </description>
  </property>
 
 <property>
   <name>oracle.hadoop.loader.connection.user</name>
   <value>HR</value>
   <description> Name of the user connecting to the database</description>
 </property>

<property> 
  <name>oracle.hadoop.loader.connection.password</name>
  <value>[HR password]</value>
  <description>Password of the user connecting to the database</description>
</property>
 
 <property>
   <name>oracle.hadoop.loader.connection.url</name>
   <value>jdbc:oracle:thin:@//example.com:1521/serviceName</value>
   <description> Database connection string </description>
 </property>
</configuration>

Invoke OraLoader.

bin/hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader -libjars \
avro-1.4.1.jar, MyInputFormat.jar -conf MyConf.xml \ 
-fs [<local|namenode:port>] \
-jt [<local|jobtracker:port>]

3.8 Target Table Characteristics

Oracle Loader for Hadoop supports loads into a single table, which is referred to as the target table. The target table must exist in the Oracle database. It can contain data or it can be empty.

3.8.1 Supported Data Types

Oracle Loader for Hadoop supports the following database built-in data types:

VARCHAR2
CHAR
NVARCHAR2
NCHAR
NUMBER
FLOAT
RAW
BINARY_FLOAT
BINARY_DOUBLE
DATE
TIMESTAMP
TIMESTAMP WITH TIME ZONE
TIMESTAMP WITH LOCAL TIME ZONE
INTERVAL YEAR TO MONTH
INTERVAL DAY TO SECOND

The target table can contain columns with unsupported data types, but these columns must be nullable, or otherwise set to a value.

3.8.2 Supported Partitioning Strategies

Oracle Loader for Hadoop supports the following single-level partitioning and composite-level partitioning strategies:

Range
List
Hash
Interval
Range-Range
Range-Hash
Range-List
List-Range
List-Hash
List-List
Hash-Range
Hash-Hash
Hash-List
Interval-Range
Interval-Hash
Interval-List

Oracle Loader for Hadoop does not support reference partitioning or virtual column-based partitioning.

3.9 Loader Map XML Schema Definition

This section contains the XML schema definition (XSD) for the loader map that specifies the columns to be loaded into the target table:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:attributeGroup name="columnAttrs">
    <xs:annotation>
      <xs:documentation>Column attributes define how to map input fields to the
                        database column. field - is the name of the field in the
                        IndexedRecord input object. The field name need not be
                        unique. This means that the same input field can map to
                        different columns in the database table. format - is a
                        format string for interpreting the input. For example,
                        if the field is a date then the format is a date format
                        string suitable for interpreting dates</xs:documentation>
    </xs:annotation>
    <xs:attribute name="field" type="xs:token" use="optional"/>
    <xs:attribute name="format" type="xs:token" use="optional"/>
  </xs:attributeGroup>
  <xs:simpleType name="TOKEN_T">
    <xs:restriction base="xs:token">
      <xs:minLength value="1"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:element name="LOADER_MAP">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="SCHEMA" type="TOKEN_T" minOccurs="0"/>
        <xs:element name="TABLE" type="TOKEN_T" nillable="false"/>
        <xs:element name="COLUMN" maxOccurs="unbounded" minOccurs="0">
          <xs:annotation>
            <xs:documentation>specifies the database column name that will be
                              loaded. Each column name must be unique.
            </xs:documentation>
          </xs:annotation>
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="TOKEN_T">
                <xs:attributeGroup ref="columnAttrs"/>
              </xs:extension>
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
 </xs:schema>

3.10 OraLoader for Hadoop Configuration Properties

This is the oraloader-conf.xml document, which describes the configuration properties for Oracle Loader for Hadoop:

<?xml version="1.0"?>
<!-- 
 Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved. 
 
   NAME
     oraloader-conf.xml 
 
   DESCRIPTION
     Config properties for OLH.  
     
     This file is loaded as the very first conf resource.
     Properties without default values are commented out.
-->
<configuration>
  <property>
    <name>oracle.hadoop.loader.libjars</name>
    <value>${oracle.hadoop.loader.olh_home}/jlib/ojdbc6.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n-utility.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n-mapping.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n-collation.jar,
${oracle.hadoop.loader.olh_home}/jlib/oraclepki.jar,
${oracle.hadoop.loader.olh_home}/jlib/osdt_cert.jar,
${oracle.hadoop.loader.olh_home}/jlib/osdt_core.jar,
${oracle.hadoop.loader.olh_home}/jlib/commons-math-2.2.jar,
${oracle.hadoop.loader.olh_home}/jlib/jackson-core-asl-1.5.2.jar,
${oracle.hadoop.loader.olh_home}/jlib/jackson-mapper-asl-1.5.2.jar,
${oracle.hadoop.loader.olh_home}/jlib/avro-1.5.4.jar,
${oracle.hadoop.loader.olh_home}/jlib/avro-mapred-1.5.4.jar</value> 
    <description>Comma separated list of library jar files. These jars get 
                 injected into the command-line arguments under the 
                 GenericOptionsParser's "-libjars" option. When a "-libjars" 
                 option is used as a command-line argument, then this list of
                 jars is prepended to the list following "-libjars". Users can
                 distribute their application jars using this property in place
                 of, or in combination with, the "-libjars" option.</description>
  </property>

  <property>
    <name>oracle.hadoop.loader.sharedLibs</name>
    <value>${oracle.hadoop.loader.olh_home}/lib/libolh11.so,
${oracle.hadoop.loader.olh_home}/lib/libclntsh.so.11.1,
${oracle.hadoop.loader.olh_home}/lib/libnnz11.so,
${oracle.hadoop.loader.olh_home}/lib/libociei.so</value>
  </property>

  <property>
    <name>oracle.hadoop.loader.olh_home</name>
    <value/>
    <description>
      A path to the OLH_HOME on the node where the OraLoader job
      is initiated. OraLoader uses this path to locate required libraries.
      If this property is not set, OraLoader will use the value in the environment
      variable OLH_HOME.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.jobName</name>
    <value>OraLoader</value>
    <description>
      Hadoop job name for this Oracle loader job. Used as input for
      the Job.setJobName() method.
    </description>
  </property>

  <property> 
    <name>oracle.hadoop.loader.targetTable</name>
    <value/>
    <description>
      A schema qualified name for the table to be loaded. Use this 
      property to indicate that all columns of the table will be 
      loaded and that the names of the input fields match the 
      column names. This property takes precedence over the
      oracle.hadoop.loader.loaderMapFile property. The default 
      value is null.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.loaderMapFile</name>
    <value/>
    <description>
      Path to the loader map file. Use a file:// schema to indicate a local file.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.tableMetadataFile</name>
    <value/>
    <description>
      Path to the target table metadata file. Use this property when
      running in disconnected mode. The table metadata file is
      created by running the OraLoaderMetadata utility. 
      Use a file:// schema to indicate a local file.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.olhcachePath</name>
    <value>${mapred.output.dir}/../olhcache</value>
    <description>
      Path to a directory where Oracle Loader for Hadoop can create
      files that will be loaded into the DistributedCache.
      Unique file names are generated every time; one may want to empty it,
      or it will grow bigger and bigger if jobs are run 
      using the same olhcache directory.
   
      The default value is a directory called 'olhcache' in the parent directory
      of the job's output directory (i.e. ${mapred.output.dir}).
      
      In distributed mode, the value must be a hdfs path
      (see javaDoc for org.apache.hadoop.filecache.DistributedCache).    
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.loadByPartition</name>
    <value>true</value>
    <description>
      Instructs the output format to perform a partition-aware load.
      For DelimitedText output format, this option controls whether the 
      keyword "PARTITION" appears in the generated .ctl file(s).
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.extTabDirectoryName</name>
    <value>OLH_EXTTAB_DIR</value>
    <description>
      The name of the Oracle directory object for the external table's
      LOCATION data files. This property applies only to the DelimitedText
      and DataPump output formats.
    </description>
  </property> 
 
  <property>
    <name>oracle.hadoop.loader.sampler.enableSampling</name>
    <value>true</value>
    <description>
      Indicates whether the sampling feature is enabled. 
      Set the value to false to disable this feature.
    </description>
  </property>
    
  <property>
    <name>oracle.hadoop.loader.enableSorting</name>
    <value>true</value>
    <description>
      Indicates whether output records within each reducer group 
      should be sorted by the primary key for the table.
    </description>
  </property>
  
  <property>
     <name>oracle.hadoop.loader.configuredCounters</name>
     <value>MAPPER,OUTPUT</value>
     <description>
       Turns ON Oracle Loader for Hadoop counters by category. The value is a
       comma separated list of zero or more of the following keywords: 
       MAPPER, REDUCER, OUTPUT, and SAMPLER.
       
       Note that the input error counters (displayed in the 
       "map phase counters" section of the final report) are always on, 
       regardless of the presence of the keyword MAPPER in this list.
 
       Newer release of Hadoop (0.20.203, cdh3u3) impose a hard limit on the
       total number of counters a job can use (see property 
       mapreduce.job.counters.limit in mapred-site.xml). Note that this 
       limit cannot be changed on a per-job basis, and the cluster needs
       to be restarted after the property has beed updated on all nodes.
 
       In order to turn off all the Oracle Loader for Hadoop specific counters, 
       set this property's value to an empty list using either:
 
       -D oracle.hadoop.loader.configuredCounters= 
 
       or
 
       <property>
         <name>oracle.hadoop.loader.configuredCounters</name>
         <value>,</value>
       </property>
 
     </description>
  </property>
  
  <!-- CONNECTION properties -->
  
  <property>
    <name>oracle.hadoop.loader.connection.url</name>
    <value/>
    <description>
      Specifies the URL of the database connection string. This property 
      takes precedence and overrides all other connection properties.
    
      If Oracle Wallet is configured as an external password store,
      the property value must start with the driver prefix: jdbc:oracle:thin:@ 
      and the db_connect_string must exactly match the credential defined in the 
      wallet.
 
        Example 1: ( using oracle net syntax) 
        jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
            (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521)))
                     (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
        
        Example 2: ( using TNS entry)
          jdbc:oracle:thin:@myTNS
        
        - Also see documentation for
          oracle.hadoop.loader.connection.wallet_location          
    
      If Oracle Wallet is NOT used, then set the following conf properties:      
      oracle.hadoop.loader.connection.url
 
        Examples of connection URL styles:
          thin-style: 
            jdbc:oracle:thin:@//myhost:1521/my_db_service_name  
            jdbc:oracle:thin:user/password@//myhost:1521/my_db_service_name
         
          Oracle Net:
            jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
                (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521)))
                         (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
        
          TNSEntry Name:
            jdbc:oracle:thin:@myTNSEntryName
        
     AND 
     oracle.hadoop.loader.connection.user  
     oracle.hadoop.loader.connection.password 
 
     If OCIOutputFormat is configured,and Oracle Wallet is not used,then
     username and password must be specified in these separate properties. 
       
    </description>
  </property>

   <property>
     <name>oracle.hadoop.loader.connection.user</name>
     <value/>
     <description>Name for the database login.</description>
   </property>

  <property>
    <name>oracle.hadoop.loader.connection.password</name>
    <value/>
    <description>Password for the connecting user.</description>
  </property>

  <property>
    <name>oracle.hadoop.loader.connection.wallet_location</name>
    <value/>
    <description>File path to an Oracle wallet where the connection information
     is stored. This property is used only for JDBC connections. For JDBC output
     format, when using Oracle Wallet as an external password store, set the
     following two properties:
 
     - oracle.hadoop.loader.connection.wallet_location
     - oracle.hadoop.loader.connection.url

     Or, set the following three properties:

     - oracle.hadoop.loader.connection.wallet_location
     - oracle.hadoop.loader.connection.tnsEntryName
     - oracle.hadoop.loader.connection.tns_admin 

     For the OCI output format, set the 
     oracle.hadoop.loader.connection.tns_admin property to indicate 
     wallet location.

     Note that JDBC connections are always made for online loads, even when  
     the OCI Direct Path output format is specified. The same wallet can be 
     used for both connection types.
    </description>
  </property>
  
  <property>
    <name>oracle.hadoop.loader.connection.tnsEntryName</name>
    <value/>
    <description>Specifies a TNS entry name defined in the tnsnames.ora file.
    This property is used together with the 
    oracle.hadoop.loader.connection.tns_admin property.
    </description>
  </property>  
  
  <property>
    <name>oracle.hadoop.loader.connection.tns_admin</name>
    <value/>
    <description>File path to a directory containing
      SQL*Net configuration files like sqlnet.ora and tnsnames.ora.
      If this property is not set, the value of the environment
      variable TNS_ADMIN will be used. Define this property in order
      to use TNS entry names in database connect strings. 
      This property must be defined when using an Oracle Wallet with OCI
      connections.
    </description>
  </property>  
 
  <property>
    <name>oracle.hadoop.loader.connection.defaultExecuteBatch</name>
    <value>100</value>
    <description>
       Applicable only for JDBC and OCI output formats. The default
       value for the number of records to be inserted in a batch for
       each trip to the database. Specify a value >= 1 to
       override the default value. If the specified value is less than 1,
       this property assumes the default value. Though the maximum
       value is unlimited, using very large batch sizes is not
       recommended, as it results in a large memory footprint without
       much increase in performance.
     </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.connection.sessionTimezone</name>
    <value>LOCAL</value>
    <description>
      This property is used to alter the session time zone for 
      database connections.  Valid values are:
      
        [+|-] hh:mm      - hours and minutes before or after UTC
        LOCAL            - the default timezone of the JVM 
        time_zone_region - a valid time zone region
      
      This property also determines the default timezone when parsing
      input data that will be loaded to database column types:
      TIMESTAMP, TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE
    </description>
  </property>
  
  <!-- properties for OCIOutputFormat -->
  <property>
    <name>oracle.hadoop.loader.output.dirpathBufsize</name>
    <value>131072</value>
    <description>
      This property is used to set the size, in bytes, of the direct path stream
      buffer for OCIOutputFormat.  If needed, values are rounded up to the next 
      nearest multiple of 8k.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.compressionFactors</name>
    <value>BASIC=5.0,OLTP=5.0,QUERY_LOW=10.0,QUERY_HIGH=10.0,
      ARCHIVE_ LOW=10.0,ARCHIVE_HIGH=10.0</value>
    <description>
      This property is used to define the compression factor for different types
      of compression. The format is a comma separated list of name=value pairs
      where name is one of BASIC, OLTP, QUERY_LOW, QUERY_HIGH, ARCHIVE_LOW, or
      ARCHIVE_HIGH.  Value is a decimal number.
    </description>
  </property>
  
  <!-- properties for DelimitedTextOutputFormat -->
    <property>
      <name>oracle.hadoop.loader.output.fieldTerminator</name>
      <value>,</value>
      <description>
        A single character to delimit fields for DelimitedTextOutputFormat.
        Alternate representation: \uHHHH (where HHHH is the character's UTF-16 
        encoding).
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.output.initialFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are always enclosed between the
        specified character and 
        ${oracle.hadoop.loader.output.trailingFieldEncloser}.
        
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.output.initialFieldEncloser} and 
        ${oracle.hadoop.loader.output.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).
                           
        Use this when some field may contain the fieldTerminator. 
        If some field may also contain the trailingFieldEncloser, then
        the escapeEnclosers property should be set to true.
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.output.trailingFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are always enclosed between 
        ${oracle.hadoop.loader.output.initialFieldEncloser} and the
        specified character for this property.
 
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.output.initialFieldEncloser} and 
        ${oracle.hadoop.loader.output.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).
                           
        Use this when some field may contain the fieldTerminator. 
        If some field may also contain the trailingFieldEncloser, then
        the escapeEnclosers property should be set to true.
      </description>
    </property>
 
    <property>
      <name>oracle.hadoop.loader.output.escapeEnclosers</name>
      <value>false</value>
      <description>
        When this is set to true and both initial and trailing field enclosers 
        are set, fields will be scanned, and embedded trailing encloser 
        characters will be escaped. Use this option when some of the field
        values may contain the trailing encloser character.
      </description>
    </property>
 
  <!-- properties for DelimitedTextInputFormat -->
                    <property>
    <name>oracle.hadoop.loader.input.fieldNames</name>
    <value/>
    <description>
    Comma-separated list of names to assign to input fields.
    The names are used to create the Avro schema for the record.
    The strings must be valid JSON name strings.
 
    If this property is not set, the names F0,F1,F2,... will be used
    (consistent with oracle.hadoop.loader.examples.CSVInputFormat).
    </description>
    </property>
 
    <property>
      <name>oracle.hadoop.loader.input.fieldTerminator</name>
      <value>,</value>
      <description>
        A single character to delimit fields for DelimitedTextInputFormat.
        Alternate representation: \uHHHH (where HHHH is the character's UTF-16 
        encoding).
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.input.initialFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are allowed to be enclosed
        between the specified character and 
        ${oracle.hadoop.loader.input.trailingFieldEncloser}.
        
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.input.initialFieldEncloser} and 
        ${oracle.hadoop.loader.input.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.input.trailingFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are allowed to be enclosed
        between ${oracle.hadoop.loader.input.initialFieldEncloser} 
        and the specified character.
        
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.input.initialFieldEncloser} and 
        ${oracle.hadoop.loader.input.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).      
      </description>
    </property>
 
    <!--Properties for tuning the sampler-->
    <!-- set numThreads > 1 for large datasets -->
    <property>
      <name>oracle.hadoop.loader.sampler.numThreads</name>
      <value>5</value>
      <description>Number of sampler threads.  
      This value should be set based on the processor and memory resources 
      available to the job tracker node. A higher number of sampler threads 
      implies higher concurrency in sampling.
      The default value is 5 threads.  
      </description>
    </property>
 
  <property>
     <name>oracle.hadoop.loader.sampler.maxLoadFactor</name>
     <value>0.05</value>
     <description> 
       The maximum acceptable reducer load factor.
       In a perfectly load balanced job, every reducer is assigned 
       an equal amount of work (or load). 
       Load factor is the percent overload per reducer 
       i.e. (assigned load - ideal load)%
       For example: a value of 0.05, indicates that it is acceptable for 
       reducers to be assigned up to 5% more data than their ideal load. 
       If load balancing is successful, it guarantees this 
       maximum load factor at the specified confidence.
       (see oracle.hadoop.loader.sampler.loadCI)
       Default = 0.05, another common value is 0.1.
     </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.loadCI</name>
    <value>0.95</value>
    <description> 
      The confidence level for the specified 
      maximum reducer load factor.
      (See oracle.hadoop.loader.sampler.maxLoadFactor)
      Default = 0.95, other common values = 0.90, 0.99
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.minSplits</name>
    <value>5</value>
    <description>
      The minimum number of splits that will be 
      read by the sampler. If the total number of splits 
      is lesser than this value, then the sampler will read
      all splits. Splits may be read partially. 
      A non-positive value is equivalent to minSplits=1. 
      The default value is 5.
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.hintMaxSplitSize</name>
    <value>1048576</value>
    <description> 
      The sampler sets Hadoop configuration parameter
      mapred.max.split.size to this value before it calls the InputFormat's 
      getSplits() method.
      The value of mapred.max.split.size is only set to this value for the 
      duration of sampling, it is not changed in the actual job 
      configuration. Some InputFormats (e.g. FileInputFormat) use the 
      maximum split size as a hint to determine the number of splits 
      returned by getSplits(). Smaller split sizes imply that more
      chunks of data will be sampled at random (good). While large splits are 
      better for IO performance, they are not necessarily better for sampling. 
      Set this value to be small enough for good sampling performance, 
      but not any smaller: extremely small values can cause inefficient IO 
      performance and cause getSplits() to run out of memory by returning too
      many splits. 
      The recommended minimum value for this property is 1048576 bytes (1 MB).
      This value can be increased for larger datasets (e.g. tens of terabytes) 
      or if the InputFormat's getSplits() method throws an OutOfMemoryError.       
      If the specified value is less than 1, this property is ignored. 
      The default value is 1048576 bytes (1 MB).
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.hintNumMapTasks</name>
    <value>100</value>
    <description> 
      The sampler sets Hadoop configuration parameter
      mapred.map.tasks to this value for the duration of sampling. 
      The value of mapred.map.tasks is not changed in the actual job 
      configuration. Some InputFormats (e.g. DBInputFormat) use the 
      number of map tasks parameter as a hint to determine the number of 
      splits returned by getSplits(). Higher values imply that more chunks 
      of data will be sampled at random (good). The default value is 100. 
      This value should typically be increased for large datasets (e.g. more 
      than a million rows), while keeping in mind that extremely large values
      can cause the InputFormat's getSplits() method to run out of memory by
      returning too many splits.
      If the specified value is less than 1, this property is ignored.  
      The default value is 100.
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.maxSamplesPct</name>
    <value>0.01</value>
    <description> 
      This property specifies the maximum data to sample, as a
      percentage of the total amount of data. In general, the 
      sampler will stop sampling if any one of the following is true:
      (1) it has collected the minimum number of samples 
          required for optimal load-balancing, or 
      (2) the percent of data sampled exceeds 
          oracle.hadoop.loader.sampler.maxSamplesPct, or 
      (3) the number of bytes sampled exceeds 
          oracle.hadoop.loader.sampler.maxHeapBytes.
      If this parameter is set to a negative value, 
      condition (2) is not imposed.
      The default value is 0.01 (1%).
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.maxHeapBytes</name>
    <value>-1</value>
    <description> 
      This value specifies the maximum memory available to 
      the sampler in bytes. In general, the sampler will 
      stop sampling when any one of these conditions is true:
      (1) it has collected the minimum number of samples 
          required for optimal load-balancing, or 
      (2) the percent of data sampled exceeds 
          oracle.hadoop.loader.sampler.maxSamplesPct, or 
      (3) the number of bytes sampled exceeds 
          oracle.hadoop.loader.sampler.maxHeapBytes.
      If this parameter is set to a negative value, 
      condition (3) is not imposed.
      Default = -1 (no memory restrictions on the sampler).
    </description>
  </property>
  
</configuration>

3.11 Third-Party Licenses for Bundled Software

Oracle Loader for Hadoop installs the following third-party products:

Apache Avro
Apache Commons Mathematics Library
Jackson JSON Processor

Oracle Loader for Hadoop includes Oracle 11g Release 2 (11.2) client libraries. For information about third party product included with Oracle Database 11g Release 2 (11.2), refer toOracle Database Licensing Information.

Unless otherwise specifically noted, or as required under the terms of the third party license (e.g., LGPL), the licenses and statements herein, including all statements regarding Apache-licensed code, are intended as notices only.

3.11.1 Apache Licensed Code

The following is included as a notice in compliance with the terms of the Apache 2.0 License, and applies to all programs licensed under the Apache 2.0 license:

You may not use the identified files except in compliance with the Apache License, Version 2.0 (the "License.")

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

A copy of the license is also reproduced below.

See the License for the specific language governing permissions and limitations under the License.

Apache License

Version 2.0, January 2004
http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

Definitions

"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
1. You must give any other recipients of the Work or Derivative Works a copy of this License; and
2. You must cause any modified files to carry prominent notices stating that You changed the files; and
3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work

To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Do not include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This product includes software developed by The Apache Software Foundation (http://www.apache.org/) (listed below):

3.11.2 Apache Avro avro-1.5.4.jar

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

3.11.3 Apache Commons Mathematics Library 2.2

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

3.11.4 Jackon JSON Library 1.5.2

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0