Oracle Database Adapter

The Oracle Database adapter provides custom functions for loading data into tables in Oracle Database.

A custom put function supported by this adapter automatically calls Oracle Loader for Hadoop at run time, either to load the data immediately or to output it to HDFS. You can declare and use multiple custom Oracle Database adapter put functions within a single query. For example, you might load data into different tables or into different Oracle databases with a single query.

Ensure that Oracle Loader for Hadoop is installed on your system, and that the OLH_HOME environment variable is set to the installation directory. See Steps 1 to 3 "Installing Oracle Loader for Hadoop." Although not required, you might find it helpful to familiarize yourself with Oracle Loader for Hadoop before using this adapter.

The Oracle Database adapter is described in the following topics:

Custom Functions for Writing to Oracle Database
%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties

Custom Functions for Writing to Oracle Database

You can use the following annotations to define functions that write to tables in an Oracle database either directly or by generating binary or text files for subsequent loading with another utility, such as SQL*Loader.

Signature

Custom functions for writing to Oracle database tables must have the following signature:

declare %oracle:put(["jdbc" | "oci" | "text" | "datapump"])
   [%oracle:columns(col1 [, col2...])] [%oracle-property annotations]
   function local:myPut($column1 [as xs:allowed_type_name[?]], [$column2 [as xs:allowed_type_name[?]], ...]) external;

Annotations

%oracle:put("output_mode"?)

Declares the put function and the output mode. Required.

The optional output_mode parameter can be one of the following string literal values:

jdbc: Writes to an Oracle database table using a JDBC connection. Default.

See "JDBC Output Format."
oci: Writes to an Oracle database table using an Oracle Call Interface (OCI) connection.

See "Oracle OCI Direct Path Output Format."
datapump: Creates Data Pump files and associated scripts in HDFS for subsequent loading with another utility.

See "Oracle Data Pump Output Format."
text: Creates delimited text files and associated scripts in HDFS

See "Delimited Text Output Format."

To write directly to an Oracle database table using either JDBC or OCI, all systems involved in processing the query must be able to connect to the Oracle Database system. See "About the Modes of Operation."

%oracle:columns(col1 [, col2...])

Identifies a selection of one or more column names in the target table. The order of column names corresponds to the order of the function parameters. See "Parameters." Optional.

This annotation supports loading a subset of the table columns. If omitted, the put function attempts to load all columns of the target table.

%oracle-property:property_name (value)

Controls various aspects of connecting to the database and writing data. See "%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties" for descriptions of the supported properties. You can specify multiple %oracle-property annotations.

These annotations correspond to the Oracle Loader for Hadoop configuration properties. Every %oracle-property annotation has an equivalent Oracle Loader for Hadoop configuration property, which you can specify in a configuration file passed to Hadoop with the generic -conf option. See "Running a Query."

The %oracle-property annotations are optional. However, the various loading scenarios require you to specify some of them or their equivalent configuration properties. For example, to load data into an Oracle database using JDBC or OCI, you must specify the target table and the connection information.

The following example specifies a target table named VISITS, a user name of db, a password of password, and the URL connection string:

%oracle-property:targetTable('visits')
%oracle-property:connection.user('db')
%oracle-property:connection.password('password')
%oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')

Parameters

$column1 [as xs:allowed_type_name[?]], [$column2 [as xs:allowed_type_name[?]],...]

Enter a parameter for each column in the same order as the Oracle table columns to load all columns, or use the %oracle:columns annotation to load selected columns.

Because the correlation between parameters and database columns is positional, the name of the parameter (column1 in the parameter syntax) is not required to match the name of the database column.

You can omit the explicit as xs:allowed_type_name type declaration for any parameter. For example, you can declare the parameter corresponding to a NUMBER column simply as $column1. In this case, the parameter is automatically assigned an XQuery type of item()*. At run time, the input value is cast to the allowed XQuery type for the corresponding table column type, as described in Table 6-2. For example, data values that are mapped to a column with a NUMBER data type are automatically cast as xs:decimal. An error is raised if the cast fails.

Alternatively, you can specify the type or its subtype for any parameter. In this case, compile-time type checking is performed. For example, you can declare a parameter corresponding to a NUMBER column as $column as xs:decimal. You can also declare it as any subtype of xs:decimal, such as xs:integer.

You can include the ? optional occurrence indicator for each specified parameter type. This indicator allows the empty sequence to be passed as a parameter value at run time, so that a null is inserted into the database table. Any occurrence indicator other than ? raises a compile-time error.

Table 6-2 describes the appropriate mappings of XQuery data types with the supported Oracle Database data types. In addition to the listed XQuery data types, you can also use the subtypes, such as xs:integer instead of xs:decimal. Oracle data types are more restrictive than XQuery data types, and these restrictions are identified in the table.

Table 6-2 Data Type Mappings Between Oracle Database and XQuery

Database Type	XQuery Type
`VARCHAR2`	`xs:string` Limited by the `VARCHAR2` maximum size of 4000 bytes
`CHAR`	`xs:string` Limited by the `CHAR` maximum size of 2000 bytes
`NVARCHAR2`	`xs:string` Limited by the `NVARCHAR2` maximum size of 4000 bytes
`NCHAR`	`xs:string` Limited by the `NCHAR` maximum size of 2000 bytes
`DATE`	`xs:dateTime` Limited to the range of January 1, 4712 BC to December 31, 9999 CE. If a time zone is specified in the `xs:dateTime` value, then the time zone information is dropped. Fractional seconds are also dropped. A time value of 24:00:00 is not valid.
`TIMESTAMP`	`xs:dateTime` Limited to the range of January 1, 4712 BC to December 31, 9999 CE. If a time zone is specified in the `xs:dateTime` value, then the time zone information is dropped. Fractional seconds are limited to a precision of 0 to 9 digits. A time value of 24:00:00 is not valid.
`TIMESTAMP W LOCAL TIME ZONE`	`xs:dateTime` Limited to the range of January 1, 4712 BC to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. See "About Session Time Zones."
`TIMESTAMP W TIME ZONE`	`xs:dateTime` Limited to the range of January 1, 4712 BC to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. See "About Session Time Zones."
`INTERVAL DAY TO SECOND`	`xs:dateTimeDuration` The day and fractional seconds are limited by a precision of 0 to 9 digits each. The hour is limited to a range of 0 to 23, and minutes and seconds are limited to a range of 0 to 59.
`INTERVAL YEAR TO MONTH`	`xs:yearMonthDuration` The year is limited by a precision of 0 to 9 digits, and the month is limited to a range of 0 to 11.
`BINARY_FLOAT`	`xs:float`
`BINARY_DOUBLE`	`xs:double`
`NUMBER`	`xs:decimal` Limited by the `NUMBER` precision of 1 to 38 decimal digits and scale of -84 to 127 decimal digits.
`FLOAT`	`xs:decimal` Limited by the `FLOAT` precision of 1 to 126 binary digits
`RAW`	`xs:hexBinary` Limit by the `RAW` maximum size of 2000 bytes

About Session Time Zones

If an xs:dateTime value with no time zone is loaded into TIMESTAMP W TIME ZONE or TIMESTAMP W LOCAL TIME ZONE, then the time zone is set to the value of the sessionTimeZone parameter, which defaults to the JVM time zone. Using Oracle XQuery for Hadoop, you can set the sessionTimeZone property, as described in "%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties."

Notes

With JDBC or OCI output modes, the Oracle Database Adapter loads data directly into the database table. It also creates a directory with the same name as the custom put function name, under the query output directory. For example, if your query output directory is myoutput, and your custom function is myPut, then the myoutput/myPut directory is created.

For every custom Oracle Database Adapter put function, a separate directory is created. This directory contains output produced by the Oracle Loader for Hadoop job. When you use datapump or text output modes, the data files are written to this directory. The control and SQL scripts for loading the files are written to the _olh subdirectory, such as myoutput/myPut/_olh.

For descriptions of the generated files, see "Delimited Text Output Format" and "Oracle Data Pump Output Format."

Examples of Oracle Database Adapter Functions

These examples use following text files in HDFS. The files contain a log of visits to different web pages. Each line represents a visit to a web page and contains the time, user name, and page visited.

mydata/visits1.log  
 
2013-10-28T06:00:00, john, index.html, 200
2013-10-28T08:30:02, kelly, index.html, 200
2013-10-28T08:32:50, kelly, about.html, 200
2013-10-30T10:00:10, mike, index.html, 401

mydata/visits2.log  

2013-10-30T10:00:01, john, index.html, 200
2013-10-30T10:05:20, john, about.html, 200
2013-11-01T08:00:08, laura, index.html, 200
2013-11-04T06:12:51, kelly, index.html, 200
2013-11-04T06:12:40, kelly, contact.html, 200

The examples also uses the following file in HDFS, which contains anonymous page visits.

mydata/anonvisits.log
 
2011-10-30T10:01:01, index.html, 401
2011-11-04T06:15:40, contact.html, 401

The VISITS table in the Oracle database is created with this SQL command:

CREATE TABLE visits (time TIMESTAMP, name VARCHAR2(15), page VARCHAR2(15), code NUMBER)

Example 1 Loading All Columns

The first query loads all information related to the page visit (time of visit, user name, page visited, and status code) to the VISITS table. In case of anonymous access, the user name is missing, so the query specifies () to insert a null into the table. The target table name, user name, password, and connection URL are specified with %oracle-property annotations.

While the example uses a clear-text user name and password, Oracle recommends that you use a wallet instead for security. You can configure an Oracle wallet using either Oracle Loader for Hadoop properties or their equivalent %oracle-property annotations. The specific properties that you must set are described in "Providing the Connection Details for Online Database Mode."

import module "oxh:text";
 
declare
   %oracle:put
   %oracle-property:targetTable('visits')
   %oracle-property:connection.user('db')
   %oracle-property:connection.password('password')
   %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')
function local:myPut($c1, $c2, $c3, $c4) external;
 
for $line in text:collection("*visits*.log")
let $split := fn:tokenize($line, "\s*,\s*")
return
   if (count($split) > 3) then
      local:myPut(xs:dateTime($split[1]), $split[2], $split[3], $split[4])
   else
      local:myPut(xs:dateTime($split[1]), (), $split[2], $split[3])

The VISITS table contains the following data after the query runs:

TIME                           NAME            PAGE                  CODE
------------------------------ --------------- --------------- ----------
30-OCT-13 10.00.01.000000 AM   john            index.html             200
30-OCT-13 10.05.20.000000 AM   john            about.html             200
01-NOV-13 08.00.08.000000 AM   laura           index.html             200
04-NOV-13 06.12.51.000000 AM   kelly           index.html             200
04-NOV-13 06.12.40.000000 AM   kelly           contact.html           200
28-OCT-13 06.00.00.000000 AM   john            index.html             200
28-OCT-13 08.30.02.000000 AM   kelly           index.html             200
28-OCT-13 08.32.50.000000 AM   kelly           about.html             200
30-OCT-13 10.00.10.000000 AM   mike            index.html             401
30-OCT-11 10.01.01.000000 AM                   index.html             401
04-NOV-11 06.15.40.000000 AM                   contact.html           401

Example 2 Loading Selected Columns

This example uses the %oracle:columns annotation to load only the time and name columns of the table. It also loads only visits by john.

The column names specified in %oracle:columns are positionally correlated to the put function parameters. Data values provided for the $c1 parameter are loaded into the TIME column, and data values provided for the $c2 parameter are loaded into the NAME column.

import module "oxh:text";

declare
   %oracle:put
   %oracle:columns('time', 'name')
   %oracle-property:targetTable('visits')
   %oracle-property:connection.user('db')
   %oracle-property:connection.password('password')
   %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')
function local:myPut($c1, $c2) external;
 
for $line in text:collection("*visits*.log")
let $split := fn:tokenize($line, "\s*,\s*")
where $split[2] eq 'john'
return
   local:myPut(xs:dateTime($split[1]), $split[2])

If the VISITS table is empty before the query runs, then it contains the following data afterward:

TIME                           NAME            PAGE                 CODE
------------------------------ --------------- --------------- ----------
30-OCT-13 10.00.01.000000 AM   john
30-OCT-13 10.05.20.000000 AM   john
28-OCT-13 06.00.00.000000 AM   john

%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties

When you use the Oracle Database adapter of Oracle XQuery for Hadoop, you indirectly use Oracle Loader for Hadoop. Oracle Loader for Hadoop defines configuration properties that control various aspects of connecting to Oracle Database and writing data. Oracle XQuery for Hadoop supports many of these properties. All supported properties are listed in Table 6-3.

In Oracle XQuery for Hadoop, you can specify these properties with the generic -conf and -D Hadoop command-line options, just like you specify them when you use Oracle Loader for Hadoop directly. Alternatively, you can specify these properties as custom put function annotations with the %oracle-property prefix.

When you specify a property with the Hadoop options, it applies to all custom Oracle Database Adapter put functions in your query. When you specify it as an %oracle-property annotation, it applies only to the particular custom put function that has this annotation.

For example, this annotation sets the target table to VISITS only for the particular put function that has the annotation in the declaration.

%oracle-property:connection.url('visits')

Alternatively, you can specify the table with -D by using the corresponding property name. This property sets the target table to VISITS for all Oracle Database adapter put functions in your query,

-D oracle.hadoop.loader.targetTable=visits

This flexibility is provided for convenience. For example, if a query has multiple put functions, each writing to a different table in the same database, you might specify the properties like this:

The database connection URL in the oracle.hadoop.loader.connection.url property in the configuration file. Then identify the configuration file using the -conf option, thereby applying it to all put functions in the query.
The different table names using the %oracle-property:targetTable annotation in each put function.

If you already use Oracle Loader for Hadoop and have a configuration file, you can simply copy-and-paste these properties into the Oracle XQuery for Hadoop configuration file and identify it to Hadoop with the -conf option.

Table 6-3 identifies the annotations and their equivalent Oracle Loader for Hadoop properties. Oracle XQuery for Hadoop only supports the Oracle Loader for Hadoop properties listed in this table.

Table 6-3 %oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties

Annotation	Property
`%oracle-property:targetTable`	`oracle.hadoop.loader.targetTable`
`%oracle-property:connection.user`	`oracle.hadoop.loader.connection.user`
`%oracle-property:connection.password`	`oracle.hadoop.loader.connection.password`
`%oracle-property:connection.url`	`oracle.hadoop.loader.connection.url`
`%oracle-property:connection.wallet_location`	`oracle.hadoop.loader.connection.wallet_location`
`%oracle-property:connection.tns_admin`	`oracle.hadoop.loader.connection.tns_admin`
`%oracle-property:connection.tnsEntryName`	`oracle.hadoop.loader.connection.tnsEntryName`
`%oracle-property:connection.defaultExecuteBatch`	`oracle.hadoop.loader.connection.defaultExecuteBatch`
`%oracle-property:connection.oci_url`	`oracle.hadoop.loader.connection.oci_url`
`%oracle-property:connection.sessionTimeZone`	`oracle.hadoop.loader.connection.sessionTimeZone`
`%oracle-property:extTabDirectoryName`	`oracle.hadoop.loader.extTabDirectoryName`
`%oracle-property:input.initialFieldEncloser`	`oracle.hadoop.loader.input.initialFieldEncloser`
`%oracle-property:output.escapeEnclosers`	`oracle.hadoop.loader.output.escapeEnclosers`
`%oracle-property:output.fieldTerminator`	`oracle.hadoop.loader.output.fieldTerminator`
`%oracle-property:output.granuleSize`	`oracle.hadoop.loader.output.granuleSize`
`%oracle-property:output.initialFieldEncloser`	`oracle.hadoop.loader.output.initialFieldEncloser`
`%oracle-property:output.trailingFieldEncloser`	`oracle.hadoop.loader.output.trailingFieldEncloser`
`%oracle-property:sampler.enableSampling`	`oracle.hadoop.loader.sampler.enableSampling`
`%oracle-property:sampler.hintMaxSplitSize`	`oracle.hadoop.loader.sampler.hintMaxSplitSize`
`%oracle-property:sampler.hintNumMapTask`	`oracle.hadoop.loader.sampler.hintNumMapTasks`
`%oracle-property:sampler.loadCI`	`oracle.hadoop.loader.sampler.loadCI`
`%oracle-property:sampler.maxHeapBytes`	`oracle.hadoop.loader.sampler.maxHeapBytes`
`%oracle-property:sampler.maxLoadFactor`	`oracle.hadoop.loader.sampler.maxLoadFactor`
`%oracle-property:sampler.maxSamplesPct`	`oracle.hadoop.loader.sampler.maxSamplesPct`
`%oracle-property:sampler.minSplits`	`oracle.hadoop.loader.sampler.minSplits`
`%oracle-property:sampler.numThreads`	`oracle.hadoop.loader.sampler.numThreads`
`%oracle-property:tableMetadataFile`	`oracle.hadoop.loader.tableMetadataFile`
`%oracle-property:loadByPartition`	`oracle.hadoop.loader.loadByPartition`
`%oracle-property:enableSorting`	`oracle.hadoop.loader.enableSorting`
`%oracle-property:sortKey`	`oracle.hadoop.loader.sortKey`
`%oracle-property:rejectLimit`	`oracle.hadoop.loader.rejectLimit`
`%oracle-property:badRecordFlushInterva`l	`oracle.hadoop.loader.badRecordFlushInterval`
`%oracle-property:compressionFactors`	`oracle.hadoop.loader.compressionFactors`
`%oracle-property:logBadRecords`	`oracle.hadoop.loader.logBadRecords`
`%oracle-property:dirpathBufsize`	`oracle.hadoop.loader.output.dirpathBufsize`