Oracle Database Adapter

The Oracle Database adapter provides custom functions for loading data into tables in Oracle Database.

A custom put function supported by this adapter automatically calls Oracle Loader for Hadoop at run time, either to load the data immediately or to output it to HDFS. You can declare and use multiple custom Oracle Database adapter put functions within a single query. For example, you might load data into different tables or into different Oracle databases with a single query.

Ensure that Oracle Loader for Hadoop is installed on your system, and that the OLH_HOME environment variable is set to the installation directory. See Step 2 of "Installing Oracle XQuery for Hadoop." Although not required, you might find it helpful to familiarize yourself with Oracle Loader for Hadoop before using this adapter.

The Oracle Database adapter is described in the following topics:

Custom Functions for Writing to Oracle Database
Examples of Oracle Database Adapter Functions
Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations

Custom Functions for Writing to Oracle Database

You can use the following annotations to define functions that write to tables in an Oracle database either directly or by generating binary or text files for subsequent loading with another utility, such as SQL*Loader.

Signature

Custom functions for writing to Oracle database tables must have the following signature:

declare %oracle:put(["jdbc" | "oci" | "text" | "datapump"])
   [%oracle:columns(col1 [, col2...])] [%oracle-property annotations]
   function local:myPut($column1 [as xs:allowed_type_name[?]], [$column2 [as xs:allowed_type_name[?]], ...]) external;

Annotations

%oracle:put("output_mode"?)

Declares the put function and the output mode. Required.

The optional output_mode parameter can be one of the following string literal values:

jdbc: Writes to an Oracle database table using a JDBC connection. Default.

See "JDBC Output Format."
oci: Writes to an Oracle database table using an Oracle Call Interface (OCI) connection.

See "Oracle OCI Direct Path Output Format."
datapump: Creates Data Pump files and associated scripts in HDFS for subsequent loading by another utility.

See "Oracle Data Pump Output Format."
text: Creates delimited text files and associated scripts in HDFS.

See "Delimited Text Output Format."

For Oracle XQuery for Hadoop to write directly to an Oracle database table using either JDBC or OCI, all systems involved in processing the query must be able to connect to the Oracle Database system. See "About the Modes of Operation."

%oracle:columns(col1 [, col2...])

Identifies a selection of one or more column names in the target table. The order of column names corresponds to the order of the function parameters. See "Parameters." Optional.

This annotation enables loading a subset of the table columns. If omitted, the put function attempts to load all columns of the target table.

%oracle-property:property_name (value)

Controls various aspects of connecting to the database and writing data. You can specify multiple %oracle-property annotations. These annotations correspond to the Oracle Loader for Hadoop configuration properties. Every %oracle-property annotation has an equivalent Oracle Loader for Hadoop configuration property. "Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations" explains this relationship in detail.

The %oracle-property annotations are optional. However, the various loading scenarios require you to specify some of them or their equivalent configuration properties. For example, to load data into an Oracle database using JDBC or OCI, you must specify the target table and the connection information.

The following example specifies a target table named VISITS, an Oracle wallet containing the user name and password, and the URL connection string. Oracle recommends that you use a wallet instead of clear text when providing your credentials.

%oracle-property:targetTable('visits')
%oracle-property:connection.tns_admin=/opt/sharedir/oracle/network/admin
%oracle-property:connection.wallet_location=/opt/sharedir/oracle/wallet
%oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')

The next example show how to specify a user name and password. Use clear text credentials only in a development environment; it is not secure for a production system.

%oracle-property:connection.user('db')
%oracle-property:connection.password('password')

Parameters

$column1 [as xs:allowed_type_name[?]], [$column2 [as xs:allowed_type_name[?]],...]

Enter a parameter for each column in the same order as the Oracle table columns to load all columns, or use the %oracle:columns annotation to load selected columns.

Because the correlation between parameters and database columns is positional, the name of the parameter (column1 in the parameter syntax) is not required to match the name of the database column.

You can omit the explicit as xs:allowed_type_name type declaration for any parameter. For example, you can declare the parameter corresponding to a NUMBER column simply as $column1. In this case, the parameter is automatically assigned an XQuery type of item()*. At run time, the input value is cast to the allowed XQuery type for the corresponding table column type, as described in Table 6-3. For example, data values that are mapped to a column with a NUMBER data type are automatically cast as xs:decimal. An error is raised if the cast fails.

Alternatively, you can specify the type or its subtype for any parameter. In this case, compile-time type checking is performed. For example, you can declare a parameter corresponding to a NUMBER column as $column as xs:decimal. You can also declare it as any subtype of xs:decimal, such as xs:integer.

You can include the ? optional occurrence indicator for each specified parameter type. This indicator allows the empty sequence to be passed as a parameter value at run time, so that a null is inserted into the database table. Any occurrence indicator other than ? raises a compile-time error.

Table 6-3 describes the appropriate mappings of XQuery data types with the supported Oracle Database data types. In addition to the listed XQuery data types, you can also use the subtypes, such as xs:integer instead of xs:decimal. Oracle data types are more restrictive than XQuery data types, and these restrictions are identified in the table.

Table 6-3 Data Type Mappings Be tween Oracle Database and XQuery

Database Type	XQuery Type
`VARCHAR2`	`xs:string` Limited by the `VARCHAR2` maximum size of 4000 bytes.
`CHAR`	`xs:string` Limited by the `CHAR` maximum size of 2000 bytes.
`NVARCHAR2`	`xs:string` Limited by the `NVARCHAR2` maximum size of 4000 bytes.
`NCHAR`	`xs:string` Limited by the `NCHAR` maximum size of 2000 bytes.
`DATE`	`xs:dateTime` Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. If a time zone is specified in the `xs:dateTime` value, then the time zone information is dropped. Fractional seconds are also dropped. A time value of 24:00:00 is not valid.
`TIMESTAMP`	`xs:dateTime` Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. If a time zone is specified in the `xs:dateTime` value, then the time zone information is dropped. Fractional seconds are limited to a precision of 0 to 9 digits. A time value of 24:00:00 is not valid.
`TIMESTAMP W LOCAL TIME ZONE`	`xs:dateTime` Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. See "About Session Time Zones."
`TIMESTAMP W TIME ZONE`	`xs:dateTime` Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. See "About Session Time Zones."
`INTERVAL DAY TO SECOND`	`xs:dateTimeDuration` The day and fractional seconds are limited by a precision of 0 to 9 digits each. The hour is limited to a range of 0 to 23, and minutes and seconds are limited to a range of 0 to 59.
`INTERVAL YEAR TO MONTH`	`xs:yearMonthDuration` The year is limited by a precision of 0 to 9 digits, and the month is limited to a range of 0 to 11.
`BINARY_FLOAT`	`xs:float`
`BINARY_DOUBLE`	`xs:double`
`NUMBER`	`xs:decimal` Limited by the `NUMBER` precision of 1 to 38 decimal digits and scale of -84 to 127 decimal digits.
`FLOAT`	`xs:decimal` Limited by the `FLOAT` precision of 1 to 126 binary digits.
`RAW`	`xs:hexBinary` Limit by the `RAW` maximum size of 2000 bytes.

About Session Time Zones

If an xs:dateTime value with no time zone is loaded into TIMESTAMP W TIME ZONE or TIMESTAMP W LOCAL TIME ZONE, then the time zone is set to the value of the sessionTimeZone parameter, which defaults to the JVM time zone. Using Oracle XQuery for Hadoop, you can set the sessionTimeZone property, as described in "Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations."

Notes

With JDBC or OCI output modes, the Oracle Database Adapter loads data directly into the database table. It also creates a directory with the same name as the custom put function name, under the query output directory. For example, if your query output directory is myoutput, and your custom function is myPut, then the myoutput/myPut directory is created.

For every custom Oracle Database Adapter put function, a separate directory is created. This directory contains output produced by the Oracle Loader for Hadoop job. When you use datapump or text output modes, the data files are written to this directory. The control and SQL scripts for loading the files are written to the _olh subdirectory, such as myoutput/myPut/_olh.

For descriptions of the generated files, see "Delimited Text Output Format" and "Oracle Data Pump Output Format."

Examples of Oracle Database Adapter Functions

These examples use the following text files in HDFS. The files contain a log of visits to different web pages. Each line represents a visit to a web page and contains the time, user name, and page visited:

mydata/visits1.log  
 
2013-10-28T06:00:00, john, index.html, 200
2013-10-28T08:30:02, kelly, index.html, 200
2013-10-28T08:32:50, kelly, about.html, 200
2013-10-30T10:00:10, mike, index.html, 401

mydata/visits2.log  

2013-10-30T10:00:01, john, index.html, 200
2013-10-30T10:05:20, john, about.html, 200
2013-11-01T08:00:08, laura, index.html, 200
2013-11-04T06:12:51, kelly, index.html, 200
2013-11-04T06:12:40, kelly, contact.html, 200

The examples also use the following file in HDFS, which contains anonymous page visits:

mydata/anonvisits.log
 
2011-10-30T10:01:01, index.html, 401
2011-11-04T06:15:40, contact.html, 401

This SQL command creates the VISITS table in the Oracle database:

CREATE TABLE visits (time TIMESTAMP, name VARCHAR2(15), page VARCHAR2(15), code NUMBER)

Example 1 Loading All Columns

The first query loads all information related to the page visit (time of visit, user name, page visited, and status code) to the VISITS table. For anonymous access, the user name is missing, therefore the query specifies () to insert a null into the table. The target table name, wallet location, and connection URL are specified with %oracle-property annotations.

Oracle recommends that you use a wallet instead for security, especially in a production application. You can configure an Oracle wallet using either Oracle Loader for Hadoop properties or their equivalent %oracle-property annotations. To use clear text credentials in a development environment, see %oracle-property:property_name (value).

import module "oxh:text";
 
declare
   %oracle:put
   %oracle-property:targetTable('visits')
   %oracle-property:connection.tns_admin=/opt/sharedir/oracle/network/admin
   %oracle-property:connection.wallet_location=/opt/sharedir/oracle/wallet
   %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')
function local:myPut($c1, $c2, $c3, $c4) external;
 
for $line in text:collection("mydata/*visits*.log")
let $split := fn:tokenize($line, "\s*,\s*")
return
   if (count($split) > 3) then
      local:myPut($split[1], $split[2], $split[3], $split[4])
   else
      local:myPut($split[1], (), $split[2], $split[3])

The VISITS table contains the following data after the query runs:

TIME                           NAME            PAGE                  CODE
------------------------------ --------------- --------------- ----------
30-OCT-13 10.00.01.000000 AM   john            index.html             200
30-OCT-13 10.05.20.000000 AM   john            about.html             200
01-NOV-13 08.00.08.000000 AM   laura           index.html             200
04-NOV-13 06.12.51.000000 AM   kelly           index.html             200
04-NOV-13 06.12.40.000000 AM   kelly           contact.html           200
28-OCT-13 06.00.00.000000 AM   john            index.html             200
28-OCT-13 08.30.02.000000 AM   kelly           index.html             200
28-OCT-13 08.32.50.000000 AM   kelly           about.html             200
30-OCT-13 10.00.10.000000 AM   mike            index.html             401
30-OCT-11 10.01.01.000000 AM                   index.html             401
04-NOV-11 06.15.40.000000 AM                   contact.html           401

Example 2 Loading Selected Columns

This example uses the %oracle:columns annotation to load only the time and name columns of the table. It also loads only visits by john.

The column names specified in %oracle:columns are positionally correlated to the put function parameters. Data values provided for the $c1 parameter are loaded into the TIME column, and data values provided for the $c2 parameter are loaded into the NAME column.

import module "oxh:text";

declare
   %oracle:put
   %oracle:columns('time', 'name')
   %oracle-property:targetTable('visits')
   %oracle-property:connection.tns_admin=/opt/sharedir/oracle/network/admin
   %oracle-property:connection.wallet_location=/opt/sharedir/oracle/wallet
   %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')
function local:myPut($c1, $c2) external;
 
for $line in text:collection("mydata/*visits*.log")
let $split := fn:tokenize($line, "\s*,\s*")
where $split[2] eq 'john'
return
   local:myPut($split[1], $split[2])

If the VISITS table is empty before the query runs, then it contains the following data afterward:

TIME                           NAME            PAGE                 CODE
------------------------------ --------------- --------------- ----------
30-OCT-13 10.00.01.000000 AM   john
30-OCT-13 10.05.20.000000 AM   john
28-OCT-13 06.00.00.000000 AM   john

Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations

When you use the Oracle Database adapter of Oracle XQuery for Hadoop, you indirectly use Oracle Loader for Hadoop. Oracle Loader for Hadoop defines configuration properties that control various aspects of connecting to Oracle Database and writing data. Oracle XQuery for Hadoop supports many of these properties, which are listed in the last column of Table 6-4.

You can specify these properties with the generic -conf and -D hadoop command-line options in Oracle XQuery for Hadoop. Properties specified using this method apply to all Oracle Database adapter put functions in your query. See "Running Queries" and especially "Generic Options" for more information about the hadoop command-line options.

Alternatively, you can specify these properties as Oracle Database adapter put function annotations with the %oracle-property prefix. These annotations are listed in the second column of Table 6-4. Annotations apply only to the particular Oracle Database adapter put function that contains them in its declaration.

For example, you can set the target table to VISITS by adding the following lines to the configuration file, and identifying the configuration file with the -conf option:

<property>
   <name>oracle.hadoop.loader.targetTable</name>
   <value>visits</value>
</property>

You can also set the target table to VISITS with the -D option, using the same Oracle Loader for Hadoop property:

-D oracle.hadoop.loader.targetTable=visits

Both methods set the target table to VISITS for all Oracle Database adapter put functions in your query.

Alternatively, this annotation sets the target table to VISITS only for the particular put function that has the annotation in the declaration:

%oracle-property:connection.url('visits')

This flexibility is provided for convenience. For example, if a query has multiple Oracle Database adapter put functions, each writing to a different table in the same database, then the most convenient way to specify the necessary information is like this:

Use the oracle.hadoop.loader.connection.url property in the configuration file to specify the database connection URL. Then identify the configuration file using the -conf option. This option sets the same database connection URL for all Oracle Database adapter put functions in your query.
Set a different table name using the %oracle-property:targetTable annotation in each Oracle Database adapter put function declaration.

Table 6-4 identifies the Oracle Loader for Hadoop properties and their equivalent Oracle XQuery for Hadoop annotations by functional category. Oracle XQuery for Hadoop supports only the Oracle Loader for Hadoop properties listed in this table.

Table 6-4 Configuration Properties and Corresponding %oracle-property Annotations

Category	Property	Annotation
Connection	`oracle.hadoop.loader.connection.defaultExecuteBatch`	`%oracle-property:connection.defaultExecuteBatch`
Connection	`oracle.hadoop.loader.connection.oci_url`	`%oracle-property:connection.oci_url`
Connection	`oracle.hadoop.loader.connection.password`	`%oracle-property:connection.password`
Connection	`oracle.hadoop.loader.connection.sessionTimeZone`	`%oracle-property:connection.sessionTimeZone`
Connection	`oracle.hadoop.loader.connection.tns_admin`	`%oracle-property:connection.tns_admin`
Connection	`oracle.hadoop.loader.connection.tnsEntryName`	`%oracle-property:connection.tnsEntryName`
Connection	`oracle.hadoop.loader.connection.url`	`%oracle-property:connection.url`
Connection	`oracle.hadoop.loader.connection.user`	`%oracle-property:connection.user`
Connection	`oracle.hadoop.loader.connection.wallet_location`	`%oracle-property:connection.wallet_location`
General	`oracle.hadoop.loader.badRecordFlushInterval`	`%oracle-property:badRecordFlushInterva`l
General	`oracle.hadoop.loader.compressionFactors`	`%oracle-property:compressionFactors`
General	`oracle.hadoop.loader.enableSorting`	`%oracle-property:enableSorting`
General	`oracle.hadoop.loader.extTabDirectoryName`	`%oracle-property:extTabDirectoryName`
General	`oracle.hadoop.loader.loadByPartition`	`%oracle-property:loadByPartition`
General	`oracle.hadoop.loader.logBadRecords`	`%oracle-property:logBadRecords`
General	`oracle.hadoop.loader.rejectLimit`	`%oracle-property:rejectLimit`
General	`oracle.hadoop.loader.sortKey`	`%oracle-property:sortKey`
General	`oracle.hadoop.loader.tableMetadataFile`	`%oracle-property:tableMetadataFile`
General	`oracle.hadoop.loader.targetTable`	`%oracle-property:targetTable`
Output	`oracle.hadoop.loader.output.dirpathBufsize`	`%oracle-property:dirpathBufsize`
Output	`oracle.hadoop.loader.output.escapeEnclosers`	`%oracle-property:output.escapeEnclosers`
Output	`oracle.hadoop.loader.output.fieldTerminator`	`%oracle-property:output.fieldTerminator`
Output	`oracle.hadoop.loader.output.granuleSize`	`%oracle-property:output.granuleSize`
Output	`oracle.hadoop.loader.output.initialFieldEncloser`	`%oracle-property:output.initialFieldEncloser`
Output	`oracle.hadoop.loader.output.trailingFieldEncloser`	`%oracle-property:output.trailingFieldEncloser`
Sampler	`oracle.hadoop.loader.sampler.enableSampling`	`%oracle-property:sampler.enableSampling`
Sampler	`oracle.hadoop.loader.sampler.hintMaxSplitSize`	`%oracle-property:sampler.hintMaxSplitSize`
Sampler	`oracle.hadoop.loader.sampler.hintNumMapTasks`	`%oracle-property:sampler.hintNumMapTask`
Sampler	`oracle.hadoop.loader.sampler.loadCI`	`%oracle-property:sampler.loadCI`
Sampler	`oracle.hadoop.loader.sampler.maxHeapBytes`	`%oracle-property:sampler.maxHeapBytes`
Sampler	`oracle.hadoop.loader.sampler.maxLoadFactor`	`%oracle-property:sampler.maxLoadFactor`
Sampler	`oracle.hadoop.loader.sampler.maxSamplesPct`	`%oracle-property:sampler.maxSamplesPct`
Sampler	`oracle.hadoop.loader.sampler.minSplits`	`%oracle-property:sampler.minSplits`
Sampler	`oracle.hadoop.loader.sampler.numThreads`	`%oracle-property:sampler.numThreads`