The Oracle Database adapter provides custom functions for loading data into tables in Oracle Database.
A custom put
function supported by this adapter automatically calls Oracle Loader for Hadoop at run time, either to load the data immediately or to output it to HDFS. You can declare and use multiple custom Oracle Database adapter put
functions within a single query. For example, you might load data into different tables or into different Oracle databases with a single query.
Ensure that Oracle Loader for Hadoop is installed on your system, and that the OLH_HOME
environment variable is set to the installation directory. See Steps 1 to 3 "Installing Oracle Loader for Hadoop." Although not required, you might find it helpful to familiarize yourself with Oracle Loader for Hadoop before using this adapter.
The Oracle Database adapter is described in the following topics:
See Also:
"Software Requirements" for the supported versions of Oracle Database
You can use the following annotations to define functions that write to tables in an Oracle database either directly or by generating binary or text files for subsequent loading with another utility, such as SQL*Loader.
Custom functions for writing to Oracle database tables must have the following signature:
declare %oracle:put(["jdbc" | "oci" | "text" | "datapump"]) [%oracle:columns(col1 [, col2...])] [%oracle-property annotations] function local:myPut($column1 [as xs:allowed_type_name[?]], [$column2 [as xs:allowed_type_name[?]], ...]) external;
Declares the put
function and the output mode. Required.
The optional output_mode parameter can be one of the following string literal values:
jdbc
: Writes to an Oracle database table using a JDBC connection. Default.
oci
: Writes to an Oracle database table using an Oracle Call Interface (OCI) connection.
datapump
: Creates Data Pump files and associated scripts in HDFS for subsequent loading with another utility.
text
: Creates delimited text files and associated scripts in HDFS
To write directly to an Oracle database table using either JDBC or OCI, all systems involved in processing the query must be able to connect to the Oracle Database system. See "About the Modes of Operation."
Identifies a selection of one or more column names in the target table. The order of column names corresponds to the order of the function parameters. See "Parameters." Optional.
This annotation supports loading a subset of the table columns. If omitted, the put function attempts to load all columns of the target table.
Controls various aspects of connecting to the database and writing data. See "%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties" for descriptions of the supported properties. You can specify multiple %oracle-property
annotations.
These annotations correspond to the Oracle Loader for Hadoop configuration properties. Every %oracle-property
annotation has an equivalent Oracle Loader for Hadoop configuration property, which you can specify in a configuration file passed to Hadoop with the generic -conf
option. See "Running a Query."
The %oracle-property
annotations are optional. However, the various loading scenarios require you to specify some of them or their equivalent configuration properties. For example, to load data into an Oracle database using JDBC or OCI, you must specify the target table and the connection information.
The following example specifies a target table named VISITS
, a user name of db
, a password of password
, and the URL connection string:
%oracle-property:targetTable('visits') %oracle-property:connection.user('db') %oracle-property:connection.password('password') %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')
Enter a parameter for each column in the same order as the Oracle table columns to load all columns, or use the %oracle:columns
annotation to load selected columns.
Because the correlation between parameters and database columns is positional, the name of the parameter (column1 in the parameter syntax) is not required to match the name of the database column.
You can omit the explicit as xs:allowed_type_name
type declaration for any parameter. For example, you can declare the parameter corresponding to a NUMBER
column simply as $column1
. In this case, the parameter is automatically assigned an XQuery type of item()*
. At run time, the input value is cast to the allowed XQuery type for the corresponding table column type, as described in Table 6-2. For example, data values that are mapped to a column with a NUMBER
data type are automatically cast as xs:decimal
. An error is raised if the cast fails.
Alternatively, you can specify the type or its subtype for any parameter. In this case, compile-time type checking is performed. For example, you can declare a parameter corresponding to a NUMBER
column as $column as xs:decimal
. You can also declare it as any subtype of xs:decimal
, such as xs:integer
.
You can include the ?
optional occurrence indicator for each specified parameter type. This indicator allows the empty sequence to be passed as a parameter value at run time, so that a null is inserted into the database table. Any occurrence indicator other than ?
raises a compile-time error.
Table 6-2 describes the appropriate mappings of XQuery data types with the supported Oracle Database data types. In addition to the listed XQuery data types, you can also use the subtypes, such as xs:integer
instead of xs:decimal
. Oracle data types are more restrictive than XQuery data types, and these restrictions are identified in the table.
Table 6-2 Data Type Mappings Between Oracle Database and XQuery
Database Type | XQuery Type |
---|---|
|
Limited by the |
|
Limited by the |
|
Limited by the |
|
Limited by the |
|
Limited to the range of January 1, 4712 BC to December 31, 9999 CE. If a time zone is specified in the |
|
Limited to the range of January 1, 4712 BC to December 31, 9999 CE. If a time zone is specified in the |
|
Limited to the range of January 1, 4712 BC to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. |
|
Limited to the range of January 1, 4712 BC to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. |
|
The day and fractional seconds are limited by a precision of 0 to 9 digits each. The hour is limited to a range of 0 to 23, and minutes and seconds are limited to a range of 0 to 59. |
|
The year is limited by a precision of 0 to 9 digits, and the month is limited to a range of 0 to 11. |
|
|
|
|
|
Limited by the |
|
Limited by the |
|
Limit by the |
If an xs:dateTime
value with no time zone is loaded into TIMESTAMP W TIME ZONE
or TIMESTAMP W LOCAL TIME ZONE
, then the time zone is set to the value of the sessionTimeZone
parameter, which defaults to the JVM time zone. Using Oracle XQuery for Hadoop, you can set the sessionTimeZone
property, as described in "%oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties."
With JDBC or OCI output modes, the Oracle Database Adapter loads data directly into the database table. It also creates a directory with the same name as the custom put
function name, under the query output directory. For example, if your query output directory is myoutput, and your custom function is myPut
, then the myoutput/myPut directory is created.
For every custom Oracle Database Adapter put
function, a separate directory is created. This directory contains output produced by the Oracle Loader for Hadoop job. When you use datapump
or text
output modes, the data files are written to this directory. The control and SQL scripts for loading the files are written to the _olh
subdirectory, such as myoutput/myPut/_olh.
For descriptions of the generated files, see "Delimited Text Output Format" and "Oracle Data Pump Output Format."
Examples of Oracle Database Adapter Functions
These examples use following text files in HDFS. The files contain a log of visits to different web pages. Each line represents a visit to a web page and contains the time, user name, and page visited.
mydata/visits1.log 2013-10-28T06:00:00, john, index.html, 200 2013-10-28T08:30:02, kelly, index.html, 200 2013-10-28T08:32:50, kelly, about.html, 200 2013-10-30T10:00:10, mike, index.html, 401 mydata/visits2.log 2013-10-30T10:00:01, john, index.html, 200 2013-10-30T10:05:20, john, about.html, 200 2013-11-01T08:00:08, laura, index.html, 200 2013-11-04T06:12:51, kelly, index.html, 200 2013-11-04T06:12:40, kelly, contact.html, 200
The examples also uses the following file in HDFS, which contains anonymous page visits.
mydata/anonvisits.log 2011-10-30T10:01:01, index.html, 401 2011-11-04T06:15:40, contact.html, 401
The VISITS
table in the Oracle database is created with this SQL command:
CREATE TABLE visits (time TIMESTAMP, name VARCHAR2(15), page VARCHAR2(15), code NUMBER)
The first query loads all information related to the page visit (time of visit, user name, page visited, and status code) to the VISITS
table. In case of anonymous access, the user name is missing, so the query specifies ()
to insert a null into the table. The target table name, user name, password, and connection URL are specified with %oracle-property
annotations.
While the example uses a clear-text user name and password, Oracle recommends that you use a wallet instead for security. You can configure an Oracle wallet using either Oracle Loader for Hadoop properties or their equivalent %oracle-property
annotations. The specific properties that you must set are described in "Providing the Connection Details for Online Database Mode."
import module "oxh:text"; declare %oracle:put %oracle-property:targetTable('visits') %oracle-property:connection.user('db') %oracle-property:connection.password('password') %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com') function local:myPut($c1, $c2, $c3, $c4) external; for $line in text:collection("*visits*.log") let $split := fn:tokenize($line, "\s*,\s*") return if (count($split) > 3) then local:myPut(xs:dateTime($split[1]), $split[2], $split[3], $split[4]) else local:myPut(xs:dateTime($split[1]), (), $split[2], $split[3])
The VISITS
table contains the following data after the query runs:
TIME NAME PAGE CODE ------------------------------ --------------- --------------- ---------- 30-OCT-13 10.00.01.000000 AM john index.html 200 30-OCT-13 10.05.20.000000 AM john about.html 200 01-NOV-13 08.00.08.000000 AM laura index.html 200 04-NOV-13 06.12.51.000000 AM kelly index.html 200 04-NOV-13 06.12.40.000000 AM kelly contact.html 200 28-OCT-13 06.00.00.000000 AM john index.html 200 28-OCT-13 08.30.02.000000 AM kelly index.html 200 28-OCT-13 08.32.50.000000 AM kelly about.html 200 30-OCT-13 10.00.10.000000 AM mike index.html 401 30-OCT-11 10.01.01.000000 AM index.html 401 04-NOV-11 06.15.40.000000 AM contact.html 401
This example uses the %oracle:columns
annotation to load only the time and name columns of the table. It also loads only visits by john
.
The column names specified in %oracle:columns
are positionally correlated to the put function parameters. Data values provided for the $c1
parameter are loaded into the TIME
column, and data values provided for the $c2
parameter are loaded into the NAME
column.
import module "oxh:text"; declare %oracle:put %oracle:columns('time', 'name') %oracle-property:targetTable('visits') %oracle-property:connection.user('db') %oracle-property:connection.password('password') %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com') function local:myPut($c1, $c2) external; for $line in text:collection("*visits*.log") let $split := fn:tokenize($line, "\s*,\s*") where $split[2] eq 'john' return local:myPut(xs:dateTime($split[1]), $split[2])
If the VISITS
table is empty before the query runs, then it contains the following data afterward:
TIME NAME PAGE CODE ------------------------------ --------------- --------------- ---------- 30-OCT-13 10.00.01.000000 AM john 30-OCT-13 10.05.20.000000 AM john 28-OCT-13 06.00.00.000000 AM john
When you use the Oracle Database adapter of Oracle XQuery for Hadoop, you indirectly use Oracle Loader for Hadoop. Oracle Loader for Hadoop defines configuration properties that control various aspects of connecting to Oracle Database and writing data. Oracle XQuery for Hadoop supports many of these properties. All supported properties are listed in Table 6-3.
In Oracle XQuery for Hadoop, you can specify these properties with the generic -conf
and -D
Hadoop command-line options, just like you specify them when you use Oracle Loader for Hadoop directly. Alternatively, you can specify these properties as custom put
function annotations with the %oracle-property
prefix.
When you specify a property with the Hadoop options, it applies to all custom Oracle Database Adapter put functions in your query. When you specify it as an %oracle-property
annotation, it applies only to the particular custom put function that has this annotation.
For example, this annotation sets the target table to VISITS
only for the particular put
function that has the annotation in the declaration.
%oracle-property:connection.url('visits')
Alternatively, you can specify the table with -D
by using the corresponding property name. This property sets the target table to VISITS
for all Oracle Database adapter put
functions in your query,
-D oracle.hadoop.loader.targetTable=visits
This flexibility is provided for convenience. For example, if a query has multiple put
functions, each writing to a different table in the same database, you might specify the properties like this:
The database connection URL in the oracle.hadoop.loader.connection.url
property in the configuration file. Then identify the configuration file using the -conf
option, thereby applying it to all put functions in the query.
The different table names using the %oracle-property:targetTable
annotation in each put
function.
If you already use Oracle Loader for Hadoop and have a configuration file, you can simply copy-and-paste these properties into the Oracle XQuery for Hadoop configuration file and identify it to Hadoop with the -conf
option.
Table 6-3 identifies the annotations and their equivalent Oracle Loader for Hadoop properties. Oracle XQuery for Hadoop only supports the Oracle Loader for Hadoop properties listed in this table.
Table 6-3 %oracle-property Annotations and Corresponding Oracle Loader for Hadoop Configuration Properties
Annotation | Property |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|