The Oracle Database adapter provides custom functions for loading data into tables in Oracle Database.
A custom put function supported by this adapter automatically calls Oracle Loader for Hadoop at run time, either to load the data immediately or to output it to HDFS. You can declare and use multiple custom Oracle Database adapter put functions within a single query. For example, you might load data into different tables or into different Oracle databases with a single query.
Ensure that Oracle Loader for Hadoop is installed on your system, and that the OLH_HOME
environment variable is set to the installation directory. See Step 2 of "Installing Oracle XQuery for Hadoop." Although not required, you might find it helpful to familiarize yourself with Oracle Loader for Hadoop before using this adapter.
The Oracle Database adapter is described in the following topics:
See Also:
"Software Requirements" for the versions of Oracle Database that Oracle Loader for Hadoop supports
You can use the following annotations to define functions that write to tables in an Oracle database either directly or by generating binary or text files for subsequent loading with another utility, such as SQL*Loader.
Custom functions for writing to Oracle database tables must have the following signature:
declare %oracle:put(["jdbc" | "oci" | "text" | "datapump"]) [%oracle:columns(col1 [, col2...])] [%oracle-property annotations] function local:myPut($column1 [as xs:allowed_type_name[?]], [$column2 [as xs:allowed_type_name[?]], ...]) external;
Declares the put function and the output mode. Required.
The optional output_mode parameter can be one of the following string literal values:
jdbc
: Writes to an Oracle database table using a JDBC connection. Default.
oci
: Writes to an Oracle database table using an Oracle Call Interface (OCI) connection.
datapump
: Creates Data Pump files and associated scripts in HDFS for subsequent loading by another utility.
text
: Creates delimited text files and associated scripts in HDFS.
For Oracle XQuery for Hadoop to write directly to an Oracle database table using either JDBC or OCI, all systems involved in processing the query must be able to connect to the Oracle Database system. See "About the Modes of Operation."
Identifies a selection of one or more column names in the target table. The order of column names corresponds to the order of the function parameters. See "Parameters." Optional.
This annotation enables loading a subset of the table columns. If omitted, the put function attempts to load all columns of the target table.
Controls various aspects of connecting to the database and writing data. You can specify multiple %oracle-property
annotations. These annotations correspond to the Oracle Loader for Hadoop configuration properties. Every %oracle-property
annotation has an equivalent Oracle Loader for Hadoop configuration property. "Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations" explains this relationship in detail.
The %oracle-property
annotations are optional. However, the various loading scenarios require you to specify some of them or their equivalent configuration properties. For example, to load data into an Oracle database using JDBC or OCI, you must specify the target table and the connection information.
The following example specifies a target table named VISITS
, an Oracle wallet containing the user name and password, and the URL connection string. Oracle recommends that you use a wallet instead of clear text when providing your credentials.
%oracle-property:targetTable('visits') %oracle-property:connection.tns_admin=/opt/sharedir/oracle/network/admin %oracle-property:connection.wallet_location=/opt/sharedir/oracle/wallet %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com')
The next example show how to specify a user name and password. Use clear text credentials only in a development environment; it is not secure for a production system.
%oracle-property:connection.user('db')
%oracle-property:connection.password('password')
Enter a parameter for each column in the same order as the Oracle table columns to load all columns, or use the %oracle:columns
annotation to load selected columns.
Because the correlation between parameters and database columns is positional, the name of the parameter (column1 in the parameter syntax) is not required to match the name of the database column.
You can omit the explicit as xs:allowed_type_name
type declaration for any parameter. For example, you can declare the parameter corresponding to a NUMBER
column simply as $column1
. In this case, the parameter is automatically assigned an XQuery type of item()*
. At run time, the input value is cast to the allowed XQuery type for the corresponding table column type, as described in Table 6-3. For example, data values that are mapped to a column with a NUMBER
data type are automatically cast as xs:decimal
. An error is raised if the cast fails.
Alternatively, you can specify the type or its subtype for any parameter. In this case, compile-time type checking is performed. For example, you can declare a parameter corresponding to a NUMBER
column as $column as xs:decimal
. You can also declare it as any subtype of xs:decimal
, such as xs:integer
.
You can include the ?
optional occurrence indicator for each specified parameter type. This indicator allows the empty sequence to be passed as a parameter value at run time, so that a null is inserted into the database table. Any occurrence indicator other than ?
raises a compile-time error.
Table 6-3 describes the appropriate mappings of XQuery data types with the supported Oracle Database data types. In addition to the listed XQuery data types, you can also use the subtypes, such as xs:integer
instead of xs:decimal
. Oracle data types are more restrictive than XQuery data types, and these restrictions are identified in the table.
Table 6-3 Data Type Mappings Between Oracle Database and XQuery
Database Type | XQuery Type |
---|---|
|
Limited by the |
|
Limited by the |
|
Limited by the |
|
Limited by the |
|
Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. If a time zone is specified in the |
|
Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. If a time zone is specified in the |
|
Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. |
|
Limited to the range of January 1, 4712 BC, to December 31, 9999 CE. In the offset from UTC, the time-zone hour field is limited to -12:00 to 14:00. Fractional seconds are limited to a precision of 0 to 9 digits. |
|
The day and fractional seconds are limited by a precision of 0 to 9 digits each. The hour is limited to a range of 0 to 23, and minutes and seconds are limited to a range of 0 to 59. |
|
The year is limited by a precision of 0 to 9 digits, and the month is limited to a range of 0 to 11. |
|
|
|
|
|
Limited by the |
|
Limited by the |
|
Limit by the |
If an xs:dateTime
value with no time zone is loaded into TIMESTAMP W TIME ZONE
or TIMESTAMP W LOCAL TIME ZONE
, then the time zone is set to the value of the sessionTimeZone
parameter, which defaults to the JVM time zone. Using Oracle XQuery for Hadoop, you can set the sessionTimeZone
property, as described in "Oracle Loader for Hadoop Configuration Properties and Corresponding %oracle-property Annotations."
With JDBC or OCI output modes, the Oracle Database Adapter loads data directly into the database table. It also creates a directory with the same name as the custom put
function name, under the query output directory. For example, if your query output directory is myoutput, and your custom function is myPut
, then the myoutput/myPut
directory is created.
For every custom Oracle Database Adapter put function, a separate directory is created. This directory contains output produced by the Oracle Loader for Hadoop job. When you use datapump
or text
output modes, the data files are written to this directory. The control and SQL scripts for loading the files are written to the _olh
subdirectory, such as myoutput/myPut/_olh
.
For descriptions of the generated files, see "Delimited Text Output Format" and "Oracle Data Pump Output Format."
These examples use the following text files in HDFS. The files contain a log of visits to different web pages. Each line represents a visit to a web page and contains the time, user name, and page visited:
mydata/visits1.log 2013-10-28T06:00:00, john, index.html, 200 2013-10-28T08:30:02, kelly, index.html, 200 2013-10-28T08:32:50, kelly, about.html, 200 2013-10-30T10:00:10, mike, index.html, 401 mydata/visits2.log 2013-10-30T10:00:01, john, index.html, 200 2013-10-30T10:05:20, john, about.html, 200 2013-11-01T08:00:08, laura, index.html, 200 2013-11-04T06:12:51, kelly, index.html, 200 2013-11-04T06:12:40, kelly, contact.html, 200
The examples also use the following file in HDFS, which contains anonymous page visits:
mydata/anonvisits.log 2011-10-30T10:01:01, index.html, 401 2011-11-04T06:15:40, contact.html, 401
This SQL command creates the VISITS
table in the Oracle database:
CREATE TABLE visits (time TIMESTAMP, name VARCHAR2(15), page VARCHAR2(15), code NUMBER)
The first query loads all information related to the page visit (time of visit, user name, page visited, and status code) to the VISITS
table. For anonymous access, the user name is missing, therefore the query specifies ()
to insert a null into the table. The target table name, wallet location, and connection URL are specified with %oracle-property
annotations.
Oracle recommends that you use a wallet instead for security, especially in a production application. You can configure an Oracle wallet using either Oracle Loader for Hadoop properties or their equivalent %oracle-property
annotations. To use clear text credentials in a development environment, see %oracle-property:property_name (value)
.
import module "oxh:text"; declare %oracle:put %oracle-property:targetTable('visits') %oracle-property:connection.tns_admin=/opt/sharedir/oracle/network/admin %oracle-property:connection.wallet_location=/opt/sharedir/oracle/wallet %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com') function local:myPut($c1, $c2, $c3, $c4) external; for $line in text:collection("mydata/*visits*.log") let $split := fn:tokenize($line, "\s*,\s*") return if (count($split) > 3) then local:myPut($split[1], $split[2], $split[3], $split[4]) else local:myPut($split[1], (), $split[2], $split[3])
The VISITS
table contains the following data after the query runs:
TIME NAME PAGE CODE ------------------------------ --------------- --------------- ---------- 30-OCT-13 10.00.01.000000 AM john index.html 200 30-OCT-13 10.05.20.000000 AM john about.html 200 01-NOV-13 08.00.08.000000 AM laura index.html 200 04-NOV-13 06.12.51.000000 AM kelly index.html 200 04-NOV-13 06.12.40.000000 AM kelly contact.html 200 28-OCT-13 06.00.00.000000 AM john index.html 200 28-OCT-13 08.30.02.000000 AM kelly index.html 200 28-OCT-13 08.32.50.000000 AM kelly about.html 200 30-OCT-13 10.00.10.000000 AM mike index.html 401 30-OCT-11 10.01.01.000000 AM index.html 401 04-NOV-11 06.15.40.000000 AM contact.html 401
This example uses the %oracle:columns
annotation to load only the time
and name
columns of the table. It also loads only visits by john
.
The column names specified in %oracle:columns
are positionally correlated to the put function parameters. Data values provided for the $c1
parameter are loaded into the TIME
column, and data values provided for the $c2
parameter are loaded into the NAME
column.
import module "oxh:text"; declare %oracle:put %oracle:columns('time', 'name') %oracle-property:targetTable('visits') %oracle-property:connection.tns_admin=/opt/sharedir/oracle/network/admin %oracle-property:connection.wallet_location=/opt/sharedir/oracle/wallet %oracle-property:connection.url('jdbc:oracle:thin:@//localhost:1521/orcl.example.com') function local:myPut($c1, $c2) external; for $line in text:collection("mydata/*visits*.log") let $split := fn:tokenize($line, "\s*,\s*") where $split[2] eq 'john' return local:myPut($split[1], $split[2])
If the VISITS
table is empty before the query runs, then it contains the following data afterward:
TIME NAME PAGE CODE ------------------------------ --------------- --------------- ---------- 30-OCT-13 10.00.01.000000 AM john 30-OCT-13 10.05.20.000000 AM john 28-OCT-13 06.00.00.000000 AM john
When you use the Oracle Database adapter of Oracle XQuery for Hadoop, you indirectly use Oracle Loader for Hadoop. Oracle Loader for Hadoop defines configuration properties that control various aspects of connecting to Oracle Database and writing data. Oracle XQuery for Hadoop supports many of these properties, which are listed in the last column of Table 6-4.
You can specify these properties with the generic -conf
and -D
hadoop
command-line options in Oracle XQuery for Hadoop. Properties specified using this method apply to all Oracle Database adapter put functions in your query. See "Running Queries" and especially "Generic Options" for more information about the hadoop
command-line options.
Alternatively, you can specify these properties as Oracle Database adapter put function annotations with the %oracle-property
prefix. These annotations are listed in the second column of Table 6-4. Annotations apply only to the particular Oracle Database adapter put function that contains them in its declaration.
For example, you can set the target table to VISITS
by adding the following lines to the configuration file, and identifying the configuration file with the -conf
option:
<property> <name>oracle.hadoop.loader.targetTable</name> <value>visits</value> </property>
You can also set the target table to VISITS
with the -D
option, using the same Oracle Loader for Hadoop property:
-D oracle.hadoop.loader.targetTable=visits
Both methods set the target table to VISITS
for all Oracle Database adapter put functions in your query.
Alternatively, this annotation sets the target table to VISITS
only for the particular put function that has the annotation in the declaration:
%oracle-property:connection.url('visits')
This flexibility is provided for convenience. For example, if a query has multiple Oracle Database adapter put functions, each writing to a different table in the same database, then the most convenient way to specify the necessary information is like this:
Use the oracle.hadoop.loader.connection.url
property in the configuration file to specify the database connection URL. Then identify the configuration file using the -conf
option. This option sets the same database connection URL for all Oracle Database adapter put functions in your query.
Set a different table name using the %oracle-property:targetTable
annotation in each Oracle Database adapter put function declaration.
Table 6-4 identifies the Oracle Loader for Hadoop properties and their equivalent Oracle XQuery for Hadoop annotations by functional category. Oracle XQuery for Hadoop supports only the Oracle Loader for Hadoop properties listed in this table.
Table 6-4 Configuration Properties and Corresponding %oracle-property Annotations
Category | Property | Annotation |
---|---|---|
Connection |
|
|
Connection |
|
|
Connection |
|
|
Connection |
|
|
Connection |
|
|
Connection |
|
|
Connection |
|
|
Connection |
|
|
Connection |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
|
|
General |
||
Output |
|
|
Output |
|
|
Output |
|
|
Output |
|
|
Output |
|
|
Output |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|
|
Sampler |
|