Package oracle.kv.exttab
Generally, the Oracle Database External Table feature allows a table to be
backed by one or more flat files (e.g. text or CSV), also known as
"location" files. When a SQL SELECT
statement is executed
against an External Table, the Oracle Database server reads the location
files, parses them according to the ACCESS PARAMETERS
specified in the External Table CREATE TABLE
definition, and
materializes them as rows in the External Table.
Any External Table definition can specify a preprocessor program which is invoked by the Oracle Database server during the read process. The preprocessor is passed the name of the location file as a command line argument. Its contract is to read the data relevant to the location file and write the records to stdout where the Oracle Database reads, parses, and materializes the records as rows in the table.
To use the Oracle Database External Table feature to read Oracle NoSQL Database data, you must use the
<KVHOME>/exttab/bin/nosql_streampreprocessor to populate your Oracle tables with the data. The remainder of this document describes how to configure an Oracle database with External Tables so that you can read data from an Oracle NoSQL Database.
Using External Tables to perform write operations against Oracle NoSQL Database is not supported.
Usage Overview
To read data from Oracle NoSQL Database into an Oracle Database External Table, you define an External Table with one or more location files; the location files will not actually contain any data, but instead contain configuration information related to connecting to the Oracle Database and Oracle NoSQL Database, query restrictions, and formatting. Specifying multiple location files in the External Table definition allows for a degree of parallelism during data reading.
Once the External Table is defined, you run the Publish
utility to "publish" information about the
External Table and how to access the data in Oracle NoSQL Database. This
information is written as an XML document into each of the location
file(s).
Note that you only have to run the publish utility once. After that, you only need to run the publish utility again if you want to change the way the NoSQL Database is accessed (for example, using a different key prefix, or if you change the port or the rep nodes where you access the database).
When the publish step is completed, a SQL SELECT
may be
executed against the External Table. This will cause
the <KVHOME>/exttab/bin/nosql_stream
script to be invoked,
which in turn causes the Preproc
class to be
invoked. The preprocessor does the following:
- Reads the configuration and NoSQL Database access information from the location file.
- Reads the data from the NoSQL Database to load records from the store.
-
Formats the data using either a user defined
formatter
or a default format. If loading rows from a table, the data is formatted using either a user definedtableFormatter
, or a default table format. - Writes it to stdout.
An External Table "cookbook" can be found in
the <KVHOME>/examples/externaltables
directory.
See the External Tables Cookbook
for more information.
External Table Configuration
Before you can run the publish utility, you must configure the Oracle Database External Table. You must also grant various permissions to any Oracle Database user who needs to access it.-
Declare two directories to the Oracle database. One will hold the
External Table location files and one which will refer to the
<KVHOME>/exttab/bin/
directory where thenosql_stream
script resides. For example, the following commands might be used to specify these two directories to Oracle Database:SQL> CONNECT / AS sysdba; SQL> CREATE DIRECTORY ext_tab AS '/data'; SQL> CREATE DIRECTORY nosql_bin_dir AS '/export/home/kvhome/exttab/bin';
NOTE: Consult with your Oracle DBA regarding the appropriate locations and permissions for these directories. It may be desirable from a security point of view to copy the
nosql_stream
script from the<KVHOME>/exttab/bin/
directory to some other location. -
Grant the appropriate permissions to any Oracle Database users who
will be accessing the External Table. For example, if a new user
"nosqluser" will be accessing an External Table, the following
commands might be used to create the user and grant appropriate
permissions:
SQL> CONNECT / AS sysdba; SQL> CREATE USER nosqluser IDENTIFIED BY password; SQL> GRANT CREATE SESSION TO nosqluser; SQL> GRANT EXECUTE ON SYS.UTL_FILE TO nosqluser; SQL> GRANT READ, WRITE ON DIRECTORY ext_tab TO nosqluser; SQL> GRANT READ, EXECUTE ON DIRECTORY nosql_bin_dir TO nosqluser; SQL> GRANT CREATE TABLE TO nosqluser;
-
Create the External Table, specifying one or more location files
and the
nosql_stream
preprocessor. For example:SQL> CONNECT nosqluser/password SQL> CREATE TABLE [schema.]tableName (column datatype, ...)) 2 ORGANIZATION EXTERNAL 3 (TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY ext_tab 5 ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE 6 PREPROCESSOR nosql_bin_dir:'nosql_stream' 7 FIELDS TERMINATED BY 'delimiter' missing field values are null) 8 LOCATION ('filename','filename', ...)) 9 REJECT LIMIT UNLIMITED 10 PARALLEL nn;
where nn is the number of files you specified for LOCATION on line 8.
Publish Utility Configuration
Before you can run thepublish
utility,
you must create a configuration file containing parameters regarding:
- Connecting to the Oracle Database.
- The External Table name.
- Connecting to the Oracle NoSQL Database.
- The NoSQL Database name.
- Various NoSQL Database record and row selection restrictions.
- The name of a Formatter class.
<config version="1"> <component name="publish" type="params"> <property name="..." value="..." type="..."/> <property name="..." value="..." type="..."/> ... </component> <component name="nosql_stream" type="params"> <property name="..." value="..." type="..."/> <property name="..." value="..." type="..."/> ... </component> <component name="nosql_table" type="params"> <property name="..." value="..." type="..."/> <property name="..." value="..." type="..."/> ... </component> </config>Only one, not both, of the
nosql_stream
and
the nosql_table
sections can be configured. The section nosql_stream
is used to configure Oracle NoSQL Database to read (Key,Value) pairs from store
into Oracle External Table. The nosql_table
section is used to
configure Oracle NoSQL Database tables so that rows can be read from the
table, then loaded into the Oracle External Table. Specifying both sections
results in a failure to publish location files to Oracle Database.
Notice that there are two major sections in the file: 1) the
publish
section and 2) either the nosql_stream
section or
nosql_table
section.
In the publish
section, the following parameters are required:
-
oracle.kv.exttab.connection.url
: The connection URL used by the Publish utility to connect to the Oracle instance containing the External Table. This property overrides all other connection properties. The connecting database user must have the privileges described above.If you are using a Wallet as an external password store, then the property value must have this form:
jdbc:oracle:thin:@db_connect_string
Thedb_connect_string
must exactly-match the credential in the wallet. This example uses Oracle Net Services syntax:jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
This example uses a TNSNAMES entry:jdbc:oracle:thin:@my_tns_entry
Also seeoracle.kv.exttab.connection.wallet_location
, which is described later in this page.If you are not using an Oracle wallet, then one of the following URL connection styles is used:
-
Thin Connection
jdbc:oracle:thin:@//myhost:1521/my_db_service_name
-
Oracle Net Services
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
- TNS Entry Name
jdbc:oracle:thin:@myTNSEntryName
-
Thin Connection
-
oracle.kv.exttab.connection.user
: required when not using a wallet for connections. In this case, the user will be prompted for a password. -
oracle.kv.exttab.tableName
: The schema-qualified table name of the external table in the format:schemaName.tableName
.
publish
section, the following parameters are optional:
-
oracle.kv.exttab.connection.user
: An Oracle user with appropriate privileges. Optional only when you are using a wallet for a connection. -
oracle.kv.exttab.connection.tns_admin
: File path to a directory containing SQL*Net configuration files, such as sqlnet.ora and tnsnames.ora. The value of the TNS_ADMIN environment variable is used for this property by default. Define this property to use TNS entry names in database connect strings. -
oracle.kv.exttab.connection.tnsEntryName
: A TNS entry name defined in the tnsnames.ora file. This property is used withoracle.kv.exttab.connection.tns_admin
. -
oracle.kv.exttab.connection.wallet_location
: Pathname of an Oracle wallet where the connection information is stored. When using Oracle wallet as an external password store, set the following additional properties.-
For a URL connection:
oracle.kv.exttab.connection.wallet_location
oracle.kv.exttab.connection.url
-
For TNS names:
oracle.kv.exttab.connection.wallet_location
oracle.kv.exttab.connection.tns_admin
oracle.kv.exttab.connection.tnsEntryName
-
For a URL connection:
nosql_stream
section, the following parameters are
required:
-
oracle.kv.kvstore
: The name of the NoSQL Database. -
oracle.kv.hosts
: One or morehostname:port
pairs, separated by commas. The utility uses thesehostname:port
pairs to contact Rep Nodes and obtain information about the organization of the NoSQL Database. Note that there are no restrictions on which online Rep Nodes in the store can be identified here; all online Rep Nodes are equally valid candidates.
nosql_stream
section, the following parameters are
optional:
-
oracle.kv.direction
: The direction to read for each of the partitions. This value must be "forward". "reverse" and "unordered" may be supported in the future. Presently this parameter is not meaningful, but is included for completeness. -
oracle.kv.batchSize
: The number of records to fetch during each round trip. -
oracle.kv.parentKey
: The parent key whose child KV pairs are to be fetched, specified in the canonical key format described inKey.toString()
. -
oracle.kv.subRange
: Used to further restrict the range under theparentKey
. This parameter is specified in the canonicalKeyRange.toString()
format. -
oracle.kv.depth
: One ofCHILDREN_ONLY
,PARENT_AND_CHILDREN
,DESCENDANTS_ONLY
, orPARENT_AND_DESCENDANTS
. -
oracle.kv.consistency
: One ofNONE_REQUIRED
,NONE_REQUIRED_NO_MASTER
,ABSOLUTE
, orTIME(permissibleLag, timeout)
, wherepermissibleLag
andtimeout
are a value and aTimeUnit
pair (e.g. "10 secs"). -
oracle.kv.timeout
: The timeout to use for querying the NoSQL Database. This is in timeout format (e.g. "10 seconds"). -
oracle.kv.exttbl.stream.keyDelimiter
: If present, the value is a single character. It indicates that:- Each record's key should be written out in the canonical Key format, and
- the character to be written between the Key and the Value.
-
oracle.kv.exttbl.stream.variableSizeBytes
: If this parameter is 0 or not specified, ororacle.kv.formatterClass
is not specified, then records are output in Oracle Data Loader "fixed" format. If this parameter is non-0, it specifies the number of prefix bytes per the Oracle Data Loader format described in External Tables Access Parameters. If both this andoracle.kv.formatterClass
are specified, then an exception is thrown when the Publish utility is run. -
oracle.kv.formatterClass
: the name of a class which can format key/value pairs to an appropriate format for the Oracle Loader. This parameter may not be specified iforacle.kv.exttbl.stream.variableSizeBytes
is also specified.
nosql_table
section, the following parameters are
required:
-
oracle.kv.kvstore
: The name of the NoSQL Database. -
oracle.kv.hosts
: One or morehostname:port
pairs, separated by commas. Eachhostname:port
pair represents a single Replication Node in the Oracle NoSQL Database store. The utility uses thesehostname:port
pairs to contact the Rep Nodes and obtain information about the organization of the NoSQL Database. Note that there are no restrictions on which online Rep Nodes in the store can be identified here; all online Rep Nodes are equally valid candidates. -
oracle.kv.table
: the Oracle NoSQL Database table from which rows will be read and formatted before they can be consumed by Oracle External Table.
nosql_table
section, the following parameters are
optional:
-
oracle.kv.consistency
: One ofNONE_REQUIRED
,NONE_REQUIRED_NO_MASTER
,ABSOLUTE
, orTIME(permissibleLag, timeout)
, wherepermissibleLag
andtimeout
are a value and aTimeUnit
pair (e.g. "10 secs"). -
oracle.kv.timeout
: The timeout to use for querying the NoSQL Database. This is in timeout format (e.g. "10 seconds"). -
oracle.kv.batchSize
: The number of rows to fetch during each round trip. -
oracle.kv.formatterClass
: the name of a class which can format each row to an appropriate format for the Oracle Loader.
The Publish Utility
ThePublish Utility
:
- Accepts the parameters described above in an XML config file.
- Opens a connection to the Oracle Database instance.
-
Reads the schema information for the specified External Table and
extracts the names of External Table files from the schema using
the
ALL_EXTERNAL_LOCATIONS
view. - Opens a connection to the NoSQL Database to verify correctness of the parameters.
- Writes the relative number of the External Table file into the file.
- Writes information about how to contact the Oracle NoSQL DB Rep Nodes (the "publish" component parameters) into the External Table files.
- Writes information about how to restrict the query on each of the Oracle NoSQL DB Rep Nodes (the "nosql_stream" component parameters) into the External Table files.
The Publish Utility accepts the following command line parameters:
-
-noexecute
: Validate the parameters and test the connections to Oracle Database and Oracle NoSQL DB, but do not publish (write) to the External Table files. Either-noexecute
or-publish
must be specified. -
-publish
: The opposite of-noexecute
. Publish (write) the relevant information to the External Table files. Either-noexecute
or-publish
must be specified. -
-verbose
: Be verbose during execution. Useful for debugging. -
-config <file>
: Specifies an XML config file in the format shown above.
Since the Publish Utility connects to the Oracle Database using JDBC,
when you invoke it, you must include
$ORACLE_HOME/jdbc/lib/ojdbc6.jar
in the classpath. If you are using
the Oracle wallet as an external password store, then you must also include
$ORACLE_HOME/jlib/oraclepki.jar
.
The Preprocessor (nosql_stream
)
The preprocessor is invoked by the Oracle Database instance when a
SQL SELECT
statement is executed on the Oracle Database
server. The Oracle Database server invokes the preprocessor once for
each External Table location file. Therefore, the number of location
files determines the level of parallelism when retrieving records. The
preprocessor(s) are invoked on the same machine that the Oracle
Database instance is running.
In general, the user does not invoke this utility directly (except
perhaps for debugging) as that will be done by Oracle Database when it
invokes the nosql_stream
script which is specified as the
PREPROCESSOR
program in the External Table declaration.
The nosql_stream
Preprocessor
performs the following operations:
- Takes the name of the External Table location file on which it will operate (passed to it by Oracle Database as a command line argument), opens that file and reads the configuration XML document written by the Publish utility.
- Opens a connection to the Oracle NoSQL Database and reads the topology.
- Determines the number of shards, and then based on a simple partitioning of the shards based on the file number, determines which shards it will operate on.
- If load (Key, Value) pairs from the Oracle NoSQL Database store, generates a list of partitions on each target shard. For loading rows from Oracle NoSQL Database table, this step is skipped.
- Reads the records from each of the target partitions, or reads the rows from the specific Oracle NoSQL Database table.
- Writes those records or rows to stdout for reading by the Oracle Database instance.
-
If
oracle.kv.exttbl.stream.keyDelimiter
is specified, the key is written to stdout using the canonical Key format, followed by the keyDelimiter, followed by the value, followed by newline. -
When loading (Key, Value) from the Oracle NoSQL Database store, and if
no Formatter class is specified, values are formatted by calling
new String()
. -
When loading (Key, Value) from the Oracle NoSQL Database store, and if
a Formatter class is specified, then
oracle.kv.exttbl.stream.keyDelimiter
is ignored. - When loading from Oracle NoSQL Database tables, the complete row will be converted to a CSV-formatted string and written to stdout if no Formatter class is specified for the table, otherwise, the row will be formatted by the specified Formatter class.
-
nosql_stream
is a shell script which requires that thePATH
,KVHOME
andCLASSPATH
be set correctly. You should edit this script to include appropriate settings. Be sure that it has execute permission when you are finished editing.
Formatters
For many users, it may not be possible to use the External TableACCESS PARAMETERS
to sufficiently map
NoSQL Database records to Oracle Database records. Formatters
(or TableFormatters
)
allow you to provide a class
which accepts a record
(or
row
) and returns a
String which can be mapped to the External Table fields. The Formatter.toOracleLoaderFormat(oracle.kv.KeyValueVersion, oracle.kv.KVStore)
(or TableFormatter.toOracleLoaderFormat(oracle.kv.table.Row, oracle.kv.KVStore)
) class is invoked once per record by the
preprocessor. If toOracleLoaderFormat()
returns null,
then that record is ignored and not imported to the External
Table. This provides a user-programmable filtering mechanism.
Exceptions
EverySELECT
statement against an External Table writes a
log to the file specified by
the
LOGFILE clause of the External Table definition (the default
logfile is <table_name>_<pid>.log in the External Table
location file directory). The BADFILE
clause names the
file to which records are written when they cannot be loaded because
of errors. For example, a record is written to the bad file when a
field in the datafile could not be converted to the datatype of a
column in the External Table.
If the preprocessor writes anything to stderr, the SELECT
fails and the output of the exception is written to the .log file.
For example, if your formatter throws a
RuntimeException("Couldn't parse field")
, you would see something
like this:
SQL> SELECT * FROM nosql_data; SELECT * FROM nosql_data * ERROR at line 1: ORA-29913: error in executing ODCIEXTTABLEFETCH callout ORA-29400: data cartridge error KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream encountered error "Exception in thread "main" java.lang.RuntimeException: Couldn't parse field at Preproc.main(Preproc.java:12) " SQL> quit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64 bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options bash-3.00$ ls -l ../data total 4 -rw-r--r-- 1 oracle oinstall 597 May 10 09:57 EMP_LOAD_13150.log -rw-rw-rw- 1 me other 27 May 2 14:49 mylocation.dat bash-3.00$ more ../data/EMP_LOAD_13150.log LOG file opened at 05/10/12 09:57:21 Field Definitions for table NOSQL_DATA Record format DELIMITED BY NEWLINE Data in file has same endianness as the platform Rows with all null fields are accepted Fields in Data Source: F1 CHAR (2) Trim whitespace same as SQL Loader F2 CHAR (10) Trim whitespace same as SQL Loader KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream encountered error "Exception in thread "main" java.lang.RuntimeException: Couldn't parse field at Preproc.main(Preproc.java:12) "
Parallelism
If you specify multiple Location files, be sure to enable parallel query processing using these three commands:ALTER SESSION ENABLE PARALLEL DDL; ALTER SESSION ENABLE PARALLEL DML; ALTER SESSION ENABLE PARALLEL QUERY;as well as adding a
PARALLEL nn
directive in the External
Table declaration, where nn
is the number of LOCATION files
you specified.
Additional External Tables
Additional External Tables can be added with aCREATE
TABLE
command for each one. You should change the Oracle
Database directory objects (ext_tab and nosql_bin_dir as
necessary). If you have a different classpath for your Formatter, then
be sure to add it to the nosql_stream
script.-
ClassDescriptionRuntime Exception for anything related to ExternalTables.Formatter is an interface to a user-specified class used for formatting NoSQL Database records into Strings which are appropriate for importing into a specific Oracle External Table.The Preproc utility is invoked by Oracle Database whenever a
SELECT
is performed on an External Table that has been associated with an Oracle NoSQL Database using thePublish
utility.The Publish utility writes information about how to connect to an Oracle NoSQL Database from an Oracle Database External Table.TableFormatter is an interface to a user-specified class used for formatting rows in NoSQL Database tables into Strings which are appropriate for importing into a specific Oracle External Table.