Package oracle.kv.exttab
Generally, the Oracle Database External Table feature allows a table to be
backed by one or more flat files (e.g. text or CSV), also known as
"location" files. When a SQL SELECT statement is executed
against an External Table, the Oracle Database server reads the location
files, parses them according to the ACCESS PARAMETERS
specified in the External Table CREATE TABLE definition, and
materializes them as rows in the External Table.
Any External Table definition can specify a preprocessor program which is invoked by the Oracle Database server during the read process. The preprocessor is passed the name of the location file as a command line argument. Its contract is to read the data relevant to the location file and write the records to stdout where the Oracle Database reads, parses, and materializes the records as rows in the table.
To use the Oracle Database External Table feature to read Oracle NoSQL Database data, you must use the
<KVHOME>/exttab/bin/nosql_streampreprocessor to populate your Oracle tables with the data. The remainder of this document describes how to configure an Oracle database with External Tables so that you can read data from an Oracle NoSQL Database.
Using External Tables to perform write operations against Oracle NoSQL Database is not supported.
Usage Overview
To read data from Oracle NoSQL Database into an Oracle Database External Table, you define an External Table with one or more location files; the location files will not actually contain any data, but instead contain configuration information related to connecting to the Oracle Database and Oracle NoSQL Database, query restrictions, and formatting. Specifying multiple location files in the External Table definition allows for a degree of parallelism during data reading.
Once the External Table is defined, you run the Publish utility to "publish" information about the
External Table and how to access the data in Oracle NoSQL Database. This
information is written as an XML document into each of the location
file(s).
Note that you only have to run the publish utility once. After that, you only need to run the publish utility again if you want to change the way the NoSQL Database is accessed (for example, using a different key prefix, or if you change the port or the rep nodes where you access the database).
When the publish step is completed, a SQL SELECT may be
executed against the External Table. This will cause
the <KVHOME>/exttab/bin/nosql_stream script to be invoked,
which in turn causes the Preproc class to be
invoked. The preprocessor does the following:
- Reads the configuration and NoSQL Database access information from the location file.
- Reads the data from the NoSQL Database to load records from the store.
- 
        Formats the data using either a user defined formatteror a default format. If loading rows from a table, the data is formatted using either a user definedtableFormatter, or a default table format.
- Writes it to stdout.
An External Table "cookbook" can be found in
the <KVHOME>/examples/externaltables directory.
See the External Tables Cookbook
for more information.
External Table Configuration
Before you can run the publish utility, you must configure the Oracle Database External Table. You must also grant various permissions to any Oracle Database user who needs to access it.- 
        Declare two directories to the Oracle database. One will hold the
        External Table location files and one which will refer to the
        <KVHOME>/exttab/bin/directory where thenosql_streamscript resides. For example, the following commands might be used to specify these two directories to Oracle Database:SQL> CONNECT / AS sysdba; SQL> CREATE DIRECTORY ext_tab AS '/data'; SQL> CREATE DIRECTORY nosql_bin_dir AS '/export/home/kvhome/exttab/bin'; NOTE: Consult with your Oracle DBA regarding the appropriate locations and permissions for these directories. It may be desirable from a security point of view to copy the nosql_streamscript from the<KVHOME>/exttab/bin/directory to some other location.
- 
        Grant the appropriate permissions to any Oracle Database users who
        will be accessing the External Table. For example, if a new user
        "nosqluser" will be accessing an External Table, the following
        commands might be used to create the user and grant appropriate
        permissions:
SQL> CONNECT / AS sysdba; SQL> CREATE USER nosqluser IDENTIFIED BY password; SQL> GRANT CREATE SESSION TO nosqluser; SQL> GRANT EXECUTE ON SYS.UTL_FILE TO nosqluser; SQL> GRANT READ, WRITE ON DIRECTORY ext_tab TO nosqluser; SQL> GRANT READ, EXECUTE ON DIRECTORY nosql_bin_dir TO nosqluser; SQL> GRANT CREATE TABLE TO nosqluser; 
- 
        Create the External Table, specifying  one or more location files
        and the nosql_streampreprocessor. For example:SQL> CONNECT nosqluser/password SQL> CREATE TABLE [schema.]tableName (column datatype, ...)) 2 ORGANIZATION EXTERNAL 3 (TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY ext_tab 5 ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE 6 PREPROCESSOR nosql_bin_dir:'nosql_stream' 7 FIELDS TERMINATED BY 'delimiter' missing field values are null) 8 LOCATION ('filename','filename', ...)) 9 REJECT LIMIT UNLIMITED 10 PARALLEL nn;where nn is the number of files you specified for LOCATION on line 8. 
Publish Utility Configuration
Before you can run thepublish utility,
you must create a configuration file containing parameters regarding:
- Connecting to the Oracle Database.
- The External Table name.
- Connecting to the Oracle NoSQL Database.
- The NoSQL Database name.
- Various NoSQL Database record and row selection restrictions.
- The name of a Formatter class.
<config version="1">
  <component name="publish" type="params">
    <property name="..."
              value="..."
              type="..."/>
    <property name="..."
              value="..."
              type="..."/>
    ...
  </component>
  <component name="nosql_stream" type="params">
    <property name="..."
              value="..."
              type="..."/>
    <property name="..."
              value="..."
              type="..."/>
    ...
  </component>
  <component name="nosql_table" type="params">
    <property name="..."
              value="..."
              type="..."/>
    <property name="..."
              value="..."
              type="..."/>
    ...
  </component>
</config>
Only one, not both, of the nosql_stream and
the nosql_table sections can be configured. The section nosql_stream
 is used to configure Oracle NoSQL Database to read (Key,Value) pairs from store
into Oracle External Table. The nosql_table section is used to
configure Oracle NoSQL Database tables so that rows can be read from the
table, then loaded into the Oracle External Table. Specifying both sections 
results in a failure to publish location files to Oracle Database.
Notice that there are two major sections in the file: 1) the
publish section and 2) either the nosql_stream section or
nosql_table section.
In the publish section, the following parameters are required:
- 
        oracle.kv.exttab.connection.url: The connection URL used by the Publish utility to connect to the Oracle instance containing the External Table. This property overrides all other connection properties. The connecting database user must have the privileges described above.If you are using a Wallet as an external password store, then the property value must have this form: jdbc:oracle:thin:@db_connect_string Thedb_connect_stringmust exactly-match the credential in the wallet. This example uses Oracle Net Services syntax:jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))This example uses a TNSNAMES entry:jdbc:oracle:thin:@my_tns_entry Also seeoracle.kv.exttab.connection.wallet_location, which is described later in this page.If you are not using an Oracle wallet, then one of the following URL connection styles is used: - 
                Thin Connection
 jdbc:oracle:thin:@//myhost:1521/my_db_service_name
 
- 
                Oracle Net Services
 jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
- TNS Entry Name
 jdbc:oracle:thin:@myTNSEntryName
 
- 
                Thin Connection
- 
        oracle.kv.exttab.connection.user: required when not using a wallet for connections. In this case, the user will be prompted for a password.
- 
        oracle.kv.exttab.tableName: The schema-qualified table name of the external table in the format:schemaName.tableName.
publish section, the following parameters are optional:
- 
        oracle.kv.exttab.connection.user: An Oracle user with appropriate privileges. Optional only when you are using a wallet for a connection.
- 
        oracle.kv.exttab.connection.tns_admin: File path to a directory containing SQL*Net configuration files, such as sqlnet.ora and tnsnames.ora. The value of the TNS_ADMIN environment variable is used for this property by default. Define this property to use TNS entry names in database connect strings.
- 
        oracle.kv.exttab.connection.tnsEntryName: A TNS entry name defined in the tnsnames.ora file. This property is used withoracle.kv.exttab.connection.tns_admin.
- 
        oracle.kv.exttab.connection.wallet_location: Pathname of an Oracle wallet where the connection information is stored. When using Oracle wallet as an external password store, set the following additional properties.- 
                For a URL connection:
 oracle.kv.exttab.connection.wallet_location
 oracle.kv.exttab.connection.url
- 
                For TNS names:
 oracle.kv.exttab.connection.wallet_location
 oracle.kv.exttab.connection.tns_admin
 oracle.kv.exttab.connection.tnsEntryName
 
- 
                For a URL connection:
nosql_stream section, the following parameters are
required:
- 
        oracle.kv.kvstore: The name of the NoSQL Database.
- 
         oracle.kv.hosts: One or morehostname:portpairs, separated by commas. The utility uses thesehostname:portpairs to contact Rep Nodes and obtain information about the organization of the NoSQL Database. Note that there are no restrictions on which online Rep Nodes in the store can be identified here; all online Rep Nodes are equally valid candidates.
nosql_stream section, the following parameters are
optional:
- 
        oracle.kv.direction: The direction to read for each of the partitions. This value must be "forward". "reverse" and "unordered" may be supported in the future. Presently this parameter is not meaningful, but is included for completeness.
- 
        oracle.kv.batchSize: The number of records to fetch during each round trip.
- 
        oracle.kv.parentKey: The parent key whose child KV pairs are to be fetched, specified in the canonical key format described inKey.toString().
- 
        oracle.kv.subRange: Used to further restrict the range under theparentKey. This parameter is specified in the canonicalKeyRange.toString()format.
- 
        oracle.kv.depth: One ofCHILDREN_ONLY,PARENT_AND_CHILDREN,DESCENDANTS_ONLY, orPARENT_AND_DESCENDANTS.
- 
        oracle.kv.consistency: One ofNONE_REQUIRED,NONE_REQUIRED_NO_MASTER,ABSOLUTE, orTIME(permissibleLag, timeout), wherepermissibleLagandtimeoutare a value and aTimeUnitpair (e.g. "10 secs").
- 
        oracle.kv.timeout: The timeout to use for querying the NoSQL Database. This is in timeout format (e.g. "10 seconds").
- 
        oracle.kv.exttbl.stream.keyDelimiter: If present, the value is a single character. It indicates that:- Each record's key should be written out in the canonical Key format, and
- the character to be written between the Key and the Value.
 
- 
        oracle.kv.exttbl.stream.variableSizeBytes: If this parameter is 0 or not specified, ororacle.kv.formatterClassis not specified, then records are output in Oracle Data Loader "fixed" format. If this parameter is non-0, it specifies the number of prefix bytes per the Oracle Data Loader format described in External Tables Access Parameters. If both this andoracle.kv.formatterClassare specified, then an exception is thrown when the Publish utility is run.
- 
        oracle.kv.formatterClass: the name of a class which can format key/value pairs to an appropriate format for the Oracle Loader. This parameter may not be specified iforacle.kv.exttbl.stream.variableSizeBytesis also specified.
nosql_table section, the following parameters are
required:
- 
        oracle.kv.kvstore: The name of the NoSQL Database.
- 
         oracle.kv.hosts: One or morehostname:portpairs, separated by commas. Eachhostname:portpair represents a single Replication Node in the Oracle NoSQL Database store. The utility uses thesehostname:portpairs to contact the Rep Nodes and obtain information about the organization of the NoSQL Database. Note that there are no restrictions on which online Rep Nodes in the store can be identified here; all online Rep Nodes are equally valid candidates.
- 
      oracle.kv.table: the Oracle NoSQL Database table from which rows will be read and formatted before they can be consumed by Oracle External Table.
nosql_table section, the following parameters are
optional:
- 
        oracle.kv.consistency: One ofNONE_REQUIRED,NONE_REQUIRED_NO_MASTER,ABSOLUTE, orTIME(permissibleLag, timeout), wherepermissibleLagandtimeoutare a value and aTimeUnitpair (e.g. "10 secs").
- 
        oracle.kv.timeout: The timeout to use for querying the NoSQL Database. This is in timeout format (e.g. "10 seconds").
- 
        oracle.kv.batchSize: The number of rows to fetch during each round trip.
- 
        oracle.kv.formatterClass: the name of a class which can format each row to an appropriate format for the Oracle Loader.
The Publish Utility
ThePublish Utility:
- Accepts the parameters described above in an XML config file.
- Opens a connection to the Oracle Database instance.
- 
        Reads the schema information for the specified External Table and
        extracts the names of External Table files from the schema using
        the ALL_EXTERNAL_LOCATIONSview.
- Opens a connection to the NoSQL Database to verify correctness of the parameters.
- Writes the relative number of the External Table file into the file.
- Writes information about how to contact the Oracle NoSQL DB Rep Nodes (the "publish" component parameters) into the External Table files.
- Writes information about how to restrict the query on each of the Oracle NoSQL DB Rep Nodes (the "nosql_stream" component parameters) into the External Table files.
The Publish Utility accepts the following command line parameters:
- 
        -noexecute: Validate the parameters and test the connections to Oracle Database and Oracle NoSQL DB, but do not publish (write) to the External Table files. Either-noexecuteor-publishmust be specified.
- 
        -publish: The opposite of-noexecute. Publish (write) the relevant information to the External Table files. Either-noexecuteor-publishmust be specified.
- 
        -verbose: Be verbose during execution. Useful for debugging.
- 
        -config <file>: Specifies an XML config file in the format shown above.
Since the Publish Utility connects to the Oracle Database using JDBC,
when you invoke it, you must include
$ORACLE_HOME/jdbc/lib/ojdbc6.jar in the classpath. If you are using
the Oracle wallet as an external password store, then you must also include
$ORACLE_HOME/jlib/oraclepki.jar.
The Preprocessor (nosql_stream)
The preprocessor is invoked by the Oracle Database instance when a
SQL SELECT statement is executed on the Oracle Database
server. The Oracle Database server invokes the preprocessor once for
each External Table location file. Therefore, the number of location
files determines the level of parallelism when retrieving records. The
preprocessor(s) are invoked on the same machine that the Oracle
Database instance is running.
In general, the user does not invoke this utility directly (except
perhaps for debugging) as that will be done by Oracle Database when it
invokes the nosql_stream script which is specified as the
PREPROCESSOR program in the External Table declaration.
The nosql_stream Preprocessor performs the following operations:
- Takes the name of the External Table location file on which it will operate (passed to it by Oracle Database as a command line argument), opens that file and reads the configuration XML document written by the Publish utility.
- Opens a connection to the Oracle NoSQL Database and reads the topology.
- Determines the number of shards, and then based on a simple partitioning of the shards based on the file number, determines which shards it will operate on.
- If load (Key, Value) pairs from the Oracle NoSQL Database store, generates a list of partitions on each target shard. For loading rows from Oracle NoSQL Database table, this step is skipped.
- Reads the records from each of the target partitions, or reads the rows from the specific Oracle NoSQL Database table.
- Writes those records or rows to stdout for reading by the Oracle Database instance.
- 
        If oracle.kv.exttbl.stream.keyDelimiteris specified, the key is written to stdout using the canonical Key format, followed by the keyDelimiter, followed by the value, followed by newline.
- 
        When loading (Key, Value) from the Oracle NoSQL Database store, and if
        no Formatter class is specified, values are formatted by calling
        new String().
- 
        When loading (Key, Value) from the Oracle NoSQL Database store, and if
        a Formatter class is specified, then
        oracle.kv.exttbl.stream.keyDelimiteris ignored.
- When loading from Oracle NoSQL Database tables, the complete row will be converted to a CSV-formatted string and written to stdout if no Formatter class is specified for the table, otherwise, the row will be formatted by the specified Formatter class.
- 
        nosql_streamis a shell script which requires that thePATH,KVHOMEandCLASSPATHbe set correctly. You should edit this script to include appropriate settings. Be sure that it has execute permission when you are finished editing.
Formatters
For many users, it may not be possible to use the External TableACCESS PARAMETERS to sufficiently map
NoSQL Database records to Oracle Database records. Formatters (or TableFormatters)
allow you to provide a class
which accepts a record (or
row) and returns a
String which can be mapped to the External Table fields. The Formatter.toOracleLoaderFormat(oracle.kv.KeyValueVersion, oracle.kv.KVStore)
(or TableFormatter.toOracleLoaderFormat(oracle.kv.table.Row, oracle.kv.KVStore)) class is invoked once per record by the
preprocessor. If toOracleLoaderFormat() returns null,
then that record is ignored and not imported to the External
Table. This provides a user-programmable filtering mechanism.
Exceptions
EverySELECT statement against an External Table writes a
log to the file specified by
the 
LOGFILE clause of the External Table definition (the default
logfile is <table_name>_<pid>.log in the External Table
location file directory).  The BADFILE clause names the
file to which records are written when they cannot be loaded because
of errors. For example, a record is written to the bad file when a
field in the datafile could not be converted to the datatype of a
column in the External Table.
If the preprocessor writes anything to stderr, the SELECT
fails and the output of the exception is written to the .log file.
For example, if your formatter throws a
RuntimeException("Couldn't parse field"), you would see something
like this:
SQL> SELECT * FROM nosql_data;
SELECT * FROM nosql_data
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream
encountered error "Exception in thread "main" java.lang.RuntimeException: Couldn't parse field
at Preproc.main(Preproc.java:12)
"
SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64
bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
bash-3.00$ ls -l ../data
total 4
-rw-r--r--   1 oracle   oinstall     597 May 10 09:57 EMP_LOAD_13150.log
-rw-rw-rw-   1 me       other         27 May  2 14:49 mylocation.dat
bash-3.00$ more ../data/EMP_LOAD_13150.log
 LOG file opened at 05/10/12 09:57:21
Field Definitions for table NOSQL_DATA
  Record format DELIMITED BY NEWLINE
  Data in file has same endianness as the platform
  Rows with all null fields are accepted
  Fields in Data Source:
    F1                              CHAR (2)
      Trim whitespace same as SQL Loader
    F2                              CHAR (10)
      Trim whitespace same as SQL Loader
KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream encountered error
"Exception in thread "main" java.lang.RuntimeException: Couldn't parse field
        at Preproc.main(Preproc.java:12)
"
Parallelism
If you specify multiple Location files, be sure to enable parallel query processing using these three commands:ALTER SESSION ENABLE PARALLEL DDL; ALTER SESSION ENABLE PARALLEL DML; ALTER SESSION ENABLE PARALLEL QUERY;as well as adding a
PARALLEL nn directive in the External
Table declaration, where nn is the number of LOCATION files
you specified.
Additional External Tables
Additional External Tables can be added with aCREATE
TABLE command for each one. You should change the Oracle
Database directory objects (ext_tab and nosql_bin_dir as
necessary). If you have a different classpath for your Formatter, then
be sure to add it to the nosql_stream script.
- 
ClassDescriptionRuntime Exception for anything related to ExternalTables.Formatter is an interface to a user-specified class used for formatting NoSQL Database records into Strings which are appropriate for importing into a specific Oracle External Table.The Preproc utility is invoked by Oracle Database whenever aSELECTis performed on an External Table that has been associated with an Oracle NoSQL Database using thePublishutility.The Publish utility writes information about how to connect to an Oracle NoSQL Database from an Oracle Database External Table.TableFormatter is an interface to a user-specified class used for formatting rows in NoSQL Database tables into Strings which are appropriate for importing into a specific Oracle External Table.