Package oracle.kv.exttab


package oracle.kv.exttab
This package contains the public API and Utilities for accessing Oracle NoSQL Database data through Oracle Database External Tables. This capability allows NoSQL Database data to be read into Oracle Database using Oracle External Tables. NoSQL Database data can not be modified using External Table access.

Generally, the Oracle Database External Table feature allows a table to be backed by one or more flat files (e.g. text or CSV), also known as "location" files. When a SQL SELECT statement is executed against an External Table, the Oracle Database server reads the location files, parses them according to the ACCESS PARAMETERS specified in the External Table CREATE TABLE definition, and materializes them as rows in the External Table.

Any External Table definition can specify a preprocessor program which is invoked by the Oracle Database server during the read process. The preprocessor is passed the name of the location file as a command line argument. Its contract is to read the data relevant to the location file and write the records to stdout where the Oracle Database reads, parses, and materializes the records as rows in the table.

To use the Oracle Database External Table feature to read Oracle NoSQL Database data, you must use the

<KVHOME>/exttab/bin/nosql_stream
preprocessor to populate your Oracle tables with the data. The remainder of this document describes how to configure an Oracle database with External Tables so that you can read data from an Oracle NoSQL Database.

Using External Tables to perform write operations against Oracle NoSQL Database is not supported.

Usage Overview

To read data from Oracle NoSQL Database into an Oracle Database External Table, you define an External Table with one or more location files; the location files will not actually contain any data, but instead contain configuration information related to connecting to the Oracle Database and Oracle NoSQL Database, query restrictions, and formatting. Specifying multiple location files in the External Table definition allows for a degree of parallelism during data reading.

Once the External Table is defined, you run the Publish utility to "publish" information about the External Table and how to access the data in Oracle NoSQL Database. This information is written as an XML document into each of the location file(s).

Note that you only have to run the publish utility once. After that, you only need to run the publish utility again if you want to change the way the NoSQL Database is accessed (for example, using a different key prefix, or if you change the port or the rep nodes where you access the database).

When the publish step is completed, a SQL SELECT may be executed against the External Table. This will cause the <KVHOME>/exttab/bin/nosql_stream script to be invoked, which in turn causes the Preproc class to be invoked. The preprocessor does the following:

  1. Reads the configuration and NoSQL Database access information from the location file.
  2. Reads the data from the NoSQL Database to load records from the store.
  3. Formats the data using either a user defined formatter or a default format. If loading rows from a table, the data is formatted using either a user defined tableFormatter, or a default table format.
  4. Writes it to stdout.

An External Table "cookbook" can be found in the <KVHOME>/examples/externaltables directory. See the External Tables Cookbook for more information.

External Table Configuration

Before you can run the publish utility, you must configure the Oracle Database External Table. You must also grant various permissions to any Oracle Database user who needs to access it.
  • Declare two directories to the Oracle database. One will hold the External Table location files and one which will refer to the <KVHOME>/exttab/bin/ directory where the nosql_stream script resides. For example, the following commands might be used to specify these two directories to Oracle Database:
    SQL> CONNECT / AS sysdba;
    SQL> CREATE DIRECTORY ext_tab AS '/data';
    SQL> CREATE DIRECTORY nosql_bin_dir AS '/export/home/kvhome/exttab/bin';
    

    NOTE: Consult with your Oracle DBA regarding the appropriate locations and permissions for these directories. It may be desirable from a security point of view to copy the nosql_stream script from the <KVHOME>/exttab/bin/ directory to some other location.

  • Grant the appropriate permissions to any Oracle Database users who will be accessing the External Table. For example, if a new user "nosqluser" will be accessing an External Table, the following commands might be used to create the user and grant appropriate permissions:
    SQL> CONNECT / AS sysdba;
    SQL> CREATE USER nosqluser IDENTIFIED BY password;
    SQL> GRANT CREATE SESSION TO nosqluser;
    SQL> GRANT EXECUTE ON SYS.UTL_FILE TO nosqluser;
    SQL> GRANT READ, WRITE ON DIRECTORY ext_tab TO nosqluser;
    SQL> GRANT READ, EXECUTE ON DIRECTORY nosql_bin_dir TO nosqluser;
    SQL> GRANT CREATE TABLE TO nosqluser;
    
  • Create the External Table, specifying one or more location files and the nosql_stream preprocessor. For example:
    SQL> CONNECT nosqluser/password
    SQL> CREATE TABLE [schema.]tableName (column datatype, ...))
      2      ORGANIZATION EXTERNAL
      3          (TYPE ORACLE_LOADER
      4           DEFAULT DIRECTORY ext_tab
      5           ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE
      6           PREPROCESSOR nosql_bin_dir:'nosql_stream'
      7           FIELDS TERMINATED BY 'delimiter' missing field values are null)
      8      LOCATION ('filename','filename', ...))
      9      REJECT LIMIT UNLIMITED
      10     PARALLEL nn;
    

    where nn is the number of files you specified for LOCATION on line 8.

For more information on External Tables and the parameters, see External Tables Access Parameters.

Publish Utility Configuration

Before you can run the publish utility, you must create a configuration file containing parameters regarding:
  • Connecting to the Oracle Database.
  • The External Table name.
  • Connecting to the Oracle NoSQL Database.
  • The NoSQL Database name.
  • Various NoSQL Database record and row selection restrictions.
  • The name of a Formatter class.
The config file is an XML document and consists of two sections: parameters related to the publish step, and parameters related to the preprocessor step. In general, the file looks like this:
<config version="1">
  <component name="publish" type="params">
    <property name="..."
              value="..."
              type="..."/>
    <property name="..."
              value="..."
              type="..."/>
    ...
  </component>
  <component name="nosql_stream" type="params">
    <property name="..."
              value="..."
              type="..."/>
    <property name="..."
              value="..."
              type="..."/>
    ...
  </component>
  <component name="nosql_table" type="params">
    <property name="..."
              value="..."
              type="..."/>
    <property name="..."
              value="..."
              type="..."/>
    ...
  </component>
</config>
Only one, not both, of the nosql_stream and the nosql_table sections can be configured. The section nosql_stream is used to configure Oracle NoSQL Database to read (Key,Value) pairs from store into Oracle External Table. The nosql_table section is used to configure Oracle NoSQL Database tables so that rows can be read from the table, then loaded into the Oracle External Table. Specifying both sections results in a failure to publish location files to Oracle Database. Notice that there are two major sections in the file: 1) the publish section and 2) either the nosql_stream section or nosql_table section. In the publish section, the following parameters are required:
  • oracle.kv.exttab.connection.url: The connection URL used by the Publish utility to connect to the Oracle instance containing the External Table. This property overrides all other connection properties. The connecting database user must have the privileges described above.

    If you are using a Wallet as an external password store, then the property value must have this form:

    jdbc:oracle:thin:@db_connect_string
    
    The db_connect_string must exactly-match the credential in the wallet. This example uses Oracle Net Services syntax:
       jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
       (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521)))
          (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
    
    This example uses a TNSNAMES entry:
    jdbc:oracle:thin:@my_tns_entry
    
    Also see oracle.kv.exttab.connection.wallet_location, which is described later in this page.

    If you are not using an Oracle wallet, then one of the following URL connection styles is used:

    • Thin Connection
      jdbc:oracle:thin:@//myhost:1521/my_db_service_name
    • Oracle Net Services
         jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
         (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521)))
            (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
      
    • TNS Entry Name
      jdbc:oracle:thin:@myTNSEntryName

  • oracle.kv.exttab.connection.user: required when not using a wallet for connections. In this case, the user will be prompted for a password.
  • oracle.kv.exttab.tableName: The schema-qualified table name of the external table in the format: schemaName.tableName.
In the publish section, the following parameters are optional:
  • oracle.kv.exttab.connection.user: An Oracle user with appropriate privileges. Optional only when you are using a wallet for a connection.
  • oracle.kv.exttab.connection.tns_admin: File path to a directory containing SQL*Net configuration files, such as sqlnet.ora and tnsnames.ora. The value of the TNS_ADMIN environment variable is used for this property by default. Define this property to use TNS entry names in database connect strings.
  • oracle.kv.exttab.connection.tnsEntryName: A TNS entry name defined in the tnsnames.ora file. This property is used with oracle.kv.exttab.connection.tns_admin.
  • oracle.kv.exttab.connection.wallet_location: Pathname of an Oracle wallet where the connection information is stored. When using Oracle wallet as an external password store, set the following additional properties.
    • For a URL connection:
      oracle.kv.exttab.connection.wallet_location
      oracle.kv.exttab.connection.url
    • For TNS names:
      oracle.kv.exttab.connection.wallet_location
      oracle.kv.exttab.connection.tns_admin
      oracle.kv.exttab.connection.tnsEntryName
In the nosql_stream section, the following parameters are required:
  • oracle.kv.kvstore: The name of the NoSQL Database.
  • oracle.kv.hosts: One or more hostname:port pairs, separated by commas. The utility uses these hostname:port pairs to contact Rep Nodes and obtain information about the organization of the NoSQL Database. Note that there are no restrictions on which online Rep Nodes in the store can be identified here; all online Rep Nodes are equally valid candidates.
In the nosql_stream section, the following parameters are optional:
  • oracle.kv.direction: The direction to read for each of the partitions. This value must be "forward". "reverse" and "unordered" may be supported in the future. Presently this parameter is not meaningful, but is included for completeness.
  • oracle.kv.batchSize: The number of records to fetch during each round trip.
  • oracle.kv.parentKey: The parent key whose child KV pairs are to be fetched, specified in the canonical key format described in Key.toString().
  • oracle.kv.subRange: Used to further restrict the range under the parentKey. This parameter is specified in the canonical KeyRange.toString() format.
  • oracle.kv.depth: One of CHILDREN_ONLY, PARENT_AND_CHILDREN, DESCENDANTS_ONLY, or PARENT_AND_DESCENDANTS.
  • oracle.kv.consistency: One of NONE_REQUIRED, NONE_REQUIRED_NO_MASTER, ABSOLUTE, or TIME(permissibleLag, timeout), where permissibleLag and timeout are a value and a TimeUnit pair (e.g. "10 secs").
  • oracle.kv.timeout: The timeout to use for querying the NoSQL Database. This is in timeout format (e.g. "10 seconds").
  • oracle.kv.exttbl.stream.keyDelimiter: If present, the value is a single character. It indicates that:
    1. Each record's key should be written out in the canonical Key format, and
    2. the character to be written between the Key and the Value.
  • oracle.kv.exttbl.stream.variableSizeBytes: If this parameter is 0 or not specified, or oracle.kv.formatterClass is not specified, then records are output in Oracle Data Loader "fixed" format. If this parameter is non-0, it specifies the number of prefix bytes per the Oracle Data Loader format described in External Tables Access Parameters. If both this and oracle.kv.formatterClass are specified, then an exception is thrown when the Publish utility is run.
  • oracle.kv.formatterClass: the name of a class which can format key/value pairs to an appropriate format for the Oracle Loader. This parameter may not be specified if oracle.kv.exttbl.stream.variableSizeBytes is also specified.
In the nosql_table section, the following parameters are required:
  • oracle.kv.kvstore: The name of the NoSQL Database.
  • oracle.kv.hosts: One or more hostname:port pairs, separated by commas. Each hostname:port pair represents a single Replication Node in the Oracle NoSQL Database store. The utility uses these hostname:port pairs to contact the Rep Nodes and obtain information about the organization of the NoSQL Database. Note that there are no restrictions on which online Rep Nodes in the store can be identified here; all online Rep Nodes are equally valid candidates.
  • oracle.kv.table: the Oracle NoSQL Database table from which rows will be read and formatted before they can be consumed by Oracle External Table.
In the nosql_table section, the following parameters are optional:
  • oracle.kv.consistency: One of NONE_REQUIRED, NONE_REQUIRED_NO_MASTER, ABSOLUTE, or TIME(permissibleLag, timeout), where permissibleLag and timeout are a value and a TimeUnit pair (e.g. "10 secs").
  • oracle.kv.timeout: The timeout to use for querying the NoSQL Database. This is in timeout format (e.g. "10 seconds").
  • oracle.kv.batchSize: The number of rows to fetch during each round trip.
  • oracle.kv.formatterClass: the name of a class which can format each row to an appropriate format for the Oracle Loader.

The Publish Utility

The Publish Utility:
  1. Accepts the parameters described above in an XML config file.
  2. Opens a connection to the Oracle Database instance.
  3. Reads the schema information for the specified External Table and extracts the names of External Table files from the schema using the ALL_EXTERNAL_LOCATIONS view.
  4. Opens a connection to the NoSQL Database to verify correctness of the parameters.
  5. Writes the relative number of the External Table file into the file.
  6. Writes information about how to contact the Oracle NoSQL DB Rep Nodes (the "publish" component parameters) into the External Table files.
  7. Writes information about how to restrict the query on each of the Oracle NoSQL DB Rep Nodes (the "nosql_stream" component parameters) into the External Table files.

The Publish Utility accepts the following command line parameters:

  • -noexecute: Validate the parameters and test the connections to Oracle Database and Oracle NoSQL DB, but do not publish (write) to the External Table files. Either -noexecute or -publish must be specified.
  • -publish: The opposite of -noexecute. Publish (write) the relevant information to the External Table files. Either -noexecute or -publish must be specified.
  • -verbose: Be verbose during execution. Useful for debugging.
  • -config <file>: Specifies an XML config file in the format shown above.

Since the Publish Utility connects to the Oracle Database using JDBC, when you invoke it, you must include $ORACLE_HOME/jdbc/lib/ojdbc6.jar in the classpath. If you are using the Oracle wallet as an external password store, then you must also include $ORACLE_HOME/jlib/oraclepki.jar.

The Preprocessor (nosql_stream)

The preprocessor is invoked by the Oracle Database instance when a SQL SELECT statement is executed on the Oracle Database server. The Oracle Database server invokes the preprocessor once for each External Table location file. Therefore, the number of location files determines the level of parallelism when retrieving records. The preprocessor(s) are invoked on the same machine that the Oracle Database instance is running.

In general, the user does not invoke this utility directly (except perhaps for debugging) as that will be done by Oracle Database when it invokes the nosql_stream script which is specified as the PREPROCESSOR program in the External Table declaration. The nosql_stream Preprocessor performs the following operations:

  • Takes the name of the External Table location file on which it will operate (passed to it by Oracle Database as a command line argument), opens that file and reads the configuration XML document written by the Publish utility.
  • Opens a connection to the Oracle NoSQL Database and reads the topology.
  • Determines the number of shards, and then based on a simple partitioning of the shards based on the file number, determines which shards it will operate on.
  • If load (Key, Value) pairs from the Oracle NoSQL Database store, generates a list of partitions on each target shard. For loading rows from Oracle NoSQL Database table, this step is skipped.
  • Reads the records from each of the target partitions, or reads the rows from the specific Oracle NoSQL Database table.
  • Writes those records or rows to stdout for reading by the Oracle Database instance.
When configuring the preprocessor, consider the following:
  • If oracle.kv.exttbl.stream.keyDelimiter is specified, the key is written to stdout using the canonical Key format, followed by the keyDelimiter, followed by the value, followed by newline.
  • When loading (Key, Value) from the Oracle NoSQL Database store, and if no Formatter class is specified, values are formatted by calling new String().
  • When loading (Key, Value) from the Oracle NoSQL Database store, and if a Formatter class is specified, then oracle.kv.exttbl.stream.keyDelimiter is ignored.
  • When loading from Oracle NoSQL Database tables, the complete row will be converted to a CSV-formatted string and written to stdout if no Formatter class is specified for the table, otherwise, the row will be formatted by the specified Formatter class.
  • nosql_stream is a shell script which requires that the PATH, KVHOME and CLASSPATH be set correctly. You should edit this script to include appropriate settings. Be sure that it has execute permission when you are finished editing.

Formatters

For many users, it may not be possible to use the External Table ACCESS PARAMETERS to sufficiently map NoSQL Database records to Oracle Database records. Formatters (or TableFormatters) allow you to provide a class which accepts a record (or row) and returns a String which can be mapped to the External Table fields. The Formatter.toOracleLoaderFormat(oracle.kv.KeyValueVersion, oracle.kv.KVStore) (or TableFormatter.toOracleLoaderFormat(oracle.kv.table.Row, oracle.kv.KVStore)) class is invoked once per record by the preprocessor. If toOracleLoaderFormat() returns null, then that record is ignored and not imported to the External Table. This provides a user-programmable filtering mechanism.

Exceptions

Every SELECT statement against an External Table writes a log to the file specified by the LOGFILE clause of the External Table definition (the default logfile is <table_name>_<pid>.log in the External Table location file directory). The BADFILE clause names the file to which records are written when they cannot be loaded because of errors. For example, a record is written to the bad file when a field in the datafile could not be converted to the datatype of a column in the External Table.

If the preprocessor writes anything to stderr, the SELECT fails and the output of the exception is written to the .log file. For example, if your formatter throws a RuntimeException("Couldn't parse field"), you would see something like this:

SQL> SELECT * FROM nosql_data;
SELECT * FROM nosql_data
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream
encountered error "Exception in thread "main" java.lang.RuntimeException: Couldn't parse field
at Preproc.main(Preproc.java:12)
"


SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64
bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
bash-3.00$ ls -l ../data
total 4
-rw-r--r--   1 oracle   oinstall     597 May 10 09:57 EMP_LOAD_13150.log
-rw-rw-rw-   1 me       other         27 May  2 14:49 mylocation.dat
bash-3.00$ more ../data/EMP_LOAD_13150.log


 LOG file opened at 05/10/12 09:57:21

Field Definitions for table NOSQL_DATA
  Record format DELIMITED BY NEWLINE
  Data in file has same endianness as the platform
  Rows with all null fields are accepted

  Fields in Data Source:

    F1                              CHAR (2)
      Trim whitespace same as SQL Loader
    F2                              CHAR (10)
      Trim whitespace same as SQL Loader
KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream encountered error
"Exception in thread "main" java.lang.RuntimeException: Couldn't parse field
        at Preproc.main(Preproc.java:12)
"

Parallelism

If you specify multiple Location files, be sure to enable parallel query processing using these three commands:
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;
as well as adding a PARALLEL nn directive in the External Table declaration, where nn is the number of LOCATION files you specified.

Additional External Tables

Additional External Tables can be added with a CREATE TABLE command for each one. You should change the Oracle Database directory objects (ext_tab and nosql_bin_dir as necessary). If you have a different classpath for your Formatter, then be sure to add it to the nosql_stream script.
  • Class
    Description
    Runtime Exception for anything related to ExternalTables.
     
    Formatter is an interface to a user-specified class used for formatting NoSQL Database records into Strings which are appropriate for importing into a specific Oracle External Table.
    The Preproc utility is invoked by Oracle Database whenever a SELECT is performed on an External Table that has been associated with an Oracle NoSQL Database using the Publish utility.
    The Publish utility writes information about how to connect to an Oracle NoSQL Database from an Oracle Database External Table.
    TableFormatter is an interface to a user-specified class used for formatting rows in NoSQL Database tables into Strings which are appropriate for importing into a specific Oracle External Table.