See: Description
Interface | Description |
---|---|
Formatter |
Formatter is an interface to a user-specified class used for formatting
NoSQL Database records into Strings which are appropriate for importing into
a specific Oracle External Table.
|
TableFormatter |
TableFormatter is an interface to a user-specified class used for formatting
rows in NoSQL Database tables into Strings which are appropriate for
importing into a specific Oracle External Table.
|
Class | Description |
---|---|
ExternalTableUtils | |
Preproc |
The Preproc utility is invoked by Oracle Database whenever a
SELECT is performed on an External Table that has been
associated with an Oracle NoSQL Database using the Publish utility. |
Publish |
The Publish utility writes information about how to connect to an Oracle
NoSQL Database from an Oracle Database External Table.
|
Exception | Description |
---|---|
ExternalTableException |
Runtime Exception for anything related to ExternalTables.
|
Generally, the Oracle Database External Table feature allows a table to be
backed by one or more flat files (e.g. text or CSV), also known as
"location" files. When a SQL SELECT
statement is executed
against an External Table, the Oracle Database server reads the location
files, parses them according to the ACCESS PARAMETERS
specified in the External Table CREATE TABLE
definition, and
materializes them as rows in the External Table.
Any External Table definition can specify a preprocessor program which is invoked by the Oracle Database server during the read process. The preprocessor is passed the name of the location file as a command line argument. Its contract is to read the data relevant to the location file and write the records to stdout where the Oracle Database reads, parses, and materializes the records as rows in the table.
To use the Oracle Database External Table feature to read Oracle NoSQL Database data, you must use the
<KVHOME>/exttab/bin/nosql_streampreprocessor to populate your Oracle tables with the data. The remainder of this document describes how to configure an Oracle database with External Tables so that you can read data from an Oracle NoSQL Database.
Using External Tables to perform write operations against Oracle NoSQL Database is not supported.
Once the External Table is defined, you run the Publish
utility to "publish" information about the
External Table and how to access the data in Oracle NoSQL Database. This
information is written as an XML document into each of the location
file(s).
Note that you only have to run the publish utility once. After that, you only need to run the publish utility again if you want to change the way the NoSQL Database is accessed (for example, using a different key prefix, or if you change the port or the rep nodes where you access the database).
When the publish step is completed, a SQL SELECT
may be
executed against the External Table. This will cause
the <KVHOME>/exttab/bin/nosql_stream
script to be invoked,
which in turn causes the Preproc
class to be
invoked. The preprocessor does the following:
formatter
or a default format. If
loading rows from a table, the data is formatted using either a
user defined tableFormatter
,
or a default table format.
An External Table "cookbook" can be found in
the <KVHOME>/examples/externaltables
directory.
See the External Tables Cookbook
for more information.
<KVHOME>/exttab/bin/
directory where the
nosql_stream
script resides. For example, the
following commands might be used to specify these two directories
to Oracle Database:
SQL> CONNECT / AS sysdba; SQL> CREATE DIRECTORY ext_tab AS '/data'; SQL> CREATE DIRECTORY nosql_bin_dir AS '/export/home/kvhome/exttab/bin';
NOTE: Consult with your Oracle DBA regarding the appropriate
locations and permissions for these directories. It may be
desirable from a security point of view to copy the
nosql_stream
script from the
<KVHOME>/exttab/bin/
directory to some other
location.
SQL> CONNECT / AS sysdba; SQL> CREATE USER nosqluser IDENTIFIED BY password; SQL> GRANT CREATE SESSION TO nosqluser; SQL> GRANT EXECUTE ON SYS.UTL_FILE TO nosqluser; SQL> GRANT READ, WRITE ON DIRECTORY ext_tab TO nosqluser; SQL> GRANT READ, EXECUTE ON DIRECTORY nosql_bin_dir TO nosqluser; SQL> GRANT CREATE TABLE TO nosqluser;
nosql_stream
preprocessor. For example:
SQL> CONNECT nosqluser/password SQL> CREATE TABLE [schema.]tableName (column datatype, ...)) 2 ORGANIZATION EXTERNAL 3 (TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY ext_tab 5 ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE 6 PREPROCESSOR nosql_bin_dir:'nosql_stream' 7 FIELDS TERMINATED BY 'delimiter' missing field values are null) 8 LOCATION ('filename','filename', ...)) 9 REJECT LIMIT UNLIMITED 10 PARALLEL nn;
where nn is the number of files you specified for LOCATION on line 8.
publish
utility,
you must create a configuration file containing parameters regarding:
<config version="1"> <component name="publish" type="params"> <property name="..." value="..." type="..."/> <property name="..." value="..." type="..."/> ... </component> <component name="nosql_stream" type="params"> <property name="..." value="..." type="..."/> <property name="..." value="..." type="..."/> ... </component> <component name="nosql_table" type="params"> <property name="..." value="..." type="..."/> <property name="..." value="..." type="..."/> ... </component> </config>Only one, not both, of the
nosql_stream
and
the nosql_table
sections can be configured. The section nosql_stream
is used to configure Oracle NoSQL Database to read (Key,Value) pairs from store
into Oracle External Table. The nosql_table
section is used to
configure Oracle NoSQL Database tables so that rows can be read from the
table, then loaded into the Oracle External Table. Specifying both sections
results in a failure to publish location files to Oracle Database.
Notice that there are two major sections in the file: 1) the
publish
section and 2) either the nosql_stream
section or
nosql_table
section.
In the publish
section, the following parameters are required:
oracle.kv.exttab.connection.url
: The connection URL
used by the Publish utility to connect to the Oracle instance
containing the External Table. This property overrides all other
connection properties. The connecting database user must have the
privileges described above.
If you are using a Wallet as an external password store, then the property value must have this form:
jdbc:oracle:thin:@db_connect_stringThe
db_connect_string
must exactly-match the credential in the
wallet. This example uses Oracle Net Services syntax:
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))This example uses a TNSNAMES entry:
jdbc:oracle:thin:@my_tns_entryAlso see
oracle.kv.exttab.connection.wallet_location
, which is
described later in this page.
If you are not using an Oracle wallet, then one of the following URL connection styles is used:
jdbc:oracle:thin:@//myhost:1521/my_db_service_name
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
jdbc:oracle:thin:@myTNSEntryName
oracle.kv.exttab.connection.user
: required when not
using a wallet for connections. In this case, the user will be prompted
for a password.
oracle.kv.exttab.tableName
: The schema-qualified table
name of the external table in the format:
schemaName.tableName
.
publish
section, the following parameters are optional:
oracle.kv.exttab.connection.user
: An Oracle user with
appropriate privileges. Optional only when you are using a wallet
for a connection.
oracle.kv.exttab.connection.tns_admin
: File path to a
directory containing SQL*Net configuration files, such as sqlnet.ora
and tnsnames.ora. The value of the TNS_ADMIN environment variable is
used for this property by default. Define this property to use TNS
entry names in database connect strings.
oracle.kv.exttab.connection.tnsEntryName
: A TNS entry name
defined in the tnsnames.ora file. This property is used with
oracle.kv.exttab.connection.tns_admin
.
oracle.kv.exttab.connection.wallet_location
: Pathname of
an Oracle wallet where the connection information is stored. When using
Oracle wallet as an external password store, set the following
additional properties.
oracle.kv.exttab.connection.wallet_location
oracle.kv.exttab.connection.url
oracle.kv.exttab.connection.wallet_location
oracle.kv.exttab.connection.tns_admin
oracle.kv.exttab.connection.tnsEntryName
nosql_stream
section, the following parameters are
required:
oracle.kv.kvstore
: The name of the NoSQL Database.
oracle.kv.hosts
: One or more
hostname:port
pairs, separated by commas. The utility
uses these hostname:port
pairs to contact Rep Nodes
and obtain information about the organization of the NoSQL
Database. Note that there are no restrictions on which online Rep
Nodes in the store can be identified here; all online Rep Nodes
are equally valid candidates.
nosql_stream
section, the following parameters are
optional:
oracle.kv.direction
: The direction to
read for each of the partitions. This value must be
"forward". "reverse" and "unordered" may be supported in the
future. Presently this parameter is not meaningful, but is included
for completeness.
oracle.kv.batchSize
: The number of records to fetch during each round trip.
oracle.kv.parentKey
: The parent key
whose child KV pairs are to be fetched, specified in the canonical
key format described in Key.toString()
.
oracle.kv.subRange
: Used to further
restrict the range under the parentKey
. This parameter
is specified in the canonical KeyRange.toString()
format.
oracle.kv.depth
: One of
CHILDREN_ONLY
, PARENT_AND_CHILDREN
,
DESCENDANTS_ONLY
, or
PARENT_AND_DESCENDANTS
.
oracle.kv.consistency
: One of
NONE_REQUIRED
,
NONE_REQUIRED_NO_MASTER
, ABSOLUTE
, or
TIME(permissibleLag, timeout)
, where
permissibleLag
and timeout
are a value
and a TimeUnit
pair (e.g. "10 secs").
oracle.kv.timeout
: The timeout to use for
querying the NoSQL Database. This is in timeout format (e.g. "10
seconds").
oracle.kv.exttbl.stream.keyDelimiter
: If present, the
value is a single character. It indicates that:
oracle.kv.exttbl.stream.variableSizeBytes
: If this
parameter is 0 or not specified, or
oracle.kv.formatterClass
is not
specified, then records are output in Oracle Data Loader "fixed"
format. If this parameter is non-0, it specifies the number of
prefix bytes per the Oracle Data Loader format described in
External Tables Access Parameters. If both this and
oracle.kv.formatterClass
are specified,
then an exception is thrown when the Publish utility is run.
oracle.kv.formatterClass
: the name of a
class which can format key/value pairs to an appropriate format for
the Oracle Loader. This parameter may not be specified if
oracle.kv.exttbl.stream.variableSizeBytes
is also
specified.
nosql_table
section, the following parameters are
required:
oracle.kv.kvstore
: The name of the NoSQL Database.
oracle.kv.hosts
: One or more
hostname:port
pairs, separated by commas. Each
hostname:port
pair represents a single Replication
Node in the Oracle NoSQL Database store. The utility
uses these hostname:port
pairs to contact the Rep Nodes
and obtain information about the organization of the NoSQL
Database. Note that there are no restrictions on which online Rep
Nodes in the store can be identified here; all online Rep Nodes
are equally valid candidates.
oracle.kv.table
: the Oracle NoSQL Database table from which
rows will be read and formatted before they can be consumed by Oracle
External Table.
nosql_table
section, the following parameters are
optional:
oracle.kv.consistency
: One of
NONE_REQUIRED
,
NONE_REQUIRED_NO_MASTER
, ABSOLUTE
, or
TIME(permissibleLag, timeout)
, where
permissibleLag
and timeout
are a value
and a TimeUnit
pair (e.g. "10 secs").
oracle.kv.timeout
: The timeout to use for
querying the NoSQL Database. This is in timeout format (e.g. "10
seconds").
oracle.kv.batchSize
: The number of rows to fetch during each round trip.
oracle.kv.formatterClass
: the name of a
class which can format each row to an appropriate format for
the Oracle Loader.
Publish Utility
:
ALL_EXTERNAL_LOCATIONS
view.
The Publish Utility accepts the following command line parameters:
-noexecute
: Validate the parameters and test the
connections to Oracle Database and Oracle NoSQL DB, but do not publish (write)
to the External Table files. Either -noexecute
or
-publish
must be specified.
-publish
: The opposite
of -noexecute
. Publish (write) the relevant information
to the External Table files. Either -noexecute
or -publish
must be specified.
-verbose
: Be verbose during execution. Useful for debugging.
-config <file>
: Specifies an XML config file in
the format shown above.
Since the Publish Utility connects to the Oracle Database using JDBC,
when you invoke it, you must include
$ORACLE_HOME/jdbc/lib/ojdbc6.jar
in the classpath. If you are using
the Oracle wallet as an external password store, then you must also include
$ORACLE_HOME/jlib/oraclepki.jar
.
nosql_stream
)SELECT
statement is executed on the Oracle Database
server. The Oracle Database server invokes the preprocessor once for
each External Table location file. Therefore, the number of location
files determines the level of parallelism when retrieving records. The
preprocessor(s) are invoked on the same machine that the Oracle
Database instance is running.
In general, the user does not invoke this utility directly (except
perhaps for debugging) as that will be done by Oracle Database when it
invokes the nosql_stream
script which is specified as the
PREPROCESSOR
program in the External Table declaration.
The nosql_stream
Preprocessor
performs the following operations:
oracle.kv.exttbl.stream.keyDelimiter
is specified, the
key is written to stdout using the canonical Key format, followed by
the keyDelimiter, followed by the value, followed by newline.
new String()
.
oracle.kv.exttbl.stream.keyDelimiter
is ignored.
nosql_stream
is a shell script which requires that the
PATH
, KVHOME
and CLASSPATH
be set correctly. You should edit this script to include
appropriate settings. Be sure that it has execute permission when
you are finished editing.
ACCESS PARAMETERS
to sufficiently map
NoSQL Database records to Oracle Database records. Formatters
(or TableFormatters
)
allow you to provide a class
which accepts a record
(or
row
) and returns a
String which can be mapped to the External Table fields. The Formatter.toOracleLoaderFormat(oracle.kv.KeyValueVersion,
oracle.kv.KVStore)
(or TableFormatter.toOracleLoaderFormat(oracle.kv.table.Row,
oracle.kv.KVStore)
) class is invoked once per record by the
preprocessor. If toOracleLoaderFormat()
returns null,
then that record is ignored and not imported to the External
Table. This provides a user-programmable filtering mechanism.
SELECT
statement against an External Table writes a
log to the file specified by
the
LOGFILE clause of the External Table definition (the default
logfile is <table_name>_<pid>.log in the External Table
location file directory). The BADFILE
clause names the
file to which records are written when they cannot be loaded because
of errors. For example, a record is written to the bad file when a
field in the datafile could not be converted to the datatype of a
column in the External Table.
If the preprocessor writes anything to stderr, the SELECT
fails and the output of the exception is written to the .log file.
For example, if your formatter throws a
RuntimeException("Couldn't parse field")
, you would see something
like this:
SQL> SELECT * FROM nosql_data; SELECT * FROM nosql_data * ERROR at line 1: ORA-29913: error in executing ODCIEXTTABLEFETCH callout ORA-29400: data cartridge error KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream encountered error "Exception in thread "main" java.lang.RuntimeException: Couldn't parse field at Preproc.main(Preproc.java:12) " SQL> quit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64 bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options bash-3.00$ ls -l ../data total 4 -rw-r--r-- 1 oracle oinstall 597 May 10 09:57 EMP_LOAD_13150.log -rw-rw-rw- 1 me other 27 May 2 14:49 mylocation.dat bash-3.00$ more ../data/EMP_LOAD_13150.log LOG file opened at 05/10/12 09:57:21 Field Definitions for table NOSQL_DATA Record format DELIMITED BY NEWLINE Data in file has same endianness as the platform Rows with all null fields are accepted Fields in Data Source: F1 CHAR (2) Trim whitespace same as SQL Loader F2 CHAR (10) Trim whitespace same as SQL Loader KUP-04095: preprocessor command /export/home/kvhome/exttab/bin/nosql_stream encountered error "Exception in thread "main" java.lang.RuntimeException: Couldn't parse field at Preproc.main(Preproc.java:12) "
ALTER SESSION ENABLE PARALLEL DDL; ALTER SESSION ENABLE PARALLEL DML; ALTER SESSION ENABLE PARALLEL QUERY;as well as adding a
PARALLEL nn
directive in the External
Table declaration, where nn
is the number of LOCATION files
you specified.
CREATE
TABLE
command for each one. You should change the Oracle
Database directory objects (ext_tab and nosql_bin_dir as
necessary). If you have a different classpath for your Formatter, then
be sure to add it to the nosql_stream
script.Copyright (c) 2011, 2017 Oracle and/or its affiliates. All rights reserved.