This adapter provides functions to read and write values stored in Oracle NoSQL Database.
This adapter is described in the following topics:
Built-in Functions for Reading from and Writing to Oracle NoSQL Database
Custom Functions for Reading Values from Oracle NoSQL Database
Custom Functions for Retrieving Single Values from Oracle NoSQL Database
Before you write queries that use the Oracle NoSQL Database adapter, you must configure Oracle XQuery for Hadoop to use your Oracle NoSQL Database server.
You must set the following:
The KVHOME
environment variable to the local directory containing the Oracle NoSQL database lib directory.
The oracle.kv.hosts
and oracle.kv.kvstore
configuration properties.
You can set the configuration properties using either the -D
or -conf
options in the hadoop
command when you run the query. See "Running a Query."
This example sets KVHOME
and uses the hadoop
-D
option in a query to set oracle.kv.kvstore
:
$ export KVHOME=/local/path/to/kvstore/ $ hadoop jar $OXH_HOME/lib/oxh.jar -D oracle.kv.hosts=example.com:5000 -D oracle.kv.kvstore=kvstore ./myquery.xq -output ./myoutput
See "Oracle NoSQL Database Adapter Configuration Properties."
To use the built-in functions in your query, you must import the Oracle NoSQL Database module as follows
import module "oxh:kv";
The Oracle NoSQL Database module contains the following functions:
Accesses a collection of values in the database. Each value is decoded as UTF-8 and returned as a string.
declare %kv:collection("text") function kv:collection-text($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as xs:string* external; declare %kv:collection("text") function kv:collection-text($parent-key as xs:string?, $depth as xs:int?) as xs:string* external; declare %kv:collection("text") function kv:collection-text($parent-key as xs:string?) as xs:string* external;
See "Parameters." Omitting $subrange
is the same as specifying $subrange()
. Likewise, omitting $depth
is the same as specifying $depth()
.
One string for each value
Accesses a collection of values in the database. Each value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."
declare %kv:collection("avroxml") function kv:collection-avroxml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as element()* external; declare %kv:collection("avroxml") function kv:collection-avroxml($parent-key as xs:string?, $depth as xs:int?) as element()* external; declare %kv:collection("avroxml") function kv:collection-avroxml($parent-key as xs:string?) as element()* external;
See "Parameters." Omitting $subrange
is the same as specifying $subrange()
. Likewise, omitting $depth
is the same as specifying $depth()
.
One XML element for each Avro record
Accesses a collection of values in the database. Each value is read as a sequence of bytes and parsed as XML.
declare %kv:collection("xml") function kv:collection-xml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external; declare %kv:collection("xml") function kv:collection-xml($parent-key as xs:string?, $depth as xs:int?) as document-node()* external; declare %kv:collection("xml") function kv:collection-xml($parent-key as xs:string?) as document-node()* external;
See "Parameters." Omitting $subrange
is the same as specifying $subrange()
. Likewise, omitting $depth
is the same as specifying $depth()
.
One XML document for each value.
Accesses a collection of values in the database. Each value is read as XDK binary XML and returned as an XML document.
declare %kv:collection("binxml") function kv:collection-binxml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external;
declare %kv:collection("binxml") function kv:collection-binxml($parent-key as xs:string?, $depth as xs:int?) as document-node()* external;
declare %kv:collection("binxml") function kv:collection-binxml($parent-key as xs:string?) as document-node()* external;
See "Parameters." Omitting $subrange
is the same as specifying $subrange()
. Likewise, omitting $depth
is the same as specifying $depth()
.
One XML document for each value.
Writes a key-value pair. The $value
is encoded as UTF-8.
declare %kv:put("text") function kv:put-text($key as xs:string, $value as xs:string) external;
Writes a key/value pair. The $xml
is serialized and encoded as UTF-8.
declare %kv:put("xml") function kv:put-xml($key as xs:string, $xml as node()) external;
Puts a key/value pair. The $xml
is encoded as XDK binary XML. See Oracle XML Developer's Kit Programmer's Guide.
declare %kv:putkv:put-binxml("binxml") function ($key as xs:string, $xml as node()) external;
Obtains the value associated with the key. The value is decoded as UTF-8 and returned as a string.
declare %kv:get("text") function kv:get-text($key as xs:string) as xs:string? external;
Obtains the value associated with the key. The value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records.".
declare %kv:get("avroxml") function kv:get-avroxml($key as xs:string) as element()? external;
Obtains the value associated with the key. The value is read as a sequence of bytes and parsed as XML.
declare %kv:get("xml") function kv:get-xml($key as xs:string) as document-node()? external;
Obtains the value associated with the key. The value is read as XDK binary XML and returned as an XML document.
declare %kv:get("binxml") function kv:get-binxml($key as xs:string) as document-node()? external;
Defines a prefix range. The prefix defines both the lower and upper inclusive boundaries.
Use this function as the subrange argument of a kv:collection
function.
kv:key-range($prefix as xs:string) as xs:string;
Specifies a key range.
Use this function as the subrange argument of a kv:collection
function.
kv:key-range($start as xs:string, $start-inclusive as xs:boolean, $end as xs:string, $end-inclusive as xs:boolean) as xs:string;
$start
: Defines the lower boundary of the key range.
$start-inclusive
: A value of true
includes $start
in the range, or false
omits it.
$end
: Defines the upper boundary of the key range. It must be greater than $start
.
$end-inclusive
: A value of true
includes $end
in the range, or false omits it.
You can use the following functions to read values from Oracle NoSQL Database. These annotations provide additional functionality that is not available using the built-in functions.
Custom functions for reading collections of NoSQL values must have one of the following signatures:
declare %kv:collection("text") [additional annotations] function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as xs:string* external; declare %kv:collection(["xml"|"binxml"]) [additional annotations] function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external; declare %kv:collection("avroxml") [additional annotations] function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as element()* external;
Declares the NoSQL Database collection function. Required.
The method parameter is one of the following values:
text
: Each value is decoded using the character set specified by the %output:encoding
annotation.
avroxml
: Each value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."
binxml
: Each value is read as XDK binary XML and returned as an XML document.
xml
: Each value is parsed as XML, and returned as an XML document.
Controls whether the key of a key-value pair is set as the document-uri
of the returned value. Specify true
to return the key.
The default setting is true when method is xml
, avroxml
, or binxml
, and false
when it is text. Text functions with this annotation set to true
must be declared to return text()?
instead of xs:string?
. Atomic xs:string
values are not associated with a document node, but text nodes are. For example:
declare %kv:collection("text") %kv:key("true") function local:col($parent-key as xs:string?) as text()* external;
When the key is returned, you can obtain its string representation by using the kv:key()
function. For example:
for $value in local:col(...) let $key := $value/kv:key() return ...
Specifies the Avro reader schema. This annotation is valid only when method is avroxml
. Optional.
The schema-name is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog. The record value is mapped to the reader schema. For example, %avro:schema-kv("org.example.PersonRecord")
.
See Also:
For information about Avro schemas, the Oracle NoSQL Database Getting Started Guide athttp://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html
Specifies the character encoding of text values. UTF-8 is assumed when this annotation is not used. The valid encodings are those supported by the JVM.
This annotation currently only applies to the text method. For XML files, the document's encoding declaration is used if it is available.
See Also:
"Supported Encodings" in the Oracle Java SE documentation athttp://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html
Specifies the parent key whose child KV pairs are returned by the function. The major key path must be a partial path and the minor key path must be empty. An empty sequence results in fetching all keys in the store.
See Also:
For the format of the key, Oracle NoSQL Database Java Reference athttp://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/Key.html#toString
Specifies whether parents, children, descendants, or a combination are returned. The following values are valid:
kv:depth-parent-and-descendants()
: Selects the parents and all descendants.
kv:depth-children-only()
: Selects only the immediately children, but not the parent.
kv:depth-descendants-only()
: Selects all descendants, but not the parent.
kv:depth-parent-and-children()
: Selects the parent and the immediate children.
An empty sequence implies kv:depth-parent-and-descendants()
.
This example selects all the descendants, but not the parent:
kv:collection-text("/parent/key", kv:depth-descendants-only(), ...
Specifies a subrange to further restrict the range under parentKey
to the major path components. The format of the string is:
<startType>/<start>/<end>/<endType>
The startType
and endType
are either I
for inclusive or E
for exclusive.
The start
and end
are the starting and ending key strings.
If the range does not have a lower boundary, then omit the leading startType/start
specification from the string representation. Similarly, if the range does not have an upper boundary, then omit the trailing end/endType
specification. A KeyRange
requires at least one boundary, thus at least one specification must appear in the string representation.
The kv:key-range
function provides a convenient way to create a range string.
The value can also be the empty sequence.
The following examples are valid subrange specifications:
Example | Description |
---|---|
I/alpha/beta/E |
From alpha inclusive to beta exclusive |
E//0123/I |
From "" exclusive to 0123 inclusive |
I/chi/ |
From chi inclusive to infinity |
E// |
From "" exclusive to infinity |
/chi/E |
From negative infinity to chi exclusive |
//I |
From negative infinity to "" inclusive |
The Oracle NoSQL Database adapter has get functions, which enable you to retrieve a single value from the database. Unlike collection functions, calls to get functions are not distributed across the cluster. When a get function is called, the value is retrieved by a single task.
Custom get
functions must have one of the following signatures:
declare %kv:get("text") [additional annotations] function local:myFunctionName($key as xs:string) as xs:string? external; declare %kv:get("avroxml") [additional annotations] function local:myFunctionName($key as xs:string) as element()? external; declare %kv:get(["xml"|"binxml"]) [additional annotations] function local:myFunctionName($key as xs:string) as document-node()?
Declares the NoSQL Database get function. Required.
The method parameter is one of the following values:
text
: The value is decoded using the character set specified by the %output:encoding
annotation.
avroxml
: The value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."
binxml
: The value is read as XDK binary XML and returned as an XML document.
xml
: The value is parsed as XML and returned as an XML document.
Controls whether the key of a key-value pair is set as the document-uri
of the returned value. Specify true
to return the key.
The default setting is true
when method is xml
, avroxml
, or binxml
, and false
when it is text. Text functions with this annotation set to true
must be declared to return text()?
instead of xs:string?
. Atomic xs:string
values are not associated with a document node, but text nodes are.
When the key is returned, you can obtain its string representation by using the kv:key()
function.
Specifies the Avro reader schema. This annotation is valid only when method is avroxml
. Optional.
The schema-name is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog. The record value is mapped to the reader schema. For example, %avro:schema-kv("org.example.PersonRecord")
.
See Also:
For information about Avro schemas, the Oracle NoSQL Database Getting Started Guide athttp://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html
Specifies the character encoding of text values. UTF-8 is assumed when this annotation is not used. The valid encodings are those supported by the JVM.
This annotation currently only applies to the text method. For XML files, the document encoding declaration is used, if it is available.
See Also:
"Supported Encodings" in the Oracle Java SE documentation athttp://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html
You can use the following annotations to define functions that write to Oracle NoSQL Database.
Custom functions for writing to Oracle NoSQL Database must have one of the following signatures:
declare %kv:put("text") function local:myFunctionName($key as xs:string, $value as xs:string) external; declare %kv:put(["xml"|"binxml"|"avroxml"]) function local:myFunctionName($key as xs:string, $xml as node()) external;
Declares the NoSQL Database module put function. Required.
The method determines how the value is stored. It must be one of the following values:
text
: $value
is serialized and encoded using the character set specified by the %output:encoding
annotation.
avroxml
: $xml
is mapped to an instance of the Avro record specified by the %avro:schema-kv
annotation. See "Writing XML as Avro."
binxml
: $xml is encoded as XDK binary XML.
xml
: $xml
is serialized and encoded using the character set specified by the %output:encoding
annotation. You can specify other XML serialization parameters using %output:*
.
Specifies the record schema of the values to be written. The annotation value is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog.
For example: %avro:schema-kv("org.example.PersonRecord")
A standard XQuery serialization parameter for the output method (text or XML) specified in %kv:put
. See "Serialization Annotations."
See Also:
"The Influence of Serialization Parameters" sections for XML and text output methods in XSLT and XQuery Serialization 3.0 atThis example uses the following text file is in HDFS. The file contains user profile information such as user ID, full name, and age, separated by colons (:).
mydata/users.txt john:John Doe:45 kelly:Kelly Johnson:32 laura:Laura Smith: phil:Phil Johnson:27
The first query stores the lines of this text file in Oracle NoSQL Database as text values.
import module "oxh:text"; import module "oxh:kv"; for $line in text:collection("mydata/users.txt") let $split := fn:tokenize($line, ":") let $key := "/users/text/" || $split[1] return kv:put-text($key, $line)
The next query reads the values from the database:
import module "oxh:text"; import module "oxh:kv"; for $value in kv:collection-text("/users/text") let $split := fn:tokenize($value, ":") where $split[2] eq "Phil Johnson" return text:put($value)
The query creates a text file that contains the following line:
phil:Phil Johnson:27
In this example, the following Avro schema is registered with Oracle NoSQL Database:
{ "type": "record", "name": "User", "namespace": "com.example", "fields" : [ {"name": "id", "type": "string"}, {"name": "full_name", "type": "string"}, {"name": "age", "type": ["int", "null"] } ] }
The next query writes the user names to the database as Avro records.
import module "oxh:text"; declare %kv:put("avroxml") %avro:schema-kv("com.example.User") function local:put-user($key as xs:string, $value as node()) external; for $line in text:collection("mydata/users.txt") let $split := fn:tokenize($line, ":") let $id := $split[1] let $key := "/users/avro/" || $id return local:put-user( $key, <user> <id>{$id}</id> <full_name>{$split[2]}</full_name> { if ($split[3] castable as xs:int) then <age>{$split[3]}</age> else () } </user> )
This query reads the values from the database:
import module "oxh:text"; import module "oxh:kv"; for $user in kv:collection-avroxml("/users/avro") where $user/age gt 30 return text:put($user/full_name)
The query creates a text files with the following lines:
John Doe Kelly Johnson
Oracle XQuery for Hadoop uses the generic options for specifying configuration properties in the Hadoop command. You can use the -conf
option to identify configuration files, and the -D
option to specify individual properties. See "Running a Query."
You can set various configuration properties for the Oracle NoSQL Database adapter that control the durability characteristics and timeout periods. You must set oracle.kv.hosts
and oracle.kv.kvstore
.The following properties configure the Oracle NoSQL Database adapter.
Type: String
Default Value: NO_SYNC, NO_SYNC, SIMPLE_MAJORITY
Description: Defines the durability characteristics associated with %kv:put
operations. The value consists of three parts, which you specify in order and separate with commas (,):
MasterPolicy, ReplicaPolicy, ReplicaAck
MasterPolicy: The synchronization policy used when committing a transaction to the master database. Set this part to one of the following constants:
NO_SYNC
: Do not write or synchronously flush the log on a transaction commit.
SYNC
: Write and synchronously flush the log on a transaction commit.
WRITE_NO_SYNC
: Write but do not synchronously flush the log on a transaction commit.
ReplicaPolicy: The synchronization policy used when committing a transaction to the replica databases. Set this part to NO_SYNC
, SYNC
, or WRITE_NO_SYNC
, as described under MasterPolicy.
ReplicaAck: The acknowledgment policy used to obtain transaction acknowledgments from the replica databases. Set this part to one of the following constants:
ALL
: All replicas must acknowledge that they have committed the transaction.
NONE
: No transaction commit acknowledgments are required, and the master does not wait for them.
SIMPLE_MAJORITY
: A simple majority of replicas (such as 3 of 5) must acknowledge that they have committed the transaction.
See Also:
"Durability Guarantees" in Getting Started with Oracle NoSQL Database athttp://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/durability.html
Type: Comma-separated list of integers
Default Value: 100, 90, 80
Description: Limits the number of simultaneous requests to prevent nodes with long service times from consuming all threads in the KV store client. The value consists of three integers, which you specify in order and separate with commas:
maxActiveRequests, requestThresholdPercent, nodeLimitPercent
maxActiveRequests: The maximum number of active requests permitted by the KV client. This number is typically derived from the maximum number of threads that the client has set aside for processing requests.
requestThresholdPercent: The percentage of maxActiveRequests at which requests are limited.
nodeLimitPercent: The maximum number of active requests that can be associated with a node when the number of active requests exceeds the threshold specified by requestThresholdPercent.
Type: Long
Default Value: 5000 ms
Description: Configures the request timeout period in milliseconds. The value must be greater than zero (0).
Type: Long
Default Value: 5000 ms
Description: Configures the open timeout used when establishing sockets for client requests, in milliseconds. Shorter timeouts result in more rapid failure detection and recovery. The default open timeout is adequate for most applications. The value must be greater than zero (0).
Type: Long
Default Value: 30000 ms
Description: Configures the read timeout period associated with the sockets that make client requests, in milliseconds. Shorter timeouts result in more rapid failure detection and recovery. Nonetheless, the timeout period should be sufficient to allow the longest timeout associated with a request.
Type: Key
Default Value: Not defined
Description: The desired number of keys for the InputFormat to fetch during each network round trip. A value of zero (0) sets the property to a default value.
Type: Consistency
Default Value: NONE_REQUIRED
Description: The consistency guarantee for reading child key-value pairs. The following keywords are valid values:
ABSOLUTE
: Requires the master to service the transaction so that consistency is absolute.
NONE_REQUIRED
: Allows replicas to service the transaction, regardless of the state of the replicas relative to the master.
Type: String
Default Value: Not defined
Description: An array of one or more hostname:port pairs that identify the hosts in the KV store with the source data. Separate multiple pairs with commas.
Type: String
Default Value: Not defined
Description: The name of the KV store with the source data.
Type: Long
Default Value: Not defined
Description: Sets a maximum time interval in milliseconds for retrieving a selection of key-value pairs. A value of zero (0) sets the property to its default value.
See Also:
Oracle NoSQL Database Java API Reference athttp://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/hadoop/KVInputFormatBase.html