This adapter provides functions to read and write values stored in Oracle NoSQL Database.
This adapter is described in the following topics:
Built-in Functions for Reading from and Writing to Oracle NoSQL Database
Custom Functions for Reading Values from Oracle NoSQL Database
Custom Functions for Retrieving Single Values from Oracle NoSQL Database
Before you write queries that use the Oracle NoSQL Database adapter, you must configure Oracle XQuery for Hadoop to use your Oracle NoSQL Database server.
You must set the following:
The KVHOME
environment variable to the local directory containing the Oracle NoSQL database lib directory.
The oracle.kv.hosts
and oracle.kv.kvstore
configuration properties.
You can set the configuration properties using either the -D
or -conf
arguments in the hadoop command when you run the query. See "Running a Query."
This example sets KVHOME
and uses the Hadoop -D
argument in a query to set oracle.kv.kvstore
:
$ export KVHOME=/local/path/to/kvstore/ $ hadoop jar $OXH_HOME/lib/oxh.jar -D oracle.kv.hosts=example.com:5000 -D oracle.kv.kvstore=kvstore ./myquery.xq -output ./myoutput
See "Oracle NoSQL Database Adapter Configuration Properties."
To use the built-in functions in your query, you must import the Oracle NoSQL Database module as follows
import module "oxh:kv";
The Oracle NoSQL Database module contains the following functions:
Accesses a collection of values in the database. Each value is decoded as UTF-8 and returned as a string.
declare %kv:collection("text") function kv:collection-text($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as xs:string* external;
See "Parameters."
Accesses a collection of values in the database. Each value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records.".
declare %kv:collection("avroxml") function kv:collection-avroxml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as element()* external;
See "Parameters."
This function is equivalent to kv:collection-avroxml($parent-key, $depth, ())
.
Accesses a collection of values in the database. Each value is read as a sequence of bytes and parsed as XML.
declare %kv:collection("xml") function kv:collection-xml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external;
See "Parameters."
Accesses a collection of values in the database. Each value is read as XDK binary XML and returned as an XML document.
declare %kv:collection("binxml") function kv:collection-binxml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external;
See "Parameters."
Puts a key/value pair. The $xml
is encoded as XDK binary XML. See Oracle XML Developer’s Kit Programmer’s Guide.
Obtains the value associated with the key. The value is decoded as UTF-8 and returned as a string.
Obtains the value associated with the key. The value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records.".
Obtains the value associated with the key. The value is read as a sequence of bytes and parsed as XML.
declare %kv:get("xml") function kv:get-xml($key as xs:string) as document-node()? external;
Obtains the value associated with the key. The value is read as XDK binary XML and returned as an XML document.
declare %kv:get("binxml") function kv:get-binxml($key as xs:string) as document-node()? external;
Defines a prefix range. The prefix defines both the lower and upper inclusive boundaries.
Use this function as the subrange argument of a kv:collection
function.
Specifies a key range.
Use this function as the subrange argument of a kv:collection
function.
kv:key-range($start as xs:string, $start-inclusive as xs:boolean, $end as xs:string, $end-inclusive as xs:boolean) as xs:string;
Defines the lower boundary of the key range.
A value of true
includes $start
in the range, or false
omits it.
Defines the upper boundary of the key range. It must be greater than $start
.
A value of true
includes $end
in the range, or false omits it.
This example uses the following text file is in HDFS. The file contains user profile information such as user ID, full name, and age, separated by a colon (:).
mydata/users.txt john:John Doe:45 kelly:Kelly Johnson:32 laura:Laura Smith: phil:Phil Johnson:27
The first query stores the lines of this text file in Oracle NoSQL Database as text values.
import module "oxh:text"; import module "oxh:kv"; for $line in text:collection("mydata/users.txt") let $split := fn:tokenize($line, ":") let $key := "/users/text/" || $split[1] return kv:put-text($key, $line)
The next query reads the values from the database:
import module "oxh:text"; import module "oxh:kv"; for $value in kv:collection-text("/users/text") let $split := fn:tokenize($value, ":") where $split[2] eq "Phil Johnson" return text:put($value)
The query creates a text file that contains the following line:
phil:Phil Johnson:27
In this example, the following Avro schema is registered with Oracle NoSQL Database:
{ "type": "record", "name": "User", "namespace": "com.example", "fields" : [ {"name": "id", "type": "string"}, {"name": "full_name", "type": "string"}, {"name": "age", "type": ["int", "null"] } ] }
The next query writes the user names to the database as Avro records.
import module "oxh:text"; declare %kv:put("avroxml") %avro:schema-kv("com.example.User") function local:put-user($key as xs:string, $value as node()) external; for $line in text:collection("mydata/users.txt") let $split := fn:tokenize($line, ":") let $id := $split[1] let $key := "/users/avro/" || $id return local:put-user( $key, <user> <id>{$id}</id> <full_name>{$split[2]}</full_name> { if ($split[3] castable as xs:int) then <age>{$split[3]}</age> else () } </user> )
This query reads the values from the database:
import module "oxh:text"; import module "oxh:kv"; for $user in kv:collection-avroxml("/users/avro") where $user/age gt 30 return text:put($user/full_name)
The query creates a text files with the following lines:
John Doe Kelly Johnson
You can use the following functions to read values from Oracle NoSQL Database. These annotations provide additional functionality that is not available using the built-in functions.
Custom functions for reading collections of NoSQL values must have one of the following signatures:
declare %kv:collection("text") [additional annotations] function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as xs:string* external; declare %kv:collection(["xml"|"binxml"]) [additional annotations] function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external; declare %kv:collection("avroxml") [additional annotations] function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as element()* external;
Declares the NoSQL Database collection function. Required.
The method parameter is one of the following values:
text
: Each value is decoded using the character set specified by the %output:encoding
annotation.
avroxml
: Each value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."
binxml
: Each value is read as XDK binary XML and returned as an XML document.
xml
: Each value is parsed as XML, and returned as an XML document.
Controls whether the key of a key-value pair is set as the document-uri of the returned value. Specify true
to return the key.
The default setting is true when method is xml
, avroxml
, or binxml
, and false
when it is text. Text functions with this annotation set to true must be declared to return text()?
instead of xs:string?
. Atomic xs:string
values are not associated with a document node, but text nodes are. For example:
declare %kv:collection("text") %kv:key("true") function local:col($parent-key as xs:string?) as text()* external;
When the key is returned, you can obtain its string representation by using the kv:key()
function. For example:
for $value in local:col(...) let $key := $value/kv:key() return ...
Specifies the Avro reader schema. This annotation is valid only when method is avroxml
. Optional.
The schema-name is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog. The record value is mapped to the reader schema. For example, %avro:schema-kv("org.example.PersonRecord")
.
See Also:
For information about Avro schemas, the Oracle NoSQL Database Getting Started Guide athttp://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html
Specifies the character encoding of text values. UTF-8 is assumed when this annotation is not used. The valid encodings are those supported by the JDK.
This annotation currently only applies to the text method. For XML files, the document's encoding declaration is used if it is available.
See Also:
"Supported Encodings" in the Oracle Java SE documentation athttp://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html
Specifies the parent key whose child KV pairs are returned by the function. The major key path must be a partial path and the minor key path must be empty. An empty sequence results in fetching all keys in the store.
See Also:
For the format of the key, the Oracle NoSQL Database Java Reference athttp://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/Key.html#toString
Specifies whether parents, children, descendants, or a combination are returned. The following values are valid:
kv:depth-parent-and-descendants()
: Selects the parents and all descendants.
kv:depth-children-only()
: Selects only the immediately children, but not the parent.
kv:depth-descendants-only()
: Selects all descendants, but not the parent.
kv:depth-parent-and-children()
: Selects the parent and the immediate children.
An empty sequence implies kv:depth-parent-and-descendants()
.
This example selects all the descendants, but not the parent:
kv:collection-text("/parent/key", kv:depth-descendants-only(), ...
Specifies a subrange to further restrict the range under parentKey
to the major path components. The format of the string is:
<startType>/<start>/<end>/<endType>
The startType
and endType
are either I
for inclusive or E
for exclusive.
The start
and end
are the starting and ending key strings.
If the range does not have a lower boundary, then omit the leading startType/start
specification from the string representation. Similarly, if the range does not have an upper boundary, then omit the trailing end/endType
specification. A KeyRange
requires at least one boundary, thus at least one specification must be present in the string representation.
The kv:key-range
function provides a convenient way to create a range string.
The value can also be the empty sequence.
The following examples are valid subrange specifications:
Example | Description |
---|---|
I/alpha/beta/E |
From alpha inclusive to beta exclusive |
E//0123/I |
From "" exclusive to 0123 inclusive |
I/chi/ |
From chi inclusive to infinity |
E// |
From "" exclusive to infinity |
/chi/E |
From negative infinity to chi exclusive |
//I |
From negative infinity to "" inclusive |
The Oracle NoSQL Database adapter has get
functions, which enable you to retrieve a single value from the database. Unlike collection
functions, calls to get
functions are not distributed across the cluster. When a get
function is called, the value is retrieved by a single task.
Custom get
functions must have one of the following signatures:
declare %kv:get("text") [additional annotations] function local:myFunctionName($key as xs:string) as xs:string? external; declare %kv:get("avroxml") [additional annotations] function local:myFunctionName($key as xs:string) as element()? external; declare %kv:get(["xml"|"binxml"]) [additional annotations] function local:myFunctionName($key as xs:string) as document-node()?
Declares the NoSQL Database get
function. Required.
The method parameter is one of the following values:
text
: The value is decoded using the character set specified by the %output:encoding
annotation.
avroxml
: The value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."
binxml
: The value is read as XDK binary XML and returned as an XML document.
xml
: The value is parsed as XML, and returned as an XML document.
Controls whether the key of a key-value pair is set as the document-uri of the returned value. Specify true
to return the key.
The default setting is true when method is xml
, avroxml
, or binxml
, and false
when it is text. Text functions with this annotation set to true must be declared to return text()?
instead of xs:string?
. Atomic xs:string
values are not associated with a document node, but text nodes are.
When the key is returned, you can obtain its string representation by using the kv:key()
function.
Specifies the Avro reader schema. This annotation is valid only when method is avroxml
. Optional.
The schema-name is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog. The record value is mapped to the reader schema. For example, %avro:schema-kv("org.example.PersonRecord")
.
See Also:
For information about Avro schemas, the Oracle NoSQL Database Getting Started Guide athttp://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html
Specifies the character encoding of text values. UTF-8 is assumed when this annotation is not used. The valid encodings are those supported by the JDK.
This annotation currently only applies to the text method. For XML files, the document encoding declaration is used, if it is available.
See Also:
"Supported Encodings" in the Oracle Java SE documentation athttp://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html
You can use the following annotations to define functions that write to Oracle NoSQL Database.
Custom functions for writing to Oracle NoSQL Database must have one of the following signatures:
declare %kv:put("text") function local:myFunctionName($key as xs:string, $value as xs:string) external; declare %kv:put(["xml"|"binxml"|"avroxml"]) function local:myFunctionName($key as xs:string, $xml as node()) external;
Declares the NoSQL Database module put
function. Required.
The method determines how the value is stored. It must be one of the following values:
text
: $value
is serialized and encoded using the character set specified by the %output:encoding
annotation.
avroxml
: $xml
is mapped to an instance of the Avro record specified by the %avro:schema-kv
annotation. See "Writing XML as Avro."
binxml
: $xml is encoded as XDK binary XML
xml
: $xml
is serialized and encoded using the character set specified by the %output:encoding
annotation. You can specify other XML serialization parameters using %output:*
.
Specifies the record schema of the values to be written. The annotation value is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog.
For example: %avro:schema-kv("org.example.PersonRecord")
A standard XQuery serialization parameter for the output method (text or XML) specified in %kv:put
. See "Serialization Annotations."
See Also:
"The Influence of Serialization Parameters" sections for XML and text output methods in XSLT and XQuery Serialization 3.0 atOracle XQuery for Hadoop uses the generic options for specifying configuration properties in the Hadoop command. You can use the -conf
option to identify configuration files, and the -D
option to specify individual properties. See "Running a Query."
You can set various configuration properties for the Oracle NoSQL Database adapter that control the durability characteristics and timeout periods. You must set oracle.kv.hosts
and oracle.kv.kvstore
.The following properties configure the Oracle NoSQL Database adapter.
Type: String
Default Value: NO_SYNC, NO_SYNC, SIMPLE_MAJORITY
Description: Defines the durability characteristics associated with %kv:put
operations. The value consists of three parts, which you specify in order and separate with commas:
MasterPolicy, ReplicaPolicy, ReplicaAck
MasterPolicy: The synchronization policy used when committing a transaction to the master database. Set this part to one of the following constants:
NO_SYNC
: Do not write or synchronously flush the log on a transaction commit.
SYNC
: Write and synchronously flush the log on a transaction commit.
WRITE_NO_SYNC
: Write but do not synchronously flush the log on a transaction commit.
ReplicaPolicy: The synchronization policy used when committing a transaction to the replica databases. Set this part to NO_SYNC
, SYNC
, or WRITE_NO_SYNC
, as described under MasterPolicy.
ReplicaAck: The acknowledgment policy used to obtain transaction acknowledgments from the replica databases. Set this part to one of the following constants:
ALL
: All replicas must acknowledge that they have committed the transaction.
NONE
: No transaction commit acknowledgments are required, and the master does not wait for them.
SIMPLE_MAJORITY
: A simple majority of replicas (such as 3 of 5) must acknowledge that they have committed the transaction.
See Also:
"Durability Guarantees" in Getting Started with Oracle NoSQL Database athttp://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/durability.html
Type: Comma-separated list of integer
Default Value: 100, 90, 80
Description: Limits the number of simultaneous requests to prevent nodes with long service times from consuming all threads in the KV store client. The value consists of three integers, which you specify in order and separate with commas:
maxActiveRequests, requestThresholdPercent, nodeLimitPercent
maxActiveRequests: The maximum number of active requests permitted by the KV client. This number is typically derived from the maximum number of threads that the client has set aside for processing requests.
requestThresholdPercent: The percentage of maxActiveRequests at which requests are limited.
nodeLimitPercent: The maximum number of active requests that can be associated with a node when the number of active requests exceeds the threshold specified by requestThresholdPercent.
Type: Long
Default Value: 5000 ms
Description: Configures the request timeout period in milliseconds. The value must be greater than zero (0).
Type: Long
Default Value: 5000 ms
Description: Configures the open timeout used when establishing sockets for client requests, in milliseconds. Shorter timeouts result in more rapid failure detection and recovery. The default open timeout is adequate for most applications. The value must be greater than zero (0).
Type: Long
Default Value: 30000 ms
Description: Configures the read timeout period associated with the sockets that make client requests, in milliseconds. Shorter timeouts result in more rapid failure detection and recovery. Nonetheless, the timeout period should be sufficient to allow the longest timeout associated with a request.
Type: String
Default Value: Not defined
Description: The name of the KV store with the source data.
Type: String
Default Value: Not defined
Description: An array of one or more hostname:port pairs that identify the hosts in the KV store with the source data. Separate multiple pairs with commas.
Type: Key
Default Value: Not defined
Description: The desired number of keys for the InputFormat to fetch during each network round trip. A value of zero (0) sets the property to a default value.
Type: Consistency
Default Value: NONE_REQUIRED
Description: The consistency guarantee for reading child key-value pairs. The following keywords are valid values:
ABSOLUTE
: Requires the master to service the transaction so that consistency is absolute.
NONE_REQUIRED
: Allows replicas to service the transaction, regardless of the state of the replicas relative to the master.
Type: Long
Default Value: Not defined
Description: Sets a maximum time interval in milliseconds for retrieving a selection of key-value pairs. A value of zero (0) sets the property to its default value.
See Also:
Oracle NoSQL Database Java API Reference athttp://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/hadoop/KVInputFormatBase.html