Oracle NoSQL Database Adapter

This adapter provides functions to read and write values stored in Oracle NoSQL Database.

This adapter is described in the following topics:

Prerequisites for Using the Oracle NoSQL Database Adapter

Before you write queries that use the Oracle NoSQL Database adapter, you must configure Oracle XQuery for Hadoop to use your Oracle NoSQL Database server.

You must set the following:

  • The KVHOME environment variable to the local directory containing the Oracle NoSQL database lib directory.

  • The oracle.kv.hosts and oracle.kv.kvstore configuration properties.

You can set the configuration properties using either the -D or -conf options in the hadoop command when you run the query. See "Running a Query."

This example sets KVHOME and uses the hadoop -D option in a query to set oracle.kv.kvstore:

$ export KVHOME=/local/path/to/kvstore/
$ hadoop jar $OXH_HOME/lib/oxh.jar -D oracle.kv.hosts=example.com:5000 -D oracle.kv.kvstore=kvstore ./myquery.xq -output ./myoutput

See "Oracle NoSQL Database Adapter Configuration Properties."

Built-in Functions for Reading from and Writing to Oracle NoSQL Database

To use the built-in functions in your query, you must import the Oracle NoSQL Database module as follows

import module "oxh:kv";

The Oracle NoSQL Database module contains the following functions:

kv:collection-text

Accesses a collection of values in the database. Each value is decoded as UTF-8 and returned as a string.

Signature

declare %kv:collection("text") function 
   kv:collection-text($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as xs:string* external;

declare %kv:collection("text") function 
   kv:collection-text($parent-key as xs:string?, $depth as xs:int?) as xs:string* external;

declare %kv:collection("text") function 
   kv:collection-text($parent-key as xs:string?) as xs:string* external;

Parameters

See "Parameters." Omitting $subrange is the same as specifying $subrange(). Likewise, omitting $depth is the same as specifying $depth().

Returns

One string for each value

kv:collection-avroxml

Accesses a collection of values in the database. Each value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."

Signature

declare %kv:collection("avroxml") function 
   kv:collection-avroxml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as element()* external;

declare %kv:collection("avroxml") function 
   kv:collection-avroxml($parent-key as xs:string?, $depth as xs:int?) as element()* external;

declare %kv:collection("avroxml") function 
   kv:collection-avroxml($parent-key as xs:string?) as element()* external;

Parameters

See "Parameters." Omitting $subrange is the same as specifying $subrange(). Likewise, omitting $depth is the same as specifying $depth().

Returns

One XML element for each Avro record

kv:collection-xml

Accesses a collection of values in the database. Each value is read as a sequence of bytes and parsed as XML.

Signature

declare %kv:collection("xml") function 
   kv:collection-xml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external;

declare %kv:collection("xml") function 
   kv:collection-xml($parent-key as xs:string?, $depth as xs:int?) as document-node()* external;

declare %kv:collection("xml") function 
   kv:collection-xml($parent-key as xs:string?) as document-node()* external;

Parameters

See "Parameters." Omitting $subrange is the same as specifying $subrange(). Likewise, omitting $depth is the same as specifying $depth().

Returns

One XML document for each value.

kv:collection-binxml

Accesses a collection of values in the database. Each value is read as XDK binary XML and returned as an XML document.

Signature

declare %kv:collection("binxml") function 
   kv:collection-binxml($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external;
declare %kv:collection("binxml") function 
   kv:collection-binxml($parent-key as xs:string?, $depth as xs:int?) as document-node()* external;
declare %kv:collection("binxml") function
   kv:collection-binxml($parent-key as xs:string?) as document-node()* external;

Parameters

See "Parameters." Omitting $subrange is the same as specifying $subrange(). Likewise, omitting $depth is the same as specifying $depth().

Returns

One XML document for each value.

See Also

Oracle XML Developer's Kit Programmer's Guide

kv:put-text

Writes a key-value pair. The $value is encoded as UTF-8.

Signature

declare %kv:put("text") function
   kv:put-text($key as xs:string, $value as xs:string) external;

kv:put-xml

Writes a key/value pair. The $xml is serialized and encoded as UTF-8.

Signature

declare %kv:put("xml") function 
   kv:put-xml($key as xs:string, $xml as node()) external;

kv:put-binxml

Puts a key/value pair. The $xml is encoded as XDK binary XML. See Oracle XML Developer's Kit Programmer's Guide.

Signature

declare %kv:putkv:put-binxml("binxml") function
   ($key as xs:string, $xml as node()) external;

kv:get-text

Obtains the value associated with the key. The value is decoded as UTF-8 and returned as a string.

Signature

declare %kv:get("text") function 
   kv:get-text($key as xs:string) as xs:string? external;

kv:get-avroxml

Obtains the value associated with the key. The value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records.".

Signature

declare %kv:get("avroxml") function 
   kv:get-avroxml($key as xs:string) as element()? external;

kv:get-xml

Obtains the value associated with the key. The value is read as a sequence of bytes and parsed as XML.

Signature

declare %kv:get("xml")  function 
   kv:get-xml($key as xs:string) as document-node()? external;

kv:get-binxml

Obtains the value associated with the key. The value is read as XDK binary XML and returned as an XML document.

Signature

declare %kv:get("binxml") function
   kv:get-binxml($key as xs:string) as document-node()? external;

See Also

Oracle XML Developer's Kit Programmer's Guide

kv:key-range

Defines a prefix range. The prefix defines both the lower and upper inclusive boundaries.

Use this function as the subrange argument of a kv:collection function.

Signature

kv:key-range($prefix as xs:string) as xs:string;

kv:key-range

Specifies a key range.

Use this function as the subrange argument of a kv:collection function.

Signature

kv:key-range($start as xs:string, $start-inclusive as xs:boolean, $end as xs:string, $end-inclusive as xs:boolean) as xs:string;

Parameters

$start: Defines the lower boundary of the key range.

$start-inclusive: A value of true includes $start in the range, or false omits it.

$end: Defines the upper boundary of the key range. It must be greater than $start.

$end-inclusive: A value of true includes $end in the range, or false omits it.

Custom Functions for Reading Values from Oracle NoSQL Database

You can use the following functions to read values from Oracle NoSQL Database. These annotations provide additional functionality that is not available using the built-in functions.

Signature

Custom functions for reading collections of NoSQL values must have one of the following signatures:

declare %kv:collection("text") [additional annotations] 
   function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as xs:string* external;

declare %kv:collection(["xml"|"binxml"]) [additional annotations] 
   function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as document-node()* external;

declare %kv:collection("avroxml") [additional annotations]
   function local:myFunctionName($parent-key as xs:string?, $depth as xs:int?, $subrange as xs:string?) as element()* external;

Annotations

%kv:collection("method")

Declares the NoSQL Database collection function. Required.

The method parameter is one of the following values:

  • text: Each value is decoded using the character set specified by the %output:encoding annotation.

  • avroxml: Each value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."

  • binxml: Each value is read as XDK binary XML and returned as an XML document.

  • xml: Each value is parsed as XML, and returned as an XML document.

%kv:key("true" | "false")

Controls whether the key of a key-value pair is set as the document-uri of the returned value. Specify true to return the key.

The default setting is true when method is xml, avroxml, or binxml, and false when it is text. Text functions with this annotation set to true must be declared to return text()? instead of xs:string?. Atomic xs:string values are not associated with a document node, but text nodes are. For example:

declare %kv:collection("text") %kv:key("true")
   function local:col($parent-key as xs:string?) as text()* external;

When the key is returned, you can obtain its string representation by using the kv:key() function. For example:

for $value in local:col(...)
let $key := $value/kv:key()
return ...
%avro:schema-kv("schema-name")

Specifies the Avro reader schema. This annotation is valid only when method is avroxml. Optional.

The schema-name is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog. The record value is mapped to the reader schema. For example, %avro:schema-kv("org.example.PersonRecord").

See Also:

For information about Avro schemas, the Oracle NoSQL Database Getting Started Guide at

http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html

%output:encoding

Specifies the character encoding of text values. UTF-8 is assumed when this annotation is not used. The valid encodings are those supported by the JVM.

This annotation currently only applies to the text method. For XML files, the document's encoding declaration is used if it is available.

See Also:

"Supported Encodings" in the Oracle Java SE documentation at

http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

Parameters

Parameter 1: $parent-key as xs:string?

Specifies the parent key whose child KV pairs are returned by the function. The major key path must be a partial path and the minor key path must be empty. An empty sequence results in fetching all keys in the store.

See Also:

For the format of the key, Oracle NoSQL Database Java Reference at

http://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/Key.html#toString

Parameter 2: $depth as xs:int?

Specifies whether parents, children, descendants, or a combination are returned. The following values are valid:

  • kv:depth-parent-and-descendants(): Selects the parents and all descendants.

  • kv:depth-children-only(): Selects only the immediately children, but not the parent.

  • kv:depth-descendants-only(): Selects all descendants, but not the parent.

  • kv:depth-parent-and-children(): Selects the parent and the immediate children.

An empty sequence implies kv:depth-parent-and-descendants().

This example selects all the descendants, but not the parent:

kv:collection-text("/parent/key", kv:depth-descendants-only(), ...
Parameter 3: $subRange as xs:string?

Specifies a subrange to further restrict the range under parentKey to the major path components. The format of the string is:

<startType>/<start>/<end>/<endType>

The startType and endType are either I for inclusive or E for exclusive.

The start and end are the starting and ending key strings.

If the range does not have a lower boundary, then omit the leading startType/start specification from the string representation. Similarly, if the range does not have an upper boundary, then omit the trailing end/endType specification. A KeyRange requires at least one boundary, thus at least one specification must appear in the string representation.

The kv:key-range function provides a convenient way to create a range string.

The value can also be the empty sequence.

The following examples are valid subrange specifications:

Example Description
I/alpha/beta/E From alpha inclusive to beta exclusive
E//0123/I From "" exclusive to 0123 inclusive
I/chi/ From chi inclusive to infinity
E// From "" exclusive to infinity
/chi/E From negative infinity to chi exclusive
//I From negative infinity to "" inclusive

Custom Functions for Retrieving Single Values from Oracle NoSQL Database

The Oracle NoSQL Database adapter has get functions, which enable you to retrieve a single value from the database. Unlike collection functions, calls to get functions are not distributed across the cluster. When a get function is called, the value is retrieved by a single task.

Signature

Custom get functions must have one of the following signatures:

declare %kv:get("text") [additional annotations]
   function local:myFunctionName($key as xs:string) as xs:string? external;

declare %kv:get("avroxml") [additional annotations] 
   function local:myFunctionName($key as xs:string) as element()? external;

declare %kv:get(["xml"|"binxml"]) [additional annotations]
   function local:myFunctionName($key as xs:string) as document-node()?

Annotations

%kv:get("method")

Declares the NoSQL Database get function. Required.

The method parameter is one of the following values:

  • text: The value is decoded using the character set specified by the %output:encoding annotation.

  • avroxml: The value is read as an Avro record and returned as an XML element. The records are converted to XML as described in "Reading Records."

  • binxml: The value is read as XDK binary XML and returned as an XML document.

  • xml: The value is parsed as XML and returned as an XML document.

%kv:key("true" | "false")

Controls whether the key of a key-value pair is set as the document-uri of the returned value. Specify true to return the key.

The default setting is true when method is xml, avroxml, or binxml, and false when it is text. Text functions with this annotation set to true must be declared to return text()? instead of xs:string?. Atomic xs:string values are not associated with a document node, but text nodes are.

When the key is returned, you can obtain its string representation by using the kv:key() function.

%avro:schema-kv("schema-name")

Specifies the Avro reader schema. This annotation is valid only when method is avroxml. Optional.

The schema-name is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog. The record value is mapped to the reader schema. For example, %avro:schema-kv("org.example.PersonRecord").

See Also:

For information about Avro schemas, the Oracle NoSQL Database Getting Started Guide at

http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/schemaevolution.html

%output:encoding

Specifies the character encoding of text values. UTF-8 is assumed when this annotation is not used. The valid encodings are those supported by the JVM.

This annotation currently only applies to the text method. For XML files, the document encoding declaration is used, if it is available.

See Also:

"Supported Encodings" in the Oracle Java SE documentation at

http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

Custom Functions for Writing to Oracle NoSQL Database

You can use the following annotations to define functions that write to Oracle NoSQL Database.

Signature

Custom functions for writing to Oracle NoSQL Database must have one of the following signatures:

declare %kv:put("text") function
   local:myFunctionName($key as xs:string, $value as xs:string) external;

declare %kv:put(["xml"|"binxml"|"avroxml"]) function 
   local:myFunctionName($key as xs:string, $xml as node()) external;

Annotations

%kv:put("method")

Declares the NoSQL Database module put function. Required.

The method determines how the value is stored. It must be one of the following values:

  • text: $value is serialized and encoded using the character set specified by the %output:encoding annotation.

  • avroxml: $xml is mapped to an instance of the Avro record specified by the %avro:schema-kv annotation. See "Writing XML as Avro."

  • binxml: $xml is encoded as XDK binary XML.

  • xml: $xml is serialized and encoded using the character set specified by the %output:encoding annotation. You can specify other XML serialization parameters using %output:*.

%avro:schema-kv("schema-name")

Specifies the record schema of the values to be written. The annotation value is a fully qualified record name. The record schema is retrieved from the Oracle NoSQL Database catalog.

For example: %avro:schema-kv("org.example.PersonRecord")

%output:*

A standard XQuery serialization parameter for the output method (text or XML) specified in %kv:put. See "Serialization Annotations."

See Also:

"The Influence of Serialization Parameters" sections for XML and text output methods in XSLT and XQuery Serialization 3.0 at

http://www.w3.org/TR/xslt-xquery-serialization-30/

Examples of Oracle NoSQL Database Adapter Functions

Example 1   Writing and Reading Text in Oracle NoSQL Database

This example uses the following text file is in HDFS. The file contains user profile information such as user ID, full name, and age, separated by colons (:).

mydata/users.txt  
 
john:John Doe:45
kelly:Kelly Johnson:32
laura:Laura Smith:
phil:Phil Johnson:27

The first query stores the lines of this text file in Oracle NoSQL Database as text values.

import module "oxh:text";
import module "oxh:kv";
 
for $line in text:collection("mydata/users.txt")
let $split := fn:tokenize($line, ":")
let $key := "/users/text/" || $split[1]
return
   kv:put-text($key, $line)

The next query reads the values from the database:

import module "oxh:text";
import module "oxh:kv";

for $value in kv:collection-text("/users/text")
let $split := fn:tokenize($value, ":")
where $split[2] eq "Phil Johnson"
return
   text:put($value)

The query creates a text file that contains the following line:

phil:Phil Johnson:27
Example 2   Writing and Reading Avro in Oracle NoSQL Database

In this example, the following Avro schema is registered with Oracle NoSQL Database:

{
   "type": "record",
   "name": "User",
   "namespace": "com.example",
   "fields" : [
      {"name": "id", "type": "string"},
      {"name": "full_name", "type": "string"},
      {"name": "age", "type": ["int", "null"] }
   ]
 }

The next query writes the user names to the database as Avro records.

import module "oxh:text";

declare %kv:put("avroxml") %avro:schema-kv("com.example.User")
   function local:put-user($key as xs:string, $value as node()) external;

for $line in text:collection("mydata/users.txt")
let $split := fn:tokenize($line, ":")
let $id := $split[1]
let $key := "/users/avro/" || $id
return
   local:put-user(
      $key,
      <user>
         <id>{$id}</id>
         <full_name>{$split[2]}</full_name>
         {
            if ($split[3] castable as xs:int) then
               <age>{$split[3]}</age>
            else
               ()
         }
      </user>
   )

This query reads the values from the database:

import module "oxh:text";
import module "oxh:kv";

for $user in kv:collection-avroxml("/users/avro")
where $user/age gt 30
return
   text:put($user/full_name)

The query creates a text files with the following lines:

John Doe
Kelly Johnson

Oracle NoSQL Database Adapter Configuration Properties

Oracle XQuery for Hadoop uses the generic options for specifying configuration properties in the Hadoop command. You can use the -conf option to identify configuration files, and the -D option to specify individual properties. See "Running a Query."

You can set various configuration properties for the Oracle NoSQL Database adapter that control the durability characteristics and timeout periods. You must set oracle.kv.hosts and oracle.kv.kvstore.The following properties configure the Oracle NoSQL Database adapter.

oracle.hadoop.xquery.kv.config.durability

Type: String

Default Value: NO_SYNC, NO_SYNC, SIMPLE_MAJORITY

Description: Defines the durability characteristics associated with %kv:put operations. The value consists of three parts, which you specify in order and separate with commas (,):

MasterPolicy, ReplicaPolicy, ReplicaAck

  • MasterPolicy: The synchronization policy used when committing a transaction to the master database. Set this part to one of the following constants:

    NO_SYNC: Do not write or synchronously flush the log on a transaction commit.

    SYNC: Write and synchronously flush the log on a transaction commit.

    WRITE_NO_SYNC: Write but do not synchronously flush the log on a transaction commit.

  • ReplicaPolicy: The synchronization policy used when committing a transaction to the replica databases. Set this part to NO_SYNC, SYNC, or WRITE_NO_SYNC, as described under MasterPolicy.

  • ReplicaAck: The acknowledgment policy used to obtain transaction acknowledgments from the replica databases. Set this part to one of the following constants:

    ALL: All replicas must acknowledge that they have committed the transaction.

    NONE: No transaction commit acknowledgments are required, and the master does not wait for them.

    SIMPLE_MAJORITY: A simple majority of replicas (such as 3 of 5) must acknowledge that they have committed the transaction.

    See Also:

    "Durability Guarantees" in Getting Started with Oracle NoSQL Database at

    http://docs.oracle.com/cd/NOSQL/html/GettingStartedGuide/durability.html

oracle.hadoop.xquery.kv.config.requestLimit

Type: Comma-separated list of integers

Default Value: 100, 90, 80

Description: Limits the number of simultaneous requests to prevent nodes with long service times from consuming all threads in the KV store client. The value consists of three integers, which you specify in order and separate with commas:

maxActiveRequests, requestThresholdPercent, nodeLimitPercent

  • maxActiveRequests: The maximum number of active requests permitted by the KV client. This number is typically derived from the maximum number of threads that the client has set aside for processing requests.

  • requestThresholdPercent: The percentage of maxActiveRequests at which requests are limited.

  • nodeLimitPercent: The maximum number of active requests that can be associated with a node when the number of active requests exceeds the threshold specified by requestThresholdPercent.

oracle.hadoop.xquery.kv.config.requestTimeout

Type: Long

Default Value: 5000 ms

Description: Configures the request timeout period in milliseconds. The value must be greater than zero (0).

oracle.hadoop.xquery.kv.config.socketOpenTimeout

Type: Long

Default Value: 5000 ms

Description: Configures the open timeout used when establishing sockets for client requests, in milliseconds. Shorter timeouts result in more rapid failure detection and recovery. The default open timeout is adequate for most applications. The value must be greater than zero (0).

oracle.hadoop.xquery.kv.config.socketReadTimeout

Type: Long

Default Value: 30000 ms

Description: Configures the read timeout period associated with the sockets that make client requests, in milliseconds. Shorter timeouts result in more rapid failure detection and recovery. Nonetheless, the timeout period should be sufficient to allow the longest timeout associated with a request.

oracle.kv.batchSize

Type: Key

Default Value: Not defined

Description: The desired number of keys for the InputFormat to fetch during each network round trip. A value of zero (0) sets the property to a default value.

oracle.kv.consistency

Type: Consistency

Default Value: NONE_REQUIRED

Description: The consistency guarantee for reading child key-value pairs. The following keywords are valid values:

  • ABSOLUTE: Requires the master to service the transaction so that consistency is absolute.

  • NONE_REQUIRED: Allows replicas to service the transaction, regardless of the state of the replicas relative to the master.

oracle.kv.hosts

Type: String

Default Value: Not defined

Description: An array of one or more hostname:port pairs that identify the hosts in the KV store with the source data. Separate multiple pairs with commas.

oracle.kv.kvstore

Type: String

Default Value: Not defined

Description: The name of the KV store with the source data.

oracle.kv.timeout

Type: Long

Default Value: Not defined

Description: Sets a maximum time interval in milliseconds for retrieving a selection of key-value pairs. A value of zero (0) sets the property to its default value.

See Also:

Oracle NoSQL Database Java API Reference at

http://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/hadoop/KVInputFormatBase.html