JSON File Adapter

The JSON file adapter provides access to JSON files stored in HDFS. It also contains functions for working with JSON data embedded in other file formats. For example, you can query JSON that is stored as lines in a large text file by using json:parse-as-xml with the text:collection function.

Processing a single JSON file in parallel is not currently supported. A set of JSON files can be processes in parallel, with sequential processing of each file.

The JSON module is described in the following topics:

Built-in Functions for Reading JSON

To use the built-in functions in your query, you must import the JSON file adapter as follows:

import module "oxh:json";

The JSON module contains the following functions:

json:collection-jsonxml

Accesses a collection of JSON files in HDFS. Multiple files can be processed concurrently, but each individual file is parsed by a single task.

The JSON file adapter automatically decompresses files compressed with a Hadoop-supported compression codec.

Signature

json:collection-jsonxml($uris as xs:string*) as element()* external;

Parameters

$uris: The JSON file URIs

Returns

XML elements that model the JSON values. See "About Converting JSON Data Formats to XML."

json:parse-as-xml

Parses a JSON value as XML.

Signature

json:parse-as-xml($arg as xs:string?) as element(*)?

Parameters

$arg: Can be the empty sequence.

Returns

An XML element that models the JSON value. An empty sequence if $arg is an empty sequence. See "About Converting JSON Data Formats to XML."

json:get

Retrieves an entry from a JSON object modeled as XML. See "About Converting JSON Data Formats to XML."

Signature

json:get($key as xs:string?, $obj as node()?) as element(oxh:entry)?

json:get($key as xs:string?) as element(oxh:entry)?

Parameters

$key: The JSON data key

$obj: The JSON object value

Returns

The value of the following XPath expression:

$obj/oxh:entry[@key eq $key]

If $input not present, then the behavior is identical to calling the two-argument function using the context item for $obj. See the Notes.

Notes

These function calls are equivalent:

$var/json:get("key")

json:get("key", $var)

$var/oxh:entry[@key eq "key"]
 

$var is a JSON object modeled as XML. See "Reading Maps."

Custom Functions for Reading JSON Files

You can use the following annotations to define functions that read collections of JSON files in HDFS. These annotations provide additional functionality that is not available using the built-in functions.

Signature

Custom functions for reading JSON files must have the following signature:

declare %json:collection("jsonxml") [additional annotations]
   function local:myFunctionName($uris as xs:string*) as element()* external;

Annotations

%json:collection("jsonxml")

Declares the collection function. The annotation parameter must be jsonxml.

%output:encoding("charset")

Identifies the text encoding of the input files.

The valid encodings are those supported by the JVM. If this annotation is omitted, then the encoding is automatically detected from the JSON file as UTF-8, UTF-16 big-endian serialization (BE) or little-endian serialization (LE), or UTF-32 (BE or LE).

For better performance, omit the encoding annotation if the actual file encoding is specified by JSON Request for Comment 4627, Section 3 "Encoding," on the Internet Engineering Task Force (IETF) website at

http://www.ietf.org/rfc/rfc4627.txt

Parameters

$uris as xs:string*

Lists the JSON file URIs. Required.

Returns

A collection of XML elements. Each element models the corresponding JSON value. See "About Converting JSON Data Formats to XML."

Examples of JSON Functions

Example 1 uses the following JSON text files stored in HDFS:

mydata/users1.json
[
{ "user" : "john", "full name" : "John Doe", "age" : 45 },
{ "user" : "kelly", "full name" : "Kelly Johnson", "age" : 32 }
]
 
mydata/users2.json 
[
{ "user" : "laura", "full name" : "Laura Smith", "age" : null },
{ "user" : "phil", "full name" : "Phil Johnson", "age" : 27 }
]
 

The remaining examples query the following text file in HDFS:

mydata/users-json.txt
 
{ "user" : "john", "full name" : "John Doe", "age" : 45 }
{ "user" : "kelly", "full name" : "Kelly Johnson", "age" : 32 }
{ "user" : "laura", "full name" : "Laura Smith", "age" : null }
{ "user" : "phil", "full name" : "Phil Johnson", "age" : 27 }
Example 1   

The following query selects names of users whose last name is Johnson from users1.json and users2.json:

import module "oxh:text";
import module "oxh:json";
 
for $user in json:collection-jsonxml("mydata/users*.json")/oxh:item
let $fullname := $user/json:get("full name")
where tokenize($fullname, "\s+")[2] eq "Johnson"
return 
   text:put-text($fullname)

This query generates text files that contain the following lines:

Phil Johnson
Kelly Johnson
Example 2   

The following query selects the names of users that are older than 30 from users-json.txt:

import module "oxh:text";
import module "oxh:json";

for $line in text:collection("mydata/users-json.txt")
let $user := json:parse-as-xml($line)
where $user/json:get("age") gt 30
return 
   text:put($user/json:get("full name"))

This query generates text files that contain the following lines:

John Doe
Kelly Johnson
Example 3   

The next query selects the names of employees that have a null value for age from users-json.txt:

import module "oxh:text";
import module "oxh:json";

for $line in text:collection("mydata/users-json.txt")
let $user := json:parse-as-xml($line)
where $user/json:get("age")/nilled()
return 
   text:put($user/json:get("full name"))
 

This query generates a text file that contains the following line:

Laura Smith

JSON File Adapter Configuration Properties

Oracle XQuery for Hadoop uses the generic options for specifying configuration properties in the hadoop command. You can use the -conf option to identify configuration files, and the -D option to specify individual properties. See "Running Queries."

The following configuration properties are equivalent to the Jackson parser options with the same names. You can enter the option name in either upper or lower case. For example, oracle.hadoop.xquery.json.parser.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER and oracle.hadoop.xquery.json.parser.allow_backslash_escaping_any_character are equal.

oracle.hadoop.xquery.json.parser.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER

Type: Boolean

Default Value: false

Description: Enables any character to be escaped with a backslash (\). Otherwise, only the following characters can be escaped: quotation mark("), slash (/), backslash (\), backspace, form feed (f), new line (n), carriage return (r), horizontal tab (t), and hexadecimal representations (unnnn)

oracle.hadoop.xquery.json.parser.ALLOW_COMMENTS

Type: Boolean

Default Value: false

Description: Allows Java and C++ comments (/* and //) within the parsed text.

oracle.hadoop.xquery.json.parser.ALLOW_NON_NUMERIC_NUMBERS

Type: Boolean

Default Value: false

Description: Allows Not a Number (NaN) tokens to be parsed as floating number values.

oracle.hadoop.xquery.json.parser.ALLOW_NUMERIC_LEADING_ZEROS

Type: Boolean

Default Value: false

Description: Allows integral numbers to start with zeroes, such as 00001. The zeros do not change the value and can be ignored.

oracle.hadoop.xquery.json.parser.ALLOW_SINGLE_QUOTES

Type: Boolean

Default Value: false

Description: Allow single quotes (') to delimit string values.

oracle.hadoop.xquery.json.parser.ALLOW_UNQUOTED_CONTROL_CHARS

Type: Boolean

Default Value: false

Description: Allows JSON strings to contain unquoted control characters (that is, ASCII characters with a decimal value less than 32, including the tab and line feed).

oracle.hadoop.xquery.json.parser.ALLOW_UNQUOTED_FIELD_NAMES

Type: Boolean

Default Value: false

Description: Allows unquoted field names, which are allowed by Javascript but not the JSON specification.

About Converting JSON Data Formats to XML

This section describes how JSON data formats are converted to XML. It contains the following topics:

About Converting JSON Objects to XML

JSON objects are similar to Avro maps and are converted to the same XML structure. See "Reading Maps."

For example, the following JSON object is converted to an XML element:

{ 
   "user" : "john", 
   "full_name" : "John Doe", 
   "age" : 45 
}

The object is modeled as the following element:

<oxh:item>
    <oxh:entry key="user">john</oxh:entry>
    <oxh:entry key="full_name">John Doe</oxh:entry>
    <oxh:entry key="age">45</oxh:entry>
</oxh:item>

About Converting JSON Arrays to XML

JSON arrays are similar to Avro arrays and are converted to the same XML structure. See "Reading Arrays."

For example, the following JSON array is converted to an XML element:

[ "red", "blue", "green" ]

The array is modeled as the following element:

<oxh:item>
   <oxh:item>red</oxh:item>
   <oxh:item>blue</oxh:item>
   <oxh:item>green</oxh:item>
</oxh:item>

About Converting Other JSON Types

The other JSON values are mapped as shown in Table 6-2.

Table 6-2 JSON Type Conversions

JSON XML

null

An empty (nilled) element

true/false

xs:boolean

number

xs:decimal

string

xs:string