The JSON file adapter provides access to JSON files stored in HDFS. It also contains functions for working with JSON data embedded in other file formats. For example, you can query JSON that is stored as lines in a large text file by using json:parse-as-xml
with the text:collection
function.
Processing a single JSON file in parallel is not currently supported. A set of JSON files can be processes in parallel, with sequential processing of each file.
The JSON module is described in the following topics:
To use the built-in functions in your query, you must import the JSON file adapter as follows:
import module "oxh:json";
The JSON module contains the following functions:
Accesses a collection of JSON files in HDFS. Multiple files can be processed concurrently, but each individual file is parsed by a single task.
The JSON file adapter automatically decompresses files compressed with a Hadoop-supported compression codec.
json:collection-jsonxml($uris as xs:string*) as element()* external;
$uris
: The JSON file URIs
XML elements that model the JSON values. See "About Converting JSON Data Formats to XML."
Parses a JSON value as XML.
json:parse-as-xml($arg as xs:string?) as element(*)?
$arg
: Can be the empty sequence.
An XML element that models the JSON value. An empty sequence if $arg
is an empty sequence. See "About Converting JSON Data Formats to XML."
Retrieves an entry from a JSON object modeled as XML. See "About Converting JSON Data Formats to XML."
json:get($key as xs:string?, $obj as node()?) as element(oxh:entry)? json:get($key as xs:string?) as element(oxh:entry)?
$key
: The JSON data key
$obj
: The JSON object value
The value of the following XPath expression:
$obj/oxh:entry[@key eq $key]
If $input
not present, then the behavior is identical to calling the two-argument function using the context item for $obj
. See the Notes.
These function calls are equivalent:
$var/json:get("key") json:get("key", $var) $var/oxh:entry[@key eq "key"]
$var
is a JSON object modeled as XML. See "Reading Maps."
You can use the following annotations to define functions that read collections of JSON files in HDFS. These annotations provide additional functionality that is not available using the built-in functions.
Custom functions for reading JSON files must have the following signature:
declare %json:collection("jsonxml") [additional annotations] function local:myFunctionName($uris as xs:string*) as element()* external;
Declares the collection function. The annotation parameter must be jsonxml
.
Identifies the text encoding of the input files.
The valid encodings are those supported by the JVM. If this annotation is omitted, then the encoding is automatically detected from the JSON file as UTF-8, UTF-16 big-endian serialization (BE) or little-endian serialization (LE), or UTF-32 (BE or LE).
For better performance, omit the encoding annotation if the actual file encoding is specified by JSON Request for Comment 4627, Section 3 "Encoding," on the Internet Engineering Task Force (IETF) website at
A collection of XML elements. Each element models the corresponding JSON value. See "About Converting JSON Data Formats to XML."
Example 1 uses the following JSON text files stored in HDFS:
mydata/users1.json [ { "user" : "john", "full name" : "John Doe", "age" : 45 }, { "user" : "kelly", "full name" : "Kelly Johnson", "age" : 32 } ] mydata/users2.json [ { "user" : "laura", "full name" : "Laura Smith", "age" : null }, { "user" : "phil", "full name" : "Phil Johnson", "age" : 27 } ]
The remaining examples query the following text file in HDFS:
mydata/users-json.txt { "user" : "john", "full name" : "John Doe", "age" : 45 } { "user" : "kelly", "full name" : "Kelly Johnson", "age" : 32 } { "user" : "laura", "full name" : "Laura Smith", "age" : null } { "user" : "phil", "full name" : "Phil Johnson", "age" : 27 }
The following query selects names of users whose last name is Johnson from users1.json
and users2.json
:
import module "oxh:text"; import module "oxh:json"; for $user in json:collection-jsonxml("mydata/users*.json")/oxh:item let $fullname := $user/json:get("full name") where tokenize($fullname, "\s+")[2] eq "Johnson" return text:put-text($fullname)
This query generates text files that contain the following lines:
Phil Johnson Kelly Johnson
The following query selects the names of users that are older than 30 from users-json.txt
:
import module "oxh:text"; import module "oxh:json"; for $line in text:collection("mydata/users-json.txt") let $user := json:parse-as-xml($line) where $user/json:get("age") gt 30 return text:put($user/json:get("full name"))
This query generates text files that contain the following lines:
John Doe Kelly Johnson
The next query selects the names of employees that have a null value for age from users-json.txt
:
import module "oxh:text"; import module "oxh:json"; for $line in text:collection("mydata/users-json.txt") let $user := json:parse-as-xml($line) where $user/json:get("age")/nilled() return text:put($user/json:get("full name"))
This query generates a text file that contains the following line:
Laura Smith
Oracle XQuery for Hadoop uses the generic options for specifying configuration properties in the hadoop
command. You can use the -conf
option to identify configuration files, and the -D
option to specify individual properties. See "Running Queries."
The following configuration properties are equivalent to the Jackson parser options with the same names. You can enter the option name in either upper or lower case. For example, oracle.hadoop.xquery.json.parser.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER
and oracle.hadoop.xquery.json.parser.allow_backslash_escaping_any_character
are equal.
Type: Boolean
Default Value: false
Description: Enables any character to be escaped with a backslash (\). Otherwise, only the following characters can be escaped: quotation mark("), slash (/), backslash (\), backspace, form feed (f), new line (n), carriage return (r), horizontal tab (t), and hexadecimal representations (u
nnnn
)
Type: Boolean
Default Value: false
Description: Allows Java and C++ comments (/* and //) within the parsed text.
Type: Boolean
Default Value: false
Description: Allows Not a Number (NaN) tokens to be parsed as floating number values.
Type: Boolean
Default Value: false
Description: Allows integral numbers to start with zeroes, such as 00001. The zeros do not change the value and can be ignored.
Type: Boolean
Default Value: false
Description: Allow single quotes (') to delimit string values.
Type: Boolean
Default Value: false
Description: Allows JSON strings to contain unquoted control characters (that is, ASCII characters with a decimal value less than 32, including the tab and line feed).
Type: Boolean
Default Value: false
Description: Allows unquoted field names, which are allowed by Javascript but not the JSON specification.
This section describes how JSON data formats are converted to XML. It contains the following topics:
JSON objects are similar to Avro maps and are converted to the same XML structure. See "Reading Maps."
For example, the following JSON object is converted to an XML element:
{ "user" : "john", "full_name" : "John Doe", "age" : 45 }
The object is modeled as the following element:
<oxh:item> <oxh:entry key="user">john</oxh:entry> <oxh:entry key="full_name">John Doe</oxh:entry> <oxh:entry key="age">45</oxh:entry> </oxh:item>
JSON arrays are similar to Avro arrays and are converted to the same XML structure. See "Reading Arrays."
For example, the following JSON array is converted to an XML element:
[ "red", "blue", "green" ]
The array is modeled as the following element:
<oxh:item> <oxh:item>red</oxh:item> <oxh:item>blue</oxh:item> <oxh:item>green</oxh:item> </oxh:item>