Returns the result of a query as an array of STRING
values.
An XQuery or XPath expression. It must be a constant value, because it is only read the first time the function is evaluated. The initial query string is compiled and reused in all subsequent calls.
You can access files that are stored in the Hadoop distributed cache and HTTP resources (http://...). Use the XQuery fn:doc
function for XML documents, and the fn:unparsed-text
and fn:parsed-text-lines
functions to access plain text files. See Example 4.
If an error occurs while compiling the query, the function raises an error. If an error occurs while evaluating the query, the error is logged (not raised) and an empty array is returned.
The input that the query processes. The value can be an XML STRING
or a STRUCT
of variable values:
STRING
: The string is bound to the initial context item of the query as XML. See Example 1.
STRUCT
: A STRUCT
with an even number of fields. Each pair of fields defines a variable binding (name, value) for the query. The name fields must be type STRING
, and the value fields can be any supported primitive. See "About Data Type Conversions" and Example 2.
A Hive array of STRING
values, which are the result of the query converted to a sequence of atomic values. If the result of the query is empty, then the return value is an empty array.
This example parses and binds the input XML string to the initial context item of the query "x/y":
> SELECT xml_query("x/y", "<x><y>hello</y><z/><y>world</y></x>") FROM src LIMIT 1;
["hello", "world"]
In this example, the second argument is a STRUCT
that defines two query variables, $data
and $value
. The values of the variables in the STRUCT
are converted to XML schema types as described in "About Data Type Conversions."
> SELECT xml_query( "fn:parse-xml($data)/x/y[@id = $value]", struct( "data", "<x><y id='1'>hello</y><z/><y id='2'>world</y></x>", "value", 1 ) ) FROM src LIMIT 1; ["hello"]
This example uses the fn:serialize
function to return serialized XML:
> SELECT xml_query( "for $y in x/y return fn:serialize($y) ", "<x><y>hello</y><z/><y>world</y></x>" ) FROM src LIMIT 1; ["<y>hello</y>","<y>world</y>"]
This example adds a file named test.xml to the distributed cache, and then queries it using the fn:doc
function. The file contains the value <x><y>hello</y><z/><y>world</y></x>
.
> ADD FILE test.xml; > SELECT xml_query("fn:doc('test.xml')/x/y", NULL) FROM src LIMIT 1; ["hello","world"]
The next example returns an empty array because the input XML is invalid. The XML parsing error will be written to the log:
> SELECT xml_query("x/y", "<x><y>hello</y></invalid") FROM src LIMIT 1;
[]