xml_query

Returns the result of a query as an array of STRING values.

Signature

xml_query(
   STRING query, 
   { STRING | STRUCT } bindings
) as ARRAY<STRING>

Description

query

An XQuery or XPath expression. It must be a constant value, because it is only read the first time the function is evaluated. The initial query string is compiled and reused in all subsequent calls.

You can access files that are stored in the Hadoop distributed cache and HTTP resources (http://...). Use the XQuery fn:doc function for XML documents, and the fn:unparsed-text and fn:parsed-text-lines functions to access plain text files. See Example 4.

If an error occurs while compiling the query, the function raises an error. If an error occurs while evaluating the query, the error is logged (not raised) and an empty array is returned.

bindings

The input that the query processes. The value can be an XML STRING or a STRUCT of variable values:

  • STRING: The string is bound to the initial context item of the query as XML. See Example 1.

  • STRUCT: A STRUCT with an even number of fields. Each pair of fields defines a variable binding (name, value) for the query. The name fields must be type STRING, and the value fields can be any supported primitive. See "About Data Type Conversions" and Example 2.

Return Value

A Hive array of STRING values, which are the result of the query converted to a sequence of atomic values. If the result of the query is empty, then the return value is an empty array.

Examples

Example 1   Using a STRING Binding

This example parses and binds the input XML string to the initial context item of the query "x/y":

> SELECT xml_query("x/y", "<x><y>hello</y><z/><y>world</y></x>") FROM src LIMIT 1;
["hello", "world"]
Example 2   Using a STRUCT Binding

In this example, the second argument is a STRUCT that defines two query variables, $data and $value. The values of the variables in the STRUCT are converted to XML schema types as described in "About Data Type Conversions."

> SELECT xml_query(
  "fn:parse-xml($data)/x/y[@id = $value]", 
  struct(
     "data", "<x><y id='1'>hello</y><z/><y id='2'>world</y></x>",
     "value", 1
      )
  ) FROM src LIMIT 1;
["hello"]
Example 3   Obtaining Serialized XML

This example uses the fn:serialize function to return serialized XML:

> SELECT xml_query(
  "for $y in x/y 
  return fn:serialize($y)
  ",
  "<x><y>hello</y><z/><y>world</y></x>"
   ) FROM src LIMIT 1;
["<y>hello</y>","<y>world</y>"]
Example 4   Accessing the Hadoop Distributed Cache

This example adds a file named test.xml to the distributed cache, and then queries it using the fn:doc function. The file contains the value <x><y>hello</y><z/><y>world</y></x>.

> ADD FILE test.xml;
> SELECT xml_query("fn:doc('test.xml')/x/y", NULL) FROM src LIMIT 1;
["hello","world"]
Example 5   Results of a Failed Query

The next example returns an empty array because the input XML is invalid. The XML parsing error will be written to the log:

> SELECT xml_query("x/y", "<x><y>hello</y></invalid") FROM src LIMIT 1;
 []