xml_query_as_primitive

Returns the result of a query as a Hive primitive value. Each Hive primitive data type has a separate function named for it:

  • xml_query_as_string

  • xml_query_as_boolean

  • xml_query_as_tinyint

  • xml_query_as_smallint

  • xml_query_as_int

  • xml_query_as_bigint

  • xml_query_as_double

  • xml_query_as_float

Signature

xml_query_as_primitive (
   STRING query,
   {STRUCT | STRING} bindings,
} as primitive

Description

query

An XQuery or XPath expression. It must be a constant value, because it is only read the first time the function is evaluated. The initial query string is compiled and reused in all subsequent calls.

You can access files that are stored in the Hadoop distributed cache and HTTP resources (http://...). Use the XQuery fn:doc function for XML documents, and the fn:unparsed-text and fn:parsed-text-lines functions to access plain text files. See Example 4.

If an error occurs while compiling the query, the function raises an error. If an error occurs while evaluating the query, the error is logged (not raised), and an empty array is returned.

bindings

The input that the query processes. The value can be an XML STRING or a STRUCT of variable values:

  • STRING: The string is bound to the initial context item of the query as XML. See Example 1.

  • STRUCT: A STRUCT with an even number of fields. Each pair of fields defines a variable binding (name, value) for the query. The name fields must be type STRING, and the value fields can be any supported primitive. See "Data Type Conversions" and Example 2.

    The first item in the result of the query is cast to the XML schema type that maps to the primitive type of the function. If the query returns multiple items, then all but the first are ignored.

Return Value

A Hive primitive value, which is the first item returned by the query, converted to an atomic value. If the result of the query is empty, then the return value is NULL.

Examples

Example 1   Using a STRING Binding

This example parses and binds the input XML string to the initial context item of the query x/y:

hive> SELECT xml_query_as_string("x/y", "<x><y>hello</y></x>") FROM src LIMIT 1;
     .
     .
     .
"hello"

The following are string binding examples that use other primitive functions:

hive> SELECT xml_query_as_int("x/y", "<x><y>123</y></x>") FROM src LIMIT 1;
     .
     .
     .
123
 
hive> SELECT xml_query_as_double("x/y", "<x><y>12.3</y></x>") FROM src LIMIT 1;
     .
     .
     .
12.3
 
hive> SELECT xml_query_as_boolean("x/y", "<x><y>true</y></x>") FROM src LIMIT 1;
     .
     .
     .
true
Example 2   Using a STRUCT Binding

In this example, the second argument is a STRUCT that defines two query variables, $data and $value. The values of the variables in the STRUCT are converted to XML schema types as described in "Data Type Conversions."

hive>
SELECT xml_query_as_string(
   "fn:parse-xml($data)/x/y[@id = $value]", 
   struct(
      "data", "<x><y id='1'>hello</y><z/><y id='2'>world</y></x>",
      "value", 2
   )
) FROM src LIMIT 1;
     .
     .
     .
world
Example 3   Returning Multiple Query Results

This example returns only the first item (hello) from the query. The second item (world) is discarded.

hive> SELECT xml_query_as_string("x/y", "<x><y>hello</y><z/><y>world</y></x>") FROM src LIMIT 1;
     .
     .
     .
hello
Example 4   Returning Empty Query Results

This example returns NULL because the result of the query is empty:

hive> SELECT xml_query_as_string("x/foo", "<x><y>hello</y><z/><y>world</y></x>") FROM src LIMIT 1;
     .
     .
     .
NULL
Example 5   Obtaining Serialized XML

These examples use the fn:serialize function to return complex XML elements as a STRING value:

hive> SELECT xml_query_as_string("fn:serialize(x/y[1])", "<x><y>hello</y><z/><y>world</y></x>") FROM src LIMIT 1;
     .
     .
     .
"<y>hello</y>"

hive> SELECT xml_query_as_string(
   "fn:serialize(<html><head><title>{$desc}</title></head><body>Name: {$name}</body></html>)", 
   struct(
      "desc", "Employee Details",
      "name", "John Doe"
   )
) FROM src LIMIT 1;
...
<html><head><title>Employee Details</title></head><body>Name: John Doe</body></html>
Example 6   Accessing the Hadoop Distributed Cache

This example adds a file named test.xml to the distributed cache, and then queries it using the fn:doc function. The file contains this value:

<x><y>hello</y><z/><y>world</y></x>
Hive> ADD FILE test.xml;
Added resource: test.xml
Hive> SELECT xml_query_as_string("fn:doc('test.xml')/x/y[1]", NULL) FROM src LIMIT 1;
     .
     .
     .
hello
Example 7   Results of a Failed Query

This example returns NULL because </invalid is missing an angle bracket. An XML parsing error is written to the log:

Hive> SELECT xml_query_as_string("x/y", "<x><y>hello</invalid") FROM src LIMIT 1;
     .
     .
     .
NULL

This example returns NULL because foo cannot be cast as xs:float. A cast error is written to the log:

Hive> SELECT xml_query_as_float("x/y", "<x><y>foo</y></x>") FROM src LIMIT 1;
     .
     .
     .
NULL