XML Function Library for Apache Hive

This section describes the functions provided with the XML Extensions for Hive. It contains the following topics:

The XML Extensions for Hive contain the following functions:


Online Documentation of Functions

You can get online Help for the Hive extension functions by using this command:

DESCRIBE FUNCTION [EXTENDED] function_name;

This example provides a brief description of the xml_query function:

hive> describe function xml_query;         
OK
xml_query(query, bindings) - Returns the result of the query as a STRING array

The EXTENDED option provides a detailed description and examples:

hive> describe function extended xml_query;
OK
xml_query(query, bindings) - Returns the result of the query as a STRING array
Evaluates an XQuery expression with the specified bindings. The query argument must be a STRING and the bindings argument must be a STRING or a STRUCT. If the bindings argument is a STRING, it is parsed as XML and bound to the initial context item of the query. For example:
  
  > SELECT xml_query("x/y", "<x><y>hello</y><z/><y>world</y></x>") FROM src LIMIT 1;
  ["hello", "world"]
     .
     .
     .

About Hive Access to External Files

The Hive functions have access to the following external file resources:

You can address these files by their URI from either HTTP (by using the http://... syntax) or the local file system (by using the file://... syntax). In this example, relative file locations are resolved against the local working directory of the task, so that URIs such as bar.xsd can be used to access files that were added to the distributed cache:

xml_query("
   import schema namespace tns='http://example.org' at 'bar.xsd';
   validate { ... }
        ",
           .
           .
           .

To access a local file, first add it to the Hadoop distributed cache using the Hive ADD FILE command. For example:

ADD FILE /local/mydir/thisfile.xsd;

Otherwise, you must ensure that the file is available on all nodes of the cluster, such as by mounting the same network drive or simply copying the file to every node. The default base URI is set to the local working directory.

See Also:

For information about the default base URI, see XQuery 1.0:An XML Query Language at

http://www.w3.org/TR/xquery/#dt-base-uri


About Data Type Conversions

Table 7-1 shows the conversions that occur automatically between Hive primitives and XML schema types.

Table 7-1 Data Type Equivalents

Hive XML schema

TINYINT

xs:byte

SMALLINT

xs:short

INT

xs:int

BIGINT

xs:long

BOOLEAN

xs:boolean

FLOAT

xs:float

DOUBLE

xs:double

STRING

xs:string