Chapter 5. Using XQuery with BDB XML

Table of Contents

XQuery: A Brief Introduction
Referencing Portions of Documents using XQuery
Predicates
Context
Wildcards
Case Insensitive Searches
Navigation Functions
Using FLWOR with BDB XML
Retrieving BDB XML Documents using XQuery
The Query Context
Performing Queries
Working with External Functions
Implementing XmlExternalFunction
Implementing XmlResolver
Calling External Functions from XQuery
Examining Query Results
Examining Document Values
Examining Metadata
Copying Result Sets
Using Event Readers

Documents are retrieved from BDB XML containers using XQuery expressions. XQuery is a language designed to query XML documents. Using XQuery, you can retrieve entire documents, subsections of documents, or values from one or more individual document nodes. You can also use XQuery to manipulate or transform values returned by document queries.

Note that XQuery represents a superset of XPath 2.0, which in turn is based on XPath 1.0. If you have prior experience with BDB XML 1.x, then you should be familiar with XPath as that was the query language offered by that library.

BDB XML partially implements XQuery 3.0. However, BDB XML will be updated to track any changes in the working specification that may occur. You can find the XQuery specification at http://www.w3.org/XML/Query.

Beyond the W3C specifications, there are several good books on the market today that fully describe XQuery. In addition, there are many freely available resources on the web that provide a good introduction to the language. Searching for 'XQuery' in the Web search engine of your choice ought to return a wealth of information and pointers on the language.

That said, this chapter begins with a very thin introduction to XQuery that should be enough for you to understand any BDB XML concepts required to proceed with usage of the library. In particular, the next section of this manual highlights those aspects of XQuery that have unique meanings relative to BDB XML usage. Be aware, however, that the following introduction is not meant to be complete — a full treatment of XQuery is beyond the scope of an introductory manual such as this.

We follow this brief introduction to XQuery with a general description of querying documents stored in BDB XML containers, and examining the results of those queries. See Retrieving BDB XML Documents using XQuery for that information.

XQuery: A Brief Introduction

XQuery can be used to:

  1. Query for a document. Note that queries can be formed against an individual document, or against multiple documents.

  2. Query for document subsections, including values found on individual document nodes.

  3. Manipulate and transform the results of a query.

  4. Modify a document (see Modifying XML Documents for more information).

To do this, XQuery views an XML document as a collection of element, text, and attribute nodes. For example, consider the following XML document:

<?xml version="1.0"?>
<Node0>
<Node1 class="myValue1">Node1 text </Node1>
<Node2>
    <Node3>Node3 text</Node3>
    <Node3>Node3 text 2</Node3>
    <Node3>Node3 text 3</Node3>
    <Node4>300</Node4>
    </Node2>
</Node0>

In the above document, <Node0> is the document's root node, and <Node1> is an element node. Further, the element node, <Node1>, contains a single attribute node whose name is class and whose value is myValue1. Finally, <Node1> contains a text node whose value is Node1 text.

Referencing Portions of Documents using XQuery

A document's root can always be referenced using a single forward slash:

/.

Subsequent element nodes in the document can be referenced using Unix-style path notation:

/Node1

To reference an attribute node, prefix the attribute node's name with '@':

/Node1/@class

To return the value contained in a node's text node (remember that not all element nodes contain a text node), use distinct-values() function:

distinct-values(/Node1)

To return the value assigned to an attribute node, you also use the distinct-values() function:

distinct-values(/Node1/@class)

Predicates

When you provide an XQuery path, what you receive back is a result set. You can further filter this result set by using predicates. Predicates are always contained in brackets ([]) and there are two types of predicates that you can use: numeric and boolean.

Numeric Predicates

Numeric predicates allow you to select a node based on its position relative to another node in the document (that is, based on its context).

For example, consider the document presented in XQuery: A Brief Introduction. This document contains three <Node3> elements. If you simply enter the XQuery expression:

/Node1/Node2/Node3

all <Node3> elements in the document are returned. To return, say, the second <Node3> element, use a predicate:

/Node1/Node2/Node3[2]

Boolean Predicates

Boolean predicates filter a query result so that only those elements of the result are kept if the expression evaluates to true. For example, suppose you want to select a node only if its text node is equal to some value. Then:

/Node1/Node2[Node3="Node3 text 3"]

Context

The meaning of an XQuery expression can change depending on the current context. Within XQuery expressions, context is usually only important if you want to use relative paths or if your documents use namespaces. Do not confuse XQuery contexts with BDB XML contexts. While BDB XML contexts are related to XQuery contexts, they differ in that BDB XML contexts are a data structure that allows you to define namespaces, define variables, and to identify the type of information that is returned as the result of a query (all of these topics are discussed later in this chapter).

Relative Paths

Just like Unix filesystem paths, any path that does not begin with a slash (/) is relative to your current location in a document. Your current location in a document is determined by your context. Thus, if in the document presented in XQuery: A Brief Introduction your context is set to Node2, you can refer to Node3 with the simple notation:

Node3

Further, you can refer to a parent node using the following familiar notation:

..

and to the current node using:

.

Namespaces

Natural language and, therefore, tag names can be imprecise. Two different tags can have identical names and yet hold entirely different sorts of information. Namespaces are intended to resolve any such sources of confusion.

Consider the following document:

<?xml version="1.0"?>
<definition>
    <ring>
        Jewelry that you wear.
    </ring>
    <ring>
        A sound that a telephone makes.
    </ring>
    <ring>
        A circular space for exhibitions.              
    </ring>
</definition> 

As constructed, this document makes it difficult (though not impossible) to select the node for, say, a ringing telephone.

To resolve any potential confusion in your schema or supporting code, you can introduce namespaces to your documents. For example:

<?xml version="1.0"?>
<definition>
    <jewelry:ring xmlns:jewelry="http://myDefinition.dbxml/jewelry">
        Jewelry that you wear.
    </jewelry:ring>
    <sounds:ring xmlns:sounds="http://myDefinition.dbxml/sounds">
        A sound a telephone makes.
    </sounds:ring>
    <showplaces:ring 
        xmlns:showplaces="http://myDefinition.dbxml/showplaces">
        A circular space for exhibitions.
    </showplaces:ring>
</definition> 

Now that the document has defined namespaces, you can precisely query any given node:

/definition/sounds:ring

Note

In order to perform queries against a document stored in BDB XML that makes use of namespaces, you must declare the namespace to your query. You do this using XmlQueryContext.setNamespace(). See Defining Namespaces for more information.

By identifying the namespace to which the node belongs, you are declaring a context for the query.

The URI used in the namespace definition is not required to actually resolve to anything. The only criteria is that it be unique within the scope of any document set(s) in which it might be used.

Also, the namespace is only required to be declared once in the document. All subsequent usages need only use the relevant prefix. For example, we could have added the following to our previous document:

<jewelry:diamond>
    The centerpiece of many rings.
</jewelry:diamond>
<showplaces:diamond>
    A place where baseball is played.     
</showplaces:diamond>

Finally, namespaces can be used with attributes too. For an example:

<clubMembers>
    <surveyResults school:class="English" 
        xmlns:school="http://myExampleDefinitions.dbxml/school" 
        number="200"/>
    <surveyResults school:class="Mathematics" 
        number="165"/>
    <surveyResults social:class="Middle" 
        xmlns:social="http://myExampleDefinitions.dbxml/social"      
        number="543"/>
</clubMembers>

Once you have declared a namespace for an attribute, you can query the attribute in the following way:

/clubMembers/surveyResults/@school:class

And to retrieve the value set for the attribute:

distinct-values(/clubMembers/surveyResults/@school:class)

Wildcards

XQuery allows you to use wildcards when document elements are unknown. For example:

/Node0/*/Node6

selects all the Node6 nodes that are 3 nodes deep in the document and whose path starts with Node0. Other wildcard matches are:

  • Selects all of the nodes in the document:

    //*

  • Selects all of the Node6 nodes that have three ancestors:

    /*/*/*/Node6

  • Selects all the nodes immediately beneath Node5:

    /Node0/Node5/*

  • Selects all of Node5's attributes:

    /Node0/Node5/@*

Case Insensitive Searches

It is possible to perform a case-insensitive and diacritic insensitive match using BDB XML's built-in function, dbxml:contains(). This function takes two parameters, both strings. The first identifies the attribute or element that you want to examine, and the second provides the string you want to match.

For example, the search:

collection('myCollection.dbxml')/book[dbxml:contains(title, "Résumé")]

matches "resume", "Resume", "Resumé" and so forth.

Note that searches performed using dbxml:contains() can be backed by BDB XML's substring indexes.

Navigation Functions

XQuery provides several functions that can be used for global navigation to a specific document or collection of documents. From the perspective of this manual, two of these are interesting because they have specific meaning from within the context of BDB XML

collection()

Within XQuery, collection() is a function that allows you to create a named sequence. From within BDB XML, however, it is also used to navigate to a specific container. In this case, you must identify to collection() the literal name of the container. You do this either by passing the container name directly to the function, or by declaring a default container name using the XmlQueryContext.setDefaultCollection() method.

Note that the container must have already been opened by the XmlManager in order for collection to reference that container. The exception to this is if XmlManager was opened using the XmlManagerConfig.setAllowAutoOpen() method.

For example, suppose you want to perform a query against a container named container1.dbxml. In this case, first open the container using XmlManager.openContainer() and then specify the collection() function on the query. For example:

collection("container1.dbxml")/Node0

Note that this is actually short-hand for:

collection("dbxml:/container1.dbxml")/Node0

dbxml:/ is the default base URI for BDB XML. You can change the base URI using XmlQueryContext.setBaseURI().

If you want to perform a query against multiple containers, use the union ("|") operator. For example, to query against containers c1.dbxml and c2.dbxml, you would use the following expression:

(collection("c1.dbxml") | collection("c2.dbxml"))/Node0

See Retrieving BDB XML Documents using XQuery for more information on how to prepare and perform queries.

doc()

XQuery provides the doc() function so that you can trivially navigate to the root of a named document. doc() is required to take a URI.

To use doc() to navigate to a specific document stored in BDB XML, provide an XQuery path that uses the dbxml: base URI, and that identifies the container in which the document can be found. The actual document name that you provide is the same name that was set for the document when it was added to the container (see Adding Documents for more information).

For example, suppose you have a document named "mydoc1.xml" in container "container1.dbxml". Then to perform a query against that specific document, first open container1.dbxml and then provide a query something like this:

doc("dbxml:/container1.dbxml/mydoc1.xml")/Node0

See Retrieving BDB XML Documents using XQuery for more information on how to prepare and perform queries.

Using FLWOR with BDB XML

XQuery offers iterative and transformative capabilities through FLWOR (pronounced "flower") expressions. FLWOR is an acronym that stands for the five major clauses in a FLWOR expression: for, let, where, order by and return. Using FLWOR expressions, you can iterate over sequences (frequently result sets in BDB XML), use variables, and filter, group, and sort sequences. You can even use FLWOR to perform joins of different data sources.

For example, suppose you had documents in your container that looked like this:

<product>
    <name>Widget A</name>
    <price>0.83</price>
</product>

In this case, queries against the container for these documents return the documents in order by their document name. But suppose you wanted to see all such documents in your container, ordered by price. You can do this with a FLWOR expression:

for $i in collection("myContainer.dbxml")/product
order by $i/price descending
return $i 

Note that from within BDB XML, you must provide FLWOR expressions in a single string. Lines can be separated either by a carriage return ("\n") or by a space. Thus, the above expression would become:

String flwor="for $i in collection('myContainer.dbxml')/product\n";
flwor += "order by $i/price descending\n";
flwor += "return $i"