Retrieving BDB XML Documents using XQuery

The Query Context
Performing Queries

Documents are retrieved from BDB XML when they match an XQuery path expression. Queries are either performed or prepared using an XmlManager object, but the query itself usually restricts its scope to a single container or document using one of the XQuery Navigation Functions.

When you perform a query, you must provide:

  1. The XQuery expression to be used for the query contained in a single string object.

  2. An XmlQueryContext object that identifies contextual information about the query, such as the namespaces in use and what you want for results (entire documents, or document values).

What you then receive back is a result set that is returned in the form of an XmlResults object. You iterate over this result sets in order to obtain the individual documents or values returned as a result of the query.

The Query Context

Context is a term that is heavily used in both BDB XML and XQuery. While overlap exists in how the term is used between the two, it is important to understand that differences exist between what BDB XML means by context and what the XQuery language means by it.

In XQuery, the context defines aspects of the query that aid in query navigation. For example, the XQuery context defines things like the namespace(s) and variables used by the query, the query's focus (which changes over the course of executing the query), and the functions and collations used by the query. Most thorough descriptions of XQuery will describe these things in detail.

In BDB XML, however, the context is a physical object (XmlQueryContext) that is used for very limited things (compared to what is meant by the XQuery context). You can use XmlQueryContext to control only part of the XQuery context. You also use XmlQueryContext to control BDB XML's behavior toward the query in ways that have no corresponding concept for XQuery contexts.

Specifically, you use XmlQueryContext to:

  • Define the namespaces to be used by the query.

  • Define any variables that might be needed for the query, although, these are not the same as the variables used by XQuery FLWOR expressions (see Defining Variables).

  • Defining whether the query is processed "eagerly" or "lazily" (see Defining the Evaluation Type).

Note that BDB XML also uses the XmlQueryContext to identify the query's focus as you iterate over a result set. See Examining Document Values for more information.

Defining Namespaces

In order for you to use a namespace prefix in your query, you must first declare that namespace to BDB XML. When you do this, you must identify the URI that corresponds to the prefix, and this URI must match the URI in use on your documents.

You can declare as many namespaces as are needed for your query.

To declare a namespace, use XmlQueryContext::setNamespace(). For example:

#include "DbXml.hpp"
...

using namespace DbXml;

...

// Get a manager object.
XmlManager myManager;

// Open a container
XmlContainer myContainer = 
    myManager.openContainer("exampleData.dbxml");

// Get a query context
XmlQueryContext context = myManager.createQueryContext();

// Declare a namespace
context.setNamespace("fruits", "http://groceryItem.bdbxml/fruits");
context.setNamespace("vegetables", 
                     "http://groceryItem.bdbxml/vegetables"); 

Note

If you pass an empty prefix to setNamespace(), the URI you provide is set as the default URI.

Defining Variables

In XQuery FLWOR expressions, you can set variables using the let clause. In addition to this, you can use variables that are defined by BDB XML You define these variables using XmlQueryContext::setVariableValue().

You can declare as many variables using XmlQueryContext::setVariableValue() as you need.

#include "DbXml.hpp"
...

using namespace DbXml;

...

// Get a manager object.
XmlManager myManager;

// Open a container
XmlContainer myContainer = 
    myManager.openContainer("exampleData.dbxml");

// Get a query context
XmlQueryContext context = myManager.createQueryContext();

// Declare a variable. Note that this method really wants an XmlValue
// object as the variable's argument. However, we just give it a
// string here and allow XmlValue's string constructor to create
// the XmlValue object for us.
context.setVariableValue("myVar", "Tarragon"); 

// Declare the query string
std::string myQuery = 
    "collection('exampleData.dbxml')/product[item=$myVar]";

Defining the Evaluation Type

The evaluation type defines how much work BDB XML performs as a part of the query, and how much it defers until the results are evaluated. There are two evaluation types:

Evaluation Type Description

Eager

The query is executed and its resultant values are derived and stored in-memory before the query returns. This is the default.

Lazy

Minimal processing is performed before the query returns, and the remaining processing is deferred until you iterate over the result set.

You use XmlQueryContext::setEvaluationType() to set a query's return type. For example:

#include "DbXml.hpp"
...

using namespace DbXml;

...

// Get a manager object.
XmlManager myManager;

// Open a container
XmlContainer myContainer = 
    myManager.openContainer("exampleData.dbxml");

// Get a query context
XmlQueryContext context = myManager.createQueryContext();

// Set the evaluation type to Lazy.
context.setEvaluationType(XmlQueryContext::Lazy); 

Performing Queries

You perform queries using an XmlManager object. When you perform a query, you can either:

  1. Perform a one-off query using XmlManager::query(). This is useful if you are performing queries that you know you will never repeat within the process scope. For example, if you are writing a command line utility to perform a query, display the results, then shut down, you may want to use this method.

  2. Perform the same query repeatedly by using XmlManager::prepare() to obtain an XmlQueryExpression object. You can then run the query repeatedly by calling XmlQueryExpression::execute().

    Creation of a query expression is fairly expensive, so any time you believe you will perform a given query more than one time, you should use this approach over the query() method.

Regardless of how you want to run your query, you must restrict the scope of your query to one or more containers, documents, or nodes. Usually you use one of the XQuery navigation functions to do this. See Navigation Functions for more information.

Note

You can configure the query to be performed lazily. If it is performed lazily, then only those portions of the document that are actually required to satisfy the query are returned in the results set immediately. All other portions of the document may then be retrieved by BDB XML as you iterate over and use the items in the result set.

If you are using node-level storage, then a lazy query may result in only the document being returned, but not its metadata, or the metadata but not the document itself. In this case, use XmlDocument::fetchAllData() to ensure that you have both the document and its metadata.

To specify laziness for the query, use DBXML_LAZY_DOCS as a flag value to either XmlManager::query() or XmlQueryExpression::execute().

Be aware that lazy docs is different from lazy evaluation. Lazy docs determines whether all document data and document metadata is returned as a result of the query. Lazy evaluation determines how much query processing is deferred until the results set is actually examined.

For example, the following executes a query against an XmlContainer using XmlManager::prepare().

#include "DbXml.hpp"
...

using namespace DbXml;

...

// Get a manager object.
XmlManager myManager;

// Open a container
XmlContainer myContainer = 
    myManager.openContainer("exampleData.dbxml");

// Get a query context
XmlQueryContext context = myManager.createQueryContext();

// Declare a namespace
context.setNamespace("fruits", "http://groceryItem.dbxml/fruits");

// Declare the query string
std::string myQuery = 
    "collection('exampleData.dbxml')/fruits:product[item=$myVar]";

// Prepare (compile) the query
XmlQueryExpression qe = myManager.prepare(myQuery, context);

// Run the query. Note that you can perform this query many times 
// without suffering the overhead of re-creating the query expression.
// Notice that the only thing we are changing is the variable value,
// which allows us to control exactly what gets returned for the query.
XmlResults results = qe.execute(context, 0); 

context.setVariableValue(myVar, "Tarragon");
XmlResults results = qe.execute(context); 

// Do something with the results

context.setVariableValue(myVar, "Oranges");
results = qe.execute(context); 

// Do something with the results

context.setVariableValue(myVar, "Kiwi");
results = qe.execute(context); 

Finally, note that when you perform a query, by default BDB XML will read and validate the document and any attached schema or DTDs. This can cause performance problems, so to avoid it you can pass the DBXML_WELL_FORMED_ONLY flag to XmlQueryExpression::execute(). This can improve performance by causing the scanner to examine only the XML document itself, but it can also cause parsing errors if the document references information that might have come from a schema or DTD.

Metadata Based Queries

You can query for documents based on the metadata that you set for them. To do so, do the following:

  • Define a namespace for the query that uses the URI that you set for the metadata against which you will perform the query. If you did not specify a namespace for your metadata when you added it to the document, then use an empty string.

  • Perform the query using the special dbxml:metadata() from within a predicate.

For example, suppose you placed a timestamp in your metadata using the URI 'http://dbxmlExamples/timestamp' and the attribute name 'timeStamp'. Then you can query for documents that use a specific timestamp as follows:

#include "DbXml.hpp"
...

using namespace DbXml;

...

// Get a manager object.
XmlManager myManager;

// Open a container
XmlContainer myContainer = 
    myManager.openContainer("exampleData.dbxml");
std::string col = "collection('exampleData.dbxml')";

// Get a query context
XmlQueryContext context = myManager.createQueryContext();

// Declare a namespace. The first argument, 'ts', is the
// namespace prefix and in this case it can be anything so
// long as it is not reused with another URI within the same
// query.
context.setNamespace("ts", "http://dbxmlExamples/timestamp");

// Declare the query string
std::string myQuery = col;
myQuery += "/*[dbxml:metadata('ts:timeStamp')=00:28:38]";

// Prepare (compile) the query
XmlQueryExpression qe = myManager.prepare(myQuery, context);

// Run the query. 
XmlResults results = qe.execute(context, 0);