Chapter 7. Using BDB XML Indices

Table of Contents

Index Types
Uniqueness
Path Types
Node Types
Key Types
Syntax Types
Specifying Index Strategies
Specifying Index Nodes
Indexer Processing Notes
Automatic Indexes
Managing BDB XML Indices
Adding Indices
Deleting Indices
Replacing Indices
Examining Container Indices
Working with Default Indices
Looking Up Indexed Documents
Verifying Indices using Query Plans
Query Plans
Using the dbxml Shell to Examine Query Plans

BDB XML provides a robust and flexible indexing mechanism that can greatly improve the performance of your BDB XML queries. Designing your indexing strategy is one of the most important aspects of designing a BDB XML-based application.

To make the most effective usage of BDB XML indices, design your indices for your most frequently occurring XQuery queries. Be aware that BDB XML indices can be updated or deleted in-place so if you find that your application's queries have changed over time, then you can modify your indices to meet your application's shifting requirements.

Note

The time it takes to re-index a container is proportional to the container's size. Re-indexing a container can be an extremely expensive and time-consuming operation. If you have large containers in use in a production setting, you should not expect container re-indexing to be a routine operation.

You can define indices for both document content and for metadata. You can also define default indices that are used for portions of your documents for which no other index is defined.

When you declare an index, you must identify its type and its syntax. You do this by providing the API with a string that identifies the type and syntax for the index. See Syntax Types for information on specifying the index syntax.

Finally, by default BDB XML does automatically index your containers, regardless of whether you added indexes yourself. You can turn this feature off if it is in your way. See Automatic Indexes for more information.

Index Types

The index type is defined by the following four types of information:

Uniqueness

Uniqueness indicates whether the indexed value must be unique within the container. For example, you can index an attribute and declare that index to be unique. This means the value indexed for the attribute must be unique within the container.

By default, indexed values are not unique; you must explicitly declare uniqueness for your indexing strategy in order for it to be enforced.

Path Types

If you think of an XML document as a tree of nodes, then there are two types of path elements in the tree. One type is just a node, such as an element or attribute within the document. The other type is any location in a path where two nodes meet. The path type, then, identifies the path element type that you want indexed. Path type node indicates that you want to index a single node in the path. Path type edge indicates that you want to index the portion of the path where two nodes meet.

Of the two of these, the BDB XML query processor prefers edge-type indices because they are more specific than an node-type index. This means that the query processor will use a edge-type index over a node-type if both indices provide similar information.

Consider the following document:

<vendor type="wholesale">
    <name>TriCounty Produce</name>
    <address>309 S. Main Street</address>
    <city>Middle Town</city>
    <state>MN</state>
    <zipcode>55432</zipcode>
    <phonenumber>763 555 5761</phonenumber>
    <salesrep>
        <name>Mort Dufresne</name>
        <phonenumber>763 555 5765</phonenumber>     
    </salesrep>
</vendor>

Suppose you want to declare an index for the name node in the preceding document. In that case:

Path Type Description
node

There are two locations in the document where the name node appears. The first of these has a value of "TriCounty Produce," while the second has a value of "Mort Dufresne." The result is that the name node will require two index entries, each with a different value. Queries based on a name node may have to examine both index entries in order to satisfy the query.

edge

There are two edge nodes in the document that involve the name node:

/vendor/name

and

salesrep/name

Indices that use this path type are more specific because queries that cross these edge boundaries only have to examine one index entry for the document instead of two.

Given this, use:

  • node path types to improve queries where there can be no overlap in the node name. That is, if the query is based on an element or attribute that appears on only one context within the document, then use node path types.

    In the preceding sample document, you would want to use node-type indices with the address, city, state, zipcode, and salesrep elements because they appear in only one context within the document.

  • edge path types to improve query performance when a node name is used in multiple contexts within the document. In the preceding document, use edge path types for the name and phonenumber elements because they appear in multiple (2) contexts within the document.

Node Types

BDB XML can index three types of nodes: element, attribute, or metadata. Metadata nodes are, of course, indices set for a document's metadata content.

Element and Attribute Nodes

Element and attribute nodes are only found in document content. In the following document:

<vendor type="wholesale">
    <name>TriCounty Produce</name>     
</vendor> 

vendor and name are element nodes, while type is an attribute node.

Use the element node type to improve queries that test the value of an element node. Use the attribute node type to improve any query that examines an attribute or attribute value.

Metadata Nodes

Metadata nodes are found only in a document's metadata content. This indices improve the performance of querying for documents based on metadata information. If you are declaring a metadata node, you cannot use a path type of edge.

Key Types

The Key type identifies what sort of test the index supports. You can use one of three key types:

Key Type Description
equality

Improves the performances of tests that look for nodes with a specific value.

presence

Improves the performance of tests that look for the existence of an node, regardless of its value.

substring

Improves the performance of tests that look for a node whose value contains a given substring. This key type is best used when your queries use the XQuery contains() substring function.