Specifying Index Nodes

Specifying Index Nodes
Prev	Chapter 7. Using BDB XML Indices	Next

It is possible to have BDB XML build indices at a node granularity rather than a document granularity. The difference is that document granularity is good for retrieving large documents while node granularity is good for retrieving nodes from within documents.

Indexing nodes can only be performed if your containers are performing node-level storage. You should consider using node indices if you have a few large documents stored in your containers and you will be performing queries intended to retrieve subsections of those documents. Otherwise, you should use document level indexes.

Because node indices can actually be harmful to your application's performance, depending on the actual read/write activity on your containers, expect to experiment with your indexing strategy to find out whether node or document indexes work best for you.

Node indices contain a little more information, so they may take more space on disk and could also potentially take longer to write. For example, consider the following document:

<names>
    <name>joe</name>
    <name>joe</name>
    <name>fred</name>
</names>

If you are using document-level indexing, then there is one index entry for each unique value occurring in the document for a given index. So if you have a string index on the name element, the above document would result in two index entries; one for joe and another for fred.

However, for node-level indices, there is one index entry for each node regardless of whether it is unique. Therefore, given an a string index on the name element, the above document would result in three index entries.

Given this, imagine that the document in use had 1000 name elements, 500 of which contained joe and 500 of which contained fred. For document-level indexing, you would still only have two index entries, while for node-level indexing you would have 1000 index entries per stored document. Whether the considerably larger size of the node-level index is worthwhile is something that you would have to evaluate based on the number of documents you are storing and the nature of your query patterns.

Note that by default, containers of type NodeContainer use node-level indexes. Containers of type WholedocContainer use document level indexes by default. You can change the default indexing strategy for a container by setting XmlContainerConfig::setIndexNodes() to XmlContainerConfig::On (for node-level indexes) or to XmlContainerConfig::Off (for document-level indexes).

You can tell whether a container is using node-level indices using the XmlContainer::getIndexNodes() method. If the container is creating node-level indices, this method will return true.

You can switch between node-level indices and document-level indices using the XmlManager::reindexContainer() method. Specify XmlContainerConfig::On to XmlContainerConfig::setIndexNodes() to cause the container to use node-level indices. Specify XmlContainerConfig::Off to cause it to use document-level indices. Note that this method causes your container to be completely re-indexed. Therefore, for containers containing large amount of data, or large numbers of indices, or both, this method should not be used routinely as it may take some time to write the new indices.

Prev	Up	Next
Specifying Index Strategies	Home	Indexer Processing Notes