It is possible to have BDB XML build indices at a node granularity rather than a document granularity. The difference is that document granularity is good for retrieving large documents while node granularity is good for retrieving nodes from within documents.
Indexing nodes can only be performed if your containers are performing node-level storage. You should consider using node indices if you have a few large documents stored in your containers and you will be performing queries intended to retrieve subsections of those documents. Otherwise, you should use document level indexes.
Because node indices can actually be harmful to your application's performance, depending on the actual read/write activity on your containers, expect to experiment with your indexing strategy to find out whether node or document indexes work best for you.
Node indices contain a little more information, so they may take more space on disk and could also potentially take longer to write. For example, consider the following document:
<names> <name>joe</name> <name>joe</name> <name>fred</name> </names>
If you are using document-level indexing, then there is one index entry for each unique
value occurring in the document for a given index. So if you have a string index on the name
element, the above document would result in two index entries; one for joe
and another for
fred
.
However, for node-level indices, there is one index entry for each node regardless of whether it is unique.
Therefore, given an a string index on the name
element, the above document would result in
three index entries.
Given this, imagine that the document in use had 1000 name
elements, 500 of which contained
joe
and 500 of which contained fred
. For document-level indexing, you
would still only have two index entries, while for node-level indexing you would have 1000 index entries per
stored document. Whether the considerably larger size of the node-level index is worthwhile is something that
you would have to evaluate based on the number of documents you are storing and the nature of your query
patterns.
Note that by default, containers of type NodeContainer
use node-level indexes.
Containers of type WholedocContainer
use document level indexes by
default. You can change the default indexing strategy for a container by
setting
XmlContainerConfig::setIndexNodes()
to
XmlContainerConfig::On
(for node-level indexes) or to
XmlContainerConfig::Off
(for document-level indexes).
You can tell whether a container is using node-level indices using the
XmlContainer::getIndexNodes()
method. If the container is creating node-level indices, this method will return true
.
You can switch between node-level indices and document-level indices using the
XmlManager::reindexContainer()
method. Specify
XmlContainerConfig::On
to
XmlContainerConfig::setIndexNodes()
to cause the container to use node-level indices. Specify
XmlContainerConfig::Off
to cause it to use document-level indices. Note that this method
causes your container to be completely re-indexed. Therefore, for
containers containing large amount of data, or large numbers of
indices, or both, this method should not be used routinely as it
may take some time to write the new indices.