Chapter 1. Introduction to Berkeley DB XML

Table of Contents

Architecture
Document Storage

Berkeley DB XML (BDB XML) is a programmatic toolkit specifically designed to store and manage XML data in its native format. BDB XML is built on top of the existing Berkeley DB database product, which provides fast, reliable, scalable, and mission-critical database support. Application developers can choose the version of Berkeley DB that is most suitable for a particular application: Berkeley DB Data Store, Berkeley DB Concurrent Data Store, Berkeley DB Transactional Data Store, or Berkeley DB High Availability.

BDB XML provides document query support through XQuery 3.0 and, by extension, XPath 2.0. XQuery is a W3C Recommendation. BDB XML also supports the XQuery Update for modification of document content. In addition, BDB XML is tested against the XQuery Test Suite, version 1.0, and results have been published to the W3C.

This document provides a very high level introduction to BDB XML. Users of BDB XML are assumed to have existing knowledge of XML, XQuery, XPath, either C++ or Java, and Berkeley DB.

This document also provides instructions on how to build the library, and instructions on how to compile and link the library with your application.

For a brief tour of Berkeley DB XML, see Introduction to Berkeley DB XML. For a complete introduction to BDB XML, see either the C++ or Java version of the Berkeley DB XML Getting Started Guide. For a complete description of the BDB XML API, see either the c++ api reference; or the Javadoc.

Architecture

Berkeley DB XML is implemented as C++ library on top of Berkeley DB. BDB XML is distributed as a shared library that is embedded into the client application. The BDB XML library exposes API's that enable C++ and Java applications to interact with the XML data containers. Figure 1 illustrates the Berkeley DB XML system architecture.

BDB XML uses Berkeley DB for data storage and transaction management. Client applications can also store data directly to a Berkeley DB database. Although BDB XML hides much of the internal use of Berkeley DB, some understanding of the underlying Berkeley DB API is required, as some BDB XML API methods accept Berkeley DB object handles as parameters. In particular, transactional applications need to fully understand the Berkeley DB database management interfaces for operations such as backup and restore, archiving, database recovery, etc.

The BDB XML library comprises several main components: document storage, XML indexing and index management, query optimization, and query execution.

Document Storage

Within Berkeley DB XML, documents are stored in containers. Containers are named and are files that include a number of Berkeley DB databases for information such as documents, indices and index statistics, data dictionary, and other system metadata. A container is the scope for indices, document names, container type, and other container-specific information. A client application can operate on multiple containers concurrently, and controls the placement of documents within containers. The client application can also store data to Berkeley DB databases. A client application can perform the following actions against a container:

  • Create or remove a container.

  • Add or drop an index in a container.

  • Open a container for use within the application.

  • Insert or delete a document in a container.

  • Retrieve a document from a container.

  • Update an existing document entirely or in part.

  • Set, modify, or remove document metadata.

  • Query a container using an XQuery or XPath expression.

  • Close a container.

  • Rename documents and containers.

  • Dump a container to a text file.

  • Load a container from a text file that was generated by a container dump.

  • Verify that a container is internally consistent.

For a complete description and examples of how to use the BDB XML API to perform these tasks, see either the C++ or Java version of the Berkeley DB XML Getting Started Guide.