Upgrading Berkeley DB XML 1.2.X applications to Berkeley DB XML 2.0

New and Changed Features in 2.0
Migrating Berkeley DB XML C++ Applications
Migrating Berkeley DB XML Java Applications
Migrating Berkeley DB XML Data to 2.0

If you are new to Berkeley DB XML or are already using a 2.x release, this documentation on 2.0 migration may be skipped. Berkeley DB XML Release 2.0 represents a significant upgrade from earlier Berkeley DB XML releases. This document discusses the changes, and how applications written for earlier releases can be upgraded to 2.0. There are sections for both C++ and Java. The Java section only discusses a few Java-specific issues. Java users should read the C++ section as well. It is assumed that the reader is familiar with the 1.2.X release of Berkeley DB XML. While reading this section, it will be helpful to refer to the C++ API reference or the Javadoc.

Features that are new in Berkeley DB XML 2.0 are not discussed, except as they affect code changes between 1.2.X and 2.0. This document is not intended to be an introduction to Berkeley DB XML Release 2.0. For a complete introduction to BDB XML, see either the C++ or Java version of the Berkeley DB XML Getting Started Guide. For a complete description of the BDB XML API, see either the C++ API reference or the Javadoc.

New and Changed Features in 2.0

The number of new features added in Berkeley DB XML Release 2.0 has led to a new API, as well as new database format. These require application changes in order to take advantage of the features of 2.0. The major features and changes visible to an application include:

  • XQuery support

  • Introduction of an XmlManager object to manage XmlContainers, and act as a factory object for operation contexts, including queries

  • Addition of a node storage container type

  • XmlDocument metadata handling improvements

  • Flexible interfaces for getting XML documents in and out of Berkeley DB XML

  • Extended indexing, including additional types, metadata indices, and unique indices.

  • General API improvements

A good way to become familiar with the new interface is to examine some of the 2.0 example programs. The next sections of this document use before/after code comparisons for common Berkeley DB XML operations to demonstrate how a 1.2.X program must change to work with 2.0.

Migrating Berkeley DB XML C++ Applications

XmlManager

The XmlManager is a new object in 2.0. It is used as a factory object for many Berkeley DB XML objects, as well as providing context for operations such as queries. Some of the common operations on XmlManager are:

  • XmlManager::createContainer()

  • XmlManager::openContainer()

  • XmlManager::createTransaction()

  • XmlManager::query()

Many of the operations that were previously methods on XmlContainer are now methods on XmlManager.

XmlContainer Management

The following is a comparison of 1.2.1 and 2.0 code to create an XmlContainer and insert a new document:

// Create a container, insert a document
// Do not use environment or transactions
//
// In 1.2.1
    XmlContainer container(0, "test.dbxml");
    container.open(0, DB_CREATE|DB_EXCL, 0);
    XmlDocument doc;
    doc.setContent("<root>newdoc</root>");
    container.putDocument(0, doc, 0);
    container.close(0);
//
// In 2.0
    XmlManager mgr;
    XmlContainer container = mgr.createContainer("test.dbxml");
    // createContainer and openContainer return opened containers
    XmlUpdateContext uc = mgr.createUpdateContext();
    container.putDocument("doc1", "<root>newdoc</root>", uc);
    // container and manager are closed when objects go out of scope

The points to notice are:

  • XmlManager is a factory object

  • 2.0 requires an XmlUpdateContext object for all modifications. This was happening under the covers in 1.2.X.

  • 2.0 does not require creation of an XmlDocument object in order to insert content. It is still an option.

  • 2.0 no longer exposes numeric document IDs. It requires names for documents, and the names must be unique within a container. The flag, DBXML_GEN_NAME, can be used to tell the system to generate a unique name if names are not important to an application.

  • Valid XmlContainer objects are implicitly opened when created.

  • Object scoping is used for automatic cleanup and exception safety. Internally, the 2.0 XmlManager and XmlContainer objects are reference counted, and closed upon release of last reference.

  • XmlManager has an openContainer() method that must be used to open existing containers. It can also be used to create new containers.

XmlManager and Berkeley DB DbEnv

In the 1.2.X API, the XmlContainer constructor takes a DbEnv * parameter which is used if a DbEnv is required. In 2.0, the DbEnv (Berkeley DB environment) is associated with the XmlManager object. In 1.2.X, the DbEnv, if provided, is managed externally to Berkeley DB XML. In 2.0, there is an option of passing the flag, DBXML_ADOPT_DBENV. If the DbEnv is adopted, it is owned by the XmlManager object, and is closed when the XmlManager destructor runs:

DbEnv *env = new DbEnv(0);
env->open("path", DB_INIT_MPOOL|DB_CREATE, 0);
XmlManager mgr(env, DBXML_ADOPT_DBENV);
// XmlManager will close and delete the DbEnv
// object when it goes out of scope 

Queries

The addition of the XmlManager object and the introduction of the XQuery query language to 2.0 change the way that queries are performed in two ways:

  1. Query language is XQuery, and no longer XPath 1.0. Most XPath 1.0 queries are valid in XQuery, usually with the addition of additional required syntax in XQuery.

  2. The Query context for 2.0 is the XmlManager object, and not constrained to a specific XmlContainer. A single XQuery can reference more than one container, and even reference specific documents, by name.

The following code compares a simple query in 1.2.X and 2.0:

// Assume an open container and XmlManager
// Assume container name is "test.dbxml"       
// Do not use environment or transactions
//
// In 1.2.1
       XmlResults results(container.queryWithXPath(0, "/vendor", 0));
       XmlValue value;
       while (results.next(value)) {
           // do something
       }      
//
// In 2.0
       // XmlQueryContext is required
       XmlQueryContext qc = mgr.createQueryContext();
       XmlResults results = 
            mgr.query("collection('test.dbxml')/vendor", qc);
       XmlValue value;
       while (results.next(value)) {
           // do something
       } 

The points to notice are:

  • XmlQueryContext is required in 2.0. In 1.2.X, it was created under the covers if it was defaulted.

  • The 2.0 query requires the string "collection('test.dbxml')" to point to a specific container. There are a number of ways to control query context in 2.0, both in the XQuery expression itself, and through the query interfaces.

Transactions

2.0 introduces a new object, XmlTransaction, which is used to wrap the Berkeley DB DbTxn object, and aids in internal transaction management. Rather than using an optional DbTxn * argument to a single interface, 2.0 defines 2 separate interfaces for each operation that may be transacted. One takes an XmlTransaction & argument, and the other does not. The following code compares 1.2.X and 2.0 code that performs a simple, transacted operation:

// Create a container, insert a document
// Use environment and transactions
// Assume DbEnv* has been constructed as dbEnv;
//
// In 1.2.1
       DbEnv *dbEnv;
       ...
       XmlContainer container(dbEnv, "test.dbxml");
       DbTxn *txn;
       dbEnv->txn_begin(0, &txn, 0);
       container.open(txn, DB_CREATE|DB_EXCL, 0);
       txn->commit(0);
       // new transaction for insert
       DbTxn *txn1;
       dbEnv->txn_begin(0, &txn1, 0);
       XmlDocument doc;
       doc.setContent("<root>newdoc</root>");
       container.putDocument(txn1, doc, 0);
       txn1->commit(0);
       ...
       container.close(0);
       dbEnv->close(0);
//
// In 2.0
       DbEnv *dbEnv;
       ...
       XmlManager mgr(dbEnv, DBXML_ADOPT_DBENV); // adopt env
       // create a transacted container
       XmlContainer container =
              mgr.createContainer("test.dbxml", DBXML_TRANSACTIONAL);
       XmlTransaction txn = mgr.createTransaction();
       // createContainer and openContainer return opened containers
       XmlUpdateContext uc = mgr.createUpdateContext();
       container.putDocument(txn, "doc1", "<root>newdoc</root>", uc);
       txn.commit(); 

The points to notice are:

  • 2.0 adds a DBXML_TRANSACTIONAL flag that can be passed to createContainer() and openContainer() to avoid the necessity of creating and committing a transaction for this purpose.

  • The DBXML_ADOPT_DBENV flag simplifies cleanup in 2.0.

  • The use of XmlTransaction and XmlManager::createTransaction() allows an application to ignore DB objects for most operations.

  • There is an XmlManager::createTransaction() method that takes a DbTxn * argument, allowing a DbTxn to be wrapped.

Migrating Berkeley DB XML Java Applications

The Java interface to Berkeley DB XML is similar to the C++ interface in spirit. In addition to the changes to the interface due to functional changes in Berkeley DB XML, the Java interface has changed to be more compatible with the Berkeley DB Java interface.

XmlManager and Environment

The Berkeley DB Java interface replaces the DbEnv object with an Environment object. It also replaces the DbTxn object with Transaction. The interface also replaces the use of integer flags with configuration objects. Berkeley DB XML has adopted this mechanism as well.

Configuration Object

There are 3 new configuration objects in Berkeley DB XML, replacing corresponding use of flags:

  1. XmlManagerConfig

    Use this object to configure a new XmlManager object.

  2. XmlContainerConfig

    Use this object to configure XmlContainer objects. A default XmlContainerConfig object can be set on an XmlManager object that affects all containers it creates and opens. This object extends DatabaseConfig, and inherits state, such as encryption, read isolation level, threading configuration, and read-only.

  3. XmlDocumentConfig

    Use this object to configure XmlDocument-level state, such as DBXML_LAZY_DOCS or DBXML_GEN_NAME (C++ flags).

Delete, GC and Object Life Cycle

Because Java objects in Berkeley DB XML are wrappers for native (C++) objects, the Java VM is not aware of memory consumed by the native objects. Therefore, GC on objects may not happen in a timely manner, if at all. This can result in out of memory conditions, outside of the Java VM control.

For this reason, it is necessary to explicitly delete most Berkeley DB XML Java objects when they are not longer required. This is especially true for the XmlResult, XmlValue, and XmlDocument objects, of which there can be many. All of the objects include delete() methods for this purpose.

Migrating Berkeley DB XML Data to 2.0

The database format is new in Berkeley DB XML release 2.0, and there is no upgrade utility at this time. If it is not possible to reload data from external files, it is possible to write a small, custom application to dump 1.2.X data, and load it into 2.0. The Berkeley DB XML dbxml_dump and dbxml_load programs will not work for this purpose.

Information Necessary for Load into 2.0

Migrating data is best thought about in terms of what information is needed to load into 2.0. A load comprises the following operations:

  1. Create a container. Choose a name, and type of container (node storage vs whole document storage — a new feature in 2.0).

  2. Specify indices. The same indices from a 1.2.X container will work; however, 2.0 introduces a number of new options and index types that can be used.

  3. Load XML documents. 2.0 requires names, and 1.2.X XmlDocument objects have numeric IDs, not names. The numeric IDs can serve as names, or the system can generate unique names, using the DBXML_GEN_NAME flag.

  4. Load XML document metadata. 2.0 has changed, and extended handling of metadata, including metadata indices. Also, in 2.0, metadata is no longer part of the document.

Information to Dump from 1.2.X

The remaining task is thinking about how to dump a 1.2.X container such that the information above is available:

  1. The XmlContainer name and type are an application choice, based on expected usage. If the application performed well with 1.2.X, then using whole document storage may be preferred.

  2. XmlIndexSpecification information can be extracted from a 1.2.X container.

  3. XML documents can be dumped to local files, for reloading.

  4. Obtaining 1.2.X metadata is more difficult. In this case, the application needs to know that the metadata exists, and acquire it and dump it to a format that can be used in the load step.