Compressing XML Documents

Turning Compression Off
Using Custom Compression

By default all documents stored in a BDB XML whole document containers are compressed when they are stored in those containers, and uncompressed when they are retrieved from those containers. This requires a little bit of overhead on document storage and retrieval, but it also saves on disk space.

Note that only documents are compressed; metadata and indexes are not compressed.

You can cause compression to be turned off. You can also implement your own custom compression routine.

Note that whatever compression you use when you initially add documents to your container must be used for the lifetime of the container. You cannot, for example, turn compression off for some documents in the container and leave it on for others. You also cannot use more than one compression technique for the container.

Turning Compression Off

You turn compression off by setting XmlContainerConfig::NO_COMPRESSION for the XmlContainerConfig::setCompressionName() method. Note that you must do this on every container open or an XmlException is thrown when you attempt to retrieve a document from the container.

For example:

try {
    // Set the container type as WholedocContainer and turn off
    // compression
    XmlContainerConfig contConf;
    contConf.setAllowCreate(true);
    contConf.setContainerType(XmlContainer::WholedocContainer);
    contConf.setCompressionName(XmlContainerConfig::NO_COMPRESSION);

    // Open container

    // mgr is the XmlManager, opened at some point prior to this
    // code fragment.
    XmlContainer cont = mgr.openContainer("container.dbxml", contConf);

    // From here you store and retrieve documents exactly in the same
    // way as you always would.  
    } catch (XmlException &e) {
        // If you are turning off compression for a container that has
        // already stored compressed documents, BDB XML will not notice
        // until you try to retrieve a document that is compressed.
    }  

Using Custom Compression

You can implement custom compression routine for use with you BDB XML whole document containers. When you do this, you must register the compression routine when you create and open your container, and you must always use the same compression for all subsequent uses of the container.

You create a custom compression routine by providing an implementation of XmlCompression. You must implement methods that both compress and decompress your documents. Each of these methods must return true on success and false on failure.

The following is the class definition for a custom compression routine:

#include "dbxml/DbXml.hpp"
class MyCompression : public XmlCompression
{
    bool compress(XmlTransaction &txn,
                  const XmlData &source,
                  XmlData &dest);
    bool decompress(XmlTransaction &txn,
                   const XmlData &source,
                   XmlData &dest);
}; 

A true custom compresson implementation is beyond the scope of this manual, but the following is an example implementation that uses inverse permutation to simulate compression. Notice that these member methods do not perform actual container activity; rather, they operate on the data found in the source XmlData parameter, and store the results in the destination XmlData parameter.

bool MyCompression::compress(XmlTransaction &txn,
                            const XmlData &source,
                            XmlData &dest)
{
    try {
    // Get the data to compress
    char *pSrc = (char *)source.get_data();
    size_t size = source.get_size();

    // Use inverse permutation to simulate the compression process
    dest.reserve(size);
    char *buf = (char *)dest.get_data();
    for(size_t i=0; i<size; i++)
        buf[i] = pSrc[size-1-i];
    dest.set_size(size);

    } catch (XmlException &xe) {
        cout << "XmlException: " << xe.what() << endl;
        return false;
    }
    return true;
}

bool MyCompression::decompress(XmlTransaction &txn,
                               const XmlData &source,
                               XmlData &dest)
{
    try {
    // Get the data to decompress
    char *pSrc = (char *)source.get_data();
    size_t size = source.get_size();

    // Use inverse permutation to simulate the decompression process
    dest.reserve(size);
    char *buf = (char *)dest.get_data();
    for(size_t i=0; i<size; i++)
        buf[i] = pSrc[size-1-i];
    dest.set_size(size);

    } catch (XmlException &xe) {
        cout << "XmlException: " << xe.what() << endl;
        return false;
    }
    return true;
} 

To use this class implementation, you register your implementation with BDB XML, giving it a unique name as you do so. You then set that compression name to the container before opening it. All other container operations are performed as normal.

void useCompression(XmlManager& mgr,
                 const string& containerName,
                 XmlUpdateContext& uc,
                 XmlCompression& myCompression)
{
    string docName = "doc1.xml";
    string content = "<root><a></a></root>";

    // Setup the document
    XmlDocument xdoc1 = mgr.createDocument();
    xdoc1.setName(docName);
    xdoc1.setContent(content);

    // Define an unique name to use for registering the compression
    string compressionName = "myCompression";

    // Register custom class
    mgr.registerCompression(compressionName.c_str(), myCompression);

    // Set the container type as WholedocContainer
    // and use the custom compression
    XmlContainerConfig contConf;
    contConf.setAllowCreate(true);
    contConf.setContainerType(XmlContainer::WholedocContainer);
    contConf.setCompressionName(compressionName.c_str());

    // Create container
    XmlContainer cont = mgr.createContainer(containerName, contConf);

    // Put Document
    cont.putDocument(xdoc, uc);

    // Get the Document
    string content1;
    cont.getDocument(docName).getContent(content1);
}