Compressing XML Documents

Turning Compression Off
Using Custom Compression

By default all documents stored in a BDB XML whole document containers are compressed when they are stored in those containers, and uncompressed when they are retrieved from those containers. This requires a little bit of overhead on document storage and retrieval, but it also saves on disk space.

Note that only documents are compressed; metadata and indexes are not compressed.

You can cause compression to be turned off. You can also implement your own custom compression routine.

Note that whatever compression you use when you initially add documents to your container must be used for the lifetime of the container. You cannot, for example, turn compression off for some documents in the container and leave it on for others. You also cannot use more than one compression technique for the container.

Turning Compression Off

You turn compression off by setting XmlContainerConfig.NO_COMPRESSION for the XmlContainerConfig.setCompressionName() method. Note that you must do this on every container open or an XmlException is thrown when you attempt to retrieve a document from the container.

For example:

try {
    // Set the container type as WholedocContainer and turn off
    // compression
    XmlContainerConfig contConf;
    contConf.setAllowCreate(true);
    contConf.setContainerType(XmlContainer::WholedocContainer);
    contConf.setCompressionName(XmlContainerConfig::NO_COMPRESSION);

    // Open container

    // mgr is the XmlManager, opened at some point prior to this
    // code fragment.
    XmlContainer cont = mgr.openContainer("container.dbxml", contConf);

    // From here you store and retrieve documents exactly in the same
    // way as you always would.  
    } catch (XmlException e) {
        // If you are turning off compression for a container that has
        // already stored compressed documents, BDB XML will not notice
        // until you try to retrieve a document that is compressed.
    }  

Using Custom Compression

You can implement custom compression routine for use with you BDB XML whole document containers. When you do this, you must register the compression routine when you create and open your container, and you must always use the same compression for all subsequent uses of the container.

You create a custom compression routine by providing an implementation of XmlCompression. You must implement methods that both compress and decompress your documents. Each of these methods must return true on success and false on failure.

Notice that these member methods do not perform actual container activity; rather, they operate on the data found in the source XmlData parameter, and store the results in the destination XmlData parameter.

class MyCompression extends XmlCompression {

    public boolean compress(XmlTransaction txn, 
        XmlData source, XmlData dest) {

        try {
            // Get the data to compress
            byte[] src = source.get_data();

            // Use JDK's ZLIB compress 
            java.util.zip.Deflater compressor = 
                new java.util.zip.Deflater();
            compressor.setInput(src);
            compressor.finish();

            java.io.ByteArrayOutputStream bos = 
                new java.io.ByteArrayOutputStream(src.length);

            // Compress the data
            byte[] buf = new byte[1024];
            while (!compressor.finished()) {
                int count = compressor.deflate(buf);
                bos.write(buf, 0, count);
            }
            bos.close();

            byte[] data = bos.toByteArray();

            // Set the compressed data
            dest.set(data);

        } catch (Exception e) {
            // If any exception, return false
            return true;
        }

        // Successful return true
        return true;
    }

    public boolean decompress(XmlTransaction txn, 
        XmlData source, XmlData dest) {

        try {
            // Get the data to decompress
            byte[] src = source.get_data();

            // Use JDK's ZLIB decompress
            java.util.zip.Inflater decompressor = 
                new java.util.zip.Inflater();
            decompressor.setInput(src);

            java.io.ByteArrayOutputStream bos = 
                new java.io.ByteArrayOutputStream(src.length);

            // Decompress the data
            byte[] buf = new byte[1024];
            while (!decompressor.finished()) {
                int count = decompressor.inflate(buf);
                bos.write(buf, 0, count);
            }
            bos.close();

            byte[] data = bos.toByteArray();

            // Set the decompressed data
            dest.set(data);

        } catch (Exception e) {
            // If any exception, return false
            return false;
        }

        // Successful return true
        return true;
    }
} 

To use this class implementation, you register your implementation with BDB XML, giving it a unique name as you do so. You then set that compression name to the container before opening it. All other container operations are performed as normal.

...

        String containerName = "compressionContainer.dbxml";
        String docName = "doc1.xml";
        String content = "<root><a></a></root>";
        String compressionName = "myCompression";

        try {

            XmlManager mgr = new XmlManager();

           // Register user's compression object into XmlManager
            mgr.registerCompression(compressionName, new MyCompression());

            containerConfig = new XmlContainerConfig();
            containerConfig.setContainerType(
                                        XmlContainer.WholedocContainer);

            // Set XmlContainerConfig custom compression
            containerConfig.setCompression(compressionName);

            cont = mgr.createContainer(containerName, containerConfig);

            // Put a document in
            doc = mgr.createDocument();
            doc.setName(docName);
            doc.setContent(content);
            cont.putDocument(doc);

            // Get the content
            System.out.println("Content of the document: "+
                cont.getDocument(docName).getContentAsString());

            // Clean up
            cont.delete();
            mgr.delete();

        } catch (XmlException e) {
            System.out.println("Exception: " + e.getMessage());
        }