BLOB support

The BLOB threshold
Creating BLOBs
BLOB access
BLOB storage
BLOBs and Replication

Binary Large Objects (BLOB) support is designed for efficient storage of large objects. An object is considered to be large if it is more than a third of the size of a page. Without BLOB support, large objects must be broken up into smaller pieces, and then reassembled and/or disassembled every time the record is read or updated. Berkeley DB BLOB support avoids this assembly/disassembly process by storing the large object in a special directory set aside for the purpose. The data itself is not kept in the database, nor is it placed into the in-memory cache.

BLOBs can only be stored using the data portion of a key/data pair. They are supported only for Btree, Hash, and Heap databases, and only so long as the database is not configured for checksums, encryption, duplicate records, or duplicate sorted records. In addition, the DBT that you use to access the BLOB data cannot be configured as a partial DBT if you want to access the data using the BLOB's streaming interface (introduced below).

Note that if the environment is transactionally-protected, then all access to the BLOB is also transactionally protected.

The BLOB threshold

The BLOB threshold is a positive integer, in bytes, which indicates how large an object must be before it is considered a BLOB. By default, the BLOB threshold for any given database is 0, which means that no object will ever be considered a BLOB. This means that the BLOB feature is not used by default for Berkeley DB databases.

In order to use the BLOB feature, you must set the BLOB threshold to a non-zero, positive integer value. You do this for a given database using the DB->set_blob_threshold() method. Note that this value must be set before you create the database. At any point after database creation time, this method is ignored.

In addition, if you are using an environment, you can change the default threshold for databases created in that environment to something other than 0 by using the DB_ENV->set_blob_threshold() method.

You can retrieve the BLOB threshold set for a database using the DB->get_blob_threshold(). You can retrieve the default BLOB threshold set for your environment using the DB_ENV->get_blob_threshold().

Creating BLOBs

There are two ways to create a BLOB. Before you can use either mechanism, you must set the BLOB threshold to a non-zero positive integer value (see the previous section for details). Once the BLOB threshold has been set, you create a BLOB using one of the two following mechanisms:

  • Configure the DBT used to access the BLOB data (that is, the DBT used for the data portion of the record) with the DB_DBT_BLOB flag. This causes the data to be stored as a BLOB regardless of its size, so long as the database otherwise supports BLOBs.

  • Alternatively, creating a data item with a size greater than the BLOB threshold will cause that data item to be automatically stored as a BLOB.

BLOB access

BLOBs may be accessed in the same way as other DBT data, so long as the data itself will fit into memory. More likely, you will find it necessary to use the BLOB streaming API to read and write BLOB data. You open a BLOB stream using the DBC->db_stream() method, close it with the DB_STREAM->close() method, write to it using the the DB_STREAM->write() method, and read it using the DB_STREAM->read() method.

The following example code fragment can be found in your DB distribution at .../db/examples/c/ex_blob.c.

...
    /* Some necessary variable declarations */
    DBC *dbc;       /* Cursor handle */
    DB_ENV *dbenv;  /* Environment handle */
    DB *dbp;        /* Database handle */
    DB_STREAM *dbs; /* Stream handle */
    DB_TXN *txn;    /* Transaction handle */
    DBT data, key;  /* DBT handles */
    int ret;
    db_off_t size;

    ...

    /* Environment creation skipped for brevity's sake */

    ...

    /* Enable blob files and set the size threshold. */
    if ((ret = dbenv->set_blob_threshold(dbenv, 1000, 0)) != 0) {
        dbenv->err(dbenv, ret, "set_blob_threshold");
        goto err;
    }

    ...

    /* Database and DBT creation skipped for brevity's sake */

    ...

    /* 
        Access the BLOB using the DB_STREAM API. 
    */
    if ((ret = dbenv->txn_begin(dbenv, NULL, &txn, 0)) != 0){
        dbenv->err(dbenv, ret, "txn");
        goto err;
    }

    if ((ret = dbp->cursor(dbp, txn, &dbc, 0)) != 0) {
        dbenv->err(dbenv, ret, "cursor");
        goto err;
    }

    /*
     * Set the cursor to a blob.  Use DB_DBT_PARTIAL with
     * dlen == 0 to avoid getting any blob data.
     */
    data.flags = DB_DBT_USERMEM | DB_DBT_PARTIAL;
    data.dlen = 0;
    if ((ret = dbc->get(dbc, &key, &data, DB_FIRST)) != 0) {
        dbenv->err(dbenv, ret, "Not a blob");
        goto err;
    }
    data.flags = DB_DBT_USERMEM;

    /* Create a stream on the blob the cursor points to.  */
    if ((ret = dbc->db_stream(dbc, &dbs, DB_STREAM_WRITE)) != 0) {
        dbenv->err(dbenv, 0, "Creating stream.");
        goto err;
    }

    /* Get the size of the blob.  */
    if ((ret = dbs->size(dbs, &size, 0)) != 0) {
        dbenv->err(dbenv, 0, "Stream size.");
        goto err;
    }
    /* Read from the blob. */
    if ((ret = dbs->read(dbs, &data, 0, (u_int32_t)size, 0)) != 0) {
        dbenv->err(dbenv, 0, "Stream read.");
        goto err;
    }
    /* Write data to the blob, increasing its size. */
    if ((ret = dbs->write(dbs, &data, size/2, 0)) != 0) {
        dbenv->err(dbenv, 0, "Stream write.");
        goto err;
    }
    /* Close the stream. */
    if ((ret = dbs->close(dbs, 0)) != 0) {
        dbenv->err(dbenv, 0, "Stream close.");
        goto err;
    }
    dbs = NULL;
    dbc->close(dbc);
    dbc = NULL;
    txn->commit(txn, 0);
    txn = NULL;
    free(data.data);
    data.data = NULL; 

    ...

    /* Handle clean up skipped. */ 

BLOB storage

BLOBs are not stored in the normal database files on disk in the same way as is other data managed by DB. Instead, they are stored as binary files in a special directory set aside for the purpose.

If you are not using environments, this special BLOB directory is created relative to the current working directory from which your application is running. You can modify this default location using the DB->set_blob_dir() method, and retrieve the current BLOB directory using DB->get_blob_dir().

If you are using an environment, then by default the BLOB directory is created within the environment's home directory. You can change this default location using DB_ENV->set_blob_dir() and retrieve the current default location using DB_ENV->get_blob_dir(). (Note that DB_ENV->get_blob_dir() can successfully retrieve the BLOB directory only if DB_ENV->set_blob_dir() was previously called.)

Note that because BLOBs are stored outside of the Berkeley DB database files, they are not confined by the four gigabyte limit used for Berkeley DB key and data items. The BLOB size limit is system dependent. It can be the maximum value in bytes of a signed 32 bit integer (if the Berkeley DB-defined type db_off_t is four bytes in size), or a signed 64 bit integer (if db_off_t is eight bytes in size).

BLOBs and Replication

Replication supports BLOBs without any special requirements. However, enabling BLOBs in a replicated environment can result in long synchronization times between the client and master sites. To avoid this, execute a transaction checkpoint after updating or deleting one or more BLOB records.