Chapter 2. Database Environments

Table of Contents

Opening Database Environments
Multiple Environments
Multiple Environment Subdirectories
Configuring a Shared Cache for Multiple Environments
Closing Database Environments
Environment Properties
The EnvironmentConfig Class
EnvironmentMutableConfig
Environment Statistics
Database Environment Management Example

Regardless of whether you are using the DPL or the base API, you must use a database environment. Database environments encapsulate one or more databases. This encapsulation provides your threads with efficient access to your databases by allowing a single in-memory cache to be used for each of the databases contained in the environment. This encapsulation also allows you to group operations performed against multiple databases inside a single transaction (see the Berkeley DB, Java Edition Getting Started with Transaction Processing guide for more information).

If you are using the base API, most commonly you use database environments to create and open databases (you close individual databases using the individual database handles). You can also use environments to delete and rename databases. For transactional applications, you use the environment to start transactions. For non-transactional applications, you use the environment to sync your in-memory cache to disk.

If you are using the DPL, all of these things are still being done, but the DPL takes care of it for you. Under the DPL, the most common thing you will explicitly use an environment for is to obtain transaction handles.

Regardless of the API that you use, you also use the database environment for administrative and configuration activities related to your database log files and the in-memory cache. See Administering Berkeley DB Java Edition Applications for more information.

To find out how to use environments with a transaction-protected application, see the Berkeley DB, Java Edition Getting Started with Transaction Processing guide.

Opening Database Environments

You open a database environment by instantiating an Environment object. You must provide to the constructor the name of the on-disk directory where the environment is to reside. This directory location must exist or the open will fail.

By default, the environment is not created for you if it does not exist. Set the creation property to true if you want the environment to be created. For example:

package je.gettingStarted;
    
import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;

import java.io.File;

...

// Open the environment. Allow it to be created if it does not 
// already exist.
Environment myDbEnvironment = null;

try {
    EnvironmentConfig envConfig = new EnvironmentConfig();
    envConfig.setAllowCreate(true);
    myDbEnvironment = new Environment(new File("/export/dbEnv"), 
                                      envConfig);
} catch (DatabaseException dbe) {
    // Exception handling goes here
} 

Opening an environment usually causes some background threads to be started. JE uses these threads for log file cleaning and some administrative tasks. However, these threads will only be opened once per process, so if you open the same environment more than once from within the same process, there is no performance impact on your application. Also, if you open the environment as read-only, then the background threads (with the exception of the evictor thread) are not started.

Note that opening your environment causes normal recovery to be run. This causes your databases to be brought into a consistent state relative to the changed data found in your log files. See Databases and Log Files for more information.

Multiple Environments

Most JE applications only need a single database environment because any number of databases can be created in a single environment, and the total size of the data in an environment is not limited. That said, your application can open and use as many environments as you have disk and memory to manage. Also, you can instantiate multiple Environment objects for the same physical environment.

The main reason for multiple environments is that an application must manage multiple unique data sets. By placing each data set in a separate environment, the application can gain real advantages in manageability of the data, and with application performance. By placing each data set in a unique environment, a separate set of log files is created and maintained in a separate directory, and so you can manipulate the log files for each data set separately. That is, you can:

  • Backup, restore or delete a single data set separately by copying or removing the files for its environment.

  • Balance the load between machines by moving the files for a single data set from one machine to another.

  • Improve I/O performance by placing each data set on a separate physical disk.

  • Delete individual data sets very efficiently by removing the environment's log files. This is much more efficient than deleting individual database records and is also move efficient than removing databases, and so can be a real benefit if you are managing large temporary data sets that must be frequently deleted.

Be aware that there is a downside to using multiple environments. In particular, understand that a single transaction cannot include changes made in more than one environment. If you need to perform a set of operations in more than one data set atomically (with a single transaction), use a single environment and distinguish the data sets using some other method.

For example, an application running a hosted service for multiple clients may wish to keep each client's data set separate. You can do this with multiple environments, but then you can operate on all data sets atomically. If you need to wrap operations for multiple data sets in a single transaction, consider some other approach to keeping the data sets separate.

You can, for example, distinguish each data set using a unique key range within a single database. Or you can create a secondary key that identifies the data set. Or you could use separate databases for each dataset. All of these approaches allow you to maintain multiple distinct dataset within a single environment, but each obviously adds a level of complexity to your code over what is required to simply use a unique environment for each data set.

Multiple Environment Subdirectories

You can spread your JE environment across multiple subdirectories. This allows you to improve data throughput by spreading disk I/O across multiple disks or filesystems. Environment subdirectories reside in the environment home directory and are named data001/ through dataNNN/, consecutively, where NNN is the number of subdirectories that you want to use. Typically, each of the dataNNN/ names are symbolic links to actual directories which reside on separate file systems or disks. Alternatively, each subdirectory can be mount points for filesystems which reside on different disk drives.

You control the number of subdirectories you want to use with the je.log.nDataDirectories property in the je.properties file. This value must be set prior to opening the environment, and the subdirectories must already exist at that time. The value set for this property can not change over the course of the environment's lifetime, or an exception is thrown when you attempt to open the environment.

The default value for je.log.nDataDirectories is 0, and this means no subdirectories are in use for the environment. A value greater than 0 indicates the number of subdirectories to use, and that number of subdirectories must exist prior to opening the environment.

For example, if you set je.log.nDataDirectories to 3, then the first time you open the environment (and for every environment open after that) your environment home directory must contain three subdirectories named data001, data002 and data003. This causes your JE log files (the *.jdb files) to be spread evenly across those three subdirectories. Finally, if you change the value of je.log.nDataDirectories without first completely deleting your environment, then your application will throw exceptions when you open your environment.

Configuring a Shared Cache for Multiple Environments

By default, each distinct JE environment has a separate, private in-memory cache. If a single JVM process will keep open multiple environments at the same time, it is strongly recommended that all such environments are configured to use a shared cache. A shared cache makes much more efficient use of memory than separate private caches.

For example, imagine that you open 5 environments in a single process and a total of 500 MB of memory is available for caching. Using private caches, you could configure each cache to be 100 MB. If one of the environments has a larger active data set than the others, it will not be able to take advantage of unused memory in the other environment caches. By using a shared cache, multiple open environments will make better use of memory because the cache LRU algorithm is applied across all information in all enviornments sharing the cache.

In order to configure an environment to use a shared cache, set EnvironmentConfig.setSharedCache() to true. This must be set for every environment in the process that you want to use the shared cache. For example:

package je.gettingStarted;
    
import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;

import java.io.File;

...

// Open the environment. Allow it to be created if it does not 
// already exist.
Environment myEnv1 = null;
Environment myEnv2 = null;

try {
    EnvironmentConfig envConfig = new EnvironmentConfig();
    envConfig.setAllowCreate(true);
    envConfig.setSharedCache(true);

    myEnv1 = new Environment(new File("/export/dbEnv1"), envConfig);
    myEnv2 = new Environment(new File("/export/dbEnv2"), envConfig);
} catch (DatabaseException dbe) {
    // Exception handling goes here
}