Chapter 1. Introduction to Berkeley DB Java Edition

Table of Contents

Features
DPL Features
Base API Features
Which API Should You Use?
The JE Application
Database Environments
Key-Data Pairs
Storing Data
Duplicate Data
Replacing and Deleting Entries
Secondary Keys
Transactions
JE Resources
Application Considerations
JE Backup and Restore
JCA Support
JConsole and JMX Support
Getting and Using JE
JE Exceptions
Six Things Everyone Should Know about JE Log Files

Welcome to Berkeley DB Java Edition (JE). JE is a general-purpose, transaction-protected, embedded database written in 100% Java (JE makes no JNI calls). As such, it offers the Java developer safe and efficient in-process storage and management of arbitrary data.

You use JE through a series of Java APIs which give you the ability to read and write your data, manage your database(s), and perform other more advanced activities such as managing transactions. The Java APIs that you use to interact with JE come in two basic flavors. The first is a high-level API that allows you to make Java classes persistent. The second is a lower-level API which provides additional flexibility when interacting with JE databases.

Note

For long-time users of JE, the lower-level API is the traditional API that you are probably accustomed to using.

Regardless of the API set that you choose to use, there are a series of concepts and APIs that are common across the product. This manual starts by providing a high-level examination of JE. It then describes the APIs you use regardless of the API set that you choose to use. It then provides information on using the Direct Persistence Layer (DPL) API, followed by information on using the more extensive "base" API. Finally, we provide some database administration information.

Note that the information provided here is intended to focus on only introductory API usage. Other books describe more advanced topics, such as transactional usage. See the For More Information section for a listing of other titles in the JE documentation set.

Features

JE provides an enterprise-class Java-based data management solution. All you need to get started is to add a single jar file to your application's classpath. See Getting and Using JE for more information.

JE offers the following major features:

  • Large database support. JE databases efficiently scale from one to millions of records. The size of your JE databases are likely to be limited more by hardware resources than by any limits imposed upon you by JE.

    Databases are described in Databases.

  • Database environments. Database environments provide a unit of encapsulation and management for one or more databases. Environments are also the unit of management for internal resources such as the in-memory cache and the background threads. Finally, you use environments to manage concurrency and transactions. Note that all applications using JE are required to use database environments.

    Database environments are described in Database Environments.

  • Multiple thread and process support. JE is designed for multiple threads of control. Both read and write operations can be performed by multiple threads. JE uses record-level locking for high concurrency in threaded applications. Further, JE uses timeouts for deadlock detection to help you ensure that two threads of control do not deadlock indefinitely.

    Moreover, JE allows multiple processes to access the same databases. However, in this configuration JE requires that there be no more than one process allowed to write to the database. Read-only processes are guaranteed a consistent, although potentially out of date, view of the stored data as of the time that the environment is opened.

  • Transactions. Transactions allow you to treat one or more operations on one or more databases as a single unit of work. JE transactions offer the application developer recoverability, atomicity, and isolation for your database operations.

    Note that transaction protection is optional. Transactions are described in the Berkeley DB, Java Edition Getting Started with Transaction Processing guide.

  • In-memory cache. The cache allows for high speed database access for both read and write operations by avoiding unnecessary disk I/O. The cache will grow on demand up to a pre-configured maximum size. To improve your application's performance immediately after startup time, you can preload your cache in order to avoid disk I/O for production requests of your data.

    Cache management is described in Sizing the Cache.

  • Indexes. JE allows you to easily create and maintain secondary indices for your primary data. In this way, you can obtain rapid access to your data through the use of an alternative, or secondary, key.

    How indices work is dependent upon the API you are using. If you are using the DPL, see Working with Indices. Otherwise, see Secondary Databases.

  • Log files. JE databases are stored in one or more numerically-named log files in the environment directory. The log files are write-once and are portable across platforms with different endian-ness.

    Unlike other database implementations, there is no distinction between database files (that is, the "material database") and log files. Instead JE employs a log-based storage system to protect database modifications. Before any change is made to a database, JE writes information about the change to the log file.

    Note that JE's log files are not binary compatible with Berkeley DB's database files. However, both products provide dump and load utilities, and the files that these operate on are compatible across product lines.

    JE's log files are described in more detail in Backing up and Restoring Berkeley DB Java Edition Applications. For information on using JE's dump and load utilities, see The Command Line Tools. Finally, for a short list of things to know about log files while you are learning JE, see Six Things Everyone Should Know about JE Log Files.

  • Background threads. JE provides several threads that manage internal resources for you. The checkpointer is responsible for flushing database data to disk that was written to cache as the result of a transaction commit (this is done in order to shorten recovery time). The compressor thread removes subtrees from the database that are empty because of deletion activity. Finally, the cleaner thread is responsible for cleaning and removing unneeded log files, thereby helping you to save on disk space.

    Background thread management is described in Managing the Background Threads.

  • Backup and restore. JE's backup procedure consists of simply copying JE's log files to a safe location for storage. To recover from a catastrophic failure, you copy your archived log files back to your production location on disk and reopen the JE environment.

    Note that JE always performs normal recovery when it opens a database environment. Normal recovery brings the database to a consistent state based on change information found in the database log files.

    JE's backup and recovery mechanisms are described in Backing up and Restoring Berkeley DB Java Edition Applications.

DPL Features

The DPL is one of two APIs that JE provides for interaction with JE databases. The DPL provides the ability to cause any Java type to be persistent without implementing special interfaces. The only real requirement is that each persistent class have a default constructor.

The DPL provides all of the features previously identified in this chapter. In addition, the DPL offers you:

  • A type safe, convenient way to access persistent objects.

  • No hand-coding of bindings is required. A binding is a way of transforming data types into a format which can be stored in a JE database. If you do not use the DPL, you may be required to create custom bindings for your data types.

    See Using the BIND APIs for more information on creating data bindings.

    Note that Java byte code enhancement is used by the DPL API to provide fully optimized bindings that do not use Java reflection.

  • No external schema is required to define primary and secondary index keys. Java annotations are used to define all metadata.

  • Interoperability with external components is supported using the Java collections framework. Any index can be accessed using a standard java.util collection.

  • Class evolution is explicitly supported. This means you can add fields or widen types automatically and transparently.

    You can also perform many incompatible class changes, such as renaming fields or refactoring a single class. This is done using a built-in DPL mechanism called mutations. Mutations are automatically applied as data is accessed so as to avoid downtime to convert large databases during a software upgrade.

  • Persistent class fields can be private, package-private, protected or public. The DPL can access persistence fields either by bytecode enhancement or by reflection.

  • The performance of the underlying JE engine is safe-guarded. All DPL operations are mapped directly to the underlying APIs, object bindings are lightweight, and all engine tuning parameters are available.

  • Java 1.5 generic types and annotations are supported.

Base API Features

If you are not using the DPL, then the following concepts and features are likely to be of interest to you:

  • Database records. All database records are organized as simple key/data pairs. Both keys and data can be anything from primitive Java types to the most complex of Java objects.

    Database records are described in Database Records.

  • Direct database read and write. You can use methods of a Database object to read and write database records. Reading and writing using Database objects are described in Database Records.

  • Cursors. Cursors give you the ability to sequentially move through a database. Using cursors, you can seek to a specific point in the database (using search criteria applied to the key and/or the data portion of a database record) and then either step forward or step backwards through the database.

    Cursors are described in detail in Using Cursors.

  • JCA. JE provides support for the Java Connector Architecture. See JCA Support for more information.

  • JMX. JE provides support for Java Management Extensions. See JConsole and JMX Support for more information.

Which API Should You Use?

Of the two APIs that JE makes available to you, we recommend that you use the DPL if all you want to do is make classes with a relatively static schema to be persistent.

Further, if you are porting an application between Berkley DB and Berkeley DB Java Edition, then you should not use the DPL as the base API is a much closer match to the Berkley DB Java API.

Additionally, if your application uses a highly dynamic schema, then the DPL is probably a poor choice for your application, although the use of Java annotations can make the DPL work a little better for you in this situation.