Cache Concepts

This section describes concepts unique to the TopLink cache, including the following:

Cache Type and Object Identity
Querying and the Cache
Handling Stale Data
Explicit Query Refreshes
Cache Invalidation
Cache Coordination
Cache Isolation
Cache Locking and Transaction Isolation
Cache Optimization

Cache Type and Object Identity

TopLink preserves object identity through its cache using the primary key attributes of a persistent entity. These attributes may or may not be assigned through sequencing (see "Projects and Sequencing"). In a Java application, object identity is preserved if each object in memory is represented by one, and only one, object instance. Multiple retrievals of the same object return references to the same object instance–not multiple copies of the same object.

Maintaining object identity is extremely important when the application's object model contains circular references between objects. You must ensure that the two objects are referencing each other directly, rather than copies of each other. Object identity is important when multiple parts of the application may be modifying the same object simultaneously.

Oracle recommends that you always maintain object identity. Disable object identity only if absolutely necessary, for example, for read-only objects (see "Configuring Read-Only Descriptors").

You can configure how object identity is managed on a class-by-class basis. The Descriptor object provides the cache and identity map options described in Table 90-1.

Table 90-1 Cache and Identity Map Options

Option (Identity Map)	Caching	Guaranteed Identity	Memory Use	Client/Server Transaction Save
Full Identity Map	Yes	Yes	High	Yes
Weak Identity Map	Yes	Yes	Low	No
Soft and Hard Cache Weak Identity Maps	Yes	Yes	Lower	Yes
No Identity Map	No	No	None	No

For more information, see "Guidelines for Configuring the Cache and Identity Maps".

Full Identity Map

This option provides full caching and guaranteed identity: objects are never flushed from memory unless they are deleted.

It caches all objects and does not remove them. Cache size doubles whenever the maximum size is reached. This method may be memory-intensive when many objects are read. Do not use this option on batch operations.

Oracle recommends using this identity map when the data set size is small and memory is in large supply.

Weak Identity Map

This option is similar to the full identity map, except that the map holds the objects by using weak references. This method allows full garbage collection and provides full caching and guaranteed identity.

The weak identity map uses less memory than full identity map but also does not provide a durable caching strategy across client/server transactions. Objects are available for garbage collection when the application no longer references them on the server side (that is, from within the server JVM).

Oracle recommends using this identity map for transactions that, once started, stay on the server side. Do not use this option for applications that expect objects to remain cached across client/server invocations.

Soft and Hard Cache Weak Identity Maps

This option is similar to the weak identity map except that it maintains a most frequently used subcache. The subcache uses soft or hard references to ensure that these objects are garbage-collected only if the system is low on memory.

The soft cache weak identity map and hard cache weak identity map provide more efficient memory use. They release objects as they are garbage-collected, except for a fixed number of most recently used objects. Note that weakly cached objects might be flushed if the transaction spans multiple client/server invocations. The size of the subcache is proportional to the size of the identity map as specified by the Descriptor method setIdentityMapSize. You should set this cache size to be as large as the maximum number of objects (of the same type) referenced within a transaction (see "Configuring Cache Type and Size at the Descriptor Level").

Oracle recommends using this identity map in most circumstances as a means to control memory used by the cache.

For more information, see "Understanding the Internals of Soft and Hard Cache Weak Identity Map".

No Identity Map

This option does not preserve object identity and does not cache objects.

Oracle does not recommend using the no identity map option.

Guidelines for Configuring the Cache and Identity Maps

You can configure the cache at the project ("Configuring Cache Type and Size at the Project Level") or descriptor ("Configuring Cache Type and Size at the Descriptor Level") level.

Use the following guidelines when configuring your cache and identity map:

If you are using a Java 2-compatible Virtual Machine (VM), objects with a long life span, and object identity are important, use a SoftCacheWeakIdentityMap or HardCacheWeakIdentityMap policy. For more information on when to choose one or the other, see "Understanding the Internals of Soft and Hard Cache Weak Identity Map".
If you are using a Java 2-compatible VM, and object identity is important but caching is not, use a WeakIdentityMap policy.
If an object has a long life span or requires frequent access, or is important, use a FullIdentityMap policy.
If an object has a short life span or requires frequent access, and identity is not important, use a CacheIdentityMap policy.
If objects are discarded immediately after being read from the database, such as in a batch operation, use a NoIdentityMap policy. The NoIdentityMap does not preserve object identity.

Understanding the Internals of Soft and Hard Cache Weak Identity Map

The SoftCacheWeakIdentityMap and HardCacheWeakIdentityMap types of identity map contain two caches:

Reference cache: implemented as a LinkedList that contains soft or hard references, respectively
Weak cache: implemented as a HashMap that contains weak references

When you create a SoftCacheWeakIdentityMap or HardCacheWeakIdentityMap with a specified size, the reference cache LinkedList is initialized to 50 percent of that size: the reference cache will never grow beyond this size. The weak cache HashMap is initialized to 100 percent of the specified size: the weak cache will grow when more objects than the specified size are read in. Because TopLink does not control garbage collection, the JVM can reap the weakly held objects whenever it sees fit.

Because the reference cache is implemented as a LinkedList, new objects are added to the end of the list. Because of this, it is by nature a least recently used (LRU) cache: fixed size, object at the top of the list is deleted, provided the maximum size has been reached.

The SoftCacheWeakIdentityMap and HardCacheWeakIdentityMap are essentially the same type of identity map. The HardCacheWeakIdentityMap was constructed to work around an issue with some JVMs.

If your application reaches a low system memory condition frequently enough, or if your platform's JVM treats weak and soft references the same, the objects in the reference cache may be garbage-collected so often that you will not benefit from the performance improvement provided by it. If this is the case, Oracle recommends that you use the HardCacheWeakIdentityMap. It is identical to the SoftCacheWeakIdentityMap except that it uses hard references in the reference cache. This guarantees that your application will benefit from the performance improvement provided by it.

When an object in a HardCacheWeakIdentityMap or SoftCacheWeakIdentityMap is pushed out of the reference cache, it gets put in the weak cache. Although it is still cached, TopLink cannot guarantee that it will be there for any length of time because the JVM can decide to garbage-collect weak references at anytime.

TopLink cleans up dead cache keys after every n^th access to a HardCacheWeakIdentityMap or SoftCacheWeakIdentityMap. For example, if you set the size to 100, then TopLink locks the cache and cleans up these cache keys on the 100^th access, 200^th access, 300^th access, and so on. Although this may appear to be a memory leak, it is not: it is a compromise that provides improved performance at the expense of holding on to memory until the n^th access. While reducing n frees memory more frequently, it does so at the unacceptable performance cost of performing an enumeration of the cache too frequently.

If you are querying in memory and you absolutely need the guarantee that the objects are in memory, then you should have those classes set to FullIdentityMap (see "Full Identity Map"). Querying in memory with SoftCacheWeakIdentityMap or HardCacheWeakIdentityMap does not 100 percent guarantee that the query will retrieve the objects that you expect.

When you query in memory, TopLink does not move the objects around to maintain a strict LRU. This behavior occurs only when you merge or query the database. Not supporting in-memory LRU functionality is a compromise that enhances performance. To support in-memory LRU functionality, the identity map would have to be locked for the duration of the move, resulting in performance degradation.

Querying and the Cache

A query that is run against the shared session cache is known as an in-memory query. Careful configuration of in-memory querying can improve performance (see "Using In-Memory Queries").

By default, a query that looks for a single object based on primary key attempts to retrieve the required object from the cache first, searches the data source only if the object is not in the cache. All other query types search the database first, by default. You can specify whether a given query runs against the in-memory cache, the database, or both.

For more information, see "Queries and the Cache".

Handling Stale Data

Stale data is an artifact of caching, in which an object in the cache is not the most recent version committed to the data source. To avoid stale data, implement an appropriate cache locking strategy.

By default, TopLink optimizes concurrency to minimize cache locking during read or write operations. Use the default TopLink isolation level, unless you have a very specific reason to change it. For more information on isolation levels in TopLink, see "Database Transaction Isolation Levels".

Cache locking regulates when processes read or write an object. Depending on how you configure it, cache locking determines whether a process can read or write an object that is in use within another process.

A well-managed cache makes your application more efficient. There are very few cases in which you turn the cache off entirely, because the cache reduces database access, and is an important part of managing object identity.

To make the most of your cache strategy and to minimize your application's exposure to stale data, Oracle recommends the following:

Configure a Locking Policy
Configure the Cache on a Per-Class Basis
Force a Cache Refresh When Required on a Per-Query Basis
Configure Cache Invalidation
Configure Cache Coordination

Configure a Locking Policy

Make sure you configure a locking policy so that you can prevent or at least identify when values have already changed on an object you are modifying. Typically, this is done using optimistic locking. TopLink offers several locking policies such as numeric version field, time-stamp version field, and some or all fields.

For more information, see "Configuring Locking Policy".

Configure the Cache on a Per-Class Basis

If other applications can modify the data used by a particular class, use a weaker style of cache for the class. For example, the SoftCacheWeakIdentityMap or WeakIdentityMap minimizes the length of time the cache maintains an object whose reference has been removed.

For more information, see "Configuring Cache Type and Size at the Descriptor Level".

Force a Cache Refresh When Required on a Per-Query Basis

Any query can include a flag that forces TopLink to go to the data source for the most up-to-date version of selected objects and update the cache with this information.

For more information, see the following:

Configure Cache Invalidation

Using descriptor API, you can designate an object as invalid: when any query attempts to read an invalid object, TopLink will go to the data source for the most up to date version of that object and update the cache with this information. You can manually designate an object as invalid or use a CacheInvalidationPolicy to control the conditions under which an object is designated invalid.

For more information, see "Cache Invalidation".

Configure Cache Coordination

If your application is primarily read-based and the changes are all being performed by the same Java application operating with multiple, distributed sessions, you may consider using the TopLink cache coordination feature. Although this will not prevent stale data, it should greatly minimize it.

For more information, see "Cache Coordination".

Explicit Query Refreshes

Some distributed systems require only a small number of objects to be consistent across the servers in the system. Conversely, other systems require that several specific objects must always be guaranteed to be up-to-date, regardless of the cost. If you build such a system, you can explicitly refresh selected objects from the database at appropriate intervals, without incurring the full cost of distributed cache coordination.

To implement this type of strategy, do the following:

Configure a set of queries that refresh the required objects.
Establish an appropriate refresh policy.
Invoke the queries as required to refresh the objects.

Refresh Policy

When you execute a query, if the required objects are in the cache, TopLink returns the cached objects without checking the database for a more recent version. This reduces the number of objects that TopLink must build from database results, and is optimal for noncoordinated cache environments. However, this may not always be the best strategy for a coordinated cache environment.

To override this behavior, set a refresh policy that specifies that the objects from the database always take precedence over objects in the cache. This updates the cached objects with the data from the database.

You can implement this type of refresh policy on each TopLink descriptor, or just on certain queries, depending upon the nature of the application.

For more information, see the following:

"Configuring Cache Refreshing"
"Refreshing the Cache"

Note:
Refreshing does not prevent phantom reads from occurring. See "Refreshing Finder Results".

EJB Finders and Refresh Policy

When you invoke a findByPrimaryKey finder, if the object exists in the cache, TopLink returns that copy. This is the default behavior, regardless of the refresh policy. To force a database query, you can configure the query to refresh by calling refreshIdentityMapResult method on it.

For more information, see "Queries and the Cache".

Cache Invalidation

By default, objects remain in the session cache until they are explicitly deleted (see "Deleting Objects") or garbage collected when using a weak identity map (see "Configuring Cache Type and Size at the Project Level").

Alternatively, you can configure any object with a CacheInvalidationPolicy that lets you specify, either automatically or manually, under what circumstances a cached object is invalid: when any query attempts to read an invalid object, TopLink will go to the data source for the most up-to-date version of that object, and update the cache with this information.

You can use any of the following CacheInvalidationPolicy instances:

DailyCacheInvalidationPolicy: the object is automatically flagged as invalid at a specified time of day.
NoExpiryCacheInvalidationPolicy: the object can only be flagged as invalid by explicitly calling oracle.toplink.sessions.IdentityMapAccessor method invalidateObject.
TimeToLiveCacheInvalidationPolicy: the object is automatically flagged as invalid after a specified time period has elapsed since the object was read.

You can configure a cache invalidation policy in the following ways:

At the project level that applies to all objects ("Configuring Cache Expiration at the Project Level")
At the descriptor level to override the project level configuration on a per-object basis ("Configuring Cache Expiration at the Descriptor Level")
At the query level that applies to the results returned by the query ("Configuring Cache Expiration at the Query Level")

If you configure a query to cache results in its own internal cache (see "Caching Query Results"), the cache invalidation policy you configure at the query level applies to the query's internal cache in the same way it would apply to the session cache.

If you are using a coordinated cache (see "Cache Coordination"), you can customize how TopLink communicates the fact that an object has been declared invalid. For more information, see "Configuring Cache Coordination Change Propagation at the Descriptor Level".

Cache Coordination

The need to maintain up-to-date data for all applications is a key design challenge for building a distributed application. The difficulty of this increases as the number of servers within an environment increases. TopLink provides a distributed cache coordination feature that ensures data in distributed applications remains current.

Cache coordination reduces the number of optimistic lock exceptions encountered in a distributed architecture, and decreases the number of failed or repeated transactions in an application. However, cache coordination in no way eliminates the need for an effective locking policy. To effectively ensure working with up-to-date data, cache coordination must be used with optimistic or pessimistic locking. Oracle recommends that you use cache coordination with an optimistic locking policy (see "Configuring Locking Policy").

You can use cache invalidation to improve cache coordination efficiency. For more information, see "Cache Invalidation".

For more information, see "Understanding Cache Coordination".

Cache Isolation

When your application acquires multiple instances of the same session within a given JVM, it is possible for one session cache to reference objects contained in another session cache. Similarly, if your application acquires multiple instances of the same session across distributed JVMs and uses cache coordination (see "Cache Coordination"), object changes made in one session are broadcast to all other sessions.

In some applications, you may want more control over which sessions can have visibility of another session's cache. To exercise this control, you can configure an isolated session that is guaranteed to use a dedicated connection to a data source and forbids any references into its isolated session cache.

For more information, see "Isolated Client Sessions".

Cache Locking and Transaction Isolation

By default, TopLink optimizes concurrency to minimize cache locking during read or write operations. Use the default TopLink transaction isolation configuration unless you have a very specific reason to change it.

For more information, see "Database Transaction Isolation Levels".

Cache Optimization

Tune the TopLink cache for each class to help eliminate the need for distributed cache coordination. Always tune these settings before implementing cache coordination.

For more information, see "Cache Optimization".