14.3. Datastore Cache

14.3.1. Overview of Kodo JDO Datastore Caching
14.3.2. Kodo JDO Cache Usage
14.3.3. Query Caching
14.3.4. DataCache Integrations
14.3.5. Cache Extension
14.3.6. Important Notes
14.3.7. Known Issues and Limitations

14.3.1. Overview of Kodo JDO Datastore Caching

Kodo JDO includes support for an optional datastore cache that operates at the PersistenceManagerFactory level. This cache is designed to significantly increase performance while remaining in full compliance with the JDO standard. This means that turning on the caching option can transparently increase the performance of your application, with no changes to your code.

Kodo JDO's datastore cache is not related to the PersistenceManager cache dictated by the JDO specification. The JDO specification mandates behavior for the PersistenceManager cache aimed at guaranteeing transaction isolation when operating on persistent objects. Kodo JDO's datastore cache is designed to provide significant performance increases over cacheless operation, while guaranteeing that all JDO behavior will be identical in both cache-enabled and cacheless operation.

There are five ways to access data via the JDO APIs: standard relation traversal, large result set relation traversal, JDOQL queries, direct invocation of PersistenceManager.getObjectById, and iteration over an extent's iterator. Kodo JDO's cache plugin accelerates three of these mechanisms. It does not provide any caching of large result set relations or extent iterators. If you find yourself in need of higher-performance extent iteration, see Example 14.16, “Query Replaces Extent”.

Table 14.1. Data access methods

Access methodUses cache
Standard relation traversal Yes
Large result set relation traversal No
JDOQL queryYes
PersistenceManager.getObjectById Yes
Iteration over an extent No

When enabled, the cache is checked before making a trip to the data store. Data is stored in the cache when objects are committed and when persistent objects are loaded from the datastore.

Kodo's datastore cache can operate both in a single-JVM environment and in a multi-JVM environment. Multi-JVM caching is achieved through the use of the distributed event notification framework, described in Section 14.4, “Remote Event Notification Framework”.

The single JVM mode of operation maintains and shares a data cache across all PersistenceManager instances obtained from a particular PersistenceManagerFactory. This is not appropriate for use in a distributed environment, as caches in different JVMs or created from different PersistenceManagerFactory objects will not be synchronized.

When used in conjunction with a kodo.event.RemoteCommitProvider, commit information is communicated to other JVMs via JMS or TCP, and remote caches are invalidated based on this information.

See the descriptions of the different remote commit providers in Section 14.4.1, “Remote Commit Provider Configuration” for details on multi-JVM cache synchronization options.

When using a Tangosol Coherence cache plug-in, all remote updating of cache information is delegated to the Coherence cache.

14.3.2. Kodo JDO Cache Usage

To enable the basic single-PersistenceManagerFactory cache, set the kodo.DataCache property to true, and set the kodo.RemoteCommitProvider property to sjvm:

kodo.DataCache: true
kodo.RemoteCommitProvider: sjvm

To configure the PersistenceManagerFactory cache to remain up-to-date in a distributed environment, set the kodo.RemoteCommitProvider property appropriately. This process is described in greater depth in Section 14.4, “Remote Event Notification Framework”.

The default cache implementations maintain a least-recently-used map of object ids to cache data. By default, 1000 elements are kept in cache. This can be adjusted by setting the CacheSize property in your plugin string -- see below for an example. Objects that are pinned into the cache are not counted when determining if the cache size exceeds the maximum.

Expired objects are moved to a soft reference map, so they may stick around for a little while longer. You can control the number of soft references Kodo keeps with the SoftReferenceSize property. Soft references are unlimited by default. Set to 0 to disable soft references completely.

kodo.DataCache: true(CacheSize=5000, SoftReferenceSize=0)

A cache timeout value can be specified for a class by setting the data-cache-timeout metadata extension to a positive number representing the amount of time in milliseconds for which a class's data is valid. Use a value of -1 for no expiration. This is the default value.

Example 14.4. Specifying a DataCache Timeout

<class name="Employee">
    <!-- time out employee objects after 10 seconds -->
    <extension vendor-name="kodo" key="data-cache-timeout" value="10000"/>
</class>

A specific cache can specify that it should be cleared at specific times instead of invalidating values after a period of time. The default cache implementations can take cron formatted eviction schedule specified as the EvictionSchedule property. The format of this property is a white space seperated list of five tokens, where the * symbol (asterisk), indicates match all. The tokens are in order:

  • Minute

  • Hour of Day

  • Day of Month

  • Month

  • Day of Week

For example, this would schedule the default cache to evict values from the cache at 15 and 45 past 3 PM on Sunday.

kodo.DataCache: true(EvictionSchedule="15,45 15 * * 1")

It is also possible for different persistence-capable classes to use different caches. This is achieved by specifying a cache name in the data-cache metadata extension.

Example 14.5. Specifying a Non-Default DataCache

<class name="Employee">
    <extension vendor-name="kodo" key="data-cache" value="small-cache"/>
</class>

This will cause instances of the Employee class to be stored in a cache named small-cache. This small-cache cache can be explicitly configured in the kodo.DataCache plugin string, or can be implicitly defined, in which case it will take on the same default configuration properties as the default cache identified in the kodo.DataCache property.

Example 14.6. Configuring and Acquiring a Named DataCache

kodo.DataCache: true, true(Name=small-cache, CacheSize=100)
import kodo.datacache.*;
import kodo.runtime.*;

... 

KodoPersistenceManager kpm = (KodoPersistenceManager) pm; 
DataCache smallCache = kpm.getConfiguration ().getDataCacheManager ().
    getDataCache ("small-cache");

The DataCache API provides a mechanism for pinning objects into memory by creating hard references to them. Caching algorithms are not permitted to remove objects that have been pinned unless an explicit remove call is made. To pin an object into memory, obtain a reference to the cache and invoke pin on it:

Example 14.7. Pinning an Object into the DataCache

import kodo.datacache.*;
import kodo.runtime.*;

... 

DataCache cache = KodoHelper.getDataCache (o);
cache.pin (JDOHelper.getObjectId (o));

A previously pinned object can later be unpinned by invoking DataCache.unpin:

Example 14.8. Unpinning an Object from the DataCache

import kodo.datacache.*;
import kodo.runtime.*;

... 

DataCache cache = KodoHelper.getDataCache (o);
cache.unpin (JDOHelper.getObjectId (o));

It is also possible to explicitly evict data from the cache.

Example 14.9. Evicting an Object from the DataCache

import kodo.datacache.*;
import kodo.runtime.*;

... 

DataCache cache = KodoHelper.getDataCache (o);
cache.remove (JDOHelper.getObjectId (o));

Rather than evicting objects from the data cache directly, you can also configure Kodo to automatically evict objects from the data cache when you use the persistence manager's eviction APIs.

Example 14.10. Data Cache Eviction Through the Persistence Manager

kodo.PersistenceManagerImpl: EvictFromDataCache=true

14.3.3. Query Caching

Query caching is enabled by default when datastore caching is enabled. The cache stores the object IDs returned by invocations of the Query.execute methods. When a query is executed, Kodo assembles a key based on the query properties and the parameters used at execution time, and checks for a cached query result. If one is found, the object IDs in the cached result are looked up, and the resultant persistence-capable objects are returned. Otherwise, the query is executed against the database, and the object IDs loaded by the query are put into the cache. The object ID list is not cached until the list returned at query execution time is fully traversed.

The default query cache implementation caches 100 query executions in a least-recently-used cache. This can be changed by setting the cache size in the CacheSize plugin property. Like the data cache, the query cache also has a backing soft reference map. The SoftReferenceSize property controls the size of this map. It defaults to no limit.

Example 14.11. Setting the Size of the Query Cache

kodo.QueryCache: CacheSize=1000, SoftReferenceSize=0

To disable the query cache completely, set the kodo.QueryCache property to false:

Example 14.12. Disabling the Query Cache

kodo.QueryCache: false

There are certain situations in which the query cache is bypassed:

  • Caching is not used for in-memory queries (queries in which the candidates are a collection instead of a class or extent).

  • Caching is not used in transactions that have IgnoreCache set to false and in which modifications to classes in the query's access path have occurred. If none of the classes in the access path have been touched, then cached results are still valid and are used.

  • Caching is not used in pessimistic transactions, since Kodo must go to the database to lock the appropriate rows.

  • Caching is not used when the the data cache does not have any cached data for an ID in a query result.

  • Queries that use custom result classes, groupings, aggregates, or projections are not cached.

Cache results are removed from the cache when instances of classes in a cached query's access path are touched. That is, if a query accesses data in class A, and instances of class A are modified, deleted, or inserted, then the cached query data is dropped from the cache.

It is possible to tell the query cache that a class has been altered. This is only necessary when the changes occur via direct modification of the database outside of Kodo's control.

Example 14.13. Notifying the Query Cache of Altered Classes

import kodo.datacache.*;
import kodo.runtime.*;

...

KodoPersistenceManager kpm = (KodoPersistenceManager) pm;
QueryCache cache = kpm.getConfiguration ().getDataCacheManager ().
    getQueryCache ();
Class[] changed = new Class[] { A.class, B.class };
cache.classesChanged (Arrays.asList (changed));

When using one of Kodo's distributed cache implementations, it is necessary to perform this in every JVM -- the change notification is not propagated automatically. When using a coherent cache implementation such as Kodo's Tangosol cache implementation, it is not necessary to do this in every JVM (although it won't hurt to do so), as the cache results are stored directly in the coherent cache.

Data can manually be dropped from the cache or pinned into the cache, as well. To do so, you must first create a QueryKey for the query invocation in question.

Example 14.14. Dropping or Pinning Query Results

import kodo.datacache.*;
import kodo.runtime.*;

...

KodoPersistenceManager kpm = (KodoPersistenceManager) pm;
QueryCache cache = kpm.getConfiguration ().getDataCacheManager ().
   getQueryCache ();

QueryKey key1 = QueryKey.newInstance (query, params1);
cache.pin (key1);

QueryKey key2 = QueryKey.newInstance (query, params2);
cache.remove (key2);

Pinning data into the cache instructs the cache to not expire the pinned results when cache flushing occurs. However, pinned results will be removed from the cache if an event occurs that invalidates the results.

Caching can be disabled on a per-persistence manager or per-query basis:

Example 14.15. Disabling and Enabling Query Caching

import kodo.query.*;
import kodo.runtime.*;

...

// temporarily disable query caching for all queries created from pm
KodoPersistenceManager kpm = (KodoPersistenceManager) pm;
kpm.getFetchConfiguration ().setQueryCacheEnabled (false);

// re-enable caching for a particular query
KodoQuery kq = (KodoQuery) pm.newQuery (A.class);
kq.getFetchConfiguration ().setQueryCacheEnabled (true);

14.3.4. DataCache Integrations

Several integrations to third party cache solutions for Kodo's JDO data cache exist. For details, see Appendix F, DataCache Integrations.

14.3.5. Cache Extension

The provided data cache classes can be easily extended to add additional functionality. If you are adding new behavior, you should extend kodo.datacache.DataCacheImpl. To use your own storage mechanism, extend kodo.datacache.AbstractDataCache, or implement kodo.datacache.DataCache directly. If you want to implement a distributed cache that uses an unsupported method for communications, create an implementation of kodo.event.RemoteCommitProvider. This process is described in greater detail in Section 14.4.2, “Customization”.

The query cache is just as easy to extend. Add functionality by extending the default kodo.datacache.QueryCacheImpl. Implement your own storage mechanism for query results by extending kodo.datacache.AbstractQueryCache or implementing the kodo.datacache.QueryCache interface directly.

14.3.6. Important Notes

  • The default cache implementations do not automatically refresh objects in other persistence managers when the cache is updated or invalidated. This behavior would not be compliant with the JDO specification.

  • Invoking PersistenceManager.evict does not result in the corresponding data being dropped from the data cache, unless you have set the proper configuration options as explained above (see Example 14.10, “Data Cache Eviction Through the Persistence Manager”). Other methods related to the persistence manager cache also do not effect the datastore cache. The datastore cache assumes that it is up-to-date with respect to the data store, so it is effectively an in-memory extension of the data store. To manipulate the datastore cache, you should generally use its APIs directly.

  • A kodo.event.RemoteCommitProvider must be specified (via the kodo.RemoteCommitProvider property) in order to use the data cache, even when using the cache in a single-JVM mode. When using it in a single-JVM context, the property can be set to sjvm.

14.3.7. Known Issues and Limitations

  • When using data store (pessimistic) transactions in concert with the distributed caching implementations, it is possible to read stale data when reading data outside a transaction.

    For example, if you have two JVMs (JVM A and JVM B) both communicating with each other, and JVM A obtains a data store lock on a particular object's underlying data, it is possible for JVM B to load the data from the cache without going to the data store, and therefore load data that should be locked. This will only happen if JVM B attempts to read data that is already in its cache during the period between when JVM A locked the data and JVM B received and processed the invalidation notification.

    This problem is impossible to solve without putting together a two-phase commit system for cache notifications, which would add significant overhead to the caching implementation. As a result, we recommend that people use optimistic locking when using data caching. If you do not, then understand that some of your non-transactional data may not be consistent with the data store.

    Note that when loading objects in a transaction, the appropriate data store transactions will be obtained. So, transactional code will maintain its integrity.

  • Extents are not cached. So, if you plan on iterating over a list of all the objects in an extent on a regular basis, you will only benefit from caching if you do so with a query instead:

    Example 14.16. Query Replaces Extent

    Extent extent = pm.getExtent (A.class, false);
    
    // This iterator does not benefit from caching...
    Iterator uncachedIterator = extent.iterator ();
    
    // ... but this one does.
    Query extentQuery = pm.newQuery (extent);
    Iterator cachedIterator = ((Collection) extentQuery.execute ()).iterator ();
    

  • Queries that use parameters that are FCOs are not cached.