9 Oracle TopLink (EclipseLink) JPA Performance Tuning

This chapter describes some of the available performance tuning features for EclipseLink, an open-source persistence framework used with Oracle TopLink. The chapter includes the following topics:

9.1 About Oracle TopLink and EclipseLink

Oracle TopLink includes the open source EclipseLink as the Java Persistence API (JPA) implementation. Oracle TopLink extends EclipseLink with advanced integration into the Oracle Application Server.

The Java Persistence API is a specification for persistence in Java EE and Java SE applications. In JPA, a persistent class is referred to as an entity. An entity is a plain old Java object (POJO) class that is mapped to the database and configured for usage through JPA using annotations, persistence XML, or both. This chapter focuses on tuning JPA in the context of EJB3.0 and a Java EE environment.

The information in this chapter assumes that you are familiar with the basic functionality of EclipseLink. Before you begin tuning, consider reviewing the following introductory information:

For more information on Oracle TopLink, see the TopLink page on OTN http://www.oracle.com/technology/products/ias/toplink/index.html.[Note that as of Oracle TopLink Release 11g, the older Toplink APIs have been deprecated. For more information, see the TopLink Release Notes at http://www.oracle.com/technology/products/ias/toplink/doc/11110/relnotes/toplink-relnotes.html#CHDGAEDJ]

Note:

This chapter serves as a 'Quick Start' guide to performance tuning JPA in the context of a Java EE environment. While the chapter provides common performance tuning considerations and related documentation resources, it is not meant to be comprehensive list of areas to tune.

9.2 Basic Tuning Considerations

The following tuning recommendations are applicable to most deployments. Always consult your own usecase scenarios before implementing any of these configurations.

9.2.1 Creating Efficient SQL Statements and Queries

This section covers using efficient SQL statements and SQL querying. Table 9-1 and Table 9-2 show tuning parameters and performance recommendations related to SQL statements and querying.

Table 9-1 EJB/JPA Using Efficient SQL Statements and Querying

Tuning Parameter Description Performance Notes

Parameterized SQL Binding

Using parameterized SQL and prepared statement caching, you can improve performance by reducing the number of times the database SQL engine parses and prepares SQL for a frequently called query. EclipseLink enables parameterized SQL by default. However, not all databases and JDBC drivers support these options. Note that the Oracle JDBC driver bundled with Oracle Application Server does support this option. The persistence property in persistence.xml "eclipselink.jdbc.bind-parameters" is used to configure this.

See Also: "Caching" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching and "Querying" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Querying

Default Value: PERSISTENCE_UNIT_DEFAULT (which is true by default)

Leave parameterized SQL binding enabled for selected databases and JDBC drivers that support these options.

JDBC Statement Caching

Statement caching is used to lower the performance impact of repeated cursor creation and repeated statement parsing and creation; this can improve performance for applications using a database.

Note: For Java EE applications, use the data source's statement caching (and do not use EclipseLink Statement Caching for EJB3.0/JPA, for example: eclipselink.jdbc.cache-statements"="true").

Set this option in an Oracle Weblogic data-source by setting Statement Cached Type and Statement Cached Size configuration options.

See also "Increasing Performance with the Statement Cache" in Administering JDBC Data Sources for Oracle WebLogic Server.

Default Value: The Oracle Weblogic Server data source default statement cache size is 10 statements per connection.

You should always enable statement caching if your JDBC driver supports this option. The Oracle JDBC driver supports this option.

Fetch Size

The JDBC fetch size gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed.

For large queries that return a large number of objects, you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria.

Most JDBC drivers use a default fetch size of 10. If you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results.

Note: The default value means use the JDBC driver default value, which is typically 10 rows for the Oracle JDBC driver.

To configure this, use query hint "eclipselink.jdbc.fetch-size".

Default Value: 0

The optimal fetch size is not always obvious. Usually, a fetch size of one half or one quarter of the total expected result size is optimal. Note that if you are unsure of the result set size, incorrectly setting a fetch size too large or too small can decrease performance.

Batch Writing

Batch writing can improve database performance by sending groups of INSERT, UPDATE, and DELETE statements to the database in a single transaction, rather than individually.

The persistence property in persistence.xml "eclipselink.jdbc.batch-writing"="JDBC" is used to configure this.

Default Value: Off

Enable for the persistence unit.

Change Tracking

This is an optimization feature that lets you tune the way EclipseLink detects changes in an Entity.

Default Value: AttributeLevel if using weaving (Java EE default), otherwise Deferred.

Leave at default AttributeLevel for best performance.

Weaving

Can disable through persistence.xml properties "eclipselink.weaving"

Default Value: On

Leave on for best performance.

Read Only

Setting an EJB3.0 JPA Entity to read-only ensures that the entity cannot be modified and enables EclipseLink to optimize unit of work performance.

Set through query hint "eclipselink.read-only".

Can also be set at entity level using @ReadOnly class annotation.

Default Value: False

For optimal performance use read-only on any query where the resulting objects are not changed.

firstResult and maxRows

These are JPA query properties that are used for paging large queries. Typically, these properties can be used when the entire result set of a query returning a large number of rows is not needed. For example, when a user scans the result set (a page at a time) looking for a particular result and then discards the rest of the data after the record is found.

Use on queries that can have a large result set and only a subset of the objects is needed.

Sequence number pre-allocation

Sequence number pre-allocation enables a batch of ids to be queried from the database simultaneously in order to avoid accessing the database for an id on every insert.

Default Value: 50

Always use sequence number pre-allocation for best performance for inserts. SEQUENCE or TABLE sequencing should be used for optimal performance, not IDENTITY which does not allow pre-allocation.


9.2.1.1 Tuning Entity Relationships Query Parameters

Table 9-2 shows the Entity relationship query parameters for performance tuning.

Table 9-2 EJB3.0 Entity Relationship Query Performance Options

Tuning Parameter Description Performance Notes

Batch Fetching

The eclipselink.batch hint supplies EclipseLink with batching information so subsequent queries of related objects can be optimized in batches instead of being retrieved one-by-one or in one large joined read.

Batch fetching has three types: JOIN, EXISTS and IN. The type is set through the query hint "eclipselink.batch.type"

Note that batching is only allowed on queries that have a single object in their select clause. The query hint to configure this is "eclipselink.batch". Batch fetching can also be set using the @BatchFetch annotation.

Default Value: Off

Use for queries of tables with columns mappings to table data you need.You should only use either batch fetching or joining if you know that you are going to access all of the data; if you do not intend to access the relationships, then just let indirection defer their loading.

Batch fetching is more efficient than joining because it avoids reading duplicate data; therefore for best performance for queries where batch fetching is supported, consider using batch fetching instead of join reading.

Join Fetching

Join fetching is a query optimization feature that enables a single query for a class to return the data to build the instances of that class and its related objects.

Use this feature to improve query performance by reducing database access. By default, relationships are not join-read: each relationship is fetched separately when accessed if you are using lazy-loading, or as a separate database query if you are not using lazy-loading.

You can specify the use of join in JPQL (JOIN FETCH), or you can set it multi-level in a query hint, "eclipselink.join-fetch". It also can be set in the mapping annotation @JoinFetch.

Joining is part of the JPA specification, whereas batch fetching is not. And, joining works on queries that not work with batch fetching. For example, joining works on queries with multiple objects in the select clause, queries with a single result, and for cursors and first/max results, whereas batch fetching does not.

See Also: "Join Fetch" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Querying/Query_Hints#Join_Fetch

Default Value: Not Used

Use for queries of tables with columns mappings to table data you need.You should only use either batch fetching or joining if you know that you are going to access all of the data; if you do not intend to access the relationships, then just let indirection defer their loading.For the best performance of selects, where batch fetching is not supported, a join is recommended

Lazy loading

Without lazy loading on, when EclipseLink retrieves a persistent object, it retrieves all of the dependent objects to which it refers. When you configure lazy reading (also known as indirection, lazy loading, or just-in-time reading) for an attribute mapped with a relationship mapping, EclipseLink uses an indirection object as a place holder for the referenced object.

EclipseLink defers reading the dependent object until you access that specific attribute. This can result in a significant performance improvement, especially if the application is interested only in the contents of the retrieved object, rather than the objects to which it is related.

See Also: "Lazy Loading" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Mapping/Basic_Mappings/Lazy_Basics

Default Value: On for collection mapping (ToMany mappings, @OneToMany, @ManyToMany)

Default Value: Off for reference (ToOne mappings, @OneToOne, @ManyToOne)

(Note that setting lazy loading On for @OneToOne, @ManyToOne requires weaving, which is On by default for Java Java EE.)

Use lazy loading for all mappings. Using lazy loading and querying the referenced objects using batch fetching or Join is more efficient than Eager loading.

You may also consider using optimized loading with LoadGroups which allows a query to force instantiation of relationships.


9.2.2 Tuning Cache Configuration

This section describes tuning the default internal cache that is provided by EclipseLink. Oracle Toplink/EclipseLink can also be integrated with Oracle Coherence. For information on configuring and tuning an EclipseLink Entity Cache using Oracle Coherence, see Section 9.3.1, "Integrating with Oracle Coherence".

The default settings for EJB3.0/JPA used with the EclipseLink persistence manager and cache are no locking, no cache refresh, and cache-usage DoNotCheckCache. To ensure that your application uses the cache and does not read stale data from the cache (when you do not have exclusive access), you must configure these and other isolation related settings appropriately. Table 9-3 shows the cache configuration options.

For more information on cache configuration, see "Caching" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching.

Note:

By default, EclipseLink assumes that your application has exclusive access to the data it is using (that is, there are no external, non-EclipseLink, applications modifying the data). If your application does not have exclusive access to the data, then you must change some of the defaults from Table 9-3.

Table 9-3 EJB3.0 JPA Entities and Cache Configuration Options

Tuning Parameter Description Performance Notes

Object Cache

EclipseLink sessions provide an object cache. EJB3.0 JPA applications that use the EclipseLink persistence manager create EclipseLink sessions that by default use this cache. This cache, known as the session cache, retains information about objects that are read from or written to the database, and is a key element for improving the performance of an EclipseLink application.

Typically, a server session's object cache is shared by all client sessions acquired from it. Isolated sessions provide their own session cache isolated from the shared object cache.

The annotation type @Cacheable specifies whether an entity should be cached. Caching is enabled when the value of the persistence.xml caching element is ENABLE_SELECTIVE or DISABLE_SELECTIVE. The value of the Cacheable annotation is inherited by subclasses; it can be overridden by specifying Cacheable on a subclass.

Cacheable(false) means that the entity and its state must not be cached by the provider.

Default Value: Enabled (shared is True)

Generally it is recommended that you leave caching enabled. If you have an object that is always read from the database, as in a pessimistic locked object, then the cache for that entity should be disabled. Also, consider disabling the cache for infrequently accessed entities

Query Result Set Cache

In addition to the object cache in EclipseLink, EclipseLink also supports a query cache:

  • The object cache indexes objects by their primary key, allowing primary key queries to obtain cache hits. By using the object cache, queries that access the data source can avoid the cost of building the objects and their relationships if the object is already present.

  • The query cache is distinct from the object cache. The query cache is indexed by the query and the query parameters - not the object's primary key. This enables any query executed with the same parameters to obtain a query cache hit and return the same result set.

The query hints for a query cache are:

"eclipselink.query-cache"

"eclipselink.query-cache.size"

"eclipselink.query-cache.invalidation"

See Also: "Caching" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching and "EclipseLink JPA Query Hints" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Querying/Query_Hints

Default Value: Not Used

Use for frequently executed non-primary key queries with infrequently changing result sets.Use with a cache invalidation time out to refresh as needed.

Cache Size

Cache size can be configured through persistence properties: "eclipselink.cache.size.<entity>"

"eclipselink.cache.size.default"

"eclipselink.cache.type.default"

See Also: "Configuring Persistence Units Using persistence.xml" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Configuration/JPA/persistence.xml and 'Class PersistenceUnitProperties" at http://www.eclipse.org/eclipselink/api/2.3/org/eclipse/persistence/config/PersistenceUnitProperties.html

Default Value: Type SoftWeak, Size 100 (per Entity).

Set the cache size relative to how much memory you have available, how many instances of the class you have, the frequency the entities are accessed, and how much caching you want based on your tolerance for stale data.

Consider creating larger cache sizes for entities that have many instances that are frequently accessed and stale data is not a big issue.

Consider using smaller cache sizes or no cache for frequently updated entities that must always have fresh data, or infrequently accessed entities.

Locking

Oracle supports the locking policies shown in Table 9-4: no locking, optimistic, pessimistic, and read-only.

Locking is set through JPA @Version annotation, eclipselink.read-only

How to Use EclipseLink Locking at http://wiki.eclipse.org/EclipseLink/Examples/JPA/Locking

Default Value: No Locking

For entities that can be updated concurrently, consider using the locking policy to prevent a user from writing over another users changes. To optimize performance for read-only entities, consider defining the entity as read-only or use a read-only query hint.

Cache Usage

By default, all query types search the database first and then synchronize with the cache. Unless refresh has been set on the query, the cached objects can be returned without being refreshed from the database. You can specify whether a given query runs against the in-memory cache, the database, or both.

To get performance gains by avoiding the database lookup for objects already in the cache, you can configure that the search attempts to retrieve the required object from the cache first, and then search the data source only if the object is not in the cache. For a query that looks for a single object based on a primary key, this is done by setting the query hint "eclipselink.cache-usage" to CheckCacheByExactPrimaryKey.

Default Value: DoNotCheckCache

For faster performance on primary key queries, where the data is typically in the cache and does not require a lot of refreshing, it is recommended to check the cache first on these queries (using CheckCacheByExactPrimaryKey).

This avoids the default behavior of retrieving the object from the database first and then for objects already in the cache, returning the cached values (not updated from the database access, unless refresh has been set on the query).

Isolation

There is not a single tuning parameter that sets a particular database transaction isolation level in a JPA application that uses EclipseLink.

In a typical EJB3.0 JPA application, a variety of factors affect when database transaction isolation levels apply and to what extent a particular database transaction isolation can be achieved, including the following:

  • Locking mode

  • Use of the Session Cache

  • External Applications

  • Database Login method setTransactionIsolation

See Also: "Shared and Isolated Cache" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching/Shared_and_Isolated

 

Cache Refreshing

By default, EclipseLink caches objects read from a data source. Subsequent queries for these objects access the cache and thus improve performance by reducing data source access and avoiding the cost of rebuilding object's and their relationships. Even if a query accesses the data source, if the objects corresponding to the records returned are in the cache, EclipseLink uses the cached objects. This default caching policy can lead to stale data in the application.

Refreshing can be enabled at the entity level (alwaysRefresh or refreshOnlyIfNewer and expiry) and at the query level (with the eclipselink.refresh query hint). You can also force queries to go to the database with (disableHits). Using an appropriate locking policy is the only way to ensure that stale or conflicting data does not get committed to the database.

For more information see: Section 9.2.2.1, "Cache Refreshing Scenarios"

See Also: "Caching Overview" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching/Caching_Overview

Default Value: No Cache Refreshing

Try to avoid entity level cache refresh and instead, consider configuring the following:

  • cache refresh on a query-by-query basis

  • cache expiration

  • isolated caching


9.2.2.1 Cache Refreshing Scenarios

There are a few scenarios to consider for data refreshing in the cache, all with performance implications:

  • In the case where you never want cached data and always want fresh data, consider using an isolated cache (Shared=False). This is the case when certain data in the application changes so frequently that it is desirable to always refresh the data, instead of only refreshing the data when a conflict is detected.

  • In the case when you want to avoid stale data, but getting stale data is not a major issue, then using a cache expiry policy would be the recommended solution. In this case you should also use optimistic locking, which automatically refresh stale objects when a locking error occurs. If using optimistic locking, you could also enable the entity @Cache attributes alwaysRefresh and refreshOnlyIfNewer to allow queries that access the database to refresh any stale objects returned, and avoid refreshing invalid objects when unchanged. You may also want to enable refreshing on certain query operations when you know you want refreshed data, or even provide the option of refreshing something from the client that would call a refreshing query.

  • In the case when you are not concerned about stale data, you should use optimistic locking; this automatically refresh stale objects in the cache on locking errors.

9.2.2.2 Tuning the Locking Mode Policies

The locking modes, as shown in Table 9-4, along with EclipseLink cache-usage and query refreshing options, ensures data consistency for EJB entities using JPA. The different combinations have both functional and performance implications, but often the functional requirements for up-to-date data and data consistency lead to the settings for these options, even when it may be at the expense of performance.

For more information, see "Locking" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Mapping/Locking.

Table 9-4 Locking Mode Policies

Locking Option Description Performance Notes

No Locking

The application does not prevent users overwriting each other's changes. This is the default locking mode. Use this mode if the Entity is never updated concurrently or concurrent reads and updates to the same rows with read-committed semantics is sufficient.

Default Value: No Locking

In general, no locking is faster, but may not meet your needs for data consistency.

Optimistic

All users have read access to the data. When a user attempts to make a change, the application checks to ensure the data has not changed since the user read the data.

See Also: "Optimistic Locking" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Mapping/Locking/Optimistic_Locking

If infrequent concurrent updates to the same rows are expected, then optimistic locking may provide the best performance while providing data consistency guarantees.

Pessimistic

The first user who accesses the data with the purpose of updating it locks the data until completing the update.

If frequent concurrent updates to the same rows are expected, pessimistic locking may be faster than optimistic locking that is getting a lot of concurrent access exceptions and retries.

When using pessimistic locking at the entity level, it is recommended that you use it with an isolated cache (Shared=False) for best performance.

Read Only

Setting an EJB3.0 JPA Entity to read-only ensures that the entity cannot be modified and enables EclipseLink to optimize unit of work performance.

Set at the entity level using @ReadOnly class annotation. Can also be set at the query level through query hint "eclipselink.read-only".

Defining an entity as read-only can perform better than an entity that is not defined as read-only, yet does no inserts, updates, or deletes, since it enables EclipseLink to optimize the unit of work performance. Always use read-only for all read-only operations


9.2.3 Tuning the Mapping and Descriptor Configurations

EclipseLink can transform data between an object representation and a representation specific to a data source. This transformation is called mapping and it is the core of a EclipseLink project.

A mapping corresponds to a single data member of a domain object. It associates the object data member with its data source representation and defines the means of performing the two-way conversion between object and data source.

For information on Mapping see, "Configuring Mappings" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Mapping.

9.2.4 Using Data Partitioning

EclipseLink allows you to configure data partioning using the @Partitioned annotation. Partitioning enables an application to scale information across multiple databases; including clustered databases. For more information on using @Partioned and other partitioning policy annotations, see "Data Partitioning" at http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Advanced_JPA_Development/Data_Partitioning.

9.3 Advanced Tuning Considerations

After you have performed the modifications recommended in the previous section, you can make additional changes that are specific to your deployment. Consider carefully whether the recommendations in this section are appropriate for your environment.

9.3.1 Integrating with Oracle Coherence

Oracle Toplink can be integrated with Oracle Coherence. This integration is provided through the Oracle TopLink Grid feature. With TopLink Grid, there are several types of integration with EclipseLink JPA features.

For example:

  • Replace the default EclipseLink L2 cache with Coherence. This provides support for very large L2 caches that span cluster nodes. EclipseLink's default L2 cache improves performance for multi-threaded and Java EE server hosted applications running in a single JVM, and requires configuring special cache coordination features if used across a cluster.

  • Configure entities to execute queries in the Coherence data grid instead of the database. This allows clustered application deployments to scale beyond database-bound operations.

For more information on using EclipseLink JPA with a Coherence Cache, see "JPA on the Grid" Approach at http://www.oracle.com/technology/products/ias/toplink/doc/11110/grid/tlgug003.htm

For more information on Oracle Toplink integration with Oracle Coherence, see "Oracle TopLink Integration with Coherence Grid Guide" at http://www.oracle.com/technology/products/ias/toplink/doc/11110/grid/toc.htm

9.3.2 Analyzing EclipseLink JPA Entity Performance

This section lists a few features in EclipseLink that can help you analyze your JPA application performance:

  • Form monitoring performance, see "Performance Monitoring" in the EclipseLink User's Guide. Note that this tool is intended to profile and monitor information in a multithreaded server environment.

  • For profiling performance, see "Measuring EclipseLink Performance with the EclipseLink Profiler" in the EclipseLink User's Guide. Note that this tool is intended for use with single-threaded finite use cases.

  • For debugging performance issues and testing, you can view the SQL generated from EclipseLink. To view the SQL, increase the logging level to "FINE" by using the EclipseLink JPA extensions for logging.

    For best performance, remember to restore the logging levels to the default levels when you are done profiling or debugging.