Skip Headers
Oracle® TopLink Developer's Guide
10g Release 3 (10.1.3.1.0)

Part Number B28218-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

11 Optimization

TopLink provides a diverse set of features to measure and optimize application performance. You can enable or disable most features in the descriptors or session, making any resulting performance gains global.

This chapter includes the following sections:

Understanding Optimization

Performance considerations are present at every step of the development cycle. Although this implies an awareness of performance issues in your design and implementation, it does not mean that you should expect to achieve the best possible performance in your first pass.

For example, if optimization complicates the design, leave it until the final development phase. You should still plan for these optimizations from your first iteration, to make them easier to integrate later.

The most important concept associated with tuning your TopLink application is the idea of an iterative approach. The most effective way to tune your application is to do the following:

  1. Measure application performance using the TopLink profiler (see "Measuring TopLink Performance With the TopLink Profiler") or the Oracle Dynamic Monitoring System (DMS) profiler (see "Measuring TopLink Performance With the Oracle Dynamic Monitoring System (DMS)")

  2. Modify application components (see "Sources of Application Performance Problems")

  3. Measure performance again.

To identify the changes that improve your application performance, modify only one or two components at a time. You should also tune your application in a nonproduction environment before you deploy the application.

Sources of Application Performance Problems

For various parts of a TopLink enabled application, this section describes the performance problems most commonly encountered and provides suggestions for improving performance. Areas of an application where performance problems can occur include:

Measuring TopLink Performance With the TopLink Profiler

The most important challenge to performance tuning is knowing what to optimize. To improve the performance of your application, identify the areas of your application that do not operate at peak efficiency. The TopLink performance profiler helps you identify performance problems by logging performance statistics for every executed query in a given session.


Note:

You should also consider using general performance profilers such as JDeveloper or JProbe to analyze performance problems. These tools can provide more detail that may be required to properly diagnose a problem.

The TopLink performance Profiler logs the following information to the TopLink log file (for general information about TopLink logging, see "Logging"):

This section includes information on the following topics:

Configuring the TopLink Performance Profiler

To enable the TopLink performance profiler, select the TopLink profiler option when configuring your session (see "Configuring a Performance Profiler").

The TopLink performance profiler is an instance of oracle.toplink.tools.profiler.PerformanceProfiler class. It provides the following public API:

  • logProfile–enables the profiler

  • dontLogProfile–disables the profiler

  • logProfileSummary–organizes the profiler log into a summary of all the individual operation profiles including operation statistics like the shortest time of all the operations that were profiled, the total time of all the operations, the number of objects returned by profiled queries, and the total time that was spent in each kind of operation that was profiled.

  • logProfileSummaryByQuery–organizes the profiler log as query summaries. This is the default profiler behavior.

  • logProfileSummaryByClass–organizes the profiler log as class summaries. This is an alternative to the default behavior implemented by logProfileSummaryByQuery method.

Accessing the TopLink Profiler Results

The simplest way to view TopLink profiler results is to read the TopLink log files with a text editor. For general information about TopLink logging, such as logging file location, see "Logging".

Alternatively, you can use the graphical performance profiler that the TopLink Web client provides. For more information, refer to the Web client online Help and README files.

Example 11-1 shows an example of the TopLink profiler output.

Example 11-1 Performance Profiler Output

Begin Profile of{
ReadAllQuery(oracle.toplink.demos.employee.domain.Employee)
Profile(ReadAllQuery,# of obj=12, time=1399,sql execute=217, prepare=495, row fetch=390, time/obj=116,obj/sec=8)
} End Profile

The second line of the profile contains the following information about a query:

  • ReadAllQuery(oracle.toplink.demos.employee.domain.Employee): specific query profiled, and its arguments.

  • Profile(ReadAllQuery: start of the profile and the type of query.

  • # of obj=12: number of objects involved in the query.

  • time=1399: total execution time of the query (in milliseconds).

  • sql execute=217: total time spent preparing the SQL script.

  • prepare=495: total time spent preparing the SQL script.

  • row fetch=390: total time spent fetching rows from the database.

  • time/obj=116: number of milliseconds spent on each object.

  • obj/sec=8) */: number of objects handled per second.

Measuring TopLink Performance With the Oracle Dynamic Monitoring System (DMS)

Oracle DMS is a library that enables application and system developers to use a variety of DMS sensors to measure and export customized performance metrics for specific software components (called nouns).

TopLink includes DMS instrumentation in essential objects to provide efficient monitoring of run-time data in TopLink-enabled applications, including both J2EE and non-J2EE applications.

By enabling DMS profiling in a TopLink application (see "Configuring the Oracle DMS Profiler"), you can collect and easily access run-time data that can help you with application administration tasks and performance tuning.


Note:

You should also consider using general performance profilers such as JDeveloper or JProbe to analyze performance problems. These tools can provide more detail that may be required to properly diagnose a problem.

Table 11-1 lists the many performance and status metrics TopLink provides through DMS.

Table 11-2 lists the various profiling levels you can use to adjust the level of profiling to the amount of monitoring information you require. Levels are listed in order of increasing system overhead.

You can easily access DMS data at run time using a management application that supports the Java Management Extensions (JMX) API (see "Accessing Oracle DMS Profiler Data Using JMX") or using any Web browser and the DMS Spy servlet (see "Accessing Oracle DMS Profiler Data Using the DMS Spy Servlet").

Table 11-1 TopLink DMS Metrics

DMS Noun NameFoot 1  Sensor Name LevelFoot 2  Description

Cache

CacheHits

HEAVY

Number of times an object looked up in the cache was found.


CacheMisses

HEAVY

Number of times an object looked up in the cache was not found.


Caching

ALL

Time spent adding, looking up, and removing objects in the cache.

RCMFoot 3 

ChangesNotProcessed

ALL

Number of coordinated cache ObjectChangeSets discarded because the object was not found in the cache and was not merged.


ChangesProcessed

ALL

Number of coordinated cache ObjectChangeSets for which an object was found in the cache and merged.


MessagesReceived

HEAVY

Number of cache coordination messages received.


MessagesSent

HEAVY

Number of cache coordination messages sent.


RemoteChangeSets

HEAVY

Number of change sets received from remote machines and processed.


RCMStatus

HEAVY

Cache coordination status: one of not configured, started, or stopped.

Connection

ConnectCalls

HEAVY

Total number of connect calls made.


ConnectionsInUse(POOL_NAME)

HEAVY

Number of connections in use for the given connection pool.


DisconnectCalls

HEAVY

Total number of disconnect calls made.


ConnectionManagement

ALL

Time spent managing connections including connecting, reconnecting, and disconnecting from a data source.

Query

DatabaseExecute

ALL

Time spent in calls to the JDBC Statement. Includes time spent in calls to: close, executeUpdate, and executeQuery.


DeleteQueries

HEAVY

Time spent executing delete queries, including time spent in JDBC calls.


ObjectBuilding

ALL

Time spent building persistent objects from database rows.


QueryPreparation

ALL

Time spent preparing a query. Does not include time spent doing SQL prepare.


ReadQueries

HEAVY

Time spent executing read queries, including time spent in JDBC calls.


RowFetch

ALL

Time spent fetching the JDBC result set from the database and building DatabaseRecord objects from the JDBC result set. Includes regular SQL calls and stored procedure calls.


SqlGeneration

ALL

Time spent generating SQL. In the case of TopLink expressions, includes time spent converting Expression to SQL.


SqlPrepare

ALL

Time spent in JDBC preparing the Statement. Includes the time spent in EIS creating an Interaction associated with a connection, and creating input and output Record objects.


UpdateQueries

HEAVY

Time spent executing update queries, including time spent in JDBC calls.


WriteQueries

HEAVY

Time spent executing write queries, including time spent in JDBC calls.

Session

ClientSession

HEAVY

Number of ClientSessions currently logged in.


loginTime

NORMAL

Time at which the session was logged in. Once the session is logged out, the sensor no longer appears.


SessionName

NORMAL

Name of the session.


UnitOfWork

HEAVY

Number of UnitOfWork objects acquired from this session.

Transaction

DistributedMerge

ALL

Time spent merging remote transaction changes into the local shared cache.


JtsAfterCompletion

ALL

Time spent on JTS afterCompletion method.


JtsBeforeCompletion

ALL

Time spent on JTS beforeCompletion method.


MergeTime

ALL

Time spent merging changes into the shared cache.


OptimisticLocks

HEAVY

Number of optimistic lock exceptions thrown.


Sequencing

ALL

Time spent maintaining the sequence number mechanism and setting the sequence number on objects.


UnitOfWorkRegister

ALL

Time spent in registering objects with the UnitOfWork.


UnitOfWorkCommits

ALL

Time spent in the UnitOfWork commit process.


UnitOfWorkRollBacks

HEAVY

Number of UnitOfWork commits that were rolled back.

Miscellanous

DescriptorEvents

ALL

Time spent by the DescriptorEventManager executing a descriptor event.


Logging

ALL

Time spent logging TopLink activities.


SessionEvents

ALL

Time spent by the SessionEvent manager executing a session event.


Footnote 1 DMS noun names are followed by the name of the session to which they belong. For example, Cache(SESSION_NAME).

Footnote 2 See Table 11-2 for a description of each level setting.

Footnote 3 Cache Coordination

Table 11-2 DMS Metric Collection Levels

Level Description

NONE

Disable collection of all DMS metrics.

NORMAL

Enable collection of TopLink DMS metrics. Adds very low overhead. This is the default setting.

HEAVY

Enable collection of basic TopLink DMS metrics. Adds about 1 percent overhead.

ALL

Enable all possible TopLink DMS metrics. Adds about 3 percent overhead.


Configuring the Oracle DMS Profiler

You configure DMS support in your TopLink application differently depending on the type of application it is:

OC4J Applications

By default, DMS metric collection is enabled for TopLink CMP applications deployed to OC4J. For BMP or non-CMP applications deployed to OC4J, you must configure DMS metric collection (see "Configuring a Performance Profiler").

TopLink EJB deployed in OC4J are subject to the DMS configuration specified by the OC4J command line-property -Doracle.dms.sensors=<level> where <level> is one of the values listed in Table 11-2.

Non-OC4J Applications

To enable DMS metric collection for TopLink applications deployed to an application sever other than OC4J do the following:

  1. Ensure that the dms.jar file is in your application classpath.

    By default, the dms.jar file is located in <ORACLE_HOME>\lib directory.

  2. Set system property oracle.dms.sensors=<level> where <level> is one of the values listed in Table 11-2.

  3. To enable the DMS profiler, select the DMS profiler option when configuring your TopLink session (see "Configuring a Performance Profiler").

Accessing Oracle DMS Profiler Data Using JMX

Using the Java Management Extensions (JMX) API, you can publish DMS profiler run-time data from a managed application (TopLink) to a JMX-compliant management application, by way of EJB-like MBean components.

When you configure your TopLink application to enable run-time services (see "Configuring a Performance Profiler") and you deploy your application to OC4J, the TopLink runtime will deploy a JMX MBean so that a JMX management application can access the DMS profiler run-time data your application publishes.

For code examples that illustrates how to use DMS and JMX, see http://www.oracle.com/technology/tech/java/oc4j/1003/how_to/jmx-enabled-demo.html.

Accessing Oracle DMS Profiler Data Using the DMS Spy Servlet

Once your DMS enabled TopLink application is running, you can access the DMS data it is collecting.

The DMS Spy servlet is available in all Java processes that use DMS. It lets you monitor metrics for a single Java process from a Web browser.

To access DMS data directly using the DMS Spy servlet, do the following:

  1. Ensure that the dms.jar file is in your application classpath.

    By default, the dms.jar file is located in <ORACLE_HOME>\lib.

  2. Set the following system properties for the DMS enabled Java process you want to monitor:

    oracle.dms.publisher.classes=oracle.dms.http.Httpd
    oracle.dms.httpd.port.start=<port>
    
    

    where <port> is the HTTP port on which DMS accepts requests (the default value is 46080).

  3. Apply the system property changes by restarting the Java process you want to monitor.

  4. Using a Web browser, connect to the Java process and access the Spy servlet by entering the following URL:

    http://<host>:<port>/dms0/Spy
    
    

    where <host> is the host name of your Java process and <port> is the value specified by the oracle.dms.httpd.port.start system property.

    The Spy servlet displays all TopLink DMS-enabled objects appropriate for the current DMS level setting. Figure 11-1 shows an example of the DMS Spy servlet display.

    Figure 11-1 DMS Spy Servlet Display

    Description of Figure 11-1 follows
    Description of "Figure 11-1 DMS Spy Servlet Display"

General Performance Optimization

Do not override TopLink default behavior unless your application absolutely requires it. Because TopLink default behavior is set for optimum results with the most common applications, the default is usually the most efficient choice for any given option. This is especially important for query or cache behavior.

Use TopLink Workbench rather than manual coding. TopLink Workbench is not only easy to use. The default configuration it exports to deployment XML (and the code it generates, if required) represents best practices optimized for most applications.

Schema Optimization

Optimization is an important consideration when you design your database schema and object model. Most performance issues occur when the object model or database schema is too complex, which can make the database slow and difficult to query. This is most likely to happen if you derive your database schema directly from a complex object model.

To optimize performance, design the object model and database schema together. However, allow each model to be designed optimally: do not require a direct one-to-one correlation between the two.

This section includes the following schema optimization examples:

Schema Case 1: Aggregation of Two Tables into One

A common schema optimization technique is to aggregate two tables into a single table. This improves read and write performance by requiring only one database operation instead of two.

Table 11-3 and Table 11-4 illustrate the table aggregation technique.

Table 11-3 Original Schema (Aggregation of Two Tables Case)

Elements Details

Title

ACME Member Location Tracking System

Classes

Member, Address

Tables

MEMBER, ADDRESS

Relationships

Source, Instance Variable, Mapping, Target, Member, address, one-to-one, Address


The nature of this application dictates that developers always look up employees and addresses together. As a result, querying a member based on address information requires a database join, and reading a member and its address requires two read statements. Writing a member requires two write statements. This adds unnecessary complexity to the system, and results in poor performance.

A better solution is to combine the MEMBER and ADDRESS tables into a single table, and change the one-to-one relationship to an aggregate relationship. This lets you read all information with a single operation, and doubles the update and insert speed, because only a single row in one table requires modifications.

Table 11-4 Optimized Schema (Aggregation of Two Tables Case)

Elements Details

Classes

Member, Address

Tables

MEMBER

Relationships

Source, Instance Variable, Mapping, Target, Member, address, aggregate, Address


Schema Case 2: Splitting One Table Into Many

To improve overall performance of the system, split large tables into two or more smaller tables. This significantly reduces the amount of data traffic required to query the database.

For example, the system illustrated in Table 11-5 assigns employees to projects within an organization. The most common operation reads a set of employees and projects, assigns employees to projects, and update the employees. The employee's address or job classification is also occasionally used to determine the project on which the employee is placed.

Table 11-5 Original Schema (Splitting One Table into Many Case)

Elements Details Instance Variable Mapping Target

Title

ACME Employee Workflow System




Classes

Employee, Address, PhoneNumber, EmailAddress, JobClassification, Project




Tables

EMPLOYEE, PROJECT, PROJ_EMP




Relationships

Employee

address

aggregate

Address


Employee

phoneNumber

aggregate

EmailAddress


Employee

emailAddress

aggregate

EmailAddress


Employee

job

aggregate

JobClassification


Employee

projects

many-to-many

Project


When you read a large volume of employee records from the database, you must also read their aggregate parts. Because of this, the system suffers from general read performance issues. To resolve this, break the EMPLOYEE table into the EMPLOYEE, ADDRESS, PHONE, EMAIL, and JOB tables, as illustrated in Table 11-6.

Because you usually read only the employee information, splitting the table reduces the amount of data transferred from the database to the client. This improves your read performance by reducing the amount of data traffic by 25 percent.

Table 11-6 Optimized Schema (Splitting One Table into Many Case)

Elements Details Instance Variable Mapping Target

Title

ACME Employee Workflow System




Classes

Employee, Address, PhoneNumber, EmailAddress, JobClassification, Project




Tables

EMPLOYEE, ADDRESS, PHONE, EMAIL, JOB, PROJECT, PROJ_EMP




Relationships

Employee

address

one-to-one

Address


Employee

phoneNumber

one-to-one

EmailAddress


Employee

emailAddress

one-to-one

EmailAddress


Employee

job

one-to-one

JobClassification


Employee

projects

many-to-many

Project


Schema Case 3: Collapsed Hierarchy

A common mistake when you transform an object-oriented design into a relational model, is to build a large hierarchy of tables on the database. This makes querying difficult, because queries against this type of design can require a large number of joins. It is usually a good idea to collapse some of the levels in your inheritance hierarchy into a single table.

Table 11-7 represents a system that assigns clients to a company's sales representatives. The managers also track the sales representatives that report to them.

Table 11-7 Original Schema (Collapsed Hierarchy Case)

Elements Details

Title

ACME Sales Force System

Classes

Tables

Person

PERSON

Employee

PERSON, EMPLOYEE

SalesRep

PERSON, EMPLOYEE, REP

Staff

PERSON, EMPLOYEE, STAFF

Client

PERSON, CLIENT

Contact

PERSON, CONTACT


The system suffers from complexity issues that hinder system development and performance. Nearly all queries against the database require large, resource-intensive joins. If you collapse the three-level table hierarchy into a single table, as illustrated in Table 11-8, you substantially reduce system complexity. You eliminate joins from the system, and simplify queries.

Table 11-8 Optimized Schema (Collapsed Hierarchy Case)

Elements Details

Classes

Tables

Person

none

Employee

EMPLOYEE

SalesRep

EMPLOYEE

Staff

EMPLOYEE

Client

CLIENT

Contact

CLIENT


Schema Case 4: Choosing One out of Many

In a one-to-many relationship, a single source object has a collection of other objects. In some cases, the source object frequently requires one particular object in the collection, but requires the other objects only infrequently. You can reduce the size of the returned result set in this type of case by adding an instance variable for the frequently required object. This lets you access the object without instantiating the other objects in the collection.

Table 11-9 represents a system by which an international shipping company tracks the location of packages in transit. When a package moves from one location to another, the system creates a new a location entry for the package in the database. The most common query against any given package is for its current location.

Table 11-9 Original Schema (Choosing One out of Many Case)

Elements Details Instance Variable Mapping Target

Title

ACME Shipping Package Location Tracking system




Classes

Package, Location




Tables

PACKAGE, LOCATION




Relationships

Package

locations

one-to-many

Location


A package in this system can accumulate several location values in its LOCATION collection as it travels to its destination. Reading all locations from the database is resource intensive, especially when the only location of interest is the current location.

To resolve this type of problem, add a specific instance variable that represents the current location. You then add a one-to-one mapping for the instance variable, and use the instance variable to query for the current location. As illustrated in Table 11-9, because you can now query for the current location without reading all locations associated with the package, this dramatically improves the performance of the system.

Table 11-10 Optimized Schema (Choosing One out of Many Case)

Elements Details Instance Variable Mapping Target

Classes

Package, Location




Tables

PACKAGE, LOCATION




Relationships

Package

locations

one-to-many

Location


Package

currentLocation

one-to-many

Location


Mapping and Descriptor Optimization

Always use indirection. It is not only critical in optimizing database access, but also allows TopLink to make several other optimizations including optimizing its cache access and unit of work processing. See "Configuring Indirection".

Avoid using the existence checking option checkCacheThenDatabase on descriptors (see "Configuring Cache Existence Checking at the Descriptor Level"), unless required by the application. The default existence checking behavior offers better performance.

Avoid expensive initialization in the default constructor that TopLink uses to instantiate objects. Instead, use lazy initialization or use a TopLink instantiation policy (see "Configuring Instantiation Policy") to configure the descriptor to use a different constructor.

Avoid using method access in your TopLink mappings (see "Configuring Method Accessing"), especially if you have expensive or potentially dangerous side-effect code in your get or set methods; use the default direct attribute access instead.

Session Optimization

Use a Server session in a server environment, not a DatabaseSession.

Use the TopLink client session instead of remote session. A client session is appropriate for most multiuser J2EE application server environments.

Do not pool client sessions. Pooling sessions offers no performance gains.

For more information, see "Server and Client Sessions" and "OC4J Applications".

Cache Optimization

Cache coordination (see "Understanding Cache Coordination") is one way to allow multiple, possibly distributed, instances of a session to broadcast object changes among each other so that each session's cache can be kept up-to-date.

However, cache coordination is best suited to applications with specific characteristics (see "When to use Cache Coordination"). Before implementing cache coordination, tune the TopLink cache for each class using alternatives such as object identity type (see "Configuring Cache Type and Size at the Descriptor Level"), cache invalidation (see "Cache Invalidation"), or cache isolation (see "Cache Isolation"). Doing so lets you configure the optimal cache configuration for each type of class (see Table 11-11) and may eliminate the need for distributed cache coordination altogether.

Table 11-11 Identity Map and Cache Configuration by Class Type

Class Type Identity Map Options Cache Options

read-only

soft, hard, or fullFoot 1 


read-mostly

soft or hard

cache invalidation or cache coordination

write-mostly

weak

cache invalidation


Footnote 1 If the number of instances is finite.

If you do use cache coordination, use JMS for cache coordination rather than RMI. JMS is more robust, easier to configure, and runs asynchronously. If you require synchronous cache coordination, use RMI.

You can configure a descriptor to control when the TopLink runtime will refresh the session cache when an instance of this object type is queried (see "Configuring Cache Refreshing"). Oracle does not recommend the use of Always Refresh or Disable Cache Hits.

Using Always Refresh may result in refreshing the cache on queries when not required or desired. As an alternative, consider configuring cache refresh on a query by query basis (see "Refreshing the Cache").

Using Disable Cache Hits instructs TopLink to bypass the cache for object read queries based on primary key. This results in a database round trip every time an object read query based on primary key is executed on this object type, negating the performance advantage of the cache. When used in conjunction with Always Refresh, this option ensures that all queries go to the database. This can have a significant impact on performance. These options should only be used in specialized circumstances.

Data Access Optimization

Depending on the type of data source your application accesses, TopLink offers a variety of Login options that you can use to tune the performance of low level data reads and writes.

You can use several techniques to improve data access performance for your application. This section discusses some of the more common approaches, including:

JDBC Driver Properties Optimization

Consider the default behavior of the JDBC driver you choose for your application. Some JDBC driver options can affect data access performance.

Some important JDBC driver properties can be configured directly using TopLink Workbench or TopLink API (for example, see "JDBC Fetch Size").

JDBC driver properties that are not supported directly by TopLink Workbench or TopLink API can still be configured as generic JDBC properties that TopLink passes to the JDBC driver.

For example, some JDBC drivers, such as Sybase JConnect, perform a database round trip to test whether or not a connection is closed: that is, calling the JDBC driver method isClosed results in a stored procedure call or SQL select. This database round-trip can cause a significant performance reduction. To avoid this, you can disable this behavior: for Sybase JConnect, you can set property name CLOSED_TEST to value INTERNAL.

For more information about configuring general JDBC driver properties from within your TopLink application, see "Configuring Properties".

Data Format Optimization

By default, TopLink optimizes data access by accessing the data from JDBC in the format the application requires. For example, TopLink retrieves long data types from JDBC instead of having the driver return a BigDecimal that TopLink would then have to convert into a long.

Some older JDBC drivers do not perform data conversion correctly and conflict with this optimization. In this case, you can disable this optimization (see "Configuring Advanced Options").

Batch Writing

Batch writing can improve database performance by sending groups of INSERT, UPDATE, and DELETE statements to the database in a single transaction, rather than individually.

When used without parameterized SQL, this is known as dynamic batch writing.

When used with parameterized SQL (see "Parameterized SQL (Binding) and Prepared Statement Caching"), this is known as parameterized batch writing. This allows a repeatedly executed statement, such as a group of inserts of the same type, to be executed as a single statement and a set of bind parameters. This can provide a large performance benefit as the database does not have to parse the batch.

When using batch writing, you can tune the maximum batch writing size using setMaxBatchWritingSize method of the Login interface. The meaning of this value depends on whether or not you are using parameterized SQL:

  • If you are using parameterized SQL (you configure your Login by calling Login method bindAllParameters), the maximum batch writing size is the number of statements to batch (default: 100).

  • If you are using dynamic SQL, the maximum batch writing size is the size of the SQL string buffer in characters (default: 32000).

By default, TopLink does not enable batch writing because not all databases and JDBC drivers support it. Oracle recommends that you enable batch writing for selected databases and JDBC drivers that support this option (see "Configuring JDBC Options").

For a more detailed example of using batch writing to optimize write queries, see "Batch Writing and Parameterized SQL".

Parameterized SQL (Binding) and Prepared Statement Caching

Using parameterized SQL, you can keep the overall length of an SQL query from exceeding the statement length limit that your JDBC driver or database server imposes.

Using parameterized SQL and prepared statement caching, you can improve performance by reducing the number of times the database SQL engine parses and prepares SQL for a frequently called query.

By default, TopLink does not enable parameterized SQL and prepared statement caching, because not all databases and JDBC drivers support it. Oracle recommends that you enable parameterized SQL and prepared statement caching for selected databases and JDBC drivers that support these options.

Not all JDBC drivers support all JDBC binding options (see "Configuring JDBC Options"). Selecting a combination of options may result in different behavior from one driver to another. Before selecting JDBC options, consult your JDBC driver documentation. When choosing binding options, consider the following approach:

  1. Try binding all parameters with all other binding options disabled.

  2. If this fails to bind some large parameters, consider enabling one of the following options, depending on the parameter's data type and the binding options that your JDBC driver supports:

    1. To bind large String parameters, try enabling string binding.

      If large String parameters still fail to bind, consider adjusting the maximum String size. TopLink sets the maximum String size to 32000 characters by default.

    2. To bind large Byte array parameters, try enabling byte array binding.

  3. If this fails to bind some large parameters, try enabling streams for binding.

    Typically, configuring string or byte array binding will invoke streams for binding. If not, explicitly configuring streams for binding may help.

For J2EE applications that use TopLink external connection pools, you must configure parameterized SQL in TopLink but you cannot configure prepared statement caching in TopLink. In this case, you must configure prepared statement caching in the application server connection pool. For example, in OC4J, if you configure your data-source.xml file with a managed data-source (where connection-driver is oracle.jdbc.OracleDriver and class is oracle.j2ee.sql.DriverManagerDataSource), you can configure a non-zero num-cached-statements that enables JDBC statement caching and defines the maximum number of statements cached.

For applications that use TopLink internal connection pools, you can configure parameterized SQL and prepared statement caching.

You can configure parameterized SQL and prepared statement caching at the following levels:

Query Optimization

TopLink provides an extensive query API for reading, writing, and updating data. This section describes ways of optimizing query performance in various circumstances.

Before optimizing queries, consider the optimization suggestions in "Data Access Optimization".

This section includes information on the following:

Parameterized SQL and Prepared Statement Caching

These features lets you cache and reuse a query's pre-parsed database statement when the query is re-executed.

For more information, see"Parameterized SQL (Binding) and Prepared Statement Caching".

Named Queries

Whenever possible, use named queries in your application. Named queries help you avoid duplication, are easy to maintain and reuse, and easily add complex query behavior to the application. Using named queries also allows for the query to be prepared once, and for the SQL generation to be cached.

For more information, see "Named Queries".

Batch and Join Reading

To optimize database read operations, TopLink supports both batch and join reading. When you use these techniques, you dramatically decrease the number of times you access the database during a read operation, especially when your result set contains a large number of objects.

For more information, see the following:

Partial Object Queries and Fetch Groups

Partial object queries lets you retrieve partially populated objects from the database rather than complete objects.

For CMP applications, you can use fetch groups to accomplish the same performance optimization.

For more information about partial object reading, see "Partial Object Queries".

For more information about fetch groups, see "Fetch Groups".

JDBC Fetch Size

The JDBC fetch size gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed.

For large queries that return a large number of objects you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria.

Most JDBC drivers default to a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Usually, a fetch size of one half or one quarter of the total expected result size is optimal. Note that if you are unsure of the result set size, incorrectly setting a fetch size too large or too small can decrease performance.

Set the query fetch size with ReadQuery method setFetchSize as Example 11-2 shows. Alternatively, you can use ReadQuery method setMaxRows to set the limit for the maximum number of rows that any ResultSet can contain.

Example 11-2 JDBC Driver Fetch Size

// Create query and set Employee as its reference class
ReadAllQuery query = new ReadAllQuery(Employee.class);
ExpressionBuilder builder = query.getExpressionBuilder(); 
query.setSelectionCriteria(builder.get("id").greaterThan(100));

// Set the JDBC fetch size
query.setFetchSize(50);

// Configure the query to return results as a ScrollableCursor
query.useScrollableCursor();

// Execute the query
ScrollableCursor cursor = (ScrollableCursor) session.executeQuery(query);

// Iterate over the results
while (cursor.hasNext()) {
    System.out.println(cursor.next().toString());
}
cursor.close();

In this example, when you execute the query, the JDBC driver retrieves the first 50 rows from the database (or all rows if less than 50 rows satisfy the selection criteria). As you iterate over the first 50 rows, each time you call cursor.next(), the JDBC driver returns a row from local memory–it does not need to retrieve the row from the database. When you try to access the fifty first row (assuming there are more than 50 rows that satisfy the selection criteria), the JDBC driver again goes to the database and retrieves another 50 rows. In this way, 100 rows are returned with only two database hits.

If you specify a value of zero (default; means the fetch size is not set), then the hint is ignored and the JDBC driver's default is used.

For more information about configuring JDBC driver properties from within your TopLink application, see "Configuring Properties".

Cursored Streams and Scrollable Cursors

You can configure a query to retrieve data from the database using a cursored Java stream or scrollable cursor. This lets you view a result set in manageable increments rather than as a complete collection. This is useful when you have a large result set. You can further tune performance by configuring the JDBC driver fetch size used (see "JDBC Fetch Size").

For more information about scrollable cursors, see "Handling Cursor and Stream Query Results".

Read Optimization Examples

TopLink provides the read optimization features listed in Table 11-12.

This section includes the following read optimization examples:

Table 11-12 Read Optimization Features

Feature Function Performance Technique

Unit of Work

Tracks object changes within the Unit of Work.

To minimize the amount of tracking required, registers only those objects that will change.

For more information, see Chapter 97, "Understanding TopLink Transactions".

Indirection

Uses indirection objects to defer the loading and processing of relationships.

Provides a major performance benefit. It allows database access to be optimized and allows TopLink to internally make several optimizations in caching and unit of work.

Soft cache, weak identity map

Offers client-side caching for objects read from database, and drops objects from the cache when memory becomes low.

Reduces database calls and improves memory performance.

For more information, see "Cache Type and Object Identity".

Weak identity map

Offers client-side caching for objects.

Reduces database access and maintains a cache of all referenced objects.

For more information, see "Cache Type and Object Identity".

Batch reading and joining

Reduces database access by batching many queries into a single query that reads more data.

Dramatically reduces the number of database accesses required to perform a read query.

For more information, see "Using Batch Reading" and "Using Join Reading".

Partial object reading and fetch groups.

Allows reading of a subset of a result set of the object's attributes.

Reduces the amount of data read from the database.

For more information, see "Partial Object Queries".

For more information about fetch groups, see "Fetch Groups".

Report query

Similar to partial object reading, but returns only the data instead of the objects.

Supports complex reporting functions such as aggregation and group-by functions. Also lets you compute complex results on the database, instead of reading the objects into the application and computing the results locally.

For more information, see "Report Query".

JDBC fetch size and ReadQuery maximum rows

Reduces the number of database hits required to return all the rows that satisfy selection criteria.

For more information, see "JDBC Fetch Size".

Cursors

Lets you view a large result set in manageable increments rather than as a complete collection

For more information, see "Cursored Streams and Scrollable Cursors"

Inheritance views

Allows a view to be used for queries against an inheritance superclass that can read all of its subclasses in a single query, instead of multiple queries

For more information, see "Reading Case 5: Inheritance Views".


Reading Case 1: Displaying Names in a List

An application may ask the user to choose an element from a list. Because the list displays only a subset of the information contained in the objects, it is not necessary to query for all information for objects from the database.

TopLink features that optimize these types of operations include:

These features let you query only the information required to display the list. The user can then select an object from the list.

Partial Object Reading

Partial object reading is a query designed to extract only the required information from a selected record in a database, rather than all the information the record contains. Because partial object reading does not fully populate objects, you can neither cache nor edit partially read objects.

For more information about partial object queries, see "Partial Object Queries".

In Example 11-3, the query builds complete employee objects, even though the list displays only employee last names. With no optimization, the query reads all the employee data.

Example 11-3 No Optimization

/* Read all the employees from the database, ask the user to choose one and return it. This must read in all the information for all the employees */
List list;

// Fetch data from database and add to list box
Vector employees = (Vector) session.readAllObjects(Employee.class);
list.addAll(employees);

// Display list box
....

// Get selected employee from list
Employee selectedEmployee = (Employee) list.getSelectedItem();

return selectedEmployee;

Example 11-4 demonstrates the use of partial object reading. It reads only the last name and primary key for the employee data. This reduces the amount of data read from the database.

Example 11-4 Optimization Through Partial Object Reading

/* Read all the employees from the database, ask the user to choose one and return it. This uses partial object reading to read just the last names of the employees. Since TopLink automatically includes the primary key of the object, the full object can easily be read for editing */
List list;

// Fetch data from database and add to list box
ReadAllQuery query = new ReadAllQuery(Employee.class);
query.addPartialAttribute("lastName");

// The next line avoids a query exception
query.dontMaintainCache();
Vector employees = (Vector) session.executeQuery(query);
list.addAll(employees);

// Display list box
....

// Get selected employee from list
Employee selectedEmployee = (Employee)session.readObject(list.getSelectedItem());
return selectedEmployee;

Report Query

Report query lets you retrieve data from a set of objects and their related objects. Report query supports database reporting functions and features.

For more information, see "Report Query Results".

Example 11-5 demonstrates the use of report query to read only the last name of the employees. This reduces the amount of data read from the database compared to the code in Example 11-3, and avoids instantiating employee instances.

Example 11-5 Optimization Through Report Query

/* Read all the employees from the database, ask the user to choose one and return it. The report query is used to read just the last name of the employees. Then the primary key stored in the report query result to read the real object */
List list;

// Fetch data from database and add to list box
ExpressionBuilder builder = new ExpressionBuilder();
ReportQuery query = new ReportQuery (Employee.class, builder);
query.addAttribute("lastName");
query.retrievePrimaryKeys();
Vector reportRows = (Vector) session.executeQuery(query);
list.addAll(reportRows);

// Display list box
....

// Get selected employee from list
ReportQueryResult result = (ReportQueryResult) list.getSelectedItem();
Employee selectedEmployee =     (Employee)result.readobject(Employee.Class,session);

Although the differences between the unoptimized example (Example 11-3) and the report query optimization in Example 11-5 appear to be minor, report queries offer a substantial performance improvement.

Fetch Groups

Fetch groups, applicable only to CMP projects, are similar to partial object reading, but does allow caching of the objects read. For objects with many attributes or reference attributes to complex graphs (or both), you can define a fetch group that determines what attributes are returned when an object is read. Because TopLink will automatically execute additional queries when the get method is called for attributes not in the fetch group, ensure that the unfetched data is not required: refetching data can become a performance issue.

For more information about querying with fetch groups, see "Using Queries With Fetch Groups".

Example 11-6 demonstrates the use of a static fetch group.

Example 11-6 Configuring a Query with a FetchGroup Using the FetchGroupManager

// Create static fetch group at the descriptor level
FetchGroup group = new FetchGroup("nameOnly");
group.addAttribute("firstName");
group.addAttribute("lastName");
descriptor.getFetchGroupManager().addFetchGroup(group);

// Use static fetch group at query level
ReadAllQuery query = new ReadAllQuery(Employee.class);
query.setFetchGroupName("nameOnly");

// Only Employee attributes firstName and lastName are fetched.
// If you call the Employee get method for any other attribute, TopLink executes
// another query to retrieve all unfetched attribute values. Thereafter, calling that get
// method will return the value directly from the object

Reading Case 2: Batch Reading Objects

The way your application reads data from the database affects performance. For example, reading a collection of rows from the database is significantly faster than reading each row individually.

A common performance challenge is to read a collection of objects that have a one-to-one reference to another object. This typically requires one read operation to read in the source rows, and one call for each target row in the one-to-one relationship.

To reduce the number of read operations required, use join and batch reading. Example 11-7 illustrates the unoptimized code required to retrieve a collection of objects with a one-to-one reference to another object. Example 11-8 and Example 11-9 illustrate the use of joins and batch reading to improve efficiency.

Example 11-7 No Optimization

/* Read all the employees, and collect their address' cities. This takes N + 1   
   queries if not optimized */

// Read all the employees from the database. This requires 1 SQL call
Vector employees = session.readAllObjects(Employee.class,
   new ExpressionBuilder().get("lastName").equal("Smith"));

//SQL: Select * from Employee where l_name = 'Smith'

// Iterate over employees and get their addresses.
// This requires N SQL calls
Enumeration enum = employees.elements();
Vector cities = new Vector();
while(enum.hasMoreElements()) 
   Employee employee = (Employee) enum.nextElement();
   cities.addElement(employee.getAddress().getCity());

//SQL: Select * from Address where address_id = 123, etc 
}

Example 11-8 Optimization Through Joining

/* Read all the employees, and collect their address' cities. Although the code
   is almost identical because joining optimization is used it takes only 1 
   query */

// Read all the employees from the database, using joining. 
// This requires 1 SQL call
ReadAllQuery query = new ReadAllQuery(Employee.class);
ExpressionBuilder builder = query.getExpressionBuilder(); 
query.setSelectionCriteria(builder.get("lastName").equal("Smith"));
query.addJoinedAttribute("address");
Vector employees = session.executeQuery(query);

/* SQL: Select E.*, A.* from Employee E, Address A where E.l_name = 'Smith' and 
   E.address_id = A.address_id Iterate over employees and get their addresses. 
   The previous SQL already read all the addresses, so no SQL is required */
Enumeration enum = employees.elements();
Vector cities = new Vector();
while (enum.hasMoreElements()) {
    Employee employee = (Employee) enum.nextElement();
    cities.addElement(employee.getAddress().getCity());
}

Example 11-9 Optimization Through Batch Reading

/* Read all the employees, and collect their address' cities. Although the code 
   is almost identical because batch reading optimization is used it takes only 
   2 queries */ 

// Read all the employees from the database, using batch reading. 
// This requires 1 SQL call, note that only the employees are read 
ReadAllQuery query = new ReadAllQuery(Employee.class);
ExpressionBuilder builder = query.getExpressionBuilder(); 
query.setSelectionCriteria(bulder.get("lastName").equal("Smith"));
query.addBatchReadAttribute("address");
Vector employees = (Vector)session.executeQuery(query);

// SQL: Select * from Employee where l_name = 'Smith'

// Iterate over employees and get their addresses.
// The first address accessed will cause all the addresses
// to be read in a single SQL call 
Enumeration enum = employees.elements();
Vector cities = new Vector();
while (enum.hasMoreElements()) {
    Employee employee = (Employee) enum.nextElement();
    cities.addElement(employee.getAddress().getCity());
    // SQL: Select distinct A.* from Employee E, Address A 
    // where E.l_name = 'Smith' and E.address_id = A.address_i
}

Because the two-phase approach to the query (Example 11-8 and Example 11-9) accesses the database only twice, it is significantly faster than the approach illustrated in Example 11-7.

Joins offer a significant performance increase under most circumstances. Batch reading offers a further performance advantage in that it allows for delayed loading through value holders, and has much better performance where the target objects are shared.

For example, if employees in Example 11-7, Example 11-8, and Example 11-9 are at the same address, batch reading reads much less data than joining, because batch reading uses a SQL DISTINCT call to filter duplicate data.

Batch reading is available for one-to-one, one-to-many, many-to-many, direct collection, direct map and aggregate collection mappings. Joining is only available for one-to-one and one-to-many mappings. Note that one-to-many joining will return a large amount of duplicate data and so is normally less efficient than batch reading.


WARNING:

Allowing an unverified SQL string to be passed into methods (for example: readAllObjects(Class class, String sql) method) makes your application vulnerable to SQL injection attacks.


Reading Case 3: Using Complex Custom SQL Queries

TopLink provides a high-level query mechanism. However, if your application requires a complex query, a direct SQL or stored procedure call may be the best solution.

For more information about executing SQL calls, see "SQLCall".

Reading Case 4: Using View Objects

Some application operations require information from several objects rather than from just one. This can be difficult to implement, and resource-intensive. Example 11-10 illustrates unoptimized code that reads information from several objects.

Example 11-10 No Optimization

/* Gather the information to report on an employee and return the summary of the 
   information. In this situation, a hash table is used to hold the report 
   information. Notice that this reads a lot of objects from the database, but 
   uses very little of the information contained in the objects. This may take 5 
   queries and read in a large number of objects */

public Hashtable reportOnEmployee(String employeeName) {
    Vector projects, associations;
    Hashtable report = new Hashtable();
    // Retrieve employee from database
    Employee employee = session.readObject(Employee.class,
       new ExpressionBuilder.get("lastName").equal(employeeName)); 
    // Get all the projects affiliated with the employee
    projects = session.readAllObjects(Project.class, 
      "SELECT P.* FROM PROJECT P," + 
      "EMPLOYEE E WHERE P.MEMBER_ID = E.EMP_ID AND E.L_NAME = " + 
      employeeName);
    // Get all the associations affiliated with the employee
    associations = session.readAllObjects(Association.class, "SELECT A.* " +
      "FROM ASSOC A, EMPLOYEE E WHERE A.MEMBER_ID = E.EMP_ID AND E.L_NAME = "
      + employeeName);
    report.put("firstName", employee.getFirstName());
    report.put("lastName", employee.getLastName());
    report.put("manager", employee.getManager());
    report.put("city", employee.getAddress().getCity());
    report.put("projects", projects);
    report.put("associations", associations);
    return report;
}

To improve application performance in these situations, define a new read-only object to encapsulate this information, and map it to a view on the database. To set the object to be read-only, use the addDefaultReadOnlyClass API in the oracle.toplink.sessions.Project class.

Example 11-11 Optimization Through View Object

CREATE VIEW NAMED EMPLOYEE_VIEW AS (SELECT F_NAME = E.F_NAME, L_NAME = E.L_NAME,EMP_ID = E.EMP_ID, MANAGER_NAME = E.NAME, CITY = A.CITY, NAME = E.NAME 
FROM EMPLOYEE E, EMPLOYEE M, ADDRESS A 
WHERE E.MANAGER_ID = M.EMP_ID
AND E.ADDRESS_ID = A.ADDRESS_ID)

Define a descriptor for the EmployeeReport class as follows:

  • Define the descriptor as usual, but specify the tableName as EMPLOYEE_VIEW.

  • Map only the attributes required for the report. In the case of the numberOfProjects and associations, use a transformation mapping to retrieve the required data.

You can now query the report from the database in the same way as any other object enabled by TopLink.

Example 11-12 View the Report from Example 11-11

/* Return the report for the employee */
public EmployeeReport reportOnEmployee(String employeeName) {
    EmployeeReport report;
    report = (EmployeeReport) session.readObject(EmployeeReport.class, 
      new ExpressionBuilder.get("lastName").equal(employeeName));
    return report;
}

WARNING:

Allowing an unverified SQL string to be passed into methods (for example: readAllObjects(Class class, String sql) and readObject(Class class, String sql) method) makes your application vulnerable to SQL injection attacks.


Reading Case 5: Inheritance Views

If you have an inheritance hierarchy that spans multiple tables and frequently query for the root class, consider defining an inheritance all-subclasses view. This allows a view to be used for queries against an inheritance superclass that can read all of its subclasses in a single query instead of multiple queries.

For more information about inheritance, see "Descriptors and Inheritance".

For more information about querying on inheritance, see "Querying on an Inheritance Hierarchy".

Write Optimization Examples

TopLink provides the write optimization features listed in Table 11-13.

This section includes the following write optimization examples:

Table 11-13 Write Optimization Features

Feature Effect on Performance

Unit of Work

Improves performance by updating only the changed fields and objects.

Minimizes the amount of tracking required (which can be expensive) by registering only those objects that will change.

For more information, see Chapter 97, "Understanding TopLink Transactions").

Note: The Unit of Work supports marking classes as read-only (see "Configuring Read-Only Descriptors" and "Declaring Read-Only Classes"). This avoids tracking of objects that do not change.

Batch writing

Lets you group all insert, update, and delete commands from a transaction into a single database call. This dramatically reduces the number of calls to the database (see "Cursors" and "Batch Writing and Parameterized SQL").

Parameterized SQL

Improves performance for frequently executed SQL statements (see "Parameterized SQL and Prepared Statement Caching").

Sequence number preallocation

Dramatically improves insert performance. (see "Sequence Number Preallocation").

Multiprocessing

Splitting a batch job across threads lets you synchronize reads from a cursored stream and use parallel Units of Work for performance improvements even on a single machine (see "Multiprocessing").

Does exist alternatives

The does exist call on write object can be avoided in certain situations by checking the cache for does exist, or assuming the existence of the object (see "Configuring Existence Checking at the Project Level" or "Configuring Cache Existence Checking at the Descriptor Level" and "Using Registration and Existence Checking").


Writing Case: Batch Writes

The most common write performance problem occurs when a batch job inserts a large volume of data into the database. For example, consider a batch job that loads a large amount of data from one database, and then migrates the data into another. The objects involved:

  • Are simple individual objects with no relationships

  • Use generated sequence numbers as their primary key

  • Have an address that also uses a sequence number

The batch job loads 10,000 employee records from the first database and inserts them into the target database. With no optimization, the batch job reads all the records from the source database, acquires a Unit of Work from the target database, registers all objects, and commits the Unit of Work.

Example 11-13 No Optimization

/* Read all the employees, acquire a Unit of Work, and register them */

// Read all the employees from the database. This requires 1 SQL call,
// but will be very memory intensive as 10,000 objects will be read
Vector employees = sourceSession.readAllObjects(Employee.class);

//SQL: Select * from Employee

// Acquire a Unit of Work and register the employees
UnitOfWork uow = targetSession.acquireUnitOfWork();
uow.registerAllObjects(employees);
uow.commit();

// SQL: Begin transaction
// SQL: Update Sequence set count = count + 1 where name = 'EMP'
// SQL: Select count from Sequence
// SQL: ... repeat this 10,000 times + 10,000 times for the addresses ...
// SQL: Commit transaction
// SQL: Begin transaction
// SQL: Insert into Address (...) values (...)
// SQL: ... repeat this 10,000 times
// SQL: Insert into Employee (...) values (...)
// SQL: ... repeat this 10,000 times
// SQL: Commit transaction

This batch job performs poorly, because it requires 60,000 SQL executions. It also reads huge amounts of data into memory, which can raise memory performance issues. TopLink offers several optimization features to improve the performance of this batch job.

To improve this operation, do the following:

Cursors

To optimize the query in Example 11-13, use a cursored stream to read the Employees from the source database. You can also employ a weak identity map instead of a hard or soft cache identity map in both the source and target databases.

To address the potential for memory problems, use the releasePrevious method after each read to stream the cursor in groups of 100. Register each batch of 100 employees in a new Unit of Work and commit them.

Although this does not reduce the amount of executed SQL, it does address potential out-of-memory issues. When your system runs out of memory, the result is performance degradation that increases over time, and excessive disk activity caused by memory swapping on disk.

For more information, see "Cursored Streams and Scrollable Cursors".

Batch Writing and Parameterized SQL

Batch writing lets you combine a group of SQL statements into a single statement and send it to the database as a single database execution. This feature reduces the communication time between the application and the server, and substantially improves performance.

You can enable batch writing alone (dynamic batch writing) using Login method useBatchWriting. If you add batch writing to Example 11-13, you execute each batch of 100 employees as a single SQL execution. This reduces the number of SQL executions from 20,200 to 300.

You can also enable batch writing and parameterized SQL (parameterized batch writing) and prepared statement caching. Parameterized SQL avoids the prepare component of SQL execution. This improves write performance because it avoids the prepare cost of an SQL execution. For parameterized batch writing you would get one statement per Employee, and one for Address: this reduces the number of SQL executions from 20,200 to 400. Although this is more than dynamic batch writing alone, parameterized batch writing also avoids all parsing, so it is much more efficient overall.

Although parameterized SQL avoids the prepare component of SQL execution, it does not reduce the number of executions. Because of this, parameterized SQL alone may not offer as big of a gain as batch writing. However, if your database does not support batch writing, parameterized SQL will improve performance. If you add parameterized SQL in Example 11-13, you must still execute 20,200 SQL executions, but parameterized SQL reduces the number of SQL PREPAREs to 4.

For more information, see "Batch Writing".

Sequence Number Preallocation

SQL select calls are more resource-intensive than SQL modify calls, so you can realize large performance gains by reducing the number of select calls you issue. The code in Example 11-13 uses the select calls to acquire sequence numbers. You can substantially improve performance if you use sequence number preallocation.

In TopLink, you can configure the sequence preallocation size on the login object (the default size is 50). Example 11-13 uses a preallocation size of 1 to demonstrate this point. If you stream the data in batches of 100 as suggested in "Cursors", set the sequence preallocation size to 100. Because employees and addresses in the example both use sequence numbering, you further improve performance by letting them share the same sequence. If you set the preallocation size to 200, this reduces the number of SQL execution from 60,000 to 20,200.

For more information about sequencing preallocation, see "Sequencing and Preallocation Size".

Multiprocessing

You can use multiple processes or multiple machines to split the batch job into several smaller jobs. In this example, splitting the batch job across threads enables you to synchronize reads from the cursored stream, and use parallel Units of Work on a single machine.

This leads to a performance increase, even if the machine has only a single processor, because it takes advantage of the wait times inherent in SQL execution. While one thread waits for a response from the server, another thread uses the waiting cycles to process its own database operation.

Example 11-14 illustrates the optimized code for this example. Note that it does not illustrate multiprocessing.

Example 11-14 Fully Optimized

/* Read each batch of employees, acquire a Unit of Work, and register them */
targetSession.getLogin().useBatchWriting();
targetSession.getLogin().setSequencePreallocationSize(200);
targetSession.getLogin().bindAllParameters();
targetSession.getLogin().cacheAllStatements();
targetSession.getLogin().setMaxBatchWritingSize(200);

// Read all the employees from the database into a stream. 
// This requires 1 SQL call, but none of the rows will be fetched.
ReadAllQuery query = new ReadAllQuery(Employee.class);
query.useCursoredStream();
CursoredStream stream;
stream = (CursoredStream) sourceSession.executeQuery(query);
//SQL: Select * from Employee. Process each batch
while (! stream.atEnd()) {
    Vector employees = stream.read(100);
    // Acquire a unit of work to register the employees
    UnitOfWork uow = targetSession.acquireUnitOfWork();
    uow.registerAllObjects(employees);
    uow.commit();
}
//SQL: Begin transaction
//SQL: Update Sequence set count = count + 200 where name = 'SEQ'
//SQL: Select count from Sequence where name = 'SEQ'
//SQL: Commit transaction
//SQL: Begin transaction
//BEGIN BATCH SQL: Insert into Address (...) values (...)
//... repeat this 100 times
//Insert into Employee (...) values (...)
//... repeat this 100 times
//END BATCH SQL:
//SQL: Commit transactionJava optimization

Unit of Work Optimization

For best performance when using a unit of work, consider the following tips:

If your performance measurements show that you have a performance problem during unit of work commit, consider using object level or attribute level change tracking, depending on the type of objects involved and how they typically change. For more information, see "Unit of Work and Change Policy".

Application Server and Database Optimization

Configuring your application server and database correctly can have a big impact on performance and scalabilty. Ensure that you correctly optimize these key components of your application in addition to your TopLink application and persistence.

For your application or J2EE server, ensure your memory, thread pool and connection pool sizes are sufficient for your server's expected load, and that your JVM has been configured optimally.

Ensure that your database has been configured correctly for optimal performance and its expected load.