10
Tuning for Performance

Oracle Application Server TopLink applications are generally quite complex, and offer many opportunities for optimization. When you take an iterative approach to tuning, and you design your applications for peak efficiency, the result is an OracleAS TopLink application that is fast, smooth, and robust.

This chapter illustrates different methods to improve application performance. It discusses:

Introduction to Tuning Concepts

The most important concept associated with tuning your OracleAS TopLink application is the idea of an iterative approach. The most effective way to tune your application is to:

Use a profiling tool, such as the OracleAS TopLink Performance Profiler, to measure the application's performance.
Modify application components.
Measure performance again.

To identify the changes that improve your application performance, modify only one or two components at a time. You should also tune your application in a non-production environment before you deploy the application.

OracleAS TopLink as Part of a Larger Application

An OracleAS TopLink application is part of a larger application infrastructure that can include Web servers, external cache managers, external transactions controllers, and so on. To tune the OracleAS TopLink application most effectively, consider how the application interacts with the larger infrastructure, and include those considerations in performance testing.

An Effective Tuning Approach

To optimize performance, first check to see if a standard OracleAS TopLink feature addresses the problem you are trying to solve. The OracleAS TopLink documentation discusses the most common optimizations in the context of features they support. For example, "Query Object Performance Options" offers information on how to improve query performance.

After you implement the basic optimizations, consider the more complex optimizations provided in this chapter, which include:

Profiling Performance

The most important challenge to performance tuning is knowing what to optimize. To improve your application's performance, identify the areas of your application that do not operate at peak efficiency. The OracleAS TopLink Performance Profiler helps you identify performance problems.

The OracleAS TopLink Performance Profiler logs a summary of the performance statistics for every query you execute. The Profiler also logs a summary of all queries executed in a given session.

The Profiler logs the following information:

Query class
Domain class
Total time, total execution time of the query (in milliseconds)
Local time, the amount of time spent on the user's workstation (in milliseconds)
Number of objects, the total number of objects affected
Number of objects handled per second
Logging, the amount of time spent printing logging messages (in milliseconds)
SQL prepare, the amount of time spent preparing the SQL (in milliseconds)
SQL execute, the amount of time spent executing the SQL (in milliseconds)
Row fetch, the amount of time spent fetching rows from the database (in milliseconds)
Cache, the amount of time spent searching or updating the object cache (in milliseconds)
Object build, the amount of time spent building the domain object (in milliseconds)
Query prepare, the amount of time spent to prepare the query prior to execution (in milliseconds)
SQL generation, the amount of time spent to generate the SQL before it is sent to the database (in milliseconds)

Using the Profiler in the Web Client

The OracleAS TopLink Web Client also includes a graphical Performance Profiler.

For more information, see "Using the Performance Profiler".

Using the Profiler in Java

The Performance Profiler is an instance of the PerformanceProfiler class, found in oracle.toplink.tools.profiler. To access the Profiler, call the session's getProfiler() method.

To enable the Profiler, invoke the setProfiler(new PerformanceProfiler()) method on the session. To end a profiling session, invoke the clearProfiler() method. The Profiler supports the following public API:

logProfile(): enables the profiler
dontLogProfile(): disables the profile
logProfileSummaryByQuery(): organizes the profiler log as query summaries. This is the default profiler behavior.
logProfileSummaryByClass(): organizes the profiler log as class summaries. This is an alternative to the default behavior implemented by logProfileSummaryByQuery().

Example 10-1 Executing a Read Query with the Profiler

session.setProfiler(new PerformanceProfiler());
Vector employees = session.readAllObjects(Employee.class);

Example 10-2 Implementing the Performance Profiler in the sessions.xml File

<session>
    ...
    <profiler-class>oracle.toplink.tools.profiler.PerformanceProfiler</profiler-class>
    ...
</session>

Example 10-3 Performance Profiler Output

Begin Profile of{
ReadAllQuery(oracle.toplink.demos.employee.domain.Employee)Profile(ReadAllQuery,
# of obj=12, time=1399,sql execute=217, prepare=495, row fetch=390, 
time/obj=116,obj/sec=8) */
} End Profile

The second line of the profile contains the following information about a query:

ReadAllQuery(oracle.toplink.demos.employee.domain.Employee): specific query profiled, and its arguments.
Profile(ReadAllQuery: start of the profile and the type of query.
# of obj=12: number of objects involved in the query.
time=1399: total execution time of the query (in milliseconds).
sql execute=217: total time spent preparing the SQL.
prepare=495: total time spent preparing the SQL.
row fetch=390: total time spent fetching rows from the database.
time/obj=116: number of milliseconds spent on each object.
obj/sec=8) */: number of objects handled per second.

Browsing the Profiler Results

To view profiler results, use the graphical Profile Browser. From your application code, launch the browser, located in the oracle.toplink.tools.sessionconsole package.

Example 10-4 Launching the Profile Browser

ProfileBrowser.browseProfiler(session.getProfiler());

General Tuning Tips

To substantially improve your application efficiency and throughput, Table 10-1 lists several tuning areas and offers tips to obtain the best performance from your OracleAS TopLink application.

Table 10-1 Tips for Building Efficient OracleAS TopLink Applications

Area	Recommendations	Related Information
General	Do not override OracleAS TopLink default behavior unless your application absolutely requires it. Because OracleAS TopLink default behavior is set for optimum results with the most common applications, the default is usually the most efficient choice for any given option. This is especially important for query or cache behavior.
Mapping	Use indirection whenever possible, especially in cases where a class is normally used without its related objects.	See "Indirection"
Descriptors	Do not use `checkCacheThenDatabase` on descriptors unless required by the application. Query default behavior offers better performance.	See "Cache Usage" See "Advanced Finder Options"
	Use `conformResults` on queries only when required. This avoids unnecessary resource overhead.	See "Validating a Unit of Work" See "Cache Usage" See "Advanced Finder Options"
Queries	If possible, use named queries in your application. Named queries help you avoid duplication, are easy to maintain and reuse, and easily add complex query behavior to the application.	See "Predefined Queries"
	Use parameterized SQL to improve write performance. Parameterized SQL improves performance by reusing the same prepared statement for multiple executions. This reduces overhead.	See "Binding and Parameterized SQL" See "Parameterized SQL"
Sessions	Do not pool client sessions. Pooling sessions offers no performance gains.	See "Client Session"
	With JTA transactions, use `getActiveSession()` to access the active session for the current external transaction.	See "J2EE Integration"
	Use the OracleAS TopLink client session instead of remote session. client session is appropriate for most multi-user J2EE application server environments.	See "Client Session" See "J2EE Integration"
Unit of Work	When you read objects, use the Unit of Work only when the objects returned from a query will be modified.	See "Transactions"
Cache	Tune the OracleAS TopLink cache for each class to help eliminate the need for distributed cache synchronization. Always tune these settings before implementing cache synchronization.	See "Setting Class Information" in the Oracle Application Server TopLink Mapping Workbench User's Guide
	Use Weak Cache for particularly volatile objects.	See "Working with Identity Maps" in the Oracle Application Server TopLink Mapping Workbench User's Guide
Cache Synchronization	Do not use distributed cache synchronization unless it is required by your application. Distributed cache synchronization offers performance benefits only in clustered environments in which several servers in the cluster regularly request and update the same objects.	See "Distributed Cache Synchronization"
	Use Java Message Service (JMS) for cache synchronization rather than Remote Method Invocation (RMI). JMS is more robust, easier to configure, and runs asynchronously. If you require synchronous cache synchronization, use RMI.	See "Distributed Cache Synchronization"
Code	Use the OracleAS TopLink Mapping Workbench rather than hand-coding. The OracleAS TopLink Mapping Workbench is easy to use, and implements many OracleAS TopLink features for you automatically.
	Use instance or static variables to cache the results of resource intensive computations.
	If you use RMI or CORBA, avoid fine grain remote message sends.

Basic Performance Optimization

Performance considerations are present at every step of the development cycle. Although this implies an awareness of performance issues in your design and implementation, it does not mean that you should expect to achieve the best possible performance in your first pass.

For example, if an optimization complicates the design, leave it until the final development phase. You should still plan for these optimizations from your first iteration, to make them easier to integrate later.

OracleAS TopLink provides a diverse set of features to optimize performance. You enable or disable most features in the descriptors or database session, making any resulting performance gains global.

OracleAS TopLink Reading Optimization Features

You can optimize certain read and write operations in an OracleAS TopLink application. To optimize reading, you can tune:

The amount of data read from the database
The way OracleAS TopLink queries data on the database

OracleAS TopLink provides the read optimization features listed in Table 10-2.

Table 10-2 Read Optimization Features

Feature	Function	Performance Technique
Unit of Work	Tracks object changes within the Unit of Work.	To minimize the amount of tracking required register only those objects that will change.
Object indirection	Uses valueholders as a stand-in for domain objects.	`Valueholders` can provide a major performance benefit, because they minimize database reads.
Soft cache weak identity map	Offers client-side caching for objects read from database, and drops objects from the cache when memory becomes low.	Reduces database calls and improves memory performance.
Weak identity map	Offers client-side caching for objects.	Reduces database access and maintains a cache of all referenced objects.
Full identity map	Offers client side caching for objects.	Avoids database calls for objects that have already been read. Limit the cache size. A large cache can impact system performance.
Cache identity map	Offers a fixed size client side cache.	Leverages a moderate caching strategy, and controls the impact on memory.
No identity map	Disables cache lookup.	Useful if you prefer database access over cached objects.
Batch reading and joining	Reduces database access by batching many queries into a single query that reads more data.	Dramatically reduces the number of database accesses required to perform a READ query.
Partial object reading	Allows reading of a subset of a result set of the object's attributes.	Reduces the amount of data read from the database at any one time. Reducing connection time for each read improves performance.
Report query	Similar to partial object reading, but returns only the data instead of the objects.	Supports complex reporting functions such as aggregation and group-by functions. Also enables you to compute complex results on the database, instead of reading the objects into the application and computing the results locally.

Reading Case 1: Displaying Names in a List

An application may ask the user to choose an element from a list. Because the list displays only a subset of the information contained in the objects, it is not necessary to query for all information for objects from the database.

Partial object reading and report query are two OracleAS TopLink features that optimize these types of operations. They enable you to query only the information required to display the list. The user can then select an object from the list.

Partial Object Reading

Partial object reading is a query designed to extract only the required information from a selected record in a database, rather than all the information the record contains. Because partial object reading does not fully populate objects, you can neither cache nor edit partially-read objects. Also note that the primary key is required to re-query the object (so it can be edited, for example). OracleAS TopLink does not automatically include the primary key information in a partially populated object. If you want to edit the object, specify the primary key as a required partial attribute.

In Example 10-5, the query builds complete employee objects, even though the list displays only employee last names. With no optimization, the query reads employee data.

Example 10-5 No Optimization

/* Read all the employees from the database, ask the user to choose one and 
return it. This must read in all the information for all the employees.*/
List list;

// Fetch data from database and add to list box.
Vector employees = (Vector) session.readAllObjects(Employee.class);
list.addAll(employees);

// Display list box.
....

// Get selected employee from list.
Employee selectedEmployee = (Employee) list.getSelectedItem();

return selectedEmployee;

Example 10-6 demonstrates the use of partial object reading. It reads only the last name and primary key for the employees. This reduces the amount of data read from the database.

Example 10-6 Optimization Through Partial Object Reading

/* Read all the employees from the database, ask the user to choose one and 
return it. This uses partial object reading to read just the last name of the 
employees. Note that OracleAS TopLink does not automatically include the primary 
key of the object. If this is needed to select the object for a query, it must 
be specified as a partial attribute so that it can be included. In this way, the 
object can easily be read for editing. */
List list;
// Fetch data from database and add to list box.
ReadAllQuery query = new ReadAllQuery(Employee.class);
query.addPartialAttribute("lastName");
/* OracleAS TopLink does not automatically include the primary key of the 
object. If this is needed to select the object for a query, it must be specified 
as a partial attribute so that it can be included.*/
query.addPartialAttribute("id");
// The next line avoids a query exception
query.dontMaintainCache();
Vector employees = (Vector) session.executeQuery(query);
list.addAll(employees);

// Display list box.
....
// Get selected employee from list.
Employee selectedEmployee = 
(Employee)session.readObject(list.getSelectedItem());
return selectedEmployee;

ReportQuery

Report query enables you to retrieve data from a set of objects and their related objects. Report query supports database reporting functions and features.

For more information, see "ReportQuery".

Example 10-7 demonstrates the use of report query to read only the last name of the employees. This reduces the amount of data read from the database compared to the code in Example 10-5, and avoids instantiating employee instances.

Example 10-7 Optimization Through Report Query

/* Read all the employees from the database, ask the user to choose one and 
return it. This uses the report query to read just the last name of the 
employees. It then uses the primary key stored in the report query result to 
read the real object.*/
List list;
// Fetch data from database and add to list box.
ExpressionBuilder builder = new ExpressionBuilder();
ReportQuery query = new ReportQuery (Employee.class, builder);
query.addAttribute("lastName");
query.retrievePrimaryKeys();
Vector reportRows = (Vector) session.executeQuery(query);
list.addAll(reportRows);

// Display list box.
....

// Get selected employee from list.
ReportQueryResult result = (ReportQueryResult) list.getSelectedItem();
Employee selectedEmployee = (Employee) 
   result.readobject(Employee.Class,session);

Although the differences between the unoptimized example (Example 10-5) and the report query optimization in Example 10-7 appear to be minor, report queries offer a substantial performance improvement.

Reading Case 2: Batch Reading Objects

The way your application reads data from the database affects performance. For example, reading a collection of rows from the database is significantly faster than reading each row individually.

A common performance challenge is to read a collection of objects that have a one-to-one reference to another object. This normally requires one read operation to read in the source rows, and one call for each target row in the one-to-one relationship.

To reduce the number of reads required, use join and batch reading. Example 10-8 illustrates the unoptimized code required to retrieve a collection of objects with a one-to-one reference to another object. Example 10-9 and Example 10-10 illustrate the use of joins and batch reading to improve efficiency.

Example 10-8 No Optimization

/*Read all the employees, and collect their address' cities. This takes N + 1 
queries if not optimized. */

// Read all the employees from the database. This requires 1 SQL call.
Vector employees = session.readAllObjects(Employee.class,new 
   ExpressionBuilder().get("lastName").equal("Smith"));

//SQL: Select * from Employee where l_name = `Smith'

// Iterate over employees and get their addresses.
// This requires N SQL calls.
Enumeration enum = employees.elements();
Vector cities = new Vector();
while(enum.hasMoreElements()) Employee employee = (Employee) enum.nextElement();
   cities.addElement(employee.getAddress().getCity());

//SQL: Select * from Address where address_id = 123, etc }

Example 10-9 Optimization Through Joining

/* Read all the employees, and collect their address' cities. Although the code 
is almost identical because joining optimization is used it only takes 1 query. 
*/

// Read all the employees from the database, using joining. This requires 1 SQL 
call.
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.setSelectionCriteria(new 
   ExpressionBuilder().get("lastName").equal("Smith"));
query.addJoinedAttribute("address");
Vector employees = session.executeQuery(query);

// SQL: Select E.*, A.* from Employee E, Address A where E.l_name = `Smith' and 
E.address_id = A.address_id Iterate over employees and get their addresses. The 
previous SQL already read all the addresses so no SQL is required.
Enumeration enum = employees.elements();
Vector cities = new Vector();
while (enum.hasMoreElements()) {
Employee employee = (Employee) enum.nextElement();
    cities.addElement(employee.getAddress().getCity());

Example 10-10 Optimization Through Batch Reading

/* Read all the employees, and collect their address' cities. Although the code 
is almost identical because batch reading optimization is used it only takes 2 
queries. */

// Read all the employees from the database, using batch reading. This requires 
1 SQL call, note that only the employees are read. 
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.setSelectionCriteria(new 
   ExpressionBuilder().get("lastName").equal("Smith"));
query.addBatchReadAttribute("address");
Vector employees = (Vector)session.executeQuery(query);

// SQL: Select * from Employee where l_name = `Smith'

// Iterate over employees and get their addresses.
// The first address accessed will cause all the addresses to be read in a 
single SQL call.
Enumeration enum = employees.elements();
Vector cities = new Vector();
while (enum.hasMoreElements()) {
    Employee employee = (Employee) enum.nextElement();
    cities.addElement(employee.getAddress().getCity());
    // SQL: Select distinct A.* from Employee E, Address A 
      where E.l_name = `Smith' and E.address_id = A.address_i
}

Because the two-phase approach to the query (Example 10-9 and Example 10-10) accesses the database only twice, it is significantly faster than the approach illustrated in Example 10-8.

Joins offer a significant performance increase under most circumstances. Batch reading offers further performance advantage in that it allows for delayed loading through valueholders, and has much better performance where the target objects are shared.

For example, if employees in Example 10-8, Example 10-9, and Example 10-10 live at the same address, batch reading reads much less data than joining, because batch reading uses a SQL DISTINCT call to filter duplicate data. Batch reading is also available for one-to-many relationships, but joining is available only for one-to-one relationships.

Reading Case 3: Using Complex Custom SQL Queries

OracleAS TopLink provides a high-level query mechanism. However, if your application requires a complex query, a direct SQL call may be the best solution.

For more information about executing SQL calls, see "Custom SQL".

Reading Case 4: Using View Objects

Some application operations require information from several objects rather than from just one. This can be difficult to implement, and resource intensive. Example 10-11 illustrates unoptimized code that reads information from several objects.

Example 10-11 No Optimization

/* Gather the information to report on an employee and return the summary of the 
information. In this situation a hashtable is used to hold the report 
information. Notice that this reads a lot of objects from the database, but uses 
very little of the information contained in the objects. This may take 5 queries 
and read in a large number of objects.*/

public Hashtable reportOnEmployee(String employeeName)
    {
    Vector projects, associations;
    Hashtable report = new Hashtable();
    // Retrieve employee from database.
    Employee employee = session.readObject(Employee.class, new
       ExpressionBuilder.get("lastName").equal(employeeName)); 
    // Get all the projects affiliated with the employee.
    projects = session.readAllObjects(Project.class, "SELECT P.* FROM PROJECT P, 
      EMPLOYEE E WHERE P.MEMBER_ID = E.EMP_ID AND E.L_NAME = " + employeeName);
    // Get all the associations affiliated with the employee.
    associations = session.readAllObjects(Association.class, "SELECT A.*
      FROM ASSOC A, EMPLOYEE E WHERE A.MEMBER_ID = E.EMP_ID AND E.L_NAME = 
      " + employeeName);
}
    report.put("firstName", employee.getFirstName());
    report.put("lastName", employee.getLastName());
    report.put("manager", employee.getManager());
    report.put("city", employee.getAddress().getCity());
    report.put("projects", projects);
    report.put("associations", associations);
    return report;
}

To improve application performance in these situations, define a new read-only object to encapsulate this information, and map it to a view on the database. To set the object to be read-only, use the addDefaultReadOnlyClass() API in the oracle.toplink.sessions.Project class.

Example 10-12 Optimization Through View Object

CREATE VIEW NAMED EMPLOYEE_VIEW AS (SELECT F_NAME = E.F_NAME, L_NAME = E.L_
NAME,EMP_ID = E.EMP_ID, MANAGER_NAME = E.NAME, CITY = A.CITY, NAME = E.NAME 
FROM EMPLOYEE E, EMPLOYEE M, ADDRESS A 
WHERE E.MANAGER_ID = M.EMP_ID
AND E.ADDRESS_ID = A.ADDRESS_ID)

Define a descriptor for the EmployeeReport class:

Define the descriptor normally, but specify tableName as EMPLOYEE_VIEW.
Map only the attributes required for the report. In the case of numberOfProjects and associations, use a transformation mapping to retrieve the required data.

You can now query the report from the database like any other OracleAS TopLink-enabled object.

Example 10-13 View the Report from Example 10-12

/* Return the report for the employee.*/
public EmployeeReport reportOnEmployee(String employeeName) 
{
    EmployeeReport report;
    report = (EmployeeReport) session.readObject(EmployeeReport.class, 
      new ExpressionBuilder.get("lastName").equal(employeeName));
    return report;}

OracleAS TopLink Writing Optimization Features

Table 10-3 lists OracleAS TopLink's write optimization features.

Table 10-3 Write Optimization Features

Feature	Effect on Performance
Unit of Work	Improves performance by updating only the changed fields and objects. Minimizes the amount of tracking required (which can be expensive) by registering only those objects that will change. Note: The Unit of Work supports marking classes as read-only. This avoids tracking of objects that do not change.
Parameterized SQL	Improves performance for frequently executed SQL statements.
Batch writing	Allows you to group all insert, update, and delete commands from a transaction into a single database call. This dramatically reduces the number of calls to the database.
Sequence number preallocation	Dramatically improves insert performance.
Does exist alternatives	The does exist call on write object can be avoided in certain situations by checking the cache for does exist, or assuming the existence of the object.

Writing Case 1: Batch Writes

The most common write performance problem occurs when a batch job inserts a large volume of data into the database. For example, consider a batch job that loads a large amount of data from one database, and then migrates the data into another. The objects involved:

Are simple individual objects with no relationships
Use generated sequence numbers as their primary key
Have an address that also uses a sequence number

The batch job loads 10,000 employees from the first database and inserts them into the target database. With no optimization, the batch job reads all the records from the source database, acquires a Unit of Work from the target database, registers all objects, and commits the Unit of Work.

Example 10-14 No Optimization

/* Read all the employees, acquire a Unit of Work and register them. */

// Read all the employees from the database. This requires 1 SQL call, but will 
be very memory intensive as 10,000 objects will be read.
Vector employees = sourceSession.readAllObjects(Employee.class);

//SQL: Select * from Employee

// Acquire a Unit of Work and register the employees.
UnitOfWork uow = targetSession.acquireUnitOfWork();
uow.registerAllObjects(employees);
uow.commit();

//SQL: Begin transaction
//SQL: Update Sequence set count = count + 1 where name = 'EMP'
//SQL: Select count from Sequence
//SQL: ... repeat this 10,000 times + 10,000 times for the addresses ...
//SQL: Commit transaction
//SQL: Begin transaction
//SQL: Insert into Address (...) values (...)
//SQL: ... repeat this 10,000 times
//SQL: Insert into Employee (...) values (...)
//SQL: ... repeat this 10,000 times
//SQL: Commit transaction}

This batch job performs poorly, because it requires 60,000 SQL executions. It also reads huge amounts of data into memory, which can raise memory performance issues. OracleAS TopLink offers several optimization features to improve the performance of this batch job.

To improve this operation:

Leverage OracleAS TopLink batch reads and cursor support.
Implement sequence number preallocation.
Use batch writing to write to the database.

If your database does not support batch writing, use parameterized SQL to implement the write query.
Implement multiprocessing.

Cursors and Batch Writes

To optimize the query in Example 10-14, use a cursored stream to read the employees from the source database. You can also employ a cache identity map rather than a full identity map in both the source and target databases.

To address the potential for memory problems, use the releasePrevious() method after each read to stream the cursor in groups of 100. Register each batch of 100 employees in a new Unit of Work and commit them.

Although this does not reduce the amount of executed SQL, it does address potential out-of-memory issues. When your system runs out of memory, the result is performance degradation that increases over time, and excessive disk activity caused by memory swapping on disk.

Sequence Number Preallocation

SQL select calls are more resource-intensive than SQL modify calls, so you can realize large performance gains by reducing the number of select calls you issue. The code in Example 10-14 uses the select calls to acquire sequence numbers. You can substantially improve performance if you use sequence number preallocation.

In OracleAS TopLink, you can configure the sequence preallocation size on the login object (the default size is 50). Example 10-14 uses a preallocation size of 1 to demonstrate this point. If you stream the data in batches of 100 as suggested in "Cursors and Batch Writes", set the sequence preallocation size to 100. Because employees and addresses in the example both use sequence numbering, you further improve performance by letting them share the same sequence. If you set the preallocation size to 200, this reduces the number of SQL execution from 60,000 to 20,200.

Batch Writing

Batch writing enables you to combine a group of SQL statements into a single statement and send it to the database as a single database execution. This feature reduces the communication time between the application and the server, and substantially improves performance.

You can enable batch writing on the login object with the useBatchWriting() method. If you add batch writing to Example 10-14, you execute each batch of 100 employees as a single SQL execution. This reduces the number of SQL execution from 20,200 to 300.

Parameterized SQL

OracleAS TopLink supports parameterized SQL and prepared statement caching. Using parameterized SQL improves write performance, because it avoids the prepare cost of a SQL execution.

You cannot use batch writing and parameterized SQL together, because batch writing does not use individual statements. Because the performance benefits of batch writing are much greater than those of parameterized SQL, use batch writing if it is supported by your database.

Parameterized SQL avoids the prepare component of SQL execution, but does not reduce the number of executions. Because of this, it normally offers only moderate performance gains. However, if your database does not support batch writing, parameterized SQL improves performance. If you add parameterized SQL in Example 10-14, you must still execute 20,200 SQL executions, but parameterized SQL reduces the number of SQL PREPAREs to 4.

Multiprocessing

You can use multiple processes or multiple machines to split the batch job into several smaller jobs. In this example, splitting the batch job across threads enables you to synchronize reads from the cursored stream, and use parallel Units of Work on a single machine.

This leads to a performance increase, even if the machine has only a single processor, because it takes advantage of the wait times inherent in SQL execution. While one thread waits for a response from the server, another thread uses the waiting cycles to process its own database operation.

Example 10-15 illustrates the optimized code for this example. Note that it does not illustrate multiprocessing.

Example 10-15 Fully Optimized

/* Read each batch of employees, acquire a Unit of Work and register them. */
targetSession.getLogin().useBatchWriting();
targetSession.getLogin().setSequencePreallocationSize(200);

// Read all the employees from the database, into a stream. This requires 1 SQL 
call, but none of the rows will be fetched.
ReadAllQuery query = new ReadAllQuery();
query.setReferenceClass(Employee.class);
query.useCursoredStream();
CursoredStream stream;
stream = (CursoredStream) sourceSession.executeQuery(query);
//SQL: Select * from Employee. Process each batch
while (! stream.atEnd()) {
    Vector employees = stream.read(100);
    // Acquire a Unit of Work to register the employees
    UnitOfWork uow = targetSession.acquireUnitOfWork();
    uow.registerAllObjects(employees);
    uow.commit();
}
//SQL: Begin transaction
//SQL: Update Sequence set count = count + 200 where name = 'SEQ'
//SQL: Select count from Sequence where name = 'SEQ'
//SQL: Commit transaction
//SQL: Begin transaction
//BEGIN BATCH SQL: Insert into Address (...) values (...)
//... repeat this 100 times
//Insert into Employee (...) values (...)
//... repeat this 100 times
//END BATCH SQL:
//SQL: Commit transactionJava optimization

Schema Optimization

Optimization is an important consideration when you design your database schema and object model. Most performance issues occur when the object model or database schema is too complex, which can make the database slow and difficult to query. This is most likely to happen if you derive your database schema directly from a complex object model.

To optimize performance, we recommend you design the object model and database schema together however, ensure there is no direct one-to-one correlation between the two.

Schema Case 1: Aggregation of Two Tables into One

A common schema optimization technique is to aggregate two tables into a single table. This improves read and write performance by requiring only one database operation instead of two.

Table 10-4 and Table 10-5 illustrate the table aggregation technique.

Table 10-4 Original Schema

Elements	Details
Title	ACME Member Location Tracking System
Classes	Member, Address
Tables	MEMBER, ADDRESS
Relationships	Source, Instance Variable, Mapping, Target, Member, address, one-to-one, Address

The nature of this application dictates that developers always look up employees and addresses together. As a result, querying a member based on address information requires a database join, and reading a member and its address requires two read statements. Writing a member requires two write statements. This adds unnecessary complexity to the system, and results in poor performance.

A better solution is to combine the MEMBER and ADDRESS tables into a single table, and change the one-to-one relationship to an aggregate relationship. This enables you to read all information with a single operation, and doubles the speed of updates and inserts, because they must modify only a single row in one table.

Table 10-5 Optimized Schema

Elements	Details
Classes	Member, Address
Tables	MEMBER
Relationships	Source, Instance Variable, Mapping, Target, Member, address, aggregate, Address

Schema Case 2: Splitting One Table into Many

To improve overall performance of the system, split large tables into two or more smaller tables. This significantly reduces the amount of data traffic required to query the database.

For example, the system illustrated in Table 10-6 assigns employees to projects within an organization. The most common operation reads a set of employees and projects, assigns employees to projects, and update the employees. The employee's address or job classification is also occasionally used to determine the project on which the employee is placed.

Table 10-6 Original Schema

Elements	Details	Instance Variable	Mapping	Target
Title	ACME Employee Workflow System
Classes	Employee, Address, PhoneNumber, EmailAddress, JobClassification, Project
Tables	EMPLOYEE, PROJECT, PROJ_EMP
Relationships	Employee	address	aggregate	Address
	Employee	phoneNumber	aggregate	EmailAddress
	Employee	emailAddress	aggregate	EmailAddress
	Employee	job	aggregate	JobClassification
	Employee	projects	many-to-many	Project

When you read a large volume of employees from the database, you must also read their aggregate parts. Because of this, the system suffers from general read performance issues. To resolve this, break the EMPLOYEE table into the EMPLOYEE, ADDRESS, PHONE, EMAIL, and JOB tables, as illustrated in Table 10-7.

Because you normally read only the employee information, splitting the table reduces the amount of data transferred from the database to the client. This improves your read performance by reducing the amount of data traffic by 25%.

Table 10-7 Optimized Schema

Elements	Details	Instance Variable	Mapping	Target
Title	ACME Employee Workflow System
Classes	Employee, Address, PhoneNumber, EmailAddress, JobClassification, Project
Tables	EMPLOYEE, ADDRESS, PHONE, EMAIL, JOB, PROJECT, PROJ_EMP
Relationships	Employee	address	one-to-one	Address
	Employee	phoneNumber	one-to-one	EmailAddress
	Employee	emailAddress	one-to-one	EmailAddress
	Employee	job	one-to-one	JobClassification
	Employee	projects	many-to-many	Project

Schema Case 3: Collapsed Hierarchy

A common mistake when you transform an object oriented design into a relational model, is to build a large hierarchy of tables on the database. This makes querying difficult, because queries against this type of design can require a large number of joins. It is usually a good idea to collapse some of the levels in your inheritance hierarchy into a single table.

Table 10-8 represents a system that assigns clients to a company's sales representatives. The managers also track the sales representatives that report to them.

Table 10-8 Original Schema

Elements	Details
Title	ACME Sales Force System
Classes	Tables
Person	PERSON
Employee	PERSON, EMPLOYEE
SalesRep	PERSON, EMPLOYEE, REP
Staff	PERSON, EMPLOYEE, STAFF
Client	PERSON, CLIENT
Contact	PERSON, CONTACT

The system suffers from complexity issues that hinder system development and performance. Nearly all queries against the database require large, resource intensive joins. If you collapse the three-level table hierarchy into a single table, as illustrated in Table 10-9, you substantially reduce system complexity. You eliminate joins from the system, and simplify queries.

Table 10-9 Optimized Schema

Elements	Details
Classes	Tables
Person	none
Employee	EMPLOYEE
SalesRep	EMPLOYEE
Staff	EMPLOYEE
Client	CLIENT
Contact	CLIENT

Schema Case 4: Choosing One Out of Many

In a one-to-many relationship, a single source object has a collection of other objects. In some cases, the source object frequently requires one particular object in the collection, but requires the other objects only infrequently. You can reduce the size of the returned result set in this type of case by adding an instance variable for the frequently required object. This enables you to access the object without instantiating the other objects in the collection.

Table 10-10 represents a system by which an international shipping company tracks the location of packages in transit. When a package moves from one location to another, the system creates a new a location entry for the package in the database. The most common query against any given package is for its current location.

Table 10-10 Original Schema

Elements	Details	Instance Variable	Mapping	Target
Title	ACME Shipping Package Location Tracking System
Classes	Package, Location
Tables	PACKAGE, LOCATION
Relationships	Package	locations	one-to-many	Location

A package in this system can accumulate several location values in its LOCATION collection as it travels to its destination. Reading all locations from the database is resource intensive, especially when the only location of interest is the current location.

To resolve this type of problem, add a specific instance variable that represents the current location. You then add a one-to-one mapping for the instance variable, and use the instance variable to query for the current location. As illustrated in Table 10-11, because you can now query for the current location without reading all locations associated with the package, this dramatically improves the performance of the system.

Table 10-11 Optimized Schema

Elements	Details	Instance Variable	Mapping	Target
Classes	Package, Location
Tables	PACKAGE, LOCATION
Relationships	Package	locations	one-to-many	Location
	Package	currentLocation	one-to-one	Location

10Tuning for Performance