Performance Tuning Your JDBC Application


	BEA Home \| Events \| Solutions \| Partners \| Products \| Services \| Download \| Developer Center \| WebSUPPORT
	http://www.oracle.com/technology/documentation/index.html \| Site Map \| Search \| Download Documentation \| Contact \| Glossary

\| WebLogic Server Doc Home \| Programming WebLogic JDBC \| Previous Topic \| Next Topic \| Contents \| View as PDF

The following sections explain how to get the most out of your applications:

Overview of JDBC Performance

The concepts involved with Java, JDBC, and DBMS processing are new to many programmers. As Java becomes more widely used, database access and database applications will become increasingly easy to implement. This document provides some tips on how to obtain the best performance from JDBC applications.

WebLogic Performance-Enhancing Features

WebLogic has several features that enhance performance for JDBC applications.

How Connection Pools Enhance Performance

Establishing a JDBC connection with a DBMS can be very slow. If your application requires database connections that are repeatedly opened and closed, this can become a significant performance issue. WebLogic connection pools offer an efficient solution to this problem.

When WebLogic Server starts, connections from the connection pools are opened and are available to all clients. When a client closes a connection from a connection pool, the connection is returned to the pool and becomes available for other clients; the connection itself is not closed. There is little cost to opening and closing pool connections.

How many connections should you create in the pool? A connection pool can grow and shrink according to configured parameters, between a minimum and a maximum number of connections. The best performance will always be when the connection pool has as many connections as there are concurrent users.

Caching Data

DBMS access uses considerable resources. If your program accesses frequently used data that can be shared among applications or can persist between connections, you can cach the data by using the following:

Designing Your Application for Best Performance

The large majority of the performance to be gained or lost in a DBMS application is not in the application language, but in how the application is designed. The number and location of clients, size and structure of DBMS tables and indexes, and the number and types of queries all affect application performance.

Below are general hints that apply to all DBMSs. It is also important to be familiar with the performance documentation of the specific DBMS that you use in your application.

1. Process as Much Data as Possible Inside the Database

Most serious performance problems in DBMS applications come from moving raw data around needlessly, whether it is across the network or just in and out of cache in the DBMS. A good method for minimizing this waste is to put your logic where the data is—in the DBMS, not in the client —even if the client is running on the same box as the DBMS. In fact, for some DBMSs a fat client and a fat DBMS sharing one CPU is a performance disaster.

Most DBMSs provide stored procedures, an ideal tool for putting your logic where your data is. There is a significant difference in performance between a client that calls a stored procedure to update 10 rows, and another client that fetches those rows, alters them, and sends update statements to save the changes to the DBMS.

You should also review the DBMS documentation on managing cache memory in the DBMS. Some DBMSs (Sybase, for example) provide the means to partition the virtual memory allotted to the DBMS, and to guarantee certain objects exclusive use of some fixed areas of cache. This means that an important table or index can be read once from disk and remain available to all clients without having to access the disk again.

2. Use Built-in DBMS Set-based Processing

SQL is a set processing language. DBMSs are designed from the ground up to do set-based processing. Accessing a database one row at a time is, without exception, slower than set-based processing and, on some DBMSs is poorly implemented. For example, it will always be faster to update each of four tables one at a time for all the hundred employees represented in the tables than to alter each table 100 times, once for each employee.

Understanding set-based methodology can be very useful. Many complicated processes that were originally thought too complex to do any other way but row-at-a-time have been rewritten using set-based processing, resulting in improved performance. For example, a major payroll application was converted from a huge slow COBOL application to four stored procedures running in series, and what took hours on a multi-CPU machine now takes fifteen minutes with many fewer resources used.

3. Make Your Queries Smart

Frequently customers ask how to tell how many rows will be coming back in a given result set. This is a valid question, but there is no easy answer. The only way to find out without fetching all the rows is by issuing the same query using the count keyword:

SELECT count(*) from myTable, yourTable where ...

This returns the number of rows the original query would have returned. The actual count may change when the query is issued if there has been any other DBMS activity which alters the relevant data.

You should be aware, however, that this is a resource-intensive operation. Depending on the original query, the DBMS will have to perform nearly as much work to count the rows as it will to send them.

Your application should tailor its queries to be as specific as possible about what data it actually wants. Tricks include first selecting into temporary tables, returning only the count, and then sending a refined second query to return only a subset of the rows in the temporary table.

Learning to select only the data you really want at the client is crucial. Some applications ported from ISAM (a pre-relational database architecture) will unnecessarily send a query selecting all the rows in a table when only the first few rows are really wanted. Some applications use a 'sort by' clause to get the rows they want to come back first. Database queries like this cause unnecessary degradation of performance.

Proper use of SQL can avoid these performance problems. For example, if you only want data about the top 3 earners on the payroll, the proper way to make this query is with a correlated subquery. Table 3-1 shows the entire table returned by the SQL statement

select * from payroll

Table 3-1 Full Results Returned

Name	Salary
Joe	10
Mikes	20
Sam	30
Tom	40
Jan	50
Ann	60
Sue	70
Hal	80
May	80

Here a correlated subquery

select p.name, p.salary from payroll p

where 3 >= (select count(*) from payroll pp

where pp.salary >= p.salary);

returns a much smaller result, shown in Table 3-2.

Table 3-2 Results from Subquery

Name	Salary
Sue	70
Hal	80
May	80

This query returns only 3 rows, with the name and salary of the top 3 earners. It scans through the payroll table, and for every row, it goes through the whole payroll table again in an inner loop to see how many salaries are higher than the current row of the outer scan. This may look complicated, but DBMSs are designed to use SQL efficiently for this type of operation.

4. Make Transactions Single-batch

Whenever possible, collect a set of data operations and submit an update transaction in one statement in the form:

BEGIN TRANSACTION

UPDATE TABLE1...

INSERT INTO TABLE2

DELETE TABLE3

COMMIT

This approach results in better performance than using separate statements and commits. Even with conditional logic and temporary tables in the batch, it is preferable because the DBMS will obtain all the locks necessary on the various rows and tables, and will use them and release them in one step. Using separate statements and commits results in many more client-to-DBMS transmissions and holds the locks in the DBMS for much longer. These locks will block out other clients from accessing this data, and, depending on whether different updates can alter tables in different orders, may cause deadlocks.

Warning: If any individual statement in the above transaction might fail, due, for instance, to violating a unique key constraint, you should put in conditional SQL logic to detect any statement failure and rollback the transaction rather than commit. If, in the above example, the insert failed, most DBMSs will send back an error message about the failed insert, but will behave as if you got the message between the second and third statement, and decided to commit anyway! Microsoft SQL Server has a nice connection option enabled by executing the SQL set xact_abort on, which automatically rolls back the transaction if any statement fails.

5. Never Have a DBMS Transaction Span User Input

If an application sends a 'BEGIN TRAN' and some SQL which locks rows or tables for an update, do not write your application so that it must wait on the user to press a key before committing the transaction. That user may go to lunch first and lock up a whole DBMS table until the user returns.

If user input is needed to form or complete a transaction, use optimistic locking. Briefly, optimistic locking employs timestamps and triggers (some DBMSs will generate these automatically with tables set up for it) in queries and updates. Queries select data with timestamp values and prepare a transaction based on that data, without locking the data in a transaction.

When an update transaction is finally defined by the user input, it is sent as a single submission that includes timestamped safeguards to make sure the data is the same as originally fetched. A successful transaction will automatically update the relevant timestamps for changed data. If any interceding update from another client has altered any of the data on which the current transaction is based, the timestamps will have changed, and the current transaction will be rejected. Most of the time, no relevant data has been changed so transactions usually succeed. When one a transaction fails, the application can refetch the updated data to present to the user to reform the transaction if desired.

Refer to your DBMS documents for a full description of this technique.

6. Use In-place Updates

Changing a data row in place is much faster than moving a row, which may be required if the update requires more space than the table design can accommodate. If you design your rows to have the space they need initially, updates will be faster. The trade-off is that your table may require more disk space but may run faster. Since disk space is cheap, using a little more of it can be a worthwhile investment to improve performance.

7. Keep Operational Data Sets Small

Some applications store operational data in the same table as historical data. Over time and with accumulation of this historical data, all operational queries have to read through lots of useless (on a day-to-day basis) data to get to the more current data. Move non-current data to other tables and do joins to these tables for the rarer historical queries. If this can't be done, index and cluster your table so that the most frequently used data is logically and physically localized.

8. Use Pipelining and Parallelism

DBMSs are designed to work best when very busy with lots of different things to do. The worst way to use a DBMS is as dumb file storage for one big single-threaded application. If you can design your application and data to support lots of parallel processes working on easily distinguished subsets of the work, your application will be much faster. If there are multiple steps to processing, try to design your application so that subsequent steps can start working on the portion of data that any prior process has finished, instead of having to wait until the prior process is complete. This may not always be possible, but you can dramatically improve performance by designing your program with this in mind.