These areas of the Java system interface level, where tuning can often result in significant performance gains, are discussed here:
I/O
Strings
Arrays
Vectors
Painting/drawing
Hashing
Images
Memory usage
Threads
The optimizations for these compilers are listed:
Java compiler
JIT compiler
Code tuning in these areas may be used to increase performance:
Loops
Convert expr to table lookup
Caching
Result pre-computation
Lazy evaluation
Class vs. object initialization
The biggest and most common performance problem in Java applications is often inefficient I/O. Therefore, I/O issues should generally be the first thing to look at when performance-tuning a Java application. Fixing these problems often results in greater performance gains than all the other possible optimizations combined. It is not unusual to see a speed improvement of one order of magnitude achieved by using efficient I/O techniques.
If an application performs a significant amount of I/O, then it is a candidate for I/O performance tuning. This conclusion can be confirmed by profiling the application. To learn how to profile an application, you can use the Java WorkShop (JWS) product. JWS can be obtained from:
Select Help->Help Contents, and click on Profiling Projects. This example involves running a benchmark test reading a 150,000-line file using four different methods:
DataInputStream.readLine() alone (unbuffered).
DataInputStream.readLine() with a BufferedInputStream underneath, which has a buffer size of 2048 bytes.
BufferedReader.readline() with a buffer size of 8192 bytes.
BufferedFileReader(fileName).
The results were as follows: (times in seconds) :
DataInputStream: | 178.740 |
DataInputStream(BufferedInputStream): | 21.559 |
BufferedReader | 11.150 |
BufferedFileReader | 6.991 |
Note that methods 1 and 2 do not properly handle Unicode characters, while methods 3 and 4 handle them correctly. This makes methods 1 and 2 unacceptable for most product uses. Also, DataInputStream.readLine() is deprecated as of JDK 1.1. Method 1 is used in JWS and other programs.
Another way to spot Solaris I/O problems is to use truss(1) to look for read(1) and write(1) system calls.
When using strings, the most important thing to remember is to use char arrays for all character processing in loops, instead of using the String or StringBuffer classes. Accessing an array element is much faster than using the charAt() method to access a character in a string. Also, remember that string constants ("...") are already string objects.
//DON'T
String s = new String("hello");
//DO
String s = "hello";
In addition:
class String
Do not use this class for mutable strings, character processing, or charAt()method inside a loop.
class StringBuffer
Use this class only when a string is mutable, accessed concurrently by multiple threads, and no character processing is performed. Do not use for immutable strings, character processing, or charAt(), setCharAt() methods inside a loop. The default string size is 16 characters. This class is automatically used by the compiler for string concatenation Set the initial buffer size to the maximum string length, if it is known.
class StringTokenizer
This class is useful for simple parsing or scanning, but very inefficient. It can be optimized by storing the string and delimiter in a character array instead of in String, or by storing the highest delimiter character to allow a quicker check. This will result in a 1.6x to 10x performance increase (2.4x is typical), depending on the delimiter list and target string.
Arrays are bounds-checked, which will degrade performance. However, accessing arrays is much faster than accessing Vector, String, and StringBuffer. Use System.arraycopy() to improve performance. This is a native method, and much faster than manual array processing.
Vector is convenient to use, but inefficient. For best performance, use it only when the structure size is unknown, and efficiency is not a concern. When using Vector, ensure that elementAt() is not used inside a loop, as performance will degrade. Use Vector only when you have an array with the following characteristics:
Accessed concurrently by multiple threads
Dynamic size
HashTable has these tunable parameters:
Capacity (usually a prime number), initialCapacity; if this is not set large enough, collisions will result, causing hashing to stop and linear list processing to be executed afterwards.
Load factor (0.0-1.0), loadFactor, which is a percentage of capacity beyond which the table will expand. HashTable calls hashCode(). These classes have pre-defined hashCode() methods:
Color, Font, Point
File
Boolean, Byte, Character, Double, Float, Integer, Long, Short, String
URL
BitSet, Date, GregorianCalendar, Locale, SimpleTimeZone. Note that String.hashCode() does not always sample all the characters, depending on the length:
Length from 1 to 15: all n Length from 16 to 23: every other character
Length from 24 to 31: every third character
And so on.
To improve performance in these areas, use the following techniques:
Double buffering (for instance, for animation, draw the image off-screen and load all at once) .
Overriding the default, update()
public void update(Graphics g) { paint(g); } |
Custom layout managers. If you want custom behavior, GUI performance is best if you write your own.
Events. The JDK 1.1 has a more efficient event model than JDK 1.0.
Repaint only the damaged regions (use ClipRect).
To improve (asynchronous) loading performance, use your own imageUpdate() method to override imageUpdate(). imageUpdate() can cause more repainting than desired.
//wait for the width information to be loaded while (image.getWidth(null) == -1 { try { Thread.sleep(200); } catch(InterruptedException e) { } } if (!haveWidth) { synchronized (im) { if (im.getWidth(this) == -1) { try { im.wait(); } catch (InterruptedException) { } } } //If we got this far, the width is loaded, we will never go thru // all that checking again. haveWidth = true; } ... public boolean imageUpdate(Image img, int flags, int x, int y, int width, \ int height) { boolean moreUpdatesNeeded = true; if ((flags&ImageObserver.WIDTH)!= 0 { synchronized (img) { img.notifyAll(); moreUpdatesNeeded = false; } } return moreUpdatesNeeded; } |
Pre-decoding and storing the image in an array will improve performance. Image decoding time is greater than loading time. Pre-decoding using PixelGrabber and MemoryImageSource should combine multiple images into one file for maximum speed. These techniques are more efficient than polling.
You can dramatically improve application performance by reducing the amount of garbage collection performed during execution. The following practices can also increase performance:
Increase the initial heap size from the 1MByte default with
java -ms number . java -mx number .
The default maximum heap size is 16 MBytes.
Find areas where too much memory is being used with
java -verbosegc
Take size into account when allocating arrays (for instance, if short is big enough, use it instead of int).
Avoid allocating objects in loops (readLine() is a common example)
As discussed in "Java Threads In The Solaris Environment - Earlier Releases* ", performance is increased dramatically by using native threads. Green threads are not time-sliced and may require calls to Thread.yield()
in loops, slowing execution. Other techniques to avoid:
Overuse of synchronization increases the possibility of deadlock (due to coding errors) and increases the likelihood of delays due to lock contention. Also, the overhead of synchronizing might frequently overcome the advantages. Minimizing synchronization may take work, but can pay off well.
Polling: it is only acceptable when waiting for outside events and should be performed in a "side" thread. Use wait()/notify() instead.
The following compilers automatically perform the listed optimizations.
Inlining
Constant folding
Elimination of some array bounds checking.
Elimination of common sub-expressions within blocks
Empty method elimination
Some register allocation for locals
No flow analysis
Limited inlining
Use these techniques for performance improvements:
Move loop invariants outside the loop.
Make the tests as simple as possible.
Use only local variables inside a loop; assign class fields to local variables before the loop.
Move constant conditionals outside loops.
Combine similar loops.
If loops are interchangeable, nest the busiest one.
As a last resort, unroll the loop.
expr
to Table LookupWhen a value is being selected based on a single expression with a range of small integers, convert it to a table lookup. Conditional branches defeat many compiler optimizations.
Though caching takes more memory, it can be used for performance improvement. Use the technique of caching values that are expensive to fetch or compute.
Increase performance by precomputing values known at compile time.
Save startup time by delaying computation of results until they are needed.
Speed performance up by putting all one-time initializations into a class initializer.