Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Application Server Standard and Enterprise Edition 7 2004Q2 Performance and Tuning Guide 

Chapter 4
Tuning the Java Runtime System

The Solaris operating environment, by default, supports a two level thread model (up to Solaris 8). Application level Java threads are mapped to user level Solaris threads, which are multiplexed on a limited pool of light weight processes (LWPS). Often, we need only as many LWPS as there are processors on the system, leading to conserved kernel resources and greater system efficiency. This helps when there are hundreds of user level threads. Fortunately (or unfortunately), you can choose from multiple threading models and different methods of synchronization within the model, but this varies from VM to VM. Adding to the confusion, the threads library will be transitioning from Solaris 8 to 9, eliminating many of the choices. Although we have a 2 level model, in the 1.4 VM, we have an effectively 1-to-1 thread/lwp model since the VM used LWP based sync by default.

The following topics are discussed in this module:


Using Alternate Threads

You can try to load the alternate libthread.so in /usr/lib/lwp/ on Solaris 8 by changing your LD_LIBRARY_PATH to include /usr/lib/lwp before /usr/lib. Both give better throughput and system utilization for certain applications; especially those using fewer threads.

By default, the Application Server uses /usr/lib/lwp. You can change the default settings to not use the LWP by removing /usr/lib/lwp from the LD_LIBRARY_PATH in the startserv script, but should be avoided unless required.

For applications using many threads, /usr/lib/libthread.so is the best library to use. Of course, see using -Xconcurrentio for applications with many threads as this will not only turn on LWP based sync, the default in 1.4, but also turn off TLABS, or thread local allocation buffers, which can chew up the heap and cause premature gcs.

To further examine the threading issues on Solaris with Java, see http://java.sun.com/docs/hotspot/threads/threads.html

For additional information on tuning the HotSpot JVM, see HotSpot Virtual Machine Tuning Options.


Managing Memory and Allocation

The efficient running of any application depends on how well memory and garbage collection are managed. The following sections provide information on optimizing memory and allocation functions:

Tuning the Garbage Collector

Garbage collection reclaims the heap space previously allocated to objects no longer needed.The process of locating and removing the dead objects can stall any application while consuming as much as 25 percent throughput.

Almost all Java Runtime Environments come with a generational object memory system and sophisticated garbage collection algorithms. A generational memory system divides the heap into a few carefully sized partitions called generations. As these objects accumulate a low memory condition occurs forcing garbage collection to take place. The efficiency of a generational memory system is based on the observation that most of the objects are short lived.The heap space is divided into the old and the new generation.

The new generation includes the new object space (Eden), and two survivor spaces. New objects allocate in eden. Longer lived objects are moved from the new generation and tenured to the old generation.

The young generation uses a fast copying garbage collector which employs two semi-spaces (survivor spaces) in the eden, copying surviving objects from one survivor space to the second. Objects that survive multiple young space collections are tenured -- copied to a tenured generation. The tenured generation is larger and fills up less quickly. So, it is garbage collected less frequently; and each collection takes longer than a young space only collection. Collecting the tenured space is also referred to as doing a full Generation Collection (GC).

The frequent young space collections are quick (few milliseconds), and the occasional full GC takes a relatively longer time (tens of milliseconds to even a few seconds, depending upon the heap size).

Other garbage collection algorithms, such as the Train algorithm, are incremental. They chop down the full GC into several incremental pieces. This provides a high probability of small garbage collection pauses even when full GC takes affect. This comes with an overhead and is not required for enterprise web applications.

When the new generation fills up, it triggers a minor collection in which the surviving objects are moved to the old generation. When the old generation fills up, it triggers a major collection which involves the entire object heap.

Both HotSpot and Solaris JDK use thread local object allocation pools for lock-free, fast, and scalable object allocation. User application level object pooling may have been more beneficial running on earlier generation Java Virtual Machines. Consider pooling only if the object construction cost is very high and shows up being significant in the execution profiles.

Choosing the Garbage Collection Algorithm

Long pauses during a Full Garbage Collection spanning more than 4 seconds can produce intermittent failures in persisting Session data into HADB.

While garbage collection is going on, the Application Server isn’t running. If the pause is long enough, the HADB times out the existing connections. Then, when the application server resumes its activities, the HADB generates errors when the application server attempts to use those connections to persist session data (ferrors like, "Failed to store session data", “Transaction Aborted”, or "Failed to connect to HADB server".)

To prevent that problem, use the CMS collector as the garbage collection algorithm. This Collector may cause a drop in thruput for heavily utilized systems, because it is running more or less constantly. But it prevents the long pauses that can occur when the garbage collector runs infrequently.

To use the CMS collector:

  1. Make sure that the system isn;t maxed out with respect to CPU utlitization.
  2. Configure HADB time outs, as decribed in the System Deployment Guide, Appendix X.
  3. Configure the CMS collector in the server instance by adding the following entry to server.xml:
  4. <jvm-options>
    -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=1
    </jvm-options>

Additonal Information

Use the jvmstat utility to monitor Hotspot garbage collection. (See HotSpot Virtual Machine Tuning Options.)

For detailed information on tuning the garbage collector, see http://java.sun.com/docs/hotspot/gc1.4.2/index.html

Tracing Garbage Collection

Two primary measures of garbage collection performance are Throughput and Pauses. Throughput is the percentage of the total time spent on other activities apart from garbage collection.

Pauses are times when an application appears unresponsive due to garbage collection. Users can have different requirements of garbage collection. A server centric application may consider throughput to be a metric, but a short pause may upset a graphical program. There are two other considerations - Footprint and promptness.

Footprint

Footprint is the working set of a process, measured in pages and cache lines. Promptness is the time between when an object becomes dead, and when the memory becomes available. This is an important consideration for distributed systems.

A particular generation sizing chooses a trade-off between these considerations. A very large young generation may maximize the throughput, but does so at the expense of footprint and promptness. Pauses can be minimized by using a small young generation and incremental collection.

Pauses due to Generation Collection are inspected by diagnostic output of the Java Virtual Machine. The command line argument "-verbose:gc" prints information at every collection. Below is a sample output of the information generated when this flag is passed to the Java Virtual Machine.

[GC 50650K->21808K(76868K), 0.0478645 secs]
[GC 51197K->22305K(76868K), 0.0478645 secs]
[GC 52293K->23867K(76868K), 0.0478645 secs]
[Full GC 52970K->1690K(76868K), 0.54789968 secs]

The numbers before and after the arrows indicate the combined size of live objects before and after the collection. The number in the parenthesis is the total available space, which is the total heap minus one of the survivor spaces. In this sample, there are three minor collections and one major collection. In the first GC, 50650 KB of objects existed before collection and 21808 KB of objects after collection. This means that 28842 KB of objects were dead and collected. The total heap size is 76868 KB. The collection process required 0.0478645 seconds.

Other useful monitoring options include:

Specifying Other Garbage Collector Settings

For applications which do not dynamically generate and load classes, permanent generation is not relevant to the GC performance. For applications which dynamically generate and load classes (JSP's), the permanent generation is relevant to the GC performance, as filling the permanent generation can trigger a Full GC. The maximum permanent generation can be tuned with -XX:MaxPermSize option.

Applications can interact with the garbage collection by invoking collections explicitly through the System.gc() call. But, relying on the application to manage the resources is a bad idea since these force major collections, and inhibit scalability on large systems. This can be disabled by using the flag -XX:+DisableExplicitGC.

The Application Server uses RMI in the Administration module for monitoring. Garbage cannot be collected in RMI based distributed applications without occasional local collections, so RMI forces a periodic full collection. The frequency of these collections can be controlled with the property -sun.rmi.dgc.client.gcInterval. For example, - java -Dsun.rmi.dgc.client.gcInterval=3600000 specifies explicit collection once per hour instead of the default rate of once per minute.

To specify the attributes for the Java virtual machine:

Tuning the Java Heap

This section discusses topics related to tuning the Java Heap for performance.

Guidelines for Java Heap Sizing

Maximum heap size depends on maximum address space per process. Here are the maximum per-process address values for the various platforms:

x86 / Redhat Linux 32 bit 2 GB
x86 / Redhat Linux 64 bit 3 GB
x86 / Win98/2000/NT/Me/XP 2 GB
x86 / Solaris x86 (32 bit) 4 GB
Sparc / Solaris 32 bit 4 GB
Sparc / Solaris 64 bit terabytes

Of course, the maximum heap space is always smaller than maximum address space per process, because the process also needs space for stack, libraries, and so on. To determine the maximum heap space that can be allocated, use a profiling tool to examine the way memory is used. Gauge the maximum stack space the process uses and the amount of memory taken up libraries and other memory structures. The difference between the maximum address space and the total of those values is the amount of memory that can be allocated to the heap.

Heap Tuning Parameters

You control the heap size using a variety of parameters.

The -Xms and -Xmx parameters defines the minimum and maximum heap sizes. Since collections occur when the generations fill up, throughput is inversely proportional to the amount of the memory available. By default, the JVM grows or shrinks the heap at each collection to try to keep the proportion of free space to the living objects at each collection within a specific range. This range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and -XX:MaxHeapFreeRatio=<maximum>; and the total size bounded by -Xms and -Xmx.

Server side applications set the values of -Xms and -Xmx equal to each other for a fixed heap size. When the heap grows or shrinks, the JVM must recalculate the old and new generation sizes to maintain a predefined NewRatio.

The NewSize and MaxNewSize parameters control the new generation's minimum and maximum size. You can regulate the new generation size by setting these parameters equal. The bigger the younger generation, the less often minor collections occur. By default, the young generation is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the old and young generation is 1:3, the combined size of eden and the survivor spaces will be fourth of the heap. In order to be safe, set the NewSize/MaxNewSize to the same value.

By default, the Application Server is invoked with the Java HotSpot Server JVM. The default NewRatio for the Server JVM is 2, the old generation occupies 2/3 of the heap while the new generation occupies 1/3. The larger new generation can accommodate many more short lived objects, decreasing the need for slow major collections. The old generation is still sufficiently large enough to hold many long-lived objects.

The following are important guidelines for sizing Java heap.

Survivor Ratio Sizing

The SurvivorRatio parameter controls the size of the two survivor spaces. For example, -XX:SurvivorRatio=6 sets the ratio between each survivor space and eden to be 1:6, each survivor space will be one eighth of the young generation. With JDK 1.4, the Solaris default is 32. If survivor spaces are too small, copying collection overflows directly into the old generation. If survivor spaces are too large, they will be empty. At each garbage collection, the JVM chooses a threshold number of times an object can be copied before it is tenured.

This threshold is chosen to keep the survivors half full.

The option -XX:+PrintTenuringDistribution can be used to show the threshold and ages of the objects in the new generation. It is useful for observing the lifetime distribution of an application.

For up-to-date defaults, refer to http://java.sun.com/docs/hotspot/VMOptions.html

Sample Heap Configuration on Solaris

This is a sample heap configuration used by Application Server for heavy server-centric applications, on Solaris, as set in the server.xml file.

<jvm-options> -Xms3584m </jvm-options>

<jvm-options> -Xmx3584m </jvm-options>

<jvm-options> -verbose:gc </jvm-options>

<jvm-options> -Dsun.rmi.dgc.client.gcInterval=3600000 </jvm-options>

HotSpot Virtual Machine Tuning Options

HotSpot is a just-in-time byte-code compiler for Java applications that interprets seldom-used code to avoid the overhead of optimization. It has excellent performance characteristics straight out of the box, but it also has a variety of tuning options that can be customized to maximize performance of the Application Server.

Start by using the jvmstat utility to monitor HotSpot, so you can see where improvements are needed and see the effect of changes on performance:

Then consult the following guides to tune for maximum results:



Previous      Contents      Index      Next     


Copyright 2004 Sun Microsystems, Inc. All rights reserved.