Tuning For Better Application Throughput

Every application has a unique behavior and has its own unique requirements on the JVM for gaining maximum application throughput. The “out of the box” behavior of the Oracle JRockit JVM gives good performance for most applications. You can however often tune the JVM further to gain some extra application throughput, which means that the application will run faster.

This chapter describes how to tune the JRockit JVM for improved application throughput. It includes information on the following subjects:

Measuring Your Application’s Throughput

In this document “application throughput” denotes the speed at which a Java application runs. If your application is a transaction based system, high throughput means that more transactions are executed during a given amount of time. You can also measure the throughput by measuring how long it takes to perform a specific task or calculation.

To measure the throughput of your application you need a benchmark. The benchmark should simulate several realistic use cases of the application and run long enough to allow the JVM to warm up and perform several garbage collections. You also need a way to measure the results, either by timing the entire run of a specific set of actions or by measuring the number of transactions that can be performed during a specific amount of time. For an optimal throughput assession, the benchmark should run on high load and not depend on any external input like database connections.

When you have a benchmark set up, you can monitor the behavior of the JVM using one of the following methods:

Create a runtime analysis with the JRockit Runtime Analyzer (JRA) in Oracle JRockit Mission Control. In the JRA tool, you can see the frequency of the garbage collections and why garbage the collections are launched. This information provides clues for memory management tuning. For information on creating and analyzing a JRA report, please refer to the online help in the Oracle JRockit Mission Control Client or the Oracle JRockit Mission Control documentation.
Create verbose outputs by using the command-line option -Xverbose; for example, -Xverbose:memdbg,gcpause,gcreport will show memory management data like garbage collection frequency and duration. From the JRockit JVM R27.1 and forward, setting -Xverbose:memdbg will also show the reason why each garbage collection was started. This will help you study the garbage collection behavior.

Now you have the tools for measuring the throughput of your Java application and can start to tune the JVM for better application throughput.

Select Garbage Collector

The first step of tuning the JRockit JVM for maximum application throughput is to select an appropriate garbage collection mode or strategy.

Dynamic Garbage Collection Mode Optimized for Throughput

This is the default garbage collection mode for the JRockit JVM. This mode selects the optimal garbage collection strategy for maximum application throughput.

Static Generational Parallel Garbage Collection

This static garbage collector is a good alternative if you do not want to use a dynamic garbage collection mode. The generational parallel garbage collector provides high throughput for applications that allocate a lot of temporary objects.

Static Single-Spaced Parallel Garbage Collection.

This is another alternative if you do not want to use a dynamic garbage collection mode. The single-spaced parallel garbage collector provides high throughput for applications that allocate mostly large objects.

Dynamic Garbage Collection Mode Optimized for Throughput

The default garbage collection mode in the JRockit JVM (assuming that you run in server mode, which is also default) tunes the memory management for maximum application throughput. Depending on the behavior of your application, it will select either a generational or non-generational parallel garbage collection strategy. It will also tune the nursery size, if the garbage collection strategy is generational.

Be aware that if you use the dynamic garbage collection mode optimized for throughput, the garbage collection pauses will not have any strict time limits. If your application is sensitive to long latencies, you should tune for low latencies rather than for maximum throughput, or find a middle path that gives you acceptable latencies.

The dynamic garbage collection mode optimized for throughput is the default garbage collector for the JRockit JVM. You can also turn it on explicitly like this:

java -XgcPrio:throughput myApplication

Static Single-Spaced Parallel Garbage Collection

If you want to use a static garbage collector, then you should use a parallel garbage collector in order to maximize application throughput. If the large/small object allocation ratio is high, then use a single-spaced garbage collector (-Xgc:singlepar). You can see the ratio between large and small object allocation if you do a JRA recording of your application.

To improve throughput by using a static garbage collector, you may also need to set other -X or -XX options to deliver that throughput.

Static Generational Parallel Garbage Collection

If you want to maximize application throughput and the large/small object allocation ratio is low, then use a generational parallel garbage collector (-Xgc:genpar). A generational parallel garbage collector might be the right choice even if the large/small object allocation ratio is high when you are using a very small nursery. You can see the ratio between large and small object allocation if you do a JRA recording of your application.

To improve throughput by using a static garbage collector, you may also need to set other -X or -XX options to deliver that throughput.

Tune the Heap Size

The default heap size starts at 64 MB and can increase up to 1 GB. Most server applications need a large heap—at least larger than 1 GB—to optimize throughput. For such applications, you will need to set the heap size manually by using the -Xms (initial heap size) and -Xmx (maximum heap size) command-line options. Setting -Xms the same size as -Xmx has regularly shown to be the best configuration for improving throughput; for example:

java -Xms:2g -Xmx:2g myApp

For more information on setting the initial and maximum heap sizes, including guidelines for setting these values, please see Optimizing Memory Allocation Performance.

Manually Tune the Nursery Size

The nursery—or young generation —is the area of free chunks in the heap where objects are allocated when running a generational garbage collector (-XgcPrio:throughput, -Xgc:genpar or -Xgc:gencon). A nursery is valuable because most objects in a Java application die young. Collecting garbage from the young space is preferable to collecting the entire heap, as it is a less expensive process and most objects in the young space will already be dead when the garbage collection is started.

If you are using a generational garbage collector you might need to change the nursery setting to accommodate more young objects.

-XgcPrio:throughput and -Xgc:genpar will change the nursery size dynamically in runtime. -XgcPrio:throughput might even turn off the nursery (that is, switch to a single generational garbage collector). In some cases manual tuning might result in a more efficient nursery size.
-Xgc:gencon has a fairly low and static nursery size setting. For many applications, you may want to tune the nursery size manually when using this garbage collector.

An efficient nursery size is such that the amount of memory freed by young collections (garbage collections of the nursery) rather than old collections (garbage collections of the entire heap) is as high as possible. To achieve this, you should set the nursery size close to the size of half of the free heap after an old collection.

To set the nursery size manually, use the -Xns command-line option; for example:

Manually Tune Compaction

Compaction is the process of moving chunks of allocated space toward the lower end of the heap, helping to create contiguous free memory at the other end. The JRockit JVM does partial compaction of the heap at each old collection.

The default compaction setting for static garbage collectors (-Xgc or -XXsetGC) use a dynamic compaction scheme that tries to avoid “peaks” in the compaction times. This is a compromise between keeping garbage collection pauses even and maintaining a good throughput, so it doesn't necessarily give the best possible throughput. Tuning the compaction can pay off well, depending on the application's characteristics.

There are two ways to tune the compaction for better throughput; increasing the size of the compaction area and increasing the compact set limit. Increasing the size of the compaction area will help reduce the fragmentation on the heap. Increasing the compact set limit will implicitly allow larger areas to be compacted at each garbage collection. This reduces the garbage collection frequency and makes allocation of large objects faster, thus improving the throughput.

Tune the Thread-Local Area Size

Thread Local Areas (TLAs) are chunks of free memory used for object allocation. The TLAs are reserved from the heap and given to the Java threads on demand, so that the Java threads can allocate objects without having to synchronize with the other Java threads for each object allocation.

Increasing the preferred TLA size speeds up allocation of small objects when each Java thread allocates a lot of small objects, as the threads won’t have to synchronize to get a new TLA as often.

In Oracle JRockit JVM R27.3 and later releases the preferred TLA size also determines the size limit for objects allocated in the nursery. Increasing the TLA size will thus also allow larger objects to be allocated in the nursery, which is beneficial for applications that allocate a lot of large objects. In older versions you need to set both the TLA size and the Large Object Limit to allow larger objects to be allocated in the nursery. A JRA recording will show you statistics on the sizes of large objects allocated by your application. For good performance you can try setting the preferred TLA size at least as large as the largest object allocated by your application.

Diagnostics Guide