3 Tuning Real Time Applications for Deterministic Garbage Collection

This section contains the following guidelines for tuning your applications for the Oracle JRockit JVM deterministic garbage collector that is included with Oracle JRockit Real Time 4.0.

Note:

For more information on adjusting other non-standard start-up commands available with JRockit, see the Oracle JRockit Command-Line Reference on the Oracle Technology Network.

3.1 Basic Environment Tuning

Use these guidelines for configuring your environment to use JRockit Real Time.

  • Ensure that CPUs are not at maximum capacity out on servers or clients

    If an application takes a majority of the CPU, then the deterministic GC performance may actually degrade the average latency. The reason is that deterministic GC will do continuous GC and the GC will be competing with the application for CPU cycles. It is best that the CPU is not fully utilized to get the best latency. A best practice is to run your benchmarks at various loads (with and without deterministic GC) to determine the optimal load.

  • Too many active threads can cause increased latency due to context switching

    The "sweet-spot" number is generally one thread per virtual CPU (i.e., counting dual-core and HyperTransport as separate CPUs), but leaving one CPU free for background GC work. However, if you make external calls (e.g., to a database), then it does make sense to allocating a few extra threads to utilize idle cycles.

    For information on tuning JRockit garbage collection threads, see Section 3.5.9, "Adjust the Amount of Garbage Collection Threads for Processors".

3.2 Basic Application Tuning

Use these guidelines when designing your applications for JRockit Real Time.

  • Understand your application code and how to measure latency.

  • Avoid making synchronous calls to slow back-office systems as part of a transaction as this defeats the purpose of real-time. Conversely, make sure any non-critical calls are handled asynchronously through work thread pools, or by using JMS.

  • Minimize memory allocation. If possible, allocate and free memory for a single transaction in a chunk as this helps avoid fragmentation of the Java heap. Also, minimize the amount and size of your objects.

  • Control memory utilization by avoiding rampant memory allocation and allocating many large arrays.

  • Free all objects as soon as possible; otherwise, objects that become unreferenced during a garbage collection might still be marked alive if they where referenced when the DetGC marked all live objects.

  • Avoid long critical sections in your code, as synchronized blocks of Java code may cause a transaction to block.

  • Avoid long linked structures; the deterministic GC needs to iterate through these objects.

  • If transactions span more than one highly-active JVM, each such JVM may need to run Deterministic GC. For example, if a transaction is initiated by a Java client JVM, and the transaction includes both JMS server and J2EE server operations, all three JVMs may require Deterministic GC to reliably meet maximum latency criteria.

3.3 J2EE Application Tuning

Use these guidelines when tuning your J2EE applications for JRockit Real Time.

  • For server-side EJBs, MDBs, and Servlets ensure that there are enough concurrent instances configured to respond immediately to client requests (if all instances are active, this is a sign that client requests are queuing up behind each-other on the server).

  • Make sure that resource pools contain enough instances so that threads are not forced to wait for resources. In J2EE for example, tune the EJB max-beans-in-free-pool property and tune thread pool sizes

3.4 JMS Application Tuning

Use these guidelines when using Oracle WebLogic JMS applications with JRockit Real Time.

  • Consider using asynchronous consumers rather than synchronous consumers.

    For more information on JMS consumers, see "Application Design" in Programming WebLogic JMS.

  • Tune all JMS connection factory Messages Maximum settings to 1. This can potentially provide better latency at the expense of possibly lowering throughput. Similarly, configure your MDBs to refer to a custom connection factory with the following settings:

    • Messages Maximum = 1

    • XA Connection Factory Enabled = enabled

    • Client Acknowledge Policy = ACKNOWLEDGE_PREVIOUS

  • For more information on configuring JMS connection factories, see the Administration Console Online Help.

  • For consumers of non-persistent messages from queues, consider using the WebLogic JMS WLSession NO_ACKNOWLEDGE extension.

  • For improved non-persistent messaging performance, consider using the WebLogic JMS "One-Way Send Mode" feature available with WebLogic Server 9.2. See "Tuning WebLogic JMS" in WebLogic Server Performance and Tuning.

Ensure that your Spring JMS Templates leverage resource reference pooling (otherwise, they negatively impact response times as they implicitly create and close JMS connections, sessions, and producers once per message).

Note:

Resource reference pooling is not suitable if the target destination changes with each call, in which case change application code to use regular JMS and cache the JMS connections, sessions, producers, and consumers.

3.5 JVM Tuning for Real-Time Applications

These tuning suggestions can further improve performance and decrease pause times when using the JRockit JVM deterministic garbage collector. For more information on the deterministic garbage collector, see the Oracle JRockit Performance Tuning Guide available on the Oracle Technology Network.

3.5.1 Allow For a Warm-up Period

There may be a warm-up period before response times achieve desired levels. During this warm-up, JRockit JVM will optimize the critical code paths. The warm-up period is application and hardware dependent, as follows:

  • For smaller applications (in terms of amount of Java code) with high loads that are running on fast hardware, there may be a warm-up period of one-to-three minutes.

  • For large applications (in terms of amount of Java code) with low loads that are running on slow hardware (in particular, most SPARC hardware), there may be a warm-up period of approximately thirty minutes.

3.5.2 Adjust Min/Max Heap Sizes

Setting the minimum heap size (-Xms) smaller or the maximum heap size (-Xmx) larger affects how often garbage collection will occur and determines the approximate amount of live data an application can have. To begin with, try using the following heap sizes:

   java -Xms1024m -Xmx1024m -XgcPrio:deterministic -XpauseTarget=30

For more information, see "-XX Command-line Options" in the Oracle JRockit Command Line Reference.

3.5.3 Increase or Decrease Pause Targets

  • If you specify -Xgcprio:deterministic without the pauseTarget option, it will be set to a default value, which in this release is 30 milliseconds.

  • Running on slower hardware with a different heap size and/or with more live data may break the deterministic behavior. In these cases, you might need to increase the default pause time target (30 milliseconds) by using the -XpauseTarget option. The maximum allowable value for the pauseTarget option is currently 5000 milliseconds.

  • Conversely, if you want to test your application for the lowest possible pause time, you can lower the default -XpauseTarget value down to a minimum value. In this release, the minimum value is 10 milliseconds.

For more information, see "-XX Command-line Options" in the Oracle JRockit Command Line Reference.

3.5.4 Set the Page Size

Increasing the page size (-XXlargePages) can increase performance and lower pause times by limiting cache misses in the translation look-aside buffer (TLB). See -XX Command-line Options in the Oracle JRockit JVM Command-Line Reference.

3.5.5 Determine Optimal Load

Do not be overcautious in terms of load. The deterministic garbage collector can handle a fair amount of load without breaking its determinism guarantees. Too little load means the JVM's optimizer and GC heuristics have too little information to work with, resulting in sub-par performance. A best practice is to run your benchmarks at various loads (with and without deterministic GC) to determine the optimal load.

3.5.6 Analyze GC With JRockit Verbose Output

JRockit JVM verbose output normally doesn't incur a measurable performance impact, and is quite useful for analyzing JVM memory and GC activity. Table 3-1 defines recommended verbose options for analyzing JVM memory and GC activity.

Table 3-1 JRockit JVM Verbose Output Options

Option What it does...

-Xverbose:opt,memory,memdbg, gcpause,compact,license

For GC and memory analysis.

-Xverboselog:verbose-jrockit.log

Redirects verbose output to the designated file.

-Xverbosetimestamp

Prints a formatted date before each verbose line.


3.5.7 Limit Amount of Finalizers and Reference Objects

Try to limit the amount of Finalizers and reference objects that are used, such as Soft-, Weak-, and Phantom- references. These types require special handling, and if they occur in large numbers then pause times can become longer than 30ms.

3.5.8 Adjust the Garbage Collection Trigger

Try adjusting the garbage collection trigger (-XXgctrigger) to limit the amount of heap space used. This way, you can force the garbage collection to trigger more frequent garbage collections without modifying your applications. The garbage collection trigger is somewhat deterministic, since garbage collection starts each time the trigger limit is hit. See the Oracle JRockit Performance Tuning Guide available on the Oracle Technology Network:

Note:

If the trigger value is set to low, the heap might get full before the garbage collection is finished, causing even longer pauses for threads since they have to wait for the garbage collection to complete before getting new memory. Typically, memory is always available since a portion of the heap is free and any pauses are just the small pauses when the garbage collection stops the Java application.

3.5.9 Adjust the Amount of Garbage Collection Threads for Processors

With the variety of sophisticated processing hardware currently available (HyperTransport, Strands, Dual Core, etc.), the JRockit JVM may not be able to determine the appropriate number of GC threads it should start. The current recommendation is to start one thread per physical CPU; that is, one thread per chip not per core. However, having too many garbage collection threads could affect the latency of applications since more threads will be running on the system, which might saturate the CPUs, and thus affect the Java application. Conversely, setting them too low could increase the mark phase of the GC, since less parallelism is possible. For example, on a dual core Intel Woodcrest machine with four cores the recommended number of GC threads is two, which is the same as the number of processors in the machine.

To see how many garbage collection threads that the JRockit JVM uses on your machine, start the JRockit JVM with -verbose:memdbg and then check for the following lines that are printed during startup:

   [memdbg ] number of oc threads: <num>
   [memdbg ] number of yc threads: <num>

If necessary, adjust the number of GC threads using the -XXgcthreads:<# threads> parameter.

For more information, see "-XX Command-line Options" in the Oracle JRockit Command Line Reference:

3.6 More Tuning Information

For additional performance and tuning information, see the Oracle JRockit Performance Tuning Guide on the Oracle Technology Network.