Use these guidelines for configuring your environment to use WebLogic Real Time, Core Edition.
Ensure that CPUs are not at maximum capacity out on servers or clients If an application takes a majority of the CPU, then the deterministic GC performance may actually degrade the average latency. The reason is that deterministic GC will do continuous GC and the GC will be competing with the application for CPU cycles. It is best that the CPU is not fully utilized to get the best latency. A best practice is to run your benchmarks at various loads (with and without deterministic GC) to determine the optimal load.
Too many active threads can cause increased latency due to context switching The "sweet-spot" number is generally one thread per virtual CPU (i.e., counting dual-core and HyperTransport as separate CPUs), but leaving one CPU free for background GC work. However, if you make external calls (e.g., to a database), then it does make sense to allocating a few extra threads to utilize idle cycles.
Use these guidelines when designing your applications for WebLogic Real Time, Core Edition.
Understand your application code and how to measure latency.
Avoid making synchronous calls to slow back-office systems as part of a transaction as this defeats the purpose of real-time. Conversely, make sure any non-critical calls are handled asynchronously through work thread pools, or by using JMS.
Minimize memory allocation. If possible, allocate and free memory for a single transaction in a chunk as this helps avoid fragmentation of the Java heap. Also, minimize the amount and size of your objects.
Control memory utilization by avoiding rampant memory allocation and allocating many large arrays.
Free all objects as soon as possible; otherwise, objects that become unreferenced during a garbage collection might still be marked alive if they where referenced when the DetGC marked all live objects.
Avoid long critical sections in your code, as synchronized blocks of Java code may cause a transaction to block.
Avoid long linked structures; the deterministic GC needs to iterate through these objects.
If transactions span more than one highly-active JVM, each such JVM may need to run Deterministic GC. For example, if a transaction is initiated by a Java client JVM, and the transaction includes both JMS server and J2EE server operations, all three JVMs may require Deterministic GC to reliably meet maximum latency criteria.
J2EE Application Tuning
Use these guidelines when tuning your J2EE applications for WebLogic Real Time, Core Edition.
For server-side EJBs, MDBs, and Servlets ensure that there are enough concurrent instances configured to respond immediately to client requests (if all instances are active, this is a sign that client requests are queuing up behind each-other on the server).
Make sure that resource pools contain enough instances so that threads are not forced to wait for resources. In J2EE for example, tune the EJB max-beans-in-free-pool property and tune thread pool sizes
JMS Application Tuning
Use these guidelines when using WebLogic JMS applications with WebLogic Real Time, Core Edition.
Consider using asynchronous consumers rather than synchronous consumers.
Tune all JMS connection factory Messages Maximum settings to 1. This can potentially provide better latency at the expense of possibly lowering throughput. Similarly, configure your MDBs to refer to a custom connection factory with the following settings:
For consumers of non-persistent messages from queues, consider using the WebLogic JMS WLSession NO_ACKNOWLEDGE extension.
Ensure that your Spring JMS Templates leverage resource reference pooling (otherwise, they negatively impact response times as they implicitly create and close JMS connections, sessions, and producers once per message).
Resource reference pooling is not suitable if the target destination changes with each call, in which case change application code to use regular JMS and cache the JMS connections, sessions, producers, and consumers.
JVM Tuning for Real-Time Applications
These tuning suggestions can further improve performance and decrease pause times when using the JRockit deterministic garbage collector. For more information on the deterministic garbage collector, see
Minimal Transaction Latency (The Deterministic Garbage Collector) in the BEA JRockit Diagnostics Guide.
Allow For a Warm-up Period
There may be a warm-up period before response times achieve desired levels. During this warm-up, JRockit will optimize the critical code paths. The warm-up period is application and hardware dependent, as follows:
For smaller applications (in terms of amount of Java code) with high loads that are running on fast hardware, there may be a warm-up period of one-to-three minutes.
For large applications (in terms of amount of Java code) with low loads that are running on slow hardware (in particular, most SPARC hardware), there may be a warm-up period of approximately thirty minutes.
Adjust Min/Max Heap Sizes
Setting the minimum heap size (-Xms) smaller or the maximum heap size (-Xmx) larger affects how often garbage collection will occur and determines the approximate amount of live data an application can have. To begin with, try using the following heap sizes:
If you specify -Xgcprio:deterministic without the pauseTarget option, it will be set to a default value, which in this release is 30 milliseconds.
Running on slower hardware with a different heap size and/or with more live data may break the deterministic behavior. In these cases, you might need to increase the default pause time target (30 milliseconds) by using the -XpauseTarget option. The maximum allowable value for the pauseTarget option is currently 5000 milliseconds.
Conversely, if you want to test your application for the lowest possible pause time, you can lower the default -XpauseTarget value down to a minimum value. In this release, the minimum value is 10 milliseconds.
Increasing the page size (-XXlargePages) can increase performance and lower pause times by limiting cache misses in the translation look-aside buffer (TLB). See -XX Command-line Options in the BEA JRockit Reference Manual.
Determine Optimal Load
Do not be overcautious in terms of load. The deterministic garbage collector can handle a fair amount of load without breaking its determinism guarantees. Too little load means the JVM's optimizer and GC heuristics have too little information to work with, resulting in sub-par performance. A best practice is to run your benchmarks at various loads (with and without deterministic GC) to determine the optimal load.
Analyze GC With JRockit Verbose Output
JRockit verbose output normally doesn't incur a measurable performance impact, and is quite useful for analyzing JVM memory and GC activity. Table 2-1 defines recommended verbose options for analyzing JVM memory and GC activity.
Try to limit the amount of Finalizers and reference objects that are used, such as Soft-, Weak-, and Phantom- references. These types require special handling, and if they occur in large numbers then pause times can become longer than 30ms.
Adjust the GC Trigger
Try adjusting the garbage collection trigger (-XXgctrigger) to limit the amount of heap space used. This way, you can force the garbage collection to trigger more frequent garbage collections without modifying your applications. The garbage collection trigger is somewhat deterministic, since garbage collection starts each time the trigger limit is hit. See
Adjust the Garbage Collection Trigger in the BEA JRockit Diagnostics Guide.
If the trigger value is set to low, the heap might get full before the garbage collection is finished, causing even longer pauses for threads since they have to wait for the garbage collection to complete before getting new memory. Typically, memory is always available since a portion of the heap is free and any pauses are just the small pauses when the garbage collection stops the Java application.
Adjust the Amount of GC Threads for Processors
With the variety of sophisticated processing hardware currently available (HyperTransport, Strands, Dual Core, etc.), JRockit may not be able to determine the appropriate number of GC threads it should start. The current recommendation is to start one thread per physical CPU; that is, one thread per chip not per core. However, having too many GC threads could affect the latency of applications since more threads will be running on the system, which might saturate the CPUs, and thus affect the Java application. Conversely, setting them too low could increase the mark phase of the GC, since less parallelism is possible. For example, on a dual core Intel Woodcrest machine with four cores the recommended number of GC threads is two, which is the same as the number of processors in the machine.
To see how many GC threads that JRockit uses on your machine, start JRockit with -verbose:memdbg and then check for the following lines that are printed during startup:
[memdbg ] number of oc threads: <num> [memdbg ] number of yc threads: <num>
If necessary, adjust the number of GC threads using the -XXgcthreads:<# threads> parameter.