8 Garbage-First Garbage Collector Tuning
This section describes how to adapt Garbage-First garbage collector (G1 GC) behavior in case it does not meet your requirements.
General Recommendations for G1
The general recommendation is to use G1 with its default settings, eventually giving it a different pause-time goal and setting a maximum Java heap size by using -Xmx
if desired.
G1 defaults have been balanced differently than either of the other collectors. G1's goals in the default configuration are neither maximum throughput nor lowest latency, but to provide relatively small, uniform pauses at high throughput. However, G1's mechanisms to incrementally reclaim space in the heap and the pause-time control incur some overhead in both the application threads and in the space-reclamation efficiency.
If you prefer high throughput, then relax the pause-time goal by using -XX:MaxGCPauseMillis
or provide a larger heap. If latency is the main requirement, then modify the pause-time target. Avoid limiting the young generation size to particular values by using options like -Xmn
, -XX:NewRatio
and others because the young generation size is the main means for G1 to allow it to meet the pause-time. Setting the young generation size to a single value overrides and practically disables pause-time control.
Moving to G1 from Other Collectors
Generally, when moving to G1 from other collectors, particularly the Concurrent Mark Sweep collector, start by removing all options that affect garbage collection, and only set the pause-time goal and overall heap size by using -Xmx
and optionally -Xms
.
Many options that are useful for other collectors to respond in some particular way, have either no effect at all, or even decrease throughput and the likelihood to meet the pause-time target. An example could be setting young generation sizes that completely prevent G1 from adjusting the young generation size to meet pause-time goals.
Improving G1 Performance
G1 is designed to provide good overall performance without the need to specify additional options. However, there are cases when the default heuristics or default configurations for them provide suboptimal results. This section gives some guidelines about diagnosing and improving in these cases. This guide describes only the possibilities that G1 provides to improve garbage collector performance in a selected metric, when given a set application. On a case-by-case basis, application-level optimizations could be more effective than trying to tune the VM to perform better, for example, by avoiding some problematic situations by less long-lived objects altogether.
For diagnosis purposes, G1 provides comprehensive logging. A good start is to use the -Xlog:gc*=debug
option and then refine the output from that if necessary. The log provides a detailed overview during and outside the pauses about garbage collection activity. This includes the type of collection and a breakdown of time spent in particular phases of the pause.
The following subsections explore some common performance issues.
Observing Full Garbage Collections
A full heap garbage collection (Full GC) is often very time consuming. Full GCs
causedby too high heap occupancy in the old generation can be
detected by finding the words Pause Full (G1 Compaction
Pause) in the log. Full GCs are typically
preceded by garbage collections that encounter an evacuation
failure indicated by (Evacuation Failure)
tags.
The reason that a Full GC occurs is because the application allocates too many objects that can't be reclaimed quickly enough. Often concurrent marking has not been able to complete in time to start a space-reclamation phase. The probability to run into a Full GC can be compounded by the allocation of many humongous objects. Due to the way these objects are allocated in G1, they may take up much more memory than expected.
The goal should be to ensure that concurrent marking completes on time. This can be achieved either by decreasing the allocation rate in the old generation, or giving the concurrent marking more time to complete.
G1 gives you several options to handle this situation better:
- You can determine the number of regions occupied by humongous objects on the Java heap using the
gc+heap=info
logging.Y
in the lines "Humongous regions: X->Y
” give you the amount of regions occupied by humongous objects. If this number is high compared to the number of old regions, the best option is to try to decrease this number of objects. You can achieve this by increasing the region size using the-XX:G1HeapRegionSize
option. The currently selected heap region size is printed at the beginning of the log. - Increase the size of the Java heap. This typically increases the amount of time marking has to complete.
- Increase the number of concurrent marking threads by setting
-XX:ConcGCThreads
explicitly. - Force G1 to start marking earlier. G1 automatically determines the Initiating Heap Occupancy Percent (IHOP) threshold based on earlier application behavior. If the application behavior changes, these predictions might be wrong. There are two options: Lower the target occupancy for when to start space-reclamation by increasing the buffer used in an adaptive IHOP calculation by modifying
-XX:G1ReservePercent
; or, disable the adaptive calculation of the IHOP by setting it manually using-XX:-G1UseAdaptiveIHOP
and-XX:InitiatingHeapOccupancyPercent
.
Other causes than Allocation Failure for a Full GC typically indicate that either the application or some external tool causes a full heap collection. If the cause is System.gc()
, and there is no way to modify the application sources, the effect of Full GCs can be mitigated by using -XX:+ExplicitGCInvokesConcurrent
or let the VM completely ignore them by setting -XX:+DisableExplicitGC
. External tools may still force Full GCs; they can be removed only by not requesting them.
Humongous Object Fragmentation
A Full GC could occur before all Java heap memory has been exhausted due to the necessity of finding a contiguous set of regions for them. Potential options in this case are increasing the heap region size by using the option -XX:G1HeapRegionSize
to decrease the number of humongous objects, or increasing size of the heap. In extreme cases, there might not be enough contiguous space available for G1 to allocate the object even if available memory indicates otherwise. This would lead to a VM exit if that Full GC can not reclaim enough contiguous space. As a result, there are no other options than either decreasing the amount of humongous object allocations as mentioned previously, or increasing the heap.
Tuning for Latency
This section discusses hints to improve G1 behavior in case of common latency problems that is, if the pause-time is too high.
Unusual System or Real-Time Usage
For every garbage collection pause, the gc+cpu=info
log output contains a line including information from the operating system with a breakdown about where during the pause-time has been spent. An example for such output is User=0.19s Sys=0.00s Real=0.01s
.
User time is time spent in VM code, system time is the time spent in the operating system, and real time is the amount of absolute time passed during the pause. If the system time is relatively high, then most often the environment is the cause.
Common known issues for high system time are:
- The VM allocating or giving back memory from the operating system memory may cause unnecessary delays. Avoid the delays by setting minimum and maximum heap sizes to the same value using the options
-Xms
and-Xmx
, and pre-touching all memory using-XX:+AlwaysPreTouch
to move this work to the VM startup phase. - Particularly in Linux, coalescing of small pages into huge pages by the Transparent Huge Pages (THP) feature tends to stall random processes, not just during a pause. Because the VM allocates and maintains a lot of memory, there is a higher than usual risk that the VM will be the process that stalls for a long time. Refer to the documentation of your operating system on how to disable the Transparent Huge Pages feature.
- Writing the log output may stall for some time because of some background task intermittently taking up all I/O bandwidth for the hard disk the log is written to. Consider using a separate disk for your logs or some other storage, for example memory-backed file system to avoid this.
Another situation to look out for is real time being a lot larger than the sum of the others this may indicate that the VM did not get enough CPU time on a possibly overloaded machine.
Reference Object Processing Takes Too Long
Information about the time taken for processing of Reference Objects is shown in the Reference Processing
phase. During the Reference Processing
phase, G1 updates the referents of Reference Objects according to the requirements of the particular type of Reference Object. By default, G1 tries to parallelize the sub-phases of Reference Processing
using the following heuristic: for every -XX:ReferencesPerThread
reference Objects start a single thread, bounded by the value in -XX:ParallelGCThreads
. This heuristic can be disabled by setting -XX:ReferencesPerThread
to 0 to use all available threads by default, or parallelization disabled completely by -XX:-ParallelRefProcEnabled
.
Young-Only Collections Within the Young-Only Phase Take Too Long
Normal young and, in general any young collection roughly takes time proportional to the size of the young generation, or more specifically, the number of live objects within the collection set that needs to be copied. If the Evacuate Collection Set phase takes too long, in particular, the Object Copy sub-phase, decrease -XX:G1NewSizePercent
. This decreases the minimum size of the young generation, allowing for potentially shorter pauses.
Another problem with sizing of the young generation may occur if application performance, and in particular the amount of objects surviving a collection, suddenly changes. This may cause spikes in garbage collection pause time. It might be useful to decrease the maximum young generation size by using -XX:G1MaxNewSizePercent
. This limits the maximum size of the young generation and so the number of objects that need to be processed during the pause.
Mixed Collections Take Too Long
Mixed young collections are used to reclaim space in the old generation. The
collection set of mixed collections contains young and old generation regions. You can
obtain information about how much time evacuation of either young or old generation
regionscontribute to the pause-time by enabling the gc+ergo+cset=debug
log output. Look for the following log message:
Added young regions to CSet. [...] predicted eden time: 4.86ms,
predicted base time: 9.98ms, target pause time: 200.00ms, [...]
Eden time and base time together give the predicted young region time, that is the time G1 expects evacuating the young generation will take
The log message for predicting old region time looks as follows:
Finish choosing collection set old regions. [...] predicted initial
time: 147.70ms, predicted optional time: 15.45ms, [...]
Here, predicted initial time represents predicted old region time, i.e. the time G1 expects evacuating the minimum set of old generation regions will take.
If the predicted young region time is too long, then see Young-Only Collections Within the Young-Only Phase Take Too Long for options. Otherwise, to reduce the contribution of the old generation regions to the pause-time, G1 provides three options:
-
Spread the old generation region reclamation across more garbage collections by increasing
-XX:G1MixedGCCountTarget
. -
Avoid collecting regions that take a proportionally large amount of time to collect by not putting them into the candidate collection set by using -
XX:G1MixedGCLiveThresholdPercent
. In many cases, highly occupied regions take a lot of time to collect. -
Stop old generation space reclamation earlier so that G1 won't collect as many highly occupied regions. In this case, increase
-XX:G1HeapWastePercent
.
Note that the last two options decrease the amount of collection set candidate regions where space can be reclaimed for the current space-reclamation phase. This may mean that G1 may not be able to reclaim enough space in the old generation for sustained operation. However, later space-reclamation phases may be able to garbage collect them.
Collections Occur Back to Back
G1 default MMU settings allow back-to-back garbage collections. The default value of
-XX:GCPauseIntervalMillis
is just slightly higher than
-XX:MaxGCPauseMillis
. In case you observe continuous back-to-back
garbage collections, which results in the application not progressing, increase the
value of -XX:GCPauseIntervalMillis
to an acceptable value. G1 will then
try to space out garbage collections more.
High Merge Heap Roots and Scan Heap Roots Times
One way to reduce these phases is to decrease the number of remembered set entries in the
combined remembered sets. Adjusting the size of the heap regions by using the option
-XX:G1HeapRegionSize
decreases the number of cross-region
references size of the remembered set. Larger regions tend to have fewer cross-region
references, so the relative amount of work spent in processing them decreases, although
at the same time, larger regions may mean more live objects to evacuate per region,
increasing the time for other phases.
If a significant amount of time of the garbage collection, i.e. more than
60%, is spent in these two phases, one option could be decreasing the granularity of the
remembered set entries by decreasing the value of the
-XX:GCCardSizeInBytes
option: finer granularity decreases the
amount of work to find references, at the cost of some additional memory.
Spurious high Scan Heap Roots times in combination with the application
allocating large objects may be caused by an optimization that tries to reduce
concurrent remembered set updates work by batching them. If the application that created
such a batch happens just before a garbage collection, this might have a negative impact
on Merge Heap Roots time. Use -XX:-ReduceInitialCardMarks
to disable
this optimization and potentially avoid this situation.
Tuning for Throughput
G1's default policy tries to maintain a balance between throughput and latency; however, there are situations where higher throughput is desirable. Apart from decreasing the overall pause-times as described in the previous sections, the frequency of the pauses could be decreased. The main idea is to increase the maximum pause time by using -XX:MaxGCPauseMillis
. The generation sizing heuristics will automatically adapt the size of the young generation, which directly determines the frequency of pauses. If that does not result in expected behavior, particularly during the space-reclamation phase, increasing the minimum young generation size using -XX:G1NewSizePercent
will force G1 to do that.
In some cases, -XX:G1MaxNewSizePercent
, the maximum allowed young generation size, may limit throughput by limiting young generation size. This can be diagnosed by looking at region summary output of gc+heap=info
logging. In this case the combined percentage of Eden regions and Survivor regions is close to -XX:G1MaxNewSizePercent
percent of the total number of regions. Consider increasing-XX:G1MaxNewSizePercent
in this case.
Another option to increase throughput is to try to decrease the amount of concurrent work in particular, concurrent remembered set updates often require a lot of CPU resources. Increasing -XX:G1RSetUpdatingPauseTimePercent
moves work from concurrent operation into the garbage collection pause. In the worst case, concurrent remembered set updates can be disabled by setting -XX:-G1UseAdaptiveConcRefinement
-XX:G1ConcRefinementGreenZone=
2G
-XX:G1ConcRefinementThreads=
0
. This mostly disables this mechanism and moves all remembered set update work into the next garbage collection pause.
Enabling the use of large pages by using -XX:+UseLargePages
may also improve throughput. Refer to your operating system documentation on how to set up large pages.
You can minimize heap resizing work by disabling it; set the options -Xms
and -Xmx
to the same value. In addition, you can use -XX:+AlwaysPreTouch
to move the operating system work to back virtual memory with physical memory to VM startup time. Both of these measures can be particularly desirable in order to make pause-times more consistent.
Tuning for Heap Size
Like other collectors, G1 aims to size the heap so that the time spent in garbage collection is below the ratio determined by the -XX:GCTimeRatio
option. Adjust this option to make G1 meet your requirements.
Tunable Defaults
This section describes the default values and some additional information about command-line options that are introduced in this topic.
Table 8-1 Tunable Defaults G1 GC
Option and Default Value | Description |
---|---|
|
The concurrent remembered set update (refinement) uses these options to control the work distribution of concurrent refinement threads. G1 chooses the ergonomic values for these options so that |
|
This batches together concurrent remembered set update (refinement) work for initial object allocations. |
|
This determines whether processing of java.lang.Ref.* instances should be done in parallel by multiple threads. |
|
This determines the percentage of total garbage collection time G1 should spend in the Update RS phase updating any remaining remembered sets. G1 controls the amount of concurrent remembered set updates using this setting. |
|
This is the period in a number of GCs that G1 generates remembered set summary reports. Set this to zero to disable. Generating remembered set summary reports is a costly operation, so it should be used only if necessary, and with a reasonably high value. Use |
|
This is the divisor for the target ratio of time that should be spent in garbage collection as opposed to the application. The actual formula for determining the target fraction of time that can be spent in garbage collection before increasing the heap is |
-XX:G1PeriodicGCInterval= 0 |
The interval in ms to check whether G1 should trigger a periodic garbage collection. Set to zero to disable. |
-XX:+G1PeriodicGCInvokesConcurrent |
If set, periodic garbage collections trigger a concurrent marking or continue the existing collection cycle, otherwise trigger a Full GC. |
-XX:G1PeriodicGCSystemLoadThreshold= 0.0 |
Threshold for the current system load as returned by the hosts getloadavg() call to determine whether a periodic garbage collection should be triggered. A current system load higher than this value prevents periodic garbage collections. A value of zero indicates that this threshold check is disabled.
|
Note:
<ergo>
means that the actual value is determined ergonomically depending on the environment.