Garbage-First Garbage Collector Tuning

8 Garbage-First Garbage Collector Tuning

This section describes how to adapt Garbage-First garbage collector (G1 GC) behavior in case it does not meet your requirements.

Topics

General Recommendations for G1

The general recommendation is to use G1 with its default settings, eventually giving it a different pause-time goal and setting a maximum Java heap size by using -Xmx if desired.

G1 defaults have been balanced differently than either of the other collectors. G1's goals in the default configuration are neither maximum throughput nor lowest latency, but to provide relatively small, uniform pauses at high throughput. However, G1's mechanisms to incrementally reclaim space in the heap and the pause-time control incur some overhead in both the application threads and in the space-reclamation efficiency.

If you prefer high throughput, then relax the pause-time goal by using -XX:MaxGCPauseMillis or provide a larger heap. If latency is the main requirement, then modify the pause-time target. Avoid limiting the young generation size to particular values by using options like -Xmn, -XX:NewRatio and others because the young generation size is the main means for G1 to allow it to meet the pause-time. Setting the young generation size to a single value overrides and practically disables pause-time control.

Moving to G1 from Other Collectors

Generally, when moving to G1 from other collectors, particularly the Concurrent Mark Sweep collector, start by removing all options that affect garbage collection, and only set the pause-time goal and overall heap size by using -Xmx and optionally -Xms.

Many options that are useful for other collectors to respond in some particular way, have either no effect at all, or even decrease throughput and the likelihood to meet the pause-time target. An example could be setting young generation sizes that completely prevent G1 from adjusting the young generation size to meet pause-time goals.

Improving G1 Performance

G1 is designed to provide good overall performance without the need to specify additional options. However, there are cases when the default heuristics or default configurations for them provide suboptimal results. This section gives some guidelines about diagnosing and improving in these cases. This guide describes only the possibilities that G1 provides to improve garbage collector performance in a selected metric, when given a set application. On a case-by-case basis, application-level optimizations could be more effective than trying to tune the VM to perform better, for example, by avoiding some problematic situations by less long-lived objects altogether.

For diagnosis purposes, G1 provides comprehensive logging. A good start is to use the -Xlog:gc*=debug option and then refine the output from that if necessary. The log provides a detailed overview during and outside the pauses about garbage collection activity. This includes the type of collection and a breakdown of time spent in particular phases of the pause.

The following subsections explore some common performance issues.

Observing Full Garbage Collections

A full heap garbage collection (Full GC) is often very time consuming. Full GCs caused by too high heap occupancy in the old generation can be detected by finding the words Pause Full (Allocation Failure) in the log. Full GCs are typically preceded by garbage collections that encounter an evacuation failure indicated by to-space exhausted tags.

The reason that a Full GC occurs is because the application allocates too many objects that can't be reclaimed quickly enough. Often concurrent marking has not been able to complete in time to start a space-reclamation phase. The probability to run into a Full GC can be compounded by the allocation of many humongous objects. Due to the way these objects are allocated in G1, they may take up much more memory than expected.

The goal should be to ensure that concurrent marking completes on time. This can be achieved either by decreasing the allocation rate in the old generation, or giving the concurrent marking more time to complete.

G1 gives you several options to handle this situation better:

You can determine the number of regions occupied by humongous objects on the Java heap using the gc+heap=info logging. Y in the lines "Humongous regions: X->Y” give you the amount of regions occupied by humongous objects. If this number is high compared to the number of old regions, the best option is to try to decrease this number of objects. You can achieve this by increasing the region size using the -XX:G1HeapRegionSize option. The currently selected heap region size is printed at the beginning of the log.
Increase the size of the Java heap. This typically increases the amount of time marking has to complete.
Increase the number of concurrent marking threads by setting -XX:ConcGCThreads explicitly.
Force G1 to start marking earlier. G1 automatically determines the Initiating Heap Occupancy Percent (IHOP) threshold based on earlier application behavior. If the application behavior changes, these predictions might be wrong. There are two options: Lower the target occupancy for when to start space-reclamation by increasing the buffer used in an adaptive IHOP calculation by modifying -XX:G1ReservePercent; or, disable the adaptive calculation of the IHOP by setting it manually using -XX:-G1UseAdaptiveIHOP and -XX:InitiatingHeapOccupancyPercent.

Other causes than Allocation Failure for a Full GC typically indicate that either the application or some external tool causes a full heap collection. If the cause is System.gc(), and there is no way to modify the application sources, the effect of Full GCs can be mitigated by using -XX:+ExplicitGCInvokesConcurrent or let the VM completely ignore them by setting -XX:+DisableExplicitGC. External tools may still force Full GCs; they can be removed only by not requesting them.

Humongous Object Fragmentation

A Full GC could occur before all Java heap memory has been exhausted due to the necessity of finding a contiguous set of regions for them. Potential options in this case are increasing the heap region size by using the option -XX:G1HeapRegionSize to decrease the number of humongous objects, or increasing size of the heap. In extreme cases, there might not be enough contiguous space available for G1 to allocate the object even if available memory indicates otherwise. This would lead to a VM exit if that Full GC can not reclaim enough contiguous space. As a result, there are no other options than either decreasing the amount of humongous object allocations as mentioned previously, or increasing the heap.

Tuning for Latency

This section discusses hints to improve G1 behavior in case of common latency problems that is, if the pause-time is too high.

Unusual System or Real-Time Usage

For every garbage collection pause, the gc+cpu=info log output contains a line including information from the operating system with a breakdown about where during the pause-time has been spent. An example for such output is User=0.19s Sys=0.00s Real=0.01s.

User time is time spent in VM code, system time is the time spent in the operating system, and real time is the amount of absolute time passed during the pause. If the system time is relatively high, then most often the environment is the cause.

Common known issues for high system time are:

The VM allocating or giving back memory from the operating system memory may cause unnecessary delays. Avoid the delays by setting minimum and maximum heap sizes to the same value using the options -Xms and -Xmx, and pre-touching all memory using -XX:+AlwaysPreTouch to move this work to the VM startup phase.
Particularly in Linux, coalescing of small pages into huge pages by the Transparent Huge Pages (THP) feature tends to stall random processes, not just during a pause. Because the VM allocates and maintains a lot of memory, there is a higher than usual risk that the VM will be the process that stalls for a long time. Refer to the documentation of your operating system on how to disable the Transparent Huge Pages feature.
Writing the log output may stall for some time because of some background task intermittently taking up all I/O bandwidth for the hard disk the log is written to. Consider using a separate disk for your logs or some other storage, for example memory-backed file system to avoid this.

Another situation to look out for is real time being a lot larger than the sum of the others this may indicate that the VM did not get enough CPU time on a possibly overloaded machine.

Reference Object Processing Takes Too Long

Information about the time taken for processing of Reference Objects is shown in the Reference Processing phase. During the Reference Processing phase, G1 updates the referents of Reference Objects according to the requirements of the particular type of Reference Object. By default, G1 tries to parallelize the sub-phases of Reference Processing using the following heuristic: for every -XX:ReferencesPerThread reference Objects start a single thread, bounded by the value in -XX:ParallelGCThreads. This heuristic can be disabled by setting -XX:ReferencesPerThread to 0 to use all available threads by default, or parallelization disabled completely by -XX:-ParallelRefProcEnabled.

Young-Only Collections Within the Young-Only Phase Take Too Long

Normal young and, in general any young collection roughly takes time proportional to the size of the young generation, or more specifically, the number of live objects within the collection set that needs to be copied. If the Evacuate Collection Set phase takes too long, in particular, the Object Copy sub-phase, decrease -XX:G1NewSizePercent. This decreases the minimum size of the young generation, allowing for potentially shorter pauses.

Another problem with sizing of the young generation may occur if application performance, and in particular the amount of objects surviving a collection, suddenly changes. This may cause spikes in garbage collection pause time. It might be useful to decrease the maximum young generation size by using -XX:G1MaxNewSizePercent. This limits the maximum size of the young generation and so the number of objects that need to be processed during the pause.

Mixed Collections Take Too Long

Mixed collections are used to reclaim space in the old generation. The collection set of mixed collections contains young and old generation regions. You can obtain information about how much time evacuation of either young or old generation regions contribute to the pause-time by enabling the gc+ergo+cset=trace log output. Look at the predicted young region time and predicted old region time for young and old generation regions respectively.

If the predicted young region time is too long, then see Young-Only Collections Within the Young-Only Phase Take Too Long for options. Otherwise, to reduce the contribution of the old generation regions to the pause-time, G1 provides three options:

Spread the old generation region reclamation across more garbage collections by increasing -XX:G1MixedGCCountTarget.
Avoid collecting regions that take a proportionally large amount of time to collect by not putting them into the candidate collection set by using -XX:G1MixedGCLiveThresholdPercent. In many cases, highly occupied regions take a lot of time to collect.
Stop old generation space reclamation earlier so that G1 won't collect as many highly occupied regions. In this case, increase -XX:G1HeapWastePercent.

Note that the last two options decrease the amount of collection set candidate regions where space can be reclaimed for the current space-reclamation phase. This may mean that G1 may not be able to reclaim enough space in the old generation for sustained operation. However, later space-reclamation phases may be able to garbage collect them.

Collections Occur Back to Back

G1 default MMU settings allow back-to-back garbage collections. The default value of -XX:GCPauseIntervalMillis is just slightly higher than -XX:MaxGCPauseMillis. In case you observe continuous back-to-back garbage collections, which results in the application not progressing, increase the value of -XX:GCPauseIntervalMillis to an acceptable value. G1 will then try to space out garbage collections more.

High Update RS and Scan RS Times

To enable G1 to evacuate single old generation regions, G1 tracks locations of cross-region references, that is references that point from one region to another. The set of cross-region references pointing into a given region is called that region's remembered set. The remembered sets must be updated when moving the contents of a region. Maintenance of the regions' remembered sets is mostly concurrent. For performance purposes, G1 doesn't immediately update the remembered set of a region when the application installs a new cross-region reference between two objects. Remembered set update requests are delayed and batched for efficiency.

G1 requires complete remembered sets for garbage collection, so the Update RS phase of the garbage collection processes any outstanding remembered set update requests. The Scan RS phase searches for object references in remembered sets, moves region contents, and then updates these object references to the new locations. Depending on the application, these two phases may take a significant amount of time.

Adjusting the size of the heap regions by using the option -XX:G1HeapRegionSize affects the number of cross-region references and as well as the size of the remembered set. Handling the remembered sets for regions may be a significant part of garbage collection work, so this has a direct effect on the achievable maximum pause time. Larger regions tend to have fewer cross-region references, so the relative amount of work spent in processing them decreases, although at the same time, larger regions may mean more live objects to evacuate per region, increasing the time for other phases.

G1 tries to schedule concurrent processing of the remembered set updates so that the Update RS phase takes approximately -XX:G1RSetUpdatingPauseTimePercent percent of the allowed maximum pause time. By decreasing this value, G1 usually performs more remembered set update work concurrently.

Spurious high Update RS times in combination with the application allocating large objects may be caused by an optimization that tries to reduce concurrent remembered set update work by batching it. If the application that created such a batch happens just before a garbage collection, then the garbage collection must process all this work in the Update RS times part of the pause. Use -XX:-ReduceInitialCardMarks to disable this behavior and potentially avoid these situations.

Scan RS Time is also determined by the amount of compression that G1 performs to keep remembered set storage size low. The more compact the remembered set is stored in memory, the more time it takes to retrieve the stored values during garbage collection. G1 automatically performs this compression, called remembered set coarsening, while updating the remembered sets depending on the current size of that region's remembered set. Particularly at the highest compression level, retrieving the actual data can be very slow. The option -XX:G1SummarizeRSetStatsPeriod in combination with gc+remset=trace level logging shows if this coarsening occurs. If so, then the X in the line Did <X> coarsenings in the Before GC Summary section shows a high value. The -XX:G1RSetRegionEntries option could be increased significantly to decrease the amount of these coarsenings. Avoid using this detailed remembered set logging in production environments as collecting this data can take a significant amount of time.

Tuning for Throughput

G1's default policy tries to maintain a balance between throughput and latency; however, there are situations where higher throughput is desirable. Apart from decreasing the overall pause-times as described in the previous sections, the frequency of the pauses could be decreased. The main idea is to increase the maximum pause time by using -XX:MaxGCPauseMillis. The generation sizing heuristics will automatically adapt the size of the young generation, which directly determines the frequency of pauses. If that does not result in expected behavior, particularly during the space-reclamation phase, increasing the minimum young generation size using -XX:G1NewSizePercent will force G1 to do that.

In some cases, -XX:G1MaxNewSizePercent, the maximum allowed young generation size, may limit throughput by limiting young generation size. This can be diagnosed by looking at region summary output of gc+heap=info logging. In this case the combined percentage of Eden regions and Survivor regions is close to -XX:G1MaxNewSizePercent percent of the total number of regions. Consider increasing-XX:G1MaxNewSizePercent in this case.

Another option to increase throughput is to try to decrease the amount of concurrent work in particular, concurrent remembered set updates often require a lot of CPU resources. Increasing -XX:G1RSetUpdatingPauseTimePercent moves work from concurrent operation into the garbage collection pause. In the worst case, concurrent remembered set updates can be disabled by setting -XX:-G1UseAdaptiveConcRefinement -XX:G1ConcRefinementGreenZone=2G -XX:G1ConcRefinementThreads=0. This mostly disables this mechanism and moves all remembered set update work into the next garbage collection pause.

Enabling the use of large pages by using -XX:+UseLargePages may also improve throughput. Refer to your operating system documentation on how to set up large pages.

You can minimize heap resizing work by disabling it; set the options -Xms and -Xmx to the same value. In addition, you can use -XX:+AlwaysPreTouch to move the operating system work to back virtual memory with physical memory to VM startup time. Both of these measures can be particularly desirable in order to make pause-times more consistent.

Tuning for Heap Size

Like other collectors, G1 aims to size the heap so that the time spent in garbage collection is below the ratio determined by the -XX:GCTimeRatio option. Adjust this option to make G1 meet your requirements.

Tunable Defaults

This section describes the default values and some additional information about command-line options that are introduced in this topic.

Table 8-1 Tunable Defaults G1 GC

Option and Default Value	Description
`-XX:+G1UseAdaptiveConcRefinement` `-XX:G1ConcRefinementGreenZone=`<ergo> `-XX:G1ConcRefinementYellowZone=`<ergo> `-XX:G1ConcRefinementRedZone=`<ergo> `-XX:G1ConcRefinementThreads=`<ergo>	The concurrent remembered set update (refinement) uses these options to control the work distribution of concurrent refinement threads. G1 chooses the ergonomic values for these options so that `-XX:G1RSetUpdatingPauseTimePercent` time is spent in the garbage collection pause for processing any remaining work, adaptively adjusting them as needed. Change with caution because this may cause extremely long pauses.
`-XX:+ReduceInitialCardMarks`	This batches together concurrent remembered set update (refinement) work for initial object allocations.
`-XX:+ParallelRefProcEnabled` `-XX:ReferencesPerThread=1000`	`-XX:ReferencesPerThread` determines the degree of parallelization: for every N Reference Objects one thread will participate in the sub-phases of Reference Processing, limited by `-XX:ParallelGCThreads`. A value of 0 indicates that the maximum number of threads as indicated by the value of `-XX:ParallelGCThreads` will always be used. This determines whether processing of java.lang.Ref.* instances should be done in parallel by multiple threads.
`-XX:G1RSetUpdatingPauseTimePercent=10`	This determines the percentage of total garbage collection time G1 should spend in the Update RS phase updating any remaining remembered sets. G1 controls the amount of concurrent remembered set updates using this setting.
`-XX:G1SummarizeRSetStatsPeriod=0`	This is the period in a number of GCs that G1 generates remembered set summary reports. Set this to zero to disable. Generating remembered set summary reports is a costly operation, so it should be used only if necessary, and with a reasonably high value. Use `gc+remset=trace` to print anything.
`-XX:GCTimeRatio=12`	This is the divisor for the target ratio of time that should be spent in garbage collection as opposed to the application. The actual formula for determining the target fraction of time that can be spent in garbage collection before increasing the heap is `1 / (1 + GCTimeRatio)`. This default value results in a target with about 8% of the time to be spent in garbage collection.

Note:

<ergo> means that the actual value is determined ergonomically depending on the environment.