Beta Draft: 2017-03-27
This section describes how to adapt Garbage-First garbage collector (G1 GC) behavior in case it does not meet your requirements.
The general recommendation is to use G1 with its default settings, eventually giving it a different pause-time goal and setting a maximum Java heap size by using
-Xmx if desired.
G1 defaults have been balanced differently than either of the other collectors. The G1's goals in the default configuration are neither maximum throughput nor lowest latency, but to provide relatively small, uniform pauses at high throughput. However, the G1's mechanisms to incrementally reclaim space in the heap and the pause-time control incur some overhead in both the application threads and in the space-reclamation efficiency.
If you prefer high throughput, then relax the pause-time goal using
-XX:MaxGCPauseMillis or provide a larger heap. If latency is the main requirement, then modify the pause-time target. Avoid limiting the young generation size to particular values by using options like
-XX:NewRatio and others because the young generation size is the main means for G1 to allow it to meet the pause-time. Setting the young generation size to a single value overrides and practically disables pause-time control.
Generally, when moving to G1 from other collectors, particularly the Concurrent Mark Sweep collector, start by removing all options that affect garbage collection, and only set the pause-time goal and overall heap size by using
-Xmx and optionally
Many options that are useful for other collectors to respond in some particular way, have either no effect at all, or even decrease throughput and the likelihood to meet the pause-time target. An example could be setting young generation sizes that completely prevent G1 from adjusting the young generation size to meet pause-time goals.
G1 is designed to provide good overall performance without the need to specify additional options. However, there are cases when the default heuristics or default configurations for them provide suboptimal results. This section gives some guidelines about diagnosing and improving in these cases. This guide describes only the possibilities that G1 provides to improve garbage collector performance in a selected metric, when given a set application. On a case-by-case basis, application-level optimizations could be more effective than trying to tune the VM to perform better, for example, by avoiding some problematic situations by less long-lived objects altogether.
For diagnosis purposes, G1 provides comprehensive logging. You can enable this logging with the Unified Logging options
-Xlog:gc*=debug. The log provides a detailed overview during and outside the pauses about garbage collection activity. This includes the type of collection and a breakdown of time spent in particular phases of the pause.
The following subsections explore some common performance issues.
A full heap garbage collection (Full GC) is often very time consuming. Full GCs caused by too high heap occupancy in the old generation can be detected by finding the words Pause Full (Allocation Failure) in the log. Full GCs are typically preceded by garbage collections that encounter an evacuation failure indicated by
to-space exhausted tags.
The reason that a Full GC can be time consuming is because the application allocates too many objects that can’t be reclaimed quickly enough. One reason could be that concurrent marking has not been able to complete in time to start a space-reclamation phase. The probability to run into a Full GC can be compounded by the allocation of many
humongous objects. Due to the way these objects are allocated in G1, they may take up much more memory than expected.
The goal should be to ensure that concurrent marking completes on time. This can be achieved either by decreasing the allocation rate in the old generation, or giving the concurrent marking more time to complete.
G1 gives you several options to handle this situation better:
gc+heap=infologging shows the number next to
Humongousregions. After every garbage collection, the best option is to try to reduce the number of objects. You can achieve this by increasing the region size using the
-XX:G1HeapRegionSizeoption. The currently selected heap region size is printed at the beginning of the log.
-XX:G1ReservePercent; or, disable the adaptive calculation of the IHOP by setting it manually using
Other causes than Allocation Failure for a Full GC typically indicate that either the application or some external tool causes a full heap collection. If the cause is
System.gc(), and there is no way to modify the application sources, the effect of Full GCs can be mitigated by using
-XX:+ExplicitGCInvokesConcurrent or these calls completely ignored by setting
-XX:+DisableExplicitGC. External tools may still force Full GCs; they can be removed only by not requesting them.
A Full GC could occur before all Java heap memory has been exhausted due to the necessity of finding a contiguous set of regions for them. Potential options in this case are increasing the heap region size by using the option,
-XX:G1HeapRegionSize to decrease the number of humongous objects, or increase size of the heap .In extreme cases, there might not be enough contiguous space available for G1 to allocate the object even if available memory indicates otherwise. This would lead to a VM exit if that Full GC can not reclaim enough contiguous space. As a result there are no other options than either decreasing the amount of
humongous object allocations as mentioned previously, or increasing the heap.
This section discusses hints to improve G1 behavior in case of common latency problems that is if the pause-time is too high.
The starting point for these investigations is the log. It can provide information on which parts of the garbage collection pauses take how long. A good start is to enable
gc*=debug logging and then refine the output from that if necessary.
For every garbage collection pause, the
gc+cpu=info log output contains a line including information from the operating system with a breakdown about where during the pause-time has been spent. An example for such output is
User=0.19s Sys=0.00s Real=0.01s.
User time is time spent in VM code, System Time is the time spent in the operating system, and Real Timeis the amount of absolute time passed during the pause. If the system time is relatively high, then most often the environment is the cause.
Common known issues for high system time are:
-Xmx, and pre-touching all memory using
-XX:+AlwaysPreTouchto move this work to the VM startup phase.
Another situation to look out for is Real Time being a lot larger than the sum of the others this may indicate that the VM did not get enough CPU time on a possibly overloaded machine.
If the Ref Proc or Ref Enq phases take too long, then consider enabling parallelization of these phases using the option
Young-only and in general, any young collection roughly takes time proportional to the size of the young generation, or more specifically the number of live objects within the collection set that needs to be copied. If the Evacuate Collection Set phase takes too long, then in particular the Object Copy sub-phase, decrease
-XX:G1NewSizePercent. This decreases the minimum size of the young generation, allowing for potentially shorter pauses.
Another problem with sizing of the young generation may occur if application performance, and in particular the amount of objects surviving a collection, suddenly changes. This may cause spikes in garbage collection pause time. It might be useful to decrease the maximum young generation size using
Mixed collections are used to reclaim space in the old generation. The collection set of mixed collections contains young and old generation regions. You can obtain information about how much time evacuation of either young or old generation regions contributes to the pause-time by enabling the
gc+ergo+cset=trace log output. Look at the predicted young region time and predicted old region time for young and old generation regions respectively.
If the predicted young region time is too long, then see Young-Only Collections Take Too Long for options. Otherwise, to reduce the contribution of the old generation regions to the pause-time, G1 provides three options:
Spread the old generation region reclamation across more garbage collections by increasing
Avoid collecting regions that take a proportionally large amount of time to collect by not putting them into the candidate collection set by using -
XX:G1MixedGCLiveThresholdPercent. In many cases highly occupied regions take a lot of time to collect.
Stop old generation space reclamation earlier so that G1 won't collect as many highly occupied regions. In this case, increase
Note that the last two options decrease the amount of collection set candidate regions where space can be reclaimed for the current space-reclamation phase. This may mean that G1 may not be able to reclaim enough space in the old generation for sustained operation. However, later space-reclamation phases may be able to garbage collect them.
To allow G1 to evacuate single old generation regions, G1 tracks locations of cross-region references that is references that point from one region to another. The set of cross-region references pointing into a given region is called that region's remembered set. The remembered sets must be updated when moving the contents of a region. Maintenance of the regions' remembered sets is mostly concurrent. For performance purposes, G1 doesn’t immediately update the remembered set of a region when the application installs a new cross-region reference between two objects. Remembered set update requests are delayed and batched for efficiency.
G1 requires complete remembered sets for garbage collection, so the Update RS phase of the garbage collection processes any outstanding remembered set update requests. The Scan RS phase searches for object references in remembered sets, moves region contents, and then updates these object references to the new locations. Depending on the application these two phases may take a significant amount of time.
Adjusting the size of the heap regions by using the option
-XX:G1HeapRegionSize affects the number of cross-region references and as well as the size of the remembered set. Handling the remembered sets for regions may be a significant part of garbage collection work, so this has a direct effect on the achievable maximum pause time. Larger regions tend to have fewer cross-region references, so the relative amount of work spent in processing them decreases, although at the same time, larger regions may mean more live objects to evacuate per region, increasing the time for other phases.
G1 tries to schedule concurrent processing of the remembered set updates so that the Update RS phase takes approximately
-XX:G1RSetUpdatingPauseTimePercent percent of the allowed maximum pause time. By decreasing this value, G1 usually performs more remembered set update work concurrently.
Spurious high Update RS times in combination with the application allocating large objects may be caused by an optimization that tries to reduce concurrent remembered set update work by batching it. If the application that created such a batch happens just before a garbage collection, then the garbage collection must process all this work in the Update RS times part of the pause. Use
-XX:-ReduceInitialCardMarks to disable this behavior and potentially avoid these situations.
Scan RS Times is also determined by the amount of compression that G1 performs to keep remembered set storage size low. The more compact remembered set that's stored in memory, the more time it takes to retrieve the stored values during garbage collection. G1 automatically performs this compression, called remembered set coarsening, while updating the remembered sets depending on the current size of that region's remembered set. Particularly at the highest compression level, retrieving the actual data can be very slow. The option
-XX:G1SummarizeRSetStatsPeriod in combination with
gc+remset=trace level logging shows if this coarsening occurs. If so, then the
X in the line
Did <X> coarsenings in the Before GC Summary shows a high value. The
-XX:G1RSetRegionEntries option could be increased significantly to decrease the amount of these coarsenings. Avoid using this detailed remembered set logging in production environments as collecting this data can take a significant amount of time.
G1's default policy tries to maintain a balance between throughput and latency; however there are situations where higher throughput is desirable. Apart from decreasing the overall pause-times as described in the previous sections, the frequency of the pauses could be decreased. The main idea is to increase the maximum pause time by using
-XX:MaxGCPauseMillis. The generation sizing heuristics will automatically adapt the size of the young generation which directly determines the frequency of pauses. If that does not result in expected behavior, particularly during the space-reclamation phase, increasing the minimum young generation size using
-XX:G1NewSizePercent will force G1 to do that.
In some cases
-XX:G1MaxNewSizePercent, the maximum allowed young generation size, may limit throughput by limiting young generation size. This can be diagnosed by looking at region summary output of
gc+heap=info logging. In this case the combined percentage of Eden regions and Survivor regions is close to
-XX:G1MaxNewSizePercent percent of the total number of regions. Consider increasing
-XX:G1MaxNewSizePercent in this case.
Another option to increase throughput is to try to decrease the amount of concurrent work, in particular concurrent remembered set updates often require a lot of CPU resources. Increasing
-XX:G1RSetUpdatingPauseTimePercent moves work from concurrent operation into the garbage collection pause. In the worst case, concurrent remembered set updates can be disabled by setting
0. This mostly disables this mechanism and moves all remembered set update work into the next garbage collection pause.
Enabling the use of large pages by using
-XX:+UseLargePages may also improve throughput. Refer to your operating system manual on how to set up large pages.
You can minimize heap resizing work by disabling it; set the options
-Xmx to the same value. In addition, you can use
-XX:+AlwaysPreTouch to move the operating system work to back virtual memory with physical memory to VM startup time. Both of these measures can be particularly desirable in order to make pause-times more consistent.
Like other collectors, G1 aims to size the heap so that the time spent in garbage collection is below the ratio determined by the
-XX:GCTimeRatio option. Adjust this option to make G1 meet your requirements.
This topic describes the default values and some additional information about command-line options that are introduced in this chapter.
Table 10-1 Tunable Defaults G1 GC
|Option and Default Value||Description|
The concurrent remembered set update (refinement) uses these options to control the work distribution of concurrent refinement threads. G1 chooses the ergonomic values for these options so that
This batches together concurrent remembered set update (refinement) work for initial object allocations.
This determines whether processing of java.lang.Ref.* instances should be done in parallel by multiple threads.
This determines the percentage of total garbage collection time G1 should spend in the Update RS phase updating any remaining remembered sets. G1 controls the amount of concurrent remembered set updates using this setting.
This is the period in a number of GCs that G1 generates remembered set summary reports. Set this to zero to disable. Generating remembered set summary reports is a costly operation, so it should be used only if necessary, and with a reasonably high value. Use
This is the divisor for the target ratio of time that should be spent in garbage collection as opposed to the application. The actual formula for determining the target fraction of time that can be spent in garbage collection before increasing the heap is
<ergo>means that the actual value is determined ergonomically depending on the environment.