Garbage Collector Implementation

3 Garbage Collector Implementation

One strength of the Java SE platform is that it shields the developer from the complexity of memory allocation and garbage collection.

However, when garbage collection is the principal bottleneck, it's useful to understand some aspects of the implementation. Garbage collectors make assumptions about the way applications use objects, and these are reflected in tunable parameters that can be adjusted for improved performance without sacrificing the power of the abstraction.

Topics

Generational Garbage Collection

An object is considered garbage and its memory can be reused by the VM when it can no longer be reached from any reference of any other live object in the running program.

A theoretical, most straightforward garbage collection algorithm iterates over every reachable object every time it runs. Any leftover objects are considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.

The Java HotSpot VM incorporates a number of different garbage collection algorithms that all except ZGC use a technique called generational collection. While naive garbage collection examines every live object in the heap every time, generational collection exploits several empirically observed properties of most applications to minimize the work required to reclaim unused (garbage) objects. The most important of these observed properties is the weak generational hypothesis, which states that most objects survive for only a short period of time.

The blue area in Figure 3-1 is a typical distribution for the lifetimes of objects. The x-axis shows object lifetimes measured in bytes allocated. The byte count on the y-axis is the total bytes in objects with the corresponding lifetime. The sharp peak at the left represents objects that can be reclaimed (in other words, have "died") shortly after being allocated. For example, iterator objects are often only alive for the duration of a single loop.

Figure 3-1 Typical Distribution for Lifetimes of Objects

Description of "Figure 3-1 Typical Distribution for Lifetimes of Objects"

Some objects do live longer, and so the distribution stretches out to the right. For instance, there are typically some objects allocated at initialization that live until the VM exits. Between these two extremes are objects that live for the duration of some intermediate computation, seen here as the lump to the right of the initial peak. Some applications have very different looking distributions, but a surprisingly large number possess this general shape. Efficient collection is made possible by focusing on the fact that a majority of objects "die young."

Generations

To optimize for this scenario, memory is managed in generations (memory pools holding objects of different ages). Garbage collection occurs in each generation when the generation fills up.

The vast majority of objects are allocated in a pool dedicated to young objects (the young generation), and most objects die there. When the young generation fills up, it causes a minor collection in which only the young generation is collected; garbage in other generations isn't reclaimed. The costs of such collections are, to the first order, proportional to the number of live objects being collected; a young generation full of dead objects is collected very quickly. Typically, some fraction of the surviving objects from the young generation are moved to the old generation during each minor collection. Eventually, the old generation fills up and must be collected, resulting in a major collection, in which the entire heap is collected. Major collections usually last much longer than minor collections because a significantly larger number of objects are involved. Figure 3-2 shows the default arrangement of generations in the serial garbage collector:

Figure 3-2 Default Arrangement of Generations in the Serial Collector

Description of "Figure 3-2 Default Arrangement of Generations in the Serial Collector"

At startup, the Java HotSpot VM reserves the entire Java heap in the address space, but doesn't allocate any physical memory for it unless needed. The entire address space covering the Java heap is logically divided into young and old generations. The complete address space reserved for object memory can be divided into the young and old generations.

The young generation consists of eden and two survivor spaces. Most objects are initially allocated in eden. One survivor space is empty at any time, and serves as the destination of live objects in eden and the other survivor space during garbage collection; after garbage collection, eden and the source survivor space are empty. In the next garbage collection, the purpose of the two survivor spaces are exchanged. The one space recently filled is a source of live objects that are copied into the other survivor space. Objects are copied between survivor spaces in this way until they've been copied a certain number of times or there isn't enough space left there. These objects are copied into the old region. This process is also called aging.

Performance Considerations

The primary measures of garbage collection are throughput and latency.

Throughput is the percentage of total time not spent in garbage collection considered over long periods of time. Throughput includes time spent in allocation (but tuning for speed of allocation generally isn't needed).
Latency is the responsiveness of an application. Garbage collection pauses affect the responsiveness of applications.

Users have different requirements of garbage collection. For example, some consider the right metric for a web server to be throughput because pauses during garbage collection may be tolerable or simply obscured by network latencies. However, in an interactive graphics program, even short pauses may negatively affect the user experience.

Some users are sensitive to other considerations. Footprint is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability. Promptness is the time between when an object becomes dead and when the memory becomes available, an important consideration for distributed systems, including Remote Method Invocation (RMI).

In general, choosing the size for a particular generation is a trade-off between these considerations. For example, a very large young generation may maximize throughput, but does so at the expense of footprint, promptness, and pause times. Young generation pauses can be minimized by using a small young generation at the expense of throughput. The sizing of one generation doesn't affect the collection frequency and pause times for another generation.

There is no one right way to choose the size of a generation. The best choice is determined by the way the application uses memory as well as user requirements. Thus the virtual machine's choice of a garbage collector isn't always optimal and may be overridden with command-line options; see Factors Affecting Garbage Collection Performance.

Throughput and Footprint Measurement

Throughput and footprint are best measured using metrics particular to the application.

For example, the throughput of a web server may be tested using a client load generator. However, pauses due to garbage collection are easily estimated by inspecting the diagnostic output of the virtual machine itself. The command-line option -verbose:gc prints information about the heap and garbage collection at each collection. Here is an example:

[15,651s][info ][gc] GC(36) Pause Young (G1 Evacuation Pause) 239M->57M(307M) (15,646s, 15,651s) 5,048ms
[16,162s][info ][gc] GC(37) Pause Young (G1 Evacuation Pause) 238M->57M(307M) (16,146s, 16,162s) 16,565ms
[16,367s][info ][gc] GC(38) Pause Full (System.gc()) 69M->31M(104M) (16,202s, 16,367s) 164,581ms

The output shows two young collections followed by a full collection that was initiated by the application with a call to System.gc(). The lines start with a time stamp indicating the time from when the application was started. Next comes information about the log level (info) and tag (gc) for this line. This is followed by a GC identification number. In this case, there are three GCs with the numbers 36, 37, and 38. Then the type of GC and the cause for stating the GC is logged. After this, some information about the memory consumption is logged. That log uses the format "used before GC" -> "used after GC" ("heap size").

In the first line of the example this is 239M->57M(307M), which means that 239 MB were used before the GC and the GC cleared up most of that memory, but 57 MB survived. The heap size is 307 MB. Note in this example that the full GC shrinks the heap from 307 MB to 104 MB. After the memory usage information, the start and end times for the GC are logged as well as the duration (end - start).

The -verbose:gc command is an alias for -Xlog:gc. -Xlog is the general logging configuration option for logging in the HotSpot JVM. It's a tag-based system where gc is one of the tags. To get more information about what a GC is doing, you can configure logging to print any message that has the gc tag and any other tag. The command line option for this is -Xlog:gc*.

Here's an example of one G1 young collection logged with -Xlog:gc* :

[10.178s][info][gc,start ] GC(36) Pause Young (G1 Evacuation Pause) 
[10.178s][info][gc,task ] GC(36) Using 28 workers of 28 for evacuation 
[10.191s][info][gc,phases ] GC(36) Pre Evacuate Collection Set: 0.0ms
[10.191s][info][gc,phases ] GC(36) Evacuate Collection Set: 6.9ms 
[10.191s][info][gc,phases ] GC(36) Post Evacuate Collection Set: 5.9ms 
[10.191s][info][gc,phases ] GC(36) Other: 0.2ms 
[10.191s][info][gc,heap ] GC(36) Eden regions: 286->0(276) 
[10.191s][info][gc,heap ] GC(36) Survivor regions: 15->26(38)
[10.191s][info][gc,heap ] GC(36) Old regions: 88->88 
[10.191s][info][gc,heap ] GC(36) Humongous regions: 3->1 
[10.191s][info][gc,metaspace ] GC(36) Metaspace: 8152K->8152K(1056768K)
[10.191s][info][gc ] GC(36) Pause Young (G1 Evacuation Pause) 391M->114M(508M) 13.075ms 
[10.191s][info][gc,cpu ] GC(36) User=0.20s Sys=0.00s Real=0.01s

Note:

The format of the output produced by -Xlog:gc* is subject to change in future releases.