5 Java HotSpot Virtual Machine Performance Enhancements
This chapter describes the performance enhancements in the Oracle’s HotSpot Virtual Machine technology.
Compact Strings
The compact strings feature introduces a space-efficient internal representation for strings.
Data from different applications suggests that strings are a major component of Java heap usage and that most java.lang.String
objects contain only Latin-1 characters. Such characters require only one byte of storage. As a result, half of the space in the internal character arrays of java.lang.String
objects are not used. The compact strings feature, introduced in Java SE 9 reduces the memory footprint, and reduces garbage collection activity. This feature can be disabled if you observe performance regression issues in an application.
The compact strings feature does not introduce new public APIs or interfaces. It modifies the internal representation of the java.lang.String
class from a UTF-16 (two bytes) character array to a byte array with an additional field to identify character encoding. Other string-related classes, such as AbstractStringBuilder
, StringBuilder
, and StringBuffer
are updated to use a similar internal representation.
In Java SE 9, the compact strings feature is enabled by default. Therefore, the java.lang.String
class stores characters as one byte for each character, encoded as Latin-1. The additional character encoding field indicates the encoding that is used. The HotSpot VM string intrinsics are updated and optimized to support the internal representation.
You can disable the compact strings feature by using the -XX:-CompactStrings
flag with the java
command line. When the feature is disabled, the java.lang.String
class stores characters as two bytes, encoded as UTF-16, and the HotSpot VM string intrinsics to use UTF-16 encoding.
Tiered Compilation
Tiered compilation, introduced in Java SE 7, brings client VM startup speeds to the server VM. Without tired compilation, a server VM uses the interpreter to collect profiling information about methods that is sent to the compiler. With tiered compilation, the server VM also uses the client compiler to generate compiled versions of methods that collect profiling information about themselves. The compiled code is substantially faster than the interpreter, and the program executes with greater performance during the profiling phase. Often, startup is faster than the client VM startup speed because the final code produced by the server compiler might be available during the early stages of application initialization. Tiered compilation can also achieve better peak performance than a regular server VM, because, the faster profiling phase allows a longer period of profiling, which can yield better optimization.
Tiered compilation is enabled by default for the server VM. The 64-bit mode and Compressed Ordinary Object Pointer are supported. You can disable tiered compilation by using the -XX:-TieredCompilation
flag with the java
command.
To accommodate the additional profiling code that is generated with tiered compilation, the default size of code cache is multiplied by 5x. To organize and manage the larger space effectively, segmented code cache is used.
Segmented Code Cache
The code cache is the area of memory where the Java Virtual Machine stores generated native code. It is organized as a single heap data structure on top of a contiguous chunk of memory.
Instead of having a single code heap, the code cache is divided into segments, each containing compiled code of a particular type. This segmentation provides better control of the JVM memory footprint, shortens scanning time of compiled methods, significantly decreases the fragmentation of code cache, and improves performance.
The code cache is divided into the following three segments:
Table 5-1 Segmented Code Cache
Code Cache Segments | Description | JVM Command-Line Arguments |
---|---|---|
Non-method |
This code heap contains non-method code such as compiler buffers and bytecode interpreter. This code type stays in the code cache forever. The code heap has a fixed size of 3 MB and remaining code cache is distributed evenly among the profiled and non-profiled code heaps. |
|
Profiled |
This code heap contains lightly optimized, profiled methods with a short lifetime. |
|
Non-profiled |
This code heap contains fully optimized, non-profiled methods with a potentially long lifetime. |
|
Graal: a Java-Based JIT Compiler
Graal is a high-performance, optimizing, just-in-time compiler written in Java that integrates with Java HotSpot VM. It’s a customizable dynamic compiler that you can invoke from Java.
Some of the features and benefits of Graal include:
-
Flexible speculative optimizations
-
Better inlining
-
Partial escape analysis
-
Benefits from Java tooling and IDE support
-
Metacircular approach that allows for tighter code generation control
You can use Graal in the static context as well. The static Ahead of Time Compiler is based on the Graal framework.
Graal is part of the JDK build and it is delivered as an internal module, jdk.internal.vm.compiler
. It communicates with the JVM using the JVM Compiler Interface (JVMCI). The JVMCI is also part of the JDK build and it is contained within the internal module: jdk.internal.vm.ci
.
To enable Graal as the JIT compiler, use the following option on the java
command line:
-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler
Note:
Graal is an experimental feature and is supported only on Linux-x64.Ahead-of-Time Compilation
Ahead-of-time (AOT) compilation improves the startup time of small and large Java applications by compiling the Java classes to native code before launching the virtual machine.
Though just-in-time (JIT) compilers are fast, it takes time to compile large Java programs. Also, when certain Java methods that are not compiled are interpreted repeatedly, performance is affected. AOT addresses these issues.
jaotc
is used for AOT compilation. The syntax of the jaotc
tool is as follows: jaotc <options> <list of classes or jar files>
jaotc <options> <--module name>
For example:
jaotc --output libHelloWorld.so HelloWorld.class
jaotc --output libjava.base.so --module java.base
The jaotc
tool is part of Java installation, similar to javac
.
java -XX:AOTLibrary=./libHelloWorld.so,./libjava.base.so HelloWorld
When JVM startup, the AOT initialization code looks for the libraries specified using the AOTLibrary
flag. If the libraries are not found, then the AOT is turned off for that JVM instance.
See Java Platform, Standard Edition Tools Reference for details on jaotc
tool.
Note:
Ahead-of-Time (AOT) compilation is an experimental feature and is supported only on Linux-x64.Compressed Ordinary Object Pointer
An ordinary object pointer (oop) in Java Hotspot parlance, is a managed pointer to an object. Typically, an oop is the same size as a native machine pointer, which is 64-bit on an LP64 system. On an ILP32 system, maximum heap size is less than 4 gigabytes, which is insufficient for many applications. On an LP64 system, the heap used by a given program might have to be around 1.5 times larger than when it is run on an ILP32 system. This requirement is due to the expanded size of managed pointers. Memory is inexpensive, but these days bandwidth and cache are in short supply, so significantly increasing the size of the heap and only getting just over the 4 gigabyte limit is undesirable.
Managed pointers in the Java heap point to objects that are aligned on 8-byte address boundaries. Compressed oops represent managed pointers (in many but not all places in the Java Virtual Machine (JVM) software) as 32-bit object offsets from the 64-bit Java heap base address. Because they're object offsets rather than byte offsets, oops can be used to address up to four billion objects (not bytes), or a heap size of up to about 32 gigabytes. To use them, they must be scaled by a factor of 8 and added to the Java heap base address to find the object to which they refer. Object sizes using compressed oops are comparable to those in ILP32 mode.
The term decode refer to the operation by which a 32-bit compressed oop is converted to a 64-bit native address and added into the managed heap. The term encode refers to that inverse operation.
Compressed oops is supported and enabled by default in Java SE 6u23 and later. In Java SE 7, compressed oops is enabled by default for 64-bit JVM processes when -Xmx
isn't specified and for values of -Xmx
less than 32 gigabytes. For JDK releases earlier than 6u23 release, use the -XX:+UseCompressedOops
flag with the java
command to enable the compressed oops.
Zero-Based Compressed Ordinary Object Pointers
When the JVM uses compressed ordinary object pointers (oops) in a 64-bit JVM process, the JVM software sends a request to the operating system to reserve memory for the Java heap starting at virtual address zero. If the operating system supports such a request and can reserve memory for the Java heap at virtual address zero, then zero-based compressed oops are used.
When zero-based compressed oops are used, a 64-bit pointer can be decoded from a 32-bit object offset without including the Java heap base address. For heap sizes less than 4 gigabytes, the JVM software can use a byte offset instead of an object offset and thus also avoid scaling the offset by 8. Encoding a 64-bit address into a 32-bit offset is correspondingly efficient.
For Java heap sizes up to 26 gigabytes, the Solaris, Linux, and Windows operating systems typically can allocate the Java heap at virtual address zero.
Escape Analysis
Escape analysis is a technique by which the Java HotSpot Server Compiler can analyze the scope of a new object's uses and decide whether to allocate the object on the Java heap.
Escape analysis is supported and enabled by default in Java SE 6u23 and later.
The Java HotSpot Server Compiler implements the flow-insensitive escape analysis algorithm described in:
[Choi99] Jong-Deok Choi, Manish Gupta, Mauricio Seffano,
Vugranam C. Sreedhar, Sam Midkiff,
"Escape Analysis for Java", Procedings of ACM SIGPLAN
OOPSLA Conference, November 1, 1999
An object's escape state, based on escape analysis, can be one of the following states:
GlobalEscape
: The object escapes the method and thread. For example, an object stored in a static field, stored in a field of an escaped object, or returned as the result of the current method.ArgEscape
: The object is passed as an argument or referenced by an argument but does not globally escape during a call. This state is determined by analyzing the bytecode of the called method.NoEscape
: The object is a scalar replaceable object, which means that its allocation could be removed from generated code.
After escape analysis, the server compiler eliminates the scalar replaceable object allocations and the associated locks from generated code. The server compiler also eliminates locks for objects that do not globally escape. It does not replace a heap allocation with a stack allocation for objects that do not globally escape.
The following examples describe some scenarios for escape analysis:
-
The server compiler might eliminate certain object allocations. For example, a method makes a defensive copy of an object and returns the copy to the caller.
public class Person { private String name; private int age; public Person(String personName, int personAge) { name = personName; age = personAge; } public Person(Person p) { this(p.getName(), p.getAge()); } public int getName() { return name; } public int getAge() { return age; } } public class Employee { private Person person; // makes a defensive copy to protect against modifications by caller public Person getPerson() { return new Person(person) }; public void printEmployeeDetail(Employee emp) { Person person = emp.getPerson(); // this caller does not modify the object, so defensive copy was unnecessary System.out.println ("Employee's name: " + person.getName() + "; age: " + person.getAge()); } }
The method makes a copy to prevent modification of the original object by the caller. If the compiler determines that the
getPerson
method is being invoked in a loop, then the compiler inlines that method. By using escape analysis, when the compiler determines that the original object is never modified, the compiler can optimize and eliminate the call to make a copy. -
The server compiler might eliminate synchronization blocks (lock elision) if it determines that an object is thread local. For example, methods of classes such as
StringBuffer
andVector
are synchronized because they can be accessed by different threads. However, in most scenarios, they are used in a thread local manner. In cases where the usage is thread local, the compiler can optimize and remove the synchronization blocks.