Note:

Optimize Cloud Native Java Applications with GraalVM Enterprise PGO

Introduction

This lab shows how to run a Java Microbenchmark Harness (JMH) benchmark as a native executable, built with GraalVM Native Image, and apply Profile-Guided Optimization (PGO) to improve the performance.

GraalVM Native Image enables you to compile a Java application into a native executable that starts almost instantaneously, requires less memory and CPU.

Profile-Guided Optimization (PGO) is a commonly used technique in the Java ecosystem to mitigate the missing just-in-time optimization and gather the execution profiles at one run and then use them to optimize subsequent compilation(s). With PGO you can collect the profiling data and then feed it to the native-image tool, which will use this information to further optimize the performance of the resulting executable.

Notes on Using JMH with GraalVM Native Image

When running on the JVM, JMH will fork a new JVM for each benchmark to ensure there is no interference in the measurements for each benchmark. This approach is not possible when using GraalVM Native Image so you should consider the following guidance when building JMH benchmarks that are meant to be run as native executables:

Note: Oracle Cloud Infrastructure (OCI) provides GraalVM Enterprise at no additional cost.

Lab Objectives

In this lab you will:

Estimated lab time: 30-45 minutes

NOTE: If you see the laptop icon in the instructions, it means you need to enter a command. Keep an eye out for it.

# This is where we you will need to do something

To copy a command, hover over the field and then click the Copy to clipboard icon.

To paste a copied command in a terminal window, right click and select the Paste option from the context menu. If you prefer keyboard shortcuts instead, use CTRL+SHIFT+V.

STEP 1: Connect to a Remote Host and Check the Development Environment

Your development environment is provided by a remote host: an OCI Compute Instance with Oracle Linux 8, 4 CPU, and 32GB of memory.

The desktop environment will display before the remote host is ready, which can take up to two minutes.

Visual Studio Code (VS Code) will open and automatically connect to the VM instance that has been provisioned for you. Click Continue to accept the machine fingerprint.

VS Code Accept

If you do not click Continue, VS Code will popup a dialog box, shown below. Click Retry. VS Code will ask you to accept the machine fingerprint. Then click Continue.

VS Code Retry Connection

Issues With Connecting to the Remote Development Environment

If you encounter any other issues in which VS Code fails to connect to the remote development environment that are not covered above, try the following:

Congratulations, you are now connected to a remote host in Oracle Cloud!

The script will open VS Code, connected to your remote host, with the source code for the lab opened.

Next, open a Terminal within VS Code. The Terminal enables you to interact with the remote host. A terminal can be opened in VS Code via the menu: Terminal > New Terminal, as shown below.

VS Code Terminal

Note on the Development Environment

You will use GraalVM Enterprise as a Java runtime environment for this lab. GraalVM is a high-performance JDK distribution from Oracle built on Oracle Java SE.
Your development environment comes preconfigured with GraalVM Enterprise and the Native Image tooling required for this lab. You can check that by running these commands in the terminal:

java -version

native-image --version

You can proceed to the next step.

STEP 2: Compile and Run a JMH Benchmark on JVM

The source code for the application - JMH benchmark - is available on your remote host. The JMH benchmark originates from the Computer Language Benchmarks Game. It creates binary trees - before any tree nodes are garbage collected - using at-minimum the number of allocations.

This JMH benchmark uses Java reflection. The native-image tool operates under what is known as the “closed world” assumption, and will not include any reflectively accessed elements in the native executable, unless the tool is provided with necessary configuration at build time. So to build a native executable of this JMH benchmark you need to run the Tracing Agent to supply the reflection configuration to native-image. This has already been done for you to save time and the generated configuration can be found in src/main/resources/META-INF/native-image/. For more information on the reflection configuration, see Luna Lab on GraalVM Native Image and Reflection.

Build and then run the benchmark as a Java application, running the following command:

mvn clean package exec:exec

Note that within the pom.xml file there are instructions to explicitly turn off the GraalVM JIT compiler using the option -XX:-UseJVMCICompiler. This means that benchmark will run using the C2 JIT compiler.

The application will run the benchmark in three iterations and display the results in the terminal. The run should take less than four minutes to complete. The final result is the most significant. You should see something like:

Benchmark          (binaryTreesN)   Mode  Cnt    Score   Error  Units
BinaryTrees.bench              14  thrpt    3  180.819 ± 8.301  ops/s

You can now proceed to the next step.

STEP 3: Build and Run a JMH Benchmark as a Native Executable

Now build a native executable using GraalVM Enterprise Native Image.

The JMH benchmark is built with Maven and applies the Maven plugin for GraalVM Native Image building (open pom.xml to see the native-maven-plugin plugin registration). The plugin figures out which JAR file it needs to pass to native-image and what the executable main class should be.

  1. Build a native executable. The build should take approximately one minute:

    mvn package -Pnative
    

    The -Pnative Maven profile turns on building a native executable. It will generate a native executable in the target directory, called benchmark-binary-tree.

  2. Then run the benchmark as a native executable:

    ./target/benchmark-binary-tree
    

    These are the results obtained with GraalVM Enterprise Native Image 22.2.0:

    Benchmark          (binaryTreesN)   Mode  Cnt    Score    Error  Units
    BinaryTrees.bench              14  thrpt    3  174.135 ± 10.020  ops/s
    

    The native executable numbers may be similar or better compared to the previous (non-native) option. The results will vary depending on the hardware you run the same benchmark on.

You can now proceed to the next step.

STEP 4: Optimize a Native Executable with PGO and Run

Now optimize your native executable using Profile-Guided Optimization (PGO). It is a two-step process. First, build an instrumented version of the native executable and run it to trace its execution and collect a performance profile. When the execution finishes, it will generate a profile file, default.iprof, in the project’s root directory. Then build an optimized executable containing the profile data about the benchmark, and run it.

The Profile Guided Optimization (PGO) feature is available with GraalVM Enterprise Edition.

  1. Build an instrumented native executable by passing the -Pinstrumented Maven profile:

    mvn package -Pinstrumented
    

    It generates a binary in the target directory, called benchmark-binary-tree-instr.

  2. Run it to collect the code-execution-frequency profiles:

    ./target/benchmark-binary-tree-instr
    

    Profiles collected from this run are stored in the default.iprof file in the current working directory, if nothing else is specified.

    Note: You can specify where to collect the profiles when running an instrumented native executable by passing the -XX:ProfilesDumpFile=YourFileName option at runtime. You can also collect multiple profile files, by specifying different names, and pass them to the native-image tool at build time.

  3. Now that you have generated the profile file, build the optimized version:

    mvn package -Poptimized
    

    It generates an optimized binary in the target directory, called benchmark-binary-tree-opt.

  4. Finally, run the optimized native executable:

    ./target/benchmark-binary-tree-opt
    

These are the results obtained with GraalVM Enterprise Native Image 22.2.0 on the host machine:

Benchmark          (binaryTreesN)   Mode  Cnt    Score   Error  Units
BinaryTrees.bench              14  thrpt    3  223.241 ± 3.578  ops/s

The average score of operations per second increased from 180 running as a Java application to 223 running as an optimized native executable. The results will vary depending on the hardware you run the same benchmark on.

Summary

This lab showed how you can optimize native executables with Profile-Guided Optimisation (PGO) to get higher throughput comparing to the Java version while still preserving other benefits: instantaneous startup, lower CPU and memory usage. With PGO you can “train” your application for specific workloads and transform it into an optimized binary without sacrificing any performance.

Learn More

Congratulations! You have successfully completed this lab.

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.