Run LLVM Bitcode with GraalVM Enterprise

Oracle GraalVM Enterprise Edition ships with the LLVM runtime. It provides an implementation of the lli tool to directly execute programs from LLVM bitcode. The LLVM runtime is written in Java on top of the Truffle Language Implementation framework. In contrast to static compilation that is normally used for LLVM based languages, LLI first interprets the bitcode and then dynamically compiles the hot parts of the program using the GraalVM compiler. You can execute C/C++, Rust and any programming language that can be compiled to LLVM bitcode by the LLVM front end such as clang.

Note: LLVM bitcode is platform dependent. The program must be compiled to bitcode for the appropriate platform.

To run programs in the LLVM bitcode format:

lli [LLI option] filename.bc [program args]

Where filename.bc is a single program source file in LLVM bitcode format. Mandatory arguments to long options are mandatory for short options too.

GraalVM and Polyglot options are to be placed after LLI options, but before the bitcode file name:

lli [LLI Options] [GraalVM Options] [Polyglot Options] filename.bc [program args]

Compiling to LLVM Bitcode and Running It

GraalVM can execute C/C++, Rust, and other languages that can be compiled to LLVM bitcode. As a first step, you have to compile the program to LLVM bitcode using the LLVM frontend such as clang. C/C++ code can be compiled to LLVM bitcode using the clang shipped with GraalVM.

To install a pre-built LLVM toolchain for GraalVM Enterprise, download the GraalVM LLVM Toolchain Plugin from the Oracle Technology Network page in consideration of the operating system and the underlying Java SE version. Having downloaded the component jar, install it with:

gu -L install component.jar
export LLVM_TOOLCHAIN=$(lli --print-toolchain-path)

Here is some example C code named hello.c:

#include <stdio.h>

int main() {
    printf("Hello from GraalVM!\n");
    return 0;
}

You can compile hello.c to an executable with embedded LLVM bitcode as follows:

$LLVM_TOOLCHAIN/clang hello.c -o hello

You can then run hello on GraalVM Enterprise like this:

lli hello
Hello from GraalVM!

External library dependencies

If the bitcode file depends on external libraries, GraalVM Enterprise will automatically pick up the dependencies from the binary headers.

For example:

#include <unistd.h>
#include <ncurses.h>

int main() {
    initscr();
    printw("Hello, Curses!");
    refresh();
    sleep(1);
    endwin();
    return 0;
}

This can be run with:

$LLVM_TOOLCHAIN/clang hello-curses.c -lncurses -o hello-curses
lli hello-curses

Running C++

For running C++ code, the GraalVM LLVM runtime requires the libc++ standard library from the LLVM project. The LLVM toolchain shipped with GraalVM automatically links against libc++.

#include <iostream>

int main() {
    std::cout << "Hello, C++ World!" << std::endl;
}

Compile the code with clang++:

$LLVM_TOOLCHAIN/clang++ hello-c++.cpp -o hello-c++
lli hello-c++
Hello, C++ World!

Running Rust

The LLVM toolchain bundled for GraalVM Enterprise does not come with the Rust compiler. To install Rust, run the following in your terminal, then follow the onscreen instructions:

curl https://sh.rustup.rs -sSf | sh

Here is an example Rust program:

fn main() {
    println!("Hello Rust!");
}

This can be compiled to bitcode with the --emit=llvm-bc flag:

rustc --emit=llvm-bc hello-rust.rs

To run the Rust program, inform GraalVM where to find the Rust standard libraries.

lli --lib $(rustc --print sysroot)/lib/libstd-* hello-rust.bc
Hello Rust!

Interoperability

GraalVM Enterprise supports several other programming languages, including JavaScript, Python, Ruby, and R. While LLI is designed to run LLVM bitcode, it also provides an API for programming language interoperability that lets you execute code from any other language that GraalVM supports.

Dynamic languages like JavaScript usually access object members by name. Since normally names are not preserved in LLVM bitcode, it must be compiled with debug info enabled (the LLVM toolchain shipped with GraalVM will automatically enable debug info).

The following example demonstrates how you can use the API for interoperability with other programming languages.

Let us define a C struct for points and implement allocation functions:

// cpart.c
#include <polyglot.h>

#include <stdlib.h>
#include <stdio.h>

struct Point {
    double x;
    double y;
};

POLYGLOT_DECLARE_STRUCT(Point)

void *allocNativePoint() {
    struct Point *ret = malloc(sizeof(*ret));
    return polyglot_from_Point(ret);
}

void *allocNativePointArray(int length) {
    struct Point *ret = calloc(length, sizeof(*ret));
    return polyglot_from_Point_array(ret, length);
}

void freeNativePoint(struct Point *p) {
    free(p);
}

void printPoint(struct Point *p) {
    printf("Point<%f,%f>\n", p->x, p->y);
}

Make sure LLVM_TOOLCHAIN resolves to the GraalVM LLVM toolchain (lli --print-toolchain-path), then compile cpart.c with (the polyglot-mock library defines the polyglot API functions used in the example):

$LLVM_TOOLCHAIN/clang -shared cpart.c -lpolyglot-mock -o cpart.so

You can access your C/C++ code from other languages like JavaScript:

// jspart.js

// Load and parse the LLVM bitcode into GraalVM
var cpart = Polyglot.evalFile("llvm" ,"cpart.so");

// Allocate a light-weight C struct
var point = cpart.allocNativePoint();

// Access it as if it were a JS object
point.x = 5;
point.y = 7;

// Pass it back to a native function
cpart.printPoint(point);

// We can also allocate an array of structs
var pointArray = cpart.allocNativePointArray(15);

// We can access this array like it was a JS array
for (var i = 0; i < pointArray.length; i++) {
    var p = pointArray[i];
    p.x = i;
    p.y = 2*i;
}

cpart.printPoint(pointArray[3]);

// We can also pass a JS object to a native function
cpart.printPoint({x: 17, y: 42});

// Don't forget to free the unmanaged data objects
cpart.freeNativePoint(point);
cpart.freeNativePoint(pointArray);

Run this JavaScript file with:

js --polyglot jspart.js
Point<5.000000,7.000000>
Point<3.000000,6.000000>
Point<17.000000,42.000000>

Polyglot C API

There are also lower level API functions for directly accessing polyglot values from C. See the Polyglot Reference and the documentation comments in polyglot.h for more details.

For example, this program allocates and accesses a Java array from C:

#include <stdio.h>
#include <polyglot.h>

int main() {
    void *arrayType = polyglot_java_type("int[]");
    void *array = polyglot_new_instance(arrayType, 4);
    polyglot_set_array_element(array, 2, 24);
    int element = polyglot_as_i32(polyglot_get_array_element(array, 2));
    printf("%d\n", element);
    return element;
}

Compile it to LLVM bitcode:

$LLVM_TOOLCHAIN/clang polyglot.c -lpolyglot-mock -o polyglot

And run it, using the --jvm argument to run in the JVM mode, since you are using a Java type:

lli --jvm polyglot
24

Embedding in Java

GraalVM can also be used to embed LLVM bitcode in Java host programs.

For example, let us write a Java class Polyglot.java that embeds GraalVM to run the previous example:

import java.io.*;
import org.graalvm.polyglot.*;

class Polyglot {
    public static void main(String[] args) throws IOException {
        Context polyglot = Context.newBuilder().
        		               allowAllAccess(true).build();
        File file = new File("polyglot");
        Source source = Source.newBuilder("llvm", file).build();
        Value cpart = polyglot.eval(source);
        cpart.execute();
    }
}

Compiling and running it:

javac Polyglot.java
java Polyglot
24

See the Embedding documentation for more information.

Source-Level Debugging

You can use GraalVM’s Debugger to debug the program you compiled to LLVM bitcode. To use this feature, please make sure to compile your program with debug information by specifying the -g argument when compiling with clang (the LLVM toolchain shipped with GraalVM will automatically enable debug info). This gives you the ability to step through the program’s source code and set breakpoints in it.

The option --llvm.enableLVI=true is needed for being able to inspect variables during debugging. This option is not enabled by default as it decreases the program’s run-time performance.

LLVM Compatibility

GraalVM works with LLVM bitcode versions 3.8 to 9.0. It is recommend to use the version of LLVM that is shipped with GraalVM Enterprise.

Optimization Flags

In contrast to the static compilation model of LLVM languages, in GraalVM Enterprise the machine code is not directly produced from the LLVM bitcode, but there is an additional dynamic compilation step by the GraalVM compiler.

In this scenario, first the LLVM frontend (e.g., clang) does optimizations on the bitcode level, and then the GraalVM compiler does its own optimizations on top of that during dynamic compilation. Some optimizations are better when done ahead-of-time on the bitcode, while other optimizations are better left for the dynamic compilation of the GraalVM compiler, when profiling information is available.

The LLVM toolchain that is shipped with GraalVM automatically selects the recommended flags by default.

In principle, all optimization levels should work, but for best results compile the bitcode with optimization level -O1.

Cross-language interoperability will only work when the bitcode is compiled with debug information enabled (-g), and the -mem2reg optimization is performed on the bitcode (compiled with at least -O1, or explicitly using the opt tool).

Command Line Options

LLI Options

Option Description
-L /--llvm.libraryPath= Add a list of paths where GraalVM will search for library dependencies. Paths are delimited by :.
–lib /--llvm.libraries= Add a list of libraries to load. The list can contain precompiled native libraries (.so/.dylib) and bitcode libraries (.bc). Files with a relative path are looked up relative to llvm.libraryPath. Entries are delimited by :.
--llvm.libraries= List of libraries (precompiled libraries .dylib/.so as well as bitcode libraries .bc). Files with a relative path will be looked up relative to llvm.libraryPath. Libraries are delimited by :.
--llvm.libraryPath= A list of paths where the LLVM runtime will search for relative libraries. Paths are delimited by :.
--llvm.managed enable a managed execution mode for LLVM IR code, which means memory allocations from LLVM bitcode are done on the managed heap.
--llvm.stackSize= The stack size, please end the input with one of: k, m, g, or t. Note: the stack size will be in bytes if no appropriate suffix is given.

GraalVM Options

Option Description
--jvm Execute an application in the JVM mode.
–-vm.<option> Pass JVM options to GraalVM. List available JVM options with --jvm.help.
–-vm.Dgraal.<option> Pass settings to the GraalVM compiler. For example, --graal.DumpOnError=true sends the compiler intermediate representation (IR) to dump handlers if errors occur.
--engine.Mode=default Configure the execution mode of the engine. The execution mode automatically tunes the polyglot engine towards latency or throughput. throughput - collects the maximum amount of profiling information and compile using the maximum number of optimizations. This mode results in slower application startup but better throughput. This mode uses the compiler configuration community or enterprise if not specified otherwise. default - uses a balanced engine configuration. This mode uses the compiler configuration community or enterprise if not specified otherwise. latency - collects only minimal profiling information and compile as fast as possible with less optimal generated code. This mode results in faster application startup but less optimal throughput. This mode uses the compiler configuration economy if not specified otherwise.
--engine.CompilerConfiguration=name Select the GraalVM compiler configuration to use for polyglot application code. If omitted, the compiler configuration with the highest auto-selection priority is used. To see the set of available configurations, supply the value help to this option. The current configurations and their semantics are: enterprise - produces highly optimized code with a possible trade-off to compilation time. This value is only available in GraalVM Enterprise. community - produces reasonably optimized code with a faster compilation time. economy - compiles as fast as possible with less optimal throughput of the generated code.
--engine.TraceCompilation=false Print an informational line to the console for each completed compilation.
--experimental-options Unlock experimental features that are behind flags. For example, to use --memtracer, the memory allocations profiling tool, run lli --experimental-options --memtracer program.bc. The option is experimental.

Polyglot options

Option Description
--polyglot Run with all other guest languages accessible.
--<languageID>.<property>=<value> Pass properties to guest languages through the GraalVM Polyglot SDK.

The --<languageID>.<property>=<value> syntax is to allow any language launcher to access the options of other GraalVM supported languages.

Limitations and Differences to Native Execution

LLVM code interpreted or compiled with the default configuration of GraalVM Community or Enterprise editions will not have the same characteristics as the same code interpreted or compiled in a managed environment, enabled with the --llvm.managed option on top of GraalVM Enterprise. The behavior of the lli interpreter tool used to directly execute programs in LLVM bitcode format differs between native and managed modes. The difference lies in safety guarantees and cross-language interoperability.

In the default configuration, cross-language interoperability requires bitcode to be compiled with the debug information enabled (-g), and the -mem2reg optimization is performed on the bitcode (compiled with at least -O1, or explicitly using the opt tool). These requirements can be overcome in a managed environment of GraalVM Enterprise that allows native code to participate in the polyglot programs, passing and receiving the data from any other supported language. In terms of security, the execution of native code in a managed environment passes with additional safety features: catching illegal pointer accesses, accessing arrays outside of the bounds, etc..

There are certain limitations and differences to the native execution depending on the GraalVM edition. Consider them respectively.

Limitations and Differences to Native Execution on Top of GraalVM Community

The LLVM interpreter in GraalVM Community Edition environment allows executing LLVM bitcode within a multilingual context. Even though it aspires to be a generic LLVM runtime, there are certain fundamental and/or implementational limitations that users need to be aware of.

The following restrictions and differences to native execution (i.e., bitcode compiled down to native code) exist when LLVM bitcode is executed with the LLVM interpreter on top of GraalVM Community:

Limitations and Differences to Managed Execution on Top of GraalVM Enterprise

A managed execution for LLVM intermediate representation code is GraalVM Enterprise Edition feature and can be enabled with --llvm.managed command line option. In the managed mode, GraalVM LLVM prevents access to unmanaged memory and uncontrolled calls to native code and operating system functionality. The allocations are performed in the managed Java heap, and accesses to the surrounding system are routed through proper Truffle API and Java API calls.

All the restrictions from the default native LLVM execution on GraalVM apply to the managed execution, but with the following differences/changes: