Pre-General Availability: 2017-05-23

5 Troubleshoot System Crashes

Information and guidance on some specific procedures for troubleshooting system crashes.

A crash, or fatal error, causes a process to terminate abnormally. There are various possible reasons for a crash. For example, a crash can occur due to a bug in the Java HotSpot VM, in a system library, in a Java SE library or an API, in application native code, or even in the operating system (OS). External factors, such as resource exhaustion in the OS can also cause a crash.

Crashes caused by bugs in the Java HotSpot VM or in the Java SE library code are rare. This chapter provides suggestions on how to examine a crash and work around some of the issues (if possible) until the cause of the bug is diagnosed and fixed.

In general the first step with any crash is to locate the fatal error log. This is a text file that the Java HotSpot VM generates in the event of a crash. See Fatal Error Log for an explanation of how to locate this file, as well as a detailed description of the file.

This chapter contains the following sections:

Determine Where the Crash Occurred

Examples which demonstrate how the error log can be used to find the cause of the crash, and suggests some tips for troubleshooting the problem depending on the cause.

The error log header indicates the type of error and the problematic frame, while the thread stack indicates the current thread and stack trace. See Header Format.

The following are possible causes for the crash.

Crash in Native Code

Analyze the crash dump file or core file to identify if the crash occurred in the native code or the Java Native Interface (JNI) library code.

If the fatal error log indicates the problematic frame to be a native library, there might be a bug in native code or the Java Native Interface (JNI) library code. The crash could of course be caused by something else, but analysis of the library and any core file or crash dump is a good starting place. Consider the extract in the following example from the header of a fatal error log.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x417789d7, pid=21139, tid=1024
#
# Java VM: Java HotSpot(TM) Server VM (6-beta2-b63 mixed mode)
# Problematic frame:
# C  [libApplication.so+0x9d7]

In this case a SIGSEGV occurred with a thread executing in the library libApplication.so.

In some cases a bug in a native library manifests itself as a crash in Java VM code. Consider the crash in the following example where a JavaThread fails while in the _thread_in_vm state (meaning that it is executing in Java VM code).

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x08083d77, pid=3700, tid=2896
#
# Java VM: Java HotSpot(TM) Client VM (1.5-internal mixed mode)
# Problematic frame:
# V  [jvm.dll+0x83d77]

---------------  T H R E A D  ---------------

Current thread (0x00036960):  JavaThread "main" [_thread_in_vm, id=2896]
 :
Stack: [0x00040000,0x00080000),  sp=0x0007f9f8,  free space=254k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x83d77]
C  [App.dll+0x1047]          <========= C/native frame
j  Test.foo()V+0
j  Test.main([Ljava/lang/String;)V+0
v  ~StubRoutines::call_stub
V  [jvm.dll+0x80f13]
V  [jvm.dll+0xd3842]
V  [jvm.dll+0x80de4]
V  [jvm.dll+0x87cd2]
C  [java.exe+0x14c0]
C  [java.exe+0x64cd]
C  [kernel32.dll+0x214c7]
 :

In this case, although the problematic frame is a VM frame, the thread stack shows that a native routine in App.dll has called into the VM (probably with JNI).

The first step to solving a crash in a native library is to investigate the source of the native library where the crash occurred.

  • If the native library is provided by your application, then investigate the source code of your native library. A significant number of issues with JNI code can be identified by running the application with the -Xcheck:jni option added to the command line. See The -Xcheck:jni Option.

  • If the native library has been provided by another vendor and is used by your application, then file a bug report against this third-party application and provide the fatal error log information.

  • If the native library where the crash occurred is part of the Java Runtime Environment (JRE) (for example awt.dll, net.dll, and so forth), then it is possible that you have encountered a library or API bug. If so, gather as much data as possible and submit a bug or report, indicating the library name. You can find JRE libraries in the jre/lib or jre/bin directories of the JRE distribution. See Submit a Bug Report.

You can troubleshoot a crash in a native application library by attaching the native debugger to the core file or crash dump, if it is available. Depending on the OS, the native debugger is dbx, gdb, or windbg. See Native Operating System Tools.

Crash in Compiled Code

Analyze the fatal error log to identify if the crash occurred in the compiled code.

If the fatal error log indicates that the crash occurred in compiled code, then it is possible that you have encountered a compiler bug that has resulted in incorrect code generation. You can recognize a crash in compiled code if the type of the problematic frame is J (meaning a compiled Java frame). The following example shows such a crash.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x0000002a99eb0c10, pid=6106, tid=278546
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0-beta-b51 mixed mode)
# Problematic frame:
# J  org.foobar.Scanner.body()V
#
:
Stack: [0x0000002aea560000,0x0000002aea660000),  sp=0x0000002aea65ddf0,
  free space=1015k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J  org.foobar.Scanner.body()V

[error occurred during error reporting, step 120, id 0xb]

Note:

A complete thread stack is not available. The output line "error occurred during error reporting" means that a problem arose trying to obtain the stack trace (this might indicate stack corruption).

It might be possible to temporarily work around the issue by switching the compiler (for example, by using the HotSpot Client VM instead of the HotSpot Server VM, or visa versa) or by excluding from compilation the method that provoked the crash. In this specific example it might not be possible to switch the compiler as it was taken from a 64-bit Server VM and hence it might not be feasible to switch to a 32-bit Client VM.

See Working Around Crashes in the HotSpot Compiler Thread or Compiled Code.

Crash in HotSpot Compiler Thread

Analyze the fatal error log to identify if the crash occurred in the HotSpot compiler thread.

If the fatal error log output shows that the current thread is JavaThread named CompilerThread0, CompilerThread1, or AdapterCompiler, then it is possible that you have encountered a compiler bug. In this case it might be necessary to temporarily work around the issue by switching the compiler (for example, by using the HotSpot Client VM instead of the HotSpot Server VM, or visa versa), or by excluding from compilation the method that provoked the crash.

See Working Around Crashes in the HotSpot Compiler Thread or Compiled Code.

Crash in VM Thread

Analyze the fatal error log to identify if the crash occurred in the VMThread.

If the fatal error log output shows that the current thread is VMThread, then look for the line containing VM_Operation in the THREAD section. VMThread is a special thread in the HotSpot VM. It performs special tasks in the VM such as garbage collection (GC). If the VM_Operation suggests that the operation is a GC, then it is possible that you have encountered an issue such as heap corruption.

Besides a GC issue, it could equally be something else (such as a compiler or runtime bug) that leaves object references in the heap in an inconsistent or incorrect state. In this case, collect as much information as possible about the environment and try possible workarounds. If the issue is related to GC, you might be able to temporarily work around the issue by changing the GC configuration.

See Working Around Crashes during Garbage Collection.

Crash Due to Stack Overflow

A stack overflow in Java language code will normally result in the offending thread throwing the java.lang.StackOverflowError exception.

On the other hand, C and C++ write past the end of the stack and provoke a stack overflow. This is a fatal error which causes the process to terminate.

In the HotSpot implementation, Java methods share stack frames with C/C++ native code, namely user native code and the virtual machine itself. Java methods generate code that checks whether stack space is available a fixed distance towards the end of the stack so that the native code can be called without exceeding the stack space. This distance towards the end of the stack is called "Shadow Pages". The size of the shadow pages is between 3 and 20 pages, depending on the platform. This distance is tunable, so that applications with native code needing more than the default distance can increase the shadow page size. The option to increase shadow pages is -XX:StackShadowPages=n, where n is greater than the default stack shadow pages for the platform.

If your application gets a segmentation fault without a core file or fatal error log file see Fatal Error Log, or a STACK_OVERFLOW_ERROR on Windows, or the message "An irrecoverable stack overflow has occurred", this indicates that the value of StackShadowPages was exceeded and more space is needed.

If you increase the value of StackShadowPages, you might also need to increase the default thread stack size using the -Xss parameter. Increasing the default thread stack size might decrease the number of threads that can be created, so be careful in choosing a value for the thread stack size. The thread stack size varies by platform from 256 KB to 1024 KB.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  EXCEPTION_STACK_OVERFLOW (0xc00000fd) at pc=0x10001011, pid=296, tid=2940
#
# Java VM: Java HotSpot(TM) Client VM (1.6-internal mixed mode, sharing)
# Problematic frame:
# C  [App.dll+0x1011]
#

---------------  T H R E A D  ---------------

Current thread (0x000367c0):  JavaThread "main" [_thread_in_native, id=2940]
:
Stack: [0x00040000,0x00080000),  sp=0x00041000,  free space=4k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [App.dll+0x1011]
C  [App.dll+0x1020]
C  [App.dll+0x1020]
:
C  [App.dll+0x1020]
C  [App.dll+0x1020]
...<more frames>...

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  Test.foo()V+0
j  Test.main([Ljava/lang/String;)V+0
v  ~StubRoutines::call_stub

You can interpret the following information from the above example.

  • The exception is EXCEPTION_STACK_OVERFLOW.

  • The thread state is _thread_in_native, which means that the thread is executing native or JNI code.

  • In the stack information, the free space is only 4 KB (a single page on a Windows system). In addition, the stack pointer (sp) is at 0x00041000, which is close to the end of the stack at 0x00040000.

  • The printout of the native frames shows that a recursive native function is the issue in this case. The output notation ...<more frames>... indicates that additional frames exist but were not printed. The output is limited to 100 frames.

Find a Workaround

Possible workarounds if a crash occurs with a critical application.

If a crash occurs with a critical application, and the crash appears to be caused by a bug in the HotSpot VM, then it might be desirable to quickly find a temporary workaround. If the crash occurs with an application that is deployed with the most recent release of the JDK, then the crash should always be reported to Oracle.

WARNING:

Even if a workaround in this section successfully eliminates a crash, the workaround is not a fix for the problem, but merely a temporary solution. Submit a support call or bug report with the original configuration that demonstrated the issue.

The following are three scenarios to find workarounds for system crashes.

Working Around Crashes in the HotSpot Compiler Thread or Compiled Code

Possible workarounds if the crash occurred in the hotspot compiler thread.

If the fatal error log indicates that the crash occurred in a compiler thread, then it is possible (but not always the case) that you have encountered a compiler bug. Similarly, if the crash is in compiled code then it is possible that the compiler has generated incorrect code.

In case of the HotSpot Client VM (-client option), the compiler thread appears in the error log as CompilerThread0. With the HotSpot Server VM there are multiple compiler threads and these appear in the error log file as CompilerThread0, CompilerThread1, and AdapterThread.

Since JDK 7u5 release, the HotSpot compiler is ignored by default. A command line option is available to simulate the old behavior, which is useful when multiple methods have been excluded. See notable bug fixes in JDK 7u5.

To exclude methods from being compiled by using a JVM flag instead of .hotspot_compile file, see -XX:CompileCommand in Advanced JIT Compiler Options. in the Java Platform, Standard Edition Tools Reference.

The following example shows a fragment of an error log for a compiler bug that was encountered and fixed during the development. The log file shows that the HotSpot Server VM is used and the crash occurred in CompilerThread1. In addition, the log file shows that the current CompileTask was the compilation of the java.lang.Thread.setPriority method.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
:
# Java VM: Java HotSpot(TM) Server VM (1.5-internal-debug mixed mode)
:
---------------  T H R E A D  ---------------

Current thread (0x001e9350): JavaThread "CompilerThread1" daemon [_thread_in_vm, id=20]

Stack: [0xb2500000,0xb2580000),  sp=0xb257e500,  free space=505k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc3b13c]
:

Current CompileTask:
opto: 11      java.lang.Thread.setPriority(I)V (53 bytes)

---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x00229930 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=21]
=>0x001e9350 JavaThread "CompilerThread1" daemon [_thread_in_vm, id=20]
 :

In this case there are two potential workarounds:

  • The brute force approach: change the configuration so that the application is run with the -client option to specify the HotSpot Client VM.

  • The subtle approach: assume that the bug only occurs during the compilation of the java.lang.Thread.setPriority method and exclude this method from compilation.

The first approach (to use the -client option) might be trivial to configure in some environments. In others, it might be more difficult if the configuration is complex or if the command line to configure the VM is not readily accessible. In general, switching from the HotSpot Server VM to the HotSpot Client VM also reduces the peak performance of an application. Depending on the environment, this might be acceptable until the actual issue is diagnosed and fixed.

The second approach (exclude the method from compilation) requires creating the file .hotspot_compiler in the working directory of the application. The following example shows this approach.

exclude java/lang/Thread setPriority

In general the format of this file is exclude class method, where class is the class (fully qualified with the package name) and method is the name of the method. Constructor methods are specified as <init> and static initializers are specified as <clinit>.

Note:

The .hotspot_compiler file is an unsupported interface. It is documented here solely for the purposes of troubleshooting and finding a temporary workaround.

Once the application is restarted, the compiler will not attempt to compile any of the methods excluded in the .hotspot_compiler file. In some cases this can provide temporary relief until the root cause of the crash is diagnosed and the bug is fixed.

In order to verify that the HotSpot VM correctly located and processed the .hotspot_compiler file that is shown in the above example from the second approach, look for the log information at runtime.

Note:

The file name separator is a dot, not a slash.

Working Around Crashes during Garbage Collection

Possible workaround if the crash occurs during garbage collection.

If a crash occurs during garbage collection (GC), then the fatal error log reports that a VM_Operation is in progress. For the purposes of this discussion, assume that the mostly concurrent GC (-XX:+UseConcMarkSweep) is not in use. The VM_Operation is shown in the THREAD section of the log and indicates one of the following situations:

  • Generation collection for allocation

  • Full generation collection

  • Parallel gc failed allocation

  • Parallel gc failed permanent allocation

  • Parallel gc system gc

Most likely the current thread reported in the log is the VMThread. This is the special thread used to execute special tasks in the HotSpot VM. The following example is a fragment of the fatal error log from a crash in the serial garbage collector.

---------------  T H R E A D  ---------------

Current thread (0x002cb720):  VMThread [id=3252]

siginfo: ExceptionCode=0xc0000005, reading address 0x00000000

Registers:
EAX=0x0000000a, EBX=0x00000001, ECX=0x00289530, EDX=0x00000000
ESP=0x02aefc2c, EBP=0x02aefc44, ESI=0x00289530, EDI=0x00289530
EIP=0x0806d17a, EFLAGS=0x00010246

Top of Stack: (sp=0x02aefc2c)
0x02aefc2c:   00289530 081641e8 00000001 0806e4b8
0x02aefc3c:   00000001 00000000 02aefc9c 0806e4c5
0x02aefc4c:   081641e8 081641c8 00000001 00289530
0x02aefc5c:   00000000 00000000 00000001 00000001
0x02aefc6c:   00000000 00000000 00000000 08072a9e
0x02aefc7c:   00000000 00000000 00000000 00035378
0x02aefc8c:   00035378 00280d88 00280d88 147fee00
0x02aefc9c:   02aefce8 0806e0f5 00000001 00289530
Instructions: (pc=0x0806d17a)
0x0806d16a:   15 08 83 3d c0 be 15 08 05 53 56 57 8b f1 75 0f
0x0806d17a:   0f be 05 00 00 00 00 83 c0 05 a3 c0 be 15 08 8b 

Stack: [0x02ab0000,0x02af0000),  sp=0x02aefc2c,  free space=255k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x6d17a]
V  [jvm.dll+0x6e4c5]
V  [jvm.dll+0x6e0f5]
V  [jvm.dll+0x71771]
V  [jvm.dll+0xfd1d3]
V  [jvm.dll+0x6cd99]
V  [jvm.dll+0x504bf]
V  [jvm.dll+0x6cf4b]
V  [jvm.dll+0x1175d5]
V  [jvm.dll+0x1170a0]
V  [jvm.dll+0x11728f]
V  [jvm.dll+0x116fd5]
C  [MSVCRT.dll+0x27fb8]
C  [kernel32.dll+0x1d33b]

VM_Operation (0x0373f71c): generation collection for allocation, mode:
 safepoint, requested by thread 0x02db7108

Note:

A crash during garbage collection does not imply a bug in the garbage collection implementation. It could also indicate a compiler or runtime bug, or some other issue.

You can try the following workarounds if you get a repeated crash during garbage collection:

  • Switch GC configuration. For example, if you are using the serial collector, try the throughput collector, or visa versa.

  • If you are using the HotSpot Server VM, try the HotSpot Client VM.

If you are not sure which garbage collector is in use, you can use the jmap utility on Oracle Solaris and Linux operating systems. See The jmap Utility to obtain the heap information from the core file, if the core file is available. In general if the GC configuration is not specified on the command line, then the serial collector will be used on Windows. On Oracle Solaris and Linux operating systems it depends on the machine configuration. If the machine has at least 2 GB of memory and has at least 2 CPUs, then the throughput collector (Parallel GC) will be used. For smaller machines the serial collector is the default. The option to select the serial collector is -XX:+UseSerialGC and the option to select the throughput collector is -XX:+UseParallelGC. If, as a workaround, you switch from the throughput collector to the serial collector, then you might experience some performance degradation on multi-processor systems. This might be acceptable until the root issue is diagnosed and resolved.

Working Around Crashes Caused by Class Data Sharing

Possible workaround if the crash is caused by class data sharing.

Class data sharing (CDS) was a new feature in J2SE 5.0. When the JRE is installed on 32-bit platforms using the Sun-provided installer, the installer loads a set of classes from the system JAR file into a private internal representation and dumps that representation to a file called a shared archive. When the VM is started, the shared archive is memory-mapped in. This saves on class loading and allows much of the metadata associated with the classes to be shared across multiple VM instances. In J2SE 5.0, CDS is enabled only when the HotSpot Client VM is used. In addition, sharing is supported only with the serial garbage collector.

The fatal error log prints the version string in the header of the log. If sharing is enabled, it is indicated by the text "sharing", as shown in the following example.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x08083d77, pid=3572, tid=784
#
# Java VM: Java HotSpot(TM) Client VM (1.5-internal mixed mode, sharing)
# Problematic frame:
# V  [jvm.dll+0x83d77]

CDS can be disabled by providing the -Xshare:off option on the command line. If the crash only occurs with sharing enabled, then it is possible that you have encountered a bug in this feature. In that case gather as much information as possible and submit a bug report.

Microsoft Visual C++ Version Considerations

The JDK software is built on Windows using Microsoft Visual Studio 2010 Professional for both 32-bit and 64-bit platforms.

If you experience a crash with a Java application and if you have native or JNI libraries that are compiled with a different release of the compiler, then you must consider compatibility issues between the runtimes. Specifically, your environment is supported only if you follow the Microsoft guidelines when dealing with multiple runtimes. For example, if you allocate memory using one runtime, then you must release it using the same runtime. Unpredictable behavior or crashes can arise if you release a resource using a different library than the one that allocated the resource.