About Thin, Fat, Recursive, and Contended Locks in BEA JRockit

Let’s start with the easiest part: recursive locks. A recursive lock occurs in the following scenario:

synchronized(foo) {  // first time thread takes lock

  // ...

  synchronized(foo) {  // this time, the lock is taken recursively

    // ...

The recursive lock taking may also occur in a method call several levels down—it doesn’t matter. Recursive locks are not necessarily any sign of bad programming, at least not if the recursive lock taking is done by a separate method.

The good news is that recursive lock taking in JRockit is extremely fast. In fact, the cost to take a lock recursively is almost negligible. This is regardless if the lock was originally taken as a thin or a fat lock (explained in detail below).

Now let’s talk a bit about contention. Contention occurs whenever a thread tries to take a lock, and that lock is not available (that is, it is held by another thread). Let me be clear: contention always costs in terms of performance. The exact cost depends on many factors. I’ll get to some more details on the costs later on.

So if performance is an issue, you should strive to avoid contention. Unfortunately, in many cases it is not possible to avoid contention—if your application requires several threads to access a single, shared resource at the same time, contention is unavoidable. Some designs are better than others, though. Be careful that you don’t overuse synchronized-blocks. Minimize the code that has to be run while holding a highly-contended lock. Don't use a single lock to protect unrelated resources, if that lock proves to be easily contended.

In principle, that is all you can do as an application developer: design your program to avoid contention, if possible. There are some experimental flags to change some of the JRockit locking behavior, but I strongly discourage anyone from using these. The default values is carefully trimmed, and changing this is likely to result in worse, rather than better, performance.

Still, I understand if you’re curious to what JRockit is doing with your application. I’ll give some more details about the locking strategies in JRockit.

All objects in Java are potential locks (monitors). This potential is realized as an actual lock as soon as any thread enters a synchronized block on that object. When a lock is born in this way, it is a kind of lock that is known as a “thin lock.” A thin lock has the following characteristics:

It requires no extra memory—all information about the lock is stored in the object itself.
It is fast to take.
Other threads that try to take the lock cannot register themselves as contending.

The most costly part of taking a thin lock is a CAS (compare-and-swap) operation. It’s an atomic instruction, which means as far as CPU instructions goes, it is slow. Compared to other parts of locking (contention in general, and taking fat locks in specific), it is still very fast.

For locks that are mostly uncontended, thin locks are great. There is little overhead compared to no locking, which is good since a lot of Java code (especially in the class library) use lot of synchronization.

However, as soon as a lock becomes contended, the situation is not longer as obvious as to what is most efficient. If a lock is held for just a very short moment of time, and JRockit is running on a multi-CPU (SMP) machine, the best strategy is to “spin-lock.” This means, that the thread that wants the lock continuously checks if the lock is still taken, “spinning” in a tight loop. This of course means some performance loss: no actual user code is running, and the CPU is “wasting” time that could have been spent on other threads. Still, if the lock is released by the other threads after just a few cycles in the spin loop, this method is preferable. This is what’s meant by a “contended thin lock.”

If the lock is not going to be released very fast, using this method on contention would lead to bad performance. In that case, the lock is “inflated” to a “fat lock.” A fat lock has the following characteristics:

It requires a little extra memory, in terms of a separate list of threads wanting to acquire the lock.
It is relatively slow to take.
One (or more) threads can register as queueing for (blocking on) that lock.

A thread that encounters contention on a fat lock register itself as blocking on that lock, and goes to sleep. This means giving up the rest of its time quantum given to it by the OS. While this means that the CPU will be used for running real user code on another thread, the extra context switch is still expensive, compared to spin locking. When a thread does this, we have a “contended fat lock.”

When the last contending thread releases a fat lock, the lock normally remains fat. Taking a fat lock, even without contention, is more expensive than taking a thin lock (but less expensive than converting between fat or thin locks). If JRockit believes that the lock would benefit from being thin (basically, if the contention was pure “bad luck” and the lock normally is uncontended), it might “deflate” it to a thin lock again.

A special note regarding locks: if wait/notify/notifyAll is called on a lock, it will automatically inflate to a fat lock. A good advice (not only for this reason) is therefore not to use locking for notification with any additional locking schemes on a single object.

When to spin-lock on a thin lock (and how long), and when to inflate it to a fat lock on contention.
If and when to deflate a fat lock back to a thin lock.
If and when to skip on the fairness on a contended fat lock to improve performance.

These heuristics are dynamically adaptive, which means that they will automatically change to what’s best suited for the actual application that is being run.

Since the switch between thin and fat locks are done automatically by JRockit to the kind of lock that maximizes performance of the application, the relative difference in performance between thin and fat locks shouldn't really be of any concern to the user. It is impossible to give a general answer to this question anyhow, since it differs from system to system, depending on how many CPUs you have, what kind of CPUs, the performance on other parts of the system (memory, cache, etc.) and similar factors. In addition to this, it is also very hard to give a good answer to the question even for a specific system. Especially tricky is it to determine with any accuracy the time spent spinning on contended thin locks, since JRockit loops just a few machine instructions a few times before giving up, and profiling of this is likely to heavily influence the time, giving a skewed image of the performance.

To summarize: If you’re concerned about performance, and can change your application to avoid contention on a lock—then do so. If you can’t avoid contention, try to keep the code needed to run contended to a minimum. JRockit will then do whatever is in its power to run your application as fast as possible. Use the lock information provided by JRA as a hint: fat locks are likely to have been contended much or for a long time. Put your coding efforts into minimizing contention on them.

JRockit Runtime Analyzer User Guide

About Thin, Fat, Recursive, and Contended Locks in BEA JRockit