Summary

This page describes adding support for Async Monitor Deflation to OpenJDK. The primary goal of this project is to reduce the time spent in safepoint cleanup operations.

RFE: 8153224 Monitor deflation prolong safepoints
         https://bugs.openjdk.java.net/browse/JDK-8153224

Full Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/

Inc Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/

Background

This patch for Async Monitor Deflation is based on Carsten Varming's

http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/

which has been ported to work with monitor lists. Monitor lists were optional via the '-XX:+MonitorInUseLists' option in JDK8, the option became default 'true' in JDK9, the option became deprecated in JDK10 via JDK-8180768, and the option became obsolete in JDK12 via JDK-8211384. Carsten's webrev is based on JDK10 so there was a bit of porting work needed to merge his code and/or algorithms with jdk/jdk.

Carsten also submitted a JEP back in the JDK10 time frame:

JDK-8183909 Concurrent Monitor Deflation

https://bugs.openjdk.java.net/browse/JDK-8183909

The OpenJDK JEP process has evolved a bit since JDK10 and a JEP is no longer required for a project that is well defined to be within one area of responsibility. Async Monitor Deflation is clearly defined to be in the JVM Runtime team's area of responsibility so it is likely that the JEP (JDK-8183909) will be withdrawn and the work will proceed via the RFE (JDK-8153224).

Introduction

The current idle monitor deflation mechanism executes at a safepoint during cleanup operations. Due to this execution environment, the current mechanism does not have to worry about interference from concurrently executing JavaThreads. Async Monitor Deflation uses the ServiceThread to deflate idle monitors so the new mechanism has to detect interference and adapt as appropriate. In other words, data races are natural part of Async Monitor Deflation and the algorithms have to detect the races and react without data loss or corruption.

Key Parts of the Algorithm

1) Deflation With Interference Detection

ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a three part prototcol:

  1. Setting a NULL owner field to DEFLATER_MARKER with cmpxchg() forces any contending thread through the slow path. A racing thread would be trying to set the owner field.
  2. Making a zero ref_count field a large negative value with cmpxchg() forces racing threads to retry. A racing thread would would be trying to increment the ref_count field.
  3. If the owner field is still equal to DEFLATER_MARKER, then we have won all the races and can deflate the monitor.

If we lose any of the races, the monitor cannot be deflated at this time.

Once we know it is safe to deflate the monitor (which is mostly field resetting and monitor list management), we have to restore the object's header. That's another racy operation that is described below in "Restoring the Header With Interference Detection".

2) Restoring the Header With Interference Detection

ObjectMonitor::install_displaced_markword_in_object() is the new piece of code that handles all the racy situations with restoring an object's header asynchronously. The function is called from two places (deflation and saving an ObjectMonitor* in an ObjectMonitorHandle). The restoration protocol for the object's header uses the mark bit along with the hash() value staying at zero to indicate that the object's header is being restored. Only one of the possible racing scenarios can win and the losing scenarios all adapt to the winning scenario's object header value.

3) Using "owner" or "ref_count" With Interference Detection

Various code paths have been updated to recognize an owner field equal to DEFLATER_MARKER or a negative ref_count field and those code paths will retry their operation. This is the shortest "Key Part" description, but don't be fooled. See "Gory Details" below.

An Example of ObjectMonitor Interference Detection

ObjectMonitor::save_om_ptr() is used to safely save an ObjectMonitor* in an ObjectMonitorHandle. ObjectSynchronizer::deflate_monitor_using_JT() is used to asynchronously deflate an idle monitor. save_om_ptr() and deflate_monitor_using_JT() can interfere with each other. The thread calling save_om_ptr() (T-save) is potentially racing with another JavaThread (T-deflate) so both threads have to check the results of the races.

Start of the Race

    T-save                  ObjectMonitor              T-deflate
----------------------  +-----------------------+  ----------------------------------------
save_om_ptr() {   | owner=NULL            | deflate_monitor_using_JT() {
   1> atomic inc ref_count | ref_count=0          | 1> cmpxchg(DEFLATER_MARKER, &owner, NULL)
                    +-----------------------+

Racing Threads

    T-save                  ObjectMonitor              T-deflate
    ---------------------- +-----------------------+  ------------------------------------------
    save_om_ptr() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
   1> atomic inc ref_count | ref_count=0          |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
      +-----------------------+  :
1> prev = cmpxchg(-max_jint, &ref_count, 0)

T-deflate Wins

    T-save                             ObjectMonitor              T-deflate
    --------------------------------- +-----------------------+  ------------------------------------------
    save_om_ptr() {   | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    atomic inc ref_count    | ref_count=-max_jint+1 |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> if (owner == DEFLATER_MARKER && +-----------------------+  :
    ref_count <= 0) {                 ||              prev = cmpxchg(-max_jint, &ref_count, 0)
        restore obj header             \/          1> if (prev == 0 &&
      atomic dec ref_count           +-----------------------+        owner == DEFLATER_MARKER) {
     2> return false to force retry    | owner=DEFLATER_MARKER | restore obj header
    }                                | ref_count=-max_jint |   2> finish the deflation
+-----------------------+ }

T-save Wins

    T-save                             ObjectMonitor              T-deflate
    --------------------------------- +-----------------------+  ------------------------------------------
    save_om_ptr() { | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    atomic inc ref_count    | ref_count=1          |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> if (owner == DEFLATER_MARKER && +-----------------------+  :
    ref_count <= 0) {              ||                prev = cmpxchg(-max_jint, &ref_count, 0)
    } else {        \/         1> if (prev == 0 &&
    save om_ptr in the          +-----------------------+         owner == DEFLATER_MARKER) {
ObjectMonitorHandle | owner=NULL | } else {
2> return true | ref_count=1          | cmpxchg(NULL, &owner, DEFLATER_MARKER)
+-----------------------+ 2> return

T-enter Wins By A-B-A

    T-enter                                       ObjectMonitor                T-deflate
    -------------------------------------------- +-------------------------+  ------------------------------------------
    ObjectMonitor::enter() { | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    <owner is contended>   | ref_count=1            |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> EnterI() {   +-------------------------+ 1> :
  if (owner == DEFLATER_MARKER && || 2> : <thread_stalls>
      cmpxchg(Self, &owner,                    \/ :
    DEFLATER_MARKER) +-------------------------+ :
== DEFLATER_MARKER) { | owner=Self/T-enter | :
// EnterI is done | ref_count=0 | : <thread_resumes>
return +-------------------------+ prev = cmpxchg(-max_jint, &ref_count, 0)
} || if (prev == 0 &&
} // enter() is done \/ 3> owner == DEFLATER_MARKER) {
~OMH: atomic dec ref_count +-------------------------+ } else {
2> : <does app work> | owner=Self/T-enter|NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
3> : | ref_count=-max_jint | atomic add max_jint to ref_count
exit() monitor +-------------------------+ 4> bailout on deflation
4> owner = NULL || }
\/
+-------------------------+
| owner=Self/T-enter|NULL |
| ref_count=0 |
+-------------------------+

An Example of Object Header Interference

After T-deflate has won the race for deflating an ObjectMonitor it has to restore the header in the associated object. Of course another thread can be trying to do something to the object's header at the same time. Isn't asynchronous work exciting?!?!

ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-save thread and a T-deflate thread:

Start of the Race

    T-save                                       object           T-deflate
    -------------------------------------------  +-------------+  --------------------------------------------
install_displaced_markword_in_object() { | mark=om_ptr |  install_displaced_markword_in_object() {
    dmw = header()                    +-------------+  dmw = header()
    if (!dmw->is_marked() &&                                     if (!dmw->is_marked() &&
      dmw->hash() == 0) {                                          dmw->hash() == 0) {
      create marked_dmw                    create marked_dmw
    dmw = cmpxchg(marked_dmw, &header, dmw)                      dmw = cmpxchg(marked_dmw, &header, dmw)
} }

T-deflate Wins First Race

    T-save                                       object            T-deflate
    -------------------------------------------  +-------------+   -------------------------------------------
    install_displaced_markword_in_object() {   | mark=om_ptr |  install_displaced_markword_in_object() {
     dmw = header()                    +-------------+  dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
         dmw->hash() == 0) {                                           dmw->hash() == 0) {
       create marked_dmw                                             create marked_dmw
       dmw = cmpxchg(marked_dmw, &header, dmw)                       dmw = cmpxchg(marked_dmw, &header, dmw)
     }                                                             }
     // dmw == marked_dmw here                                     // dmw == original dmw here
     if (dmw->is_marked())                                         if (dmw->is_marked())
      unmark dmw                                                    unmark dmw
    obj = object()                                                obj = object()
    obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)

T-save Wins First Race

    T-save                                       object            T-deflate
    -------------------------------------------  +-------------+   -------------------------------------------
    install_displaced_markword_in_object() {    | mark=om_ptr |  install_displaced_markword_in_object() {
    dmw = header()                    +-------------+  dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
         dmw->hash() == 0) {                                           dmw->hash() == 0) {
       create marked_dmw                                             create marked_dmw
       dmw = cmpxchg(marked_dmw, &header, dmw)                       dmw = cmpxchg(marked_dmw, &header, dmw)
    }                                                             }
    // dmw == original dmw here                                   // dmw == marked_dmw here
    if (dmw->is_marked())                                         if (dmw->is_marked())
       unmark dmw                                                    unmark dmw
    obj = object()                                                obj = object()
    obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)

Either Wins the Second Race

    T-save                                       object            T-deflate
    -------------------------------------------  +-------------+   -------------------------------------------
    install_displaced_markword_in_object() {   | mark=dmw    |  install_displaced_markword_in_object() {
     dmw = header()                   +-------------+  dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
         dmw->hash() == 0) {                                           dmw->hash() == 0) {
       create marked_dmw                                             create marked_dmw
       dmw = cmpxchg(marked_dmw, &header, dmw)                       dmw = cmpxchg(marked_dmw, &header, dmw)
     }                                                             }
     // dmw == ...                                    // dmw == ...
    if (dmw->is_marked())                                         if (dmw->is_marked())
       unmark dmw                                                    unmark dmw
     obj = object()                                                obj = object()
     obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)

Please notice that install_displaced_markword_in_object() does not do any retries on any code path:

Hashcodes and Object Header Interference

If we have a race between a T-deflate thread and a thread trying to get/set a hashcode (T-hash), then the race is between the ObjectMonitorHandle.save_om_ptr(obj, mark) call in T-hash and deflation protocol in T-deflate.

Start of the Race

    T-hash                  ObjectMonitor              T-deflate
    ----------------------  +-----------------------+  ----------------------------------------
    save_om_ptr() {         | owner=NULL            |  deflate_monitor_using_JT() {
      :                     | ref_count=0          | 1> cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> atomic inc ref_count  +-----------------------+

Racing Threads

    T-hash                  ObjectMonitor              T-deflate
    ----------------------  +-----------------------+  ------------------------------------------
    save_om_ptr() {         | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
      :   | ref_count=0   |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> atomic inc ref_count  +-----------------------+ if (contentions != 0 || waiters != 0) {
                              }
1> prev = cmpxchg(-max_jint, &ref_count, 0)

T-deflate Wins

If T-deflate wins the race, then T-hash will have to retry at most once.

    T-hash                      ObjectMonitor              T-deflate
    -------------------------  +-----------------------+  ------------------------------------------
    save_om_ptr() {           | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
   1> atomic inc ref_count    | ref_count=-max_jint |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   if (owner ==           +-----------------------+  if (contentions != 0 || waiters != 0) {
          DEFLATER_MARKER &&  ||   }
        ref_count <= 0) {              \/              prev = cmpxchg(-max_jint, &ref_count, 0)
        restore obj header +-----------------------+ 1> if (prev == 0 &&
     atomic dec ref_count  | owner=DEFLATER_MARKER |     owner == DEFLATER_MARKER) {
     2> return false to   | ref_count=-max_jint |    restore obj header
     cause a retry      +-----------------------+  2> finish the deflation
} }

T-hash Wins

If T-hash wins the race, then the ref_count will cause T-deflate to bail out on deflating the monitor.

Note: header is not mentioned in any of the previous sections for simplicity.

    T-hash                      ObjectMonitor              T-deflate
    -------------------------  +-----------------------+  ------------------------------------------
    save_om_ptr() {           | header=dmw_no_hash | deflate_monitor_using_JT() {
      atomic inc ref_count    | owner=DEFLATER_MARKER |   cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> if (owner ==            | ref_count=1      | if (contentions != 0 || waiters != 0) {
          DEFLATER_MARKER && +-----------------------+   }
         ref_count <= 0) {  ||  1> prev = cmpxchg(-max_jint, &ref_count, 0)
      } else {               \/              if (prev == 0 &&
   2> save om_ptr in the +-----------------------+ owner == DEFLATER_MARKER) {
       ObjectMonitorHandle | header=dmw_no_hash | } else {
        return true | owner=NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
      } | ref_count=1 | 2> bailout on deflation
    } +-----------------------+ }
    if save_om_ptr() { ||
      if no hash \/
      gen hash & merge +-----------------------+
   hash = hash(header) | header=dmw_hash |
   } | owner=NULL |
3> atomic dec ref_count | ref_count=1 |
return hash +-----------------------+

Please note that in Carsten's original prototype, there was another race in ObjectSynchronizer::FastHashCode() when the object's monitor had to be inflated. The setting of the hashcode in the ObjectMonitor's header/dmw could race with T-deflate. That race is resolved in this version by the use of an ObjectMonitorHandle in the call to ObjectSynchronizer::inflate(). The ObjectMonitor* returned by ObjectMonitorHandle.om_ptr() has a non-zero ref_count so no additional races with T-deflate are possible.

Lock-Free Monitor List Management

Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching to lock-free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the lock-free list management code has to have a number of special cases which will be described here.

The Simple Case

There is one simple case of lock-free list management with the Java Monitor subsystem so we'll start with that code as a way to introduce the lock-free concepts:

    L1:  while (true) {
    L2: PaddedObjectMonitor* cur = OrderAccess::load_acquire(&g_block_list);
    L3: OrderAccess::release_store(&new_blk[0]._next_om, cur);
    L4: if (Atomic::cmpxchg(new_blk, &g_block_list, cur) == cur) {
   L5: Atomic::add(_BLOCKSIZE - 1, &g_om_population);
    L6: break;
    L7: }
    L8:  }

What the above block of code does is:

The above block of code can be called by multiple threads in parallel and does not lose track of any blocks. Of course, the "does not lose track of any blocks" part is where all the details come in:

At the point that cmpxchg has published the new 'g_block_list' value, 'new_blk' is now first block in the list and the 0th element's next field is used to find the previous first block; all of the monitor list blocks are chained together via the next field in the block's 0th element. It is the use of cmpxchg to update 'g_block_list' and the checking of the return value from cmpxchg that insures that we don't lose track of any blocks.

This example is considered to be the "simple case" because we only prepend to the list (no deletes) and we only use:

to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'g_om_population' counter is considered to be just accounting (pun intended).

The concepts introduced here are:

Note: The above code snippet comes from ObjectSynchronizer::prepend_block_to_lists(); see that function for more complete context (and comments).

Housekeeping Parts of the Algorithm

The devil is in the details! Housekeeping or administrative stuff are usually detailed, but necessary.

Monitor Deflation Invocation Details


 JDK-8181859 Monitor deflation is not checked in cleanup path

   ((gMonitorPopulation - gMonitorFreeCount) / gMonitorPopulation) > NN%


(gMonitorPopulation - gMonitorFreeCount) > MonitorBound




Gory Details