Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update for CR14/v2.14/17-for-jdk15 rebased to jdk-15+25.

...

RFE: 8153224 Monitor deflation prolong safepoints
         https://bugs.openjdk.java.net/browse/JDK-8153224

Full Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/12-for-jdk1417-for-jdk15+24.v2.0915.full/

Inc Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/1217-for-jdk14jdk15+24.v2.0915.inc/

Background

This patch for Async Monitor Deflation is based on Carsten Varming's

...

The current idle monitor deflation mechanism executes at a safepoint during cleanup operations. Due to this execution environment, the current mechanism does not have to worry about interference from concurrently executing JavaThreads. Async Monitor Deflation uses the ServiceThread to deflate idle monitors so the new mechanism has to detect interference and adapt as appropriate. In other words, data races are natural part of Async Monitor Deflation and the algorithms have to detect the races and react without data loss or corruption.

Key Parts of the Algorithm

1) Deflation With Interference Detection

ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a three part prototcol:

Async Monitor Deflation is performed in two stages: stage one performs the two part protocol described in "Deflation With Interference Detection" below and moves the async deflated ObjectMonitors from an in-use list to a global wait list; the ServiceThread performs a handshake (or a safepoint) with all other JavaThreads after stage one is complete and that forces any racing threads to make forward progress; stage two moves the ObjectMonitors from the global wait list to the global free list. The special values that mark an ObjectMonitor as async deflated remain in their fields until the ObjectMonitor is moved from the global free list to a per-thread free list which is sometime after stage two has completed.

Key Parts of the Algorithm

1) Deflation With Interference Detection

ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a two part prototcol:

  1. Setting a NULL owner field to DEFLATER_MARKER with cmpxchg() forces any contending thread through the slow path. A racing thread would Setting a NULL owner field to DEFLATER_MARKER with cmpxchg() forces any contending thread through the slow path. A racing thread would be trying to set the owner field.
  2. Making a zero ref_count contentions field a large negative value with cmpxchg() forces racing threads to retry. A racing thread would would be trying to increment the ref_count field.If the owner field is still equal to DEFLATER_MARKER, then we have won all the races and can deflate the monitorcontentions field.

If we lose any of the races, the monitor cannot be deflated at this time.

Once we know it is safe to deflate the monitor (which is mostly field resetting and monitor list management), we have to restore the object's header. That's another racy operation that is described below in "Restoring the Header With Interference Detection".

The setting of the special values that mark an ObjectMonitor as async deflated and the restoration of the object's header comprise the first stage of Async Monitor Deflation.

2) Restoring the Header With Interference Detection

ObjectMonitor::install_displaced_markword_in_object() is the new piece of code that handles all the racy situations with restoring an object's header asynchronously. The function is called from two three places (deflation and saving an ObjectMonitor* in an ObjectMonitorHandle). The restoration protocol for the object's header uses the mark bit along with the hash() value staying at zero to indicate that the object's header is being restored, ObjectMonitor::enter(), and FastHashCode). Only one of the possible racing scenarios can win and the losing scenarios all adapt to the winning scenario's object header value.

3) Using "owner" or "

...

contentions" With Interference Detection

Various code paths have been updated to recognize an owner field equal to DEFLATER_MARKER or a negative ref_count contentions field and those code paths will retry their operation. This is the shortest "Key Part" description, but don't be fooled. See "Gory Details" below.

An Example of ObjectMonitor Interference Detection

ObjectMonitor::save_om_ptrenter() is used to safely save an ObjectMonitor* in an ObjectMonitorHandlecan change an idle monitor into a busy monitor. ObjectSynchronizer::deflate_monitor_using_JT() is used to asynchronously deflate an idle monitor. save_om_ptrenter() and deflate_monitor_using_JT() can interfere with each other. The thread calling save_om_ptrenter() (T-saveenter) is potentially racing with another JavaThread (T-deflate) so both threads have to check the results of the races.

Start of the Race

    T-save         enter                   ObjectMonitor              T-deflate
------------------------  +-----------------------+  ----------------------------------------
save_om_ptrenter() {   | owner=NULL            | deflate_monitor_using_JT() {
   1> atomic inc ref_countcontentions | ref_countcontentions=0              | 1> cmpxchg(DEFLATER_MARKER, &owner, NULL)
                    +-----------------------+
    • The data fields are at their starting values.
    • The "1>" markers are showing where each thread is at for the ObjectMonitor box:
      • T-deflate is about to execute cmpxchg().
      • T-save enter is about to increment the ref_countcontentions.

Racing Threads

    T-save           enter                   ObjectMonitor              T-deflate
    ------------------------ +-----------------------+  --------------------------------------------
    save_om_ptrenter() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
   1> atomic inc ref_countadd_to_contentions(1) | ref_countcontentions=0           cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
      +-----------------------+  :
1> prev = cmpxchg(&contentions, 0, -max_jint, &ref_count, 0)
    • T-deflate has executed cmpxchg() and set owner to DEFLATEDEFLATER_MARKER.
    • T-save enter still hasn't done anything yet
    • The "1>" markers are showing where each thread is at for the ObjectMonitor box:
      • T-save enter and T-deflate are racing to update the ref_count contentions field.

T-deflate Wins

    T-save                enter                            ObjectMonitor             ObjectMonitor                T-deflate
    ---------------------------------- +-------------------------+  --------------------------------------------
    save_om_ptrenter() {   | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    atomic inc ref_countadd_to_contentions(1)    | ref_countcontentions=-max_jint+1 |  cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
   1> if (owner == DEFLATER_MARKER &&is_being_async_deflated()) { +-------------------------+  :
    restore obj header ref_count <= 0) {                             ||              prev = cmpxchg(&contentions, 0, -max_jint, &ref_count, 0)
        restore obj headeradd_to_contentions(-1)                      \/                   1> if (prev == 0) &&
      atomic dec ref_count           {
     2> return false to force retry  +-------------------------+             owner == DEFLATER_MARKER) {
     2> return false to force retry   restore obj header
     }     | owner=DEFLATER_MARKER | 2> finish restorethe obj headerdeflation
                                    }                                | ref_countcontentions=-max_jint |   2> finish the deflation}
+-------------------------+ }
    • This diagram starts after "Racing Threads".
    • The "1>" markers are showing where each thread is at for that ObjectMonitor box:
      • T-save enter and T-deflate both observe owner == DEFLATER_MARKER and a negative ref_count contentions field.
    • T-save enter has lost the race: it restores the obj header (not shown) and decrements the ref_countcontentions.
    • T-deflate restores the obj header (not shown).
    • The "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-save enter returns false to cause the caller to retry.
    • T-deflate finishes the deflation.
    • Note: As of CR5/v2.05/8-for-jdk13, the owner == DEFLATER_MARKER value is allowed to linger until a deflated ObjectMonitor is reused for an enter operation. This narrows the C2 ObjectMonitor enter optimization race window with async deflation.

T-save Wins

T-enter Wins

    T-save          enter                             ObjectMonitor                T-deflate
    ---------------------------------- +-------------------------+  ---------------------------------------------
    save_om_ptrenter() { | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    atomic inc ref_count add_to_contentions(1)   | ref_countcontentions=1          |  cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
   1> if (owner == DEFLATER_MARKER &&is_being_async_deflated()) { +-------------------------+  :
    } ref_count <= 0) {              ||                prev = cmpxchg(-max_jint, &ref_count, 0)
    } else {                ||                 prev = cmpxchg(&contentions, 0, -max_jint)
   2> <continue contended enter>        \/         1> if (prev == 0) &&
   {
   save om_ptr in the           +-------------------------+         owner  == DEFLATER_MARKER)} else {
ObjectMonitorHandle | owner=NULL | } else {
try_set_owner_from(DEFLATER_MARKER, NULL)
2> return true | ref_countcontentions=1          | cmpxchg(NULL, &owner, DEFLATER_MARKER)2> return
+-------------------------+ 2> return
    • This diagram starts after "Racing Threads".
    • The "1>" markers are showing where each thread is at for the ObjectMonitor box:
      • T-save enter and T-deflate both observe a ref_count contentions field > 0.
    • T-save enter has won the race and it saves the ObjectMonitor* in the ObjectMonitorHandle (not shown)continues with the contended enter protocol.
    • T-deflate detects that it has lost the race (prev != 0) and bails out on deflating the ObjectMonitor:
      • Before bailing out T-deflate tries to restore the owner field to NULL if it is still DEFLATER_MARKER.
    • The "2>" markers are showing where each thread is at for that ObjectMonitor box.

T-save Complication with C2

Sorry in advance for the sudden deep dive into really gory C2 details, but this is related to a majority of save_om_ptr() so this is the right place to talk about the complication.

As of CR7/v2.07/10-for-jdk14, we have added C2 inc_om_ref_count() on X64 to implement the ref_count management parts of save_om_ptr():

    • inc_om_ref_count() does not implement the "restore obj header" part nor the "save om_ptr in the ObjectMonitorHandle" part mentioned in the previous two subsections.
    • inc_om_ref_count() is used by C2 fast_lock(), C2 fast_unlock() and C2 rtm_inflated_locking() on the LP64 X64 platform.
    • The v2.05 version of C2 fast_lock() has code to detect a deflated and recycled ObjectMonitor after acquiring ownership of the ObjectMonitor. The solution to the race was to drop ownership and take the slow enter path. We have spent a lot of time and energy analyzing this race and the solution to this race and have convinced ourselves that the solution introduces theoretical problems with succession. The proper solution is to switch to using inc_om_ref_count() to protect the ObjectMonitor* for the duration of C2 fast_lock().
    • Robbin wrote a new test called MoCrazy that is targeted at the C2 optimizations. This test revealed a race in the baseline C2 fast_unlock() where ownership was reacquired in order to ensure proper succession. So baseline C2 fast_unlock() had a similar version of the race that we thought we fixed in C2 fast_lock(). The proper solution is to switch to using inc_om_ref_count() to protect the ObjectMonitor* for the duration of C2 fast_unlock().
    • C2 rtm_inflated_locking() is similarly exposed to races with async deflation so inc_om_ref_count() is used to protect the ObjectMonitor* for the duration of C2 rtm_inflated_locking().

T-enter Wins By A-B-A

    T-enter                                       ObjectMonitor                T-deflate
    -------------------------------------------- +-------------------------+  ------------------------------------------
    ObjectMonitor::enter() { | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    <owner is contended>   | ref_count=1            |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> EnterI() {   +-------------------------+ 1> :
  if (owner == DEFLATER_MARKER && || 2> : <thread_stalls>
      cmpxchg(Self, &owner,                    \/ :
    DEFLATER_MARKER) +-------------------------+ :
== DEFLATER_MARKER) { | owner=Self/T-enter | :
// EnterI is done | ref_count=0 | : <thread_resumes>
return +-------------------------+ prev = cmpxchg(-max_jint, &ref_count, 0)
} || if (prev == 0 &&
} // enter() is done \/ 3> owner == DEFLATER_MARKER) {
~OMH: atomic dec ref_count +-------------------------+ } else {
2> : <does app work> | owner=Self/T-enter|NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
3> : | ref_count=-max_jint | atomic add max_jint to ref_count
exit() monitor +-------------------------+ 4> bailout on deflation
4> owner = NULL || }
\/
+-------------------------+
| owner=Self/T-enter|NULL |
| ref_count=0 |
+-------------------------+
    • T-deflate has executed cmpxchg() and set owner to DEFLATE_MARKER.
    • T-enter has called ObjectMonitor::enter() with "ref_count == 1", noticed that the owner is contended and is about to call ObjectMonitor::EnterI().
    • The first ObjectMonitor box is showing the fields at this point and the "1>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate stalls after setting the owner field to DEFLATER_MARKER.
    • T-enter calls EnterI() to do the contended enter work:
      • EnterI() observes owner == DEFLATER_MARKER and uses cmpxchg() to set the owner field to Self/T-enter.
      • T-enter owns the monitor, returns from EnterI(), and returns from enter().
      • The ObjectMonitorHandle destructor decrements the ref_count.
    • T-enter is now ready to do work that requires the monitor to be owned.
    • The second ObjectMonitor box is showing the fields at this point and the "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-enter is doing app work (but it also could have finished and exited the monitor).
    • T-deflate resumes, calls cmpxchg() to set the ref_count field to -max_jint, and passes the first part of the bailout expression because "prev == 0".
    • The third ObjectMonitor box is showing the fields at this point and the "3>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate performs the A-B-A check which observes that "owner != DEFLATE_MARKER" and bails out on deflation:
      • Depending on when T-deflate resumes after the stall, it will see "owner == T-enter" or "owner == NULL".
      • Both of those values will cause deflation to bailout so we have to conditionally undo work:
        • restore the owner field to NULL if it is still DEFLATER_MARKER (it's not DEFLATER_MARKER)
        • undo setting ref_count to -max_jint by atomically adding max_jint to ref_count which will restore ref_count to its proper value.
      • If the T-enter thread has managed to enter but not exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
        • NULL → DEFLATE_MARKER → Self/T-enter

      • so we really have A1-B-A2, but the A-B-A principal still holds.

      • If the T-enter thread has managed to enter and exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:

        • NULL → DEFLATE_MARKER → Self/T-enter  → NULL

      • so we really have A-B1-B2-A, but the A-B-A principal still holds.

    • T-enter finished doing app work and is about to exit the monitor (or it has already exited the monitor).

    • The fourth ObjectMonitor box is showing the fields at this point and the "4>" markers are showing where each thread is at for that ObjectMonitor box.

An Example of Object Header Interference

After T-deflate has won the race for deflating an ObjectMonitor it has to restore the header in the associated object. Of course another thread can be trying to do something to the object's header at the same time. Isn't asynchronous work exciting?!?!

ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-save thread and a T-deflate thread:

Start of the Race

    T-save                                       object           T-deflate
    -------------------------------------------  +-------------+  --------------------------------------------
install_displaced_markword_in_object() { | mark=om_ptr |  install_displaced_markword_in_object() {
    dmw = header()                    +-------------+  dmw = header()
    if (!dmw->is_marked() &&                                     if (!dmw->is_marked() &&
      dmw->hash() == 0) {                                          dmw->hash() == 0) {
      create marked_dmw                    create marked_dmw
    dmw = cmpxchg(marked_dmw, &header, dmw)                      dmw = cmpxchg(marked_dmw, &header, dmw)
} }
    • The data field (mark) is at its starting value.
    • 'dmw' and 'marked_dmw' are local copies in each thread.
    • T-save and T-deflate are both calling install_displaced_markword_in_object() at the same time.
    • Both threads are poised to call cmpxchg() at the same time.

T-deflate Wins First Race

    T-save                                       object            T-deflate
    -------------------------------------------  +-------------+   -------------------------------------------
    install_displaced_markword_in_object() {   | mark=om_ptr |  install_displaced_markword_in_object() {
     dmw = header()                    +-------------+  dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
         dmw->hash() == 0) {                                           dmw->hash() == 0) {
       create marked_dmw                                             create marked_dmw
       dmw = cmpxchg(marked_dmw, &header, dmw)                       dmw = cmpxchg(marked_dmw, &header, dmw)
     }                                                             }
     // dmw == marked_dmw here                                     // dmw == original dmw here
     if (dmw->is_marked())                                         if (dmw->is_marked())
      unmark dmw                                                    unmark dmw
    obj = object()                                                obj = object()
    obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)
    • The return value from cmpxchg() in each thread will be different.
    • Since T-deflate won the race, its 'dmw' variable contains the header/dmw from the ObjectMonitor.
    • Since T-save lost the race, its 'dmw' variable contains the 'marked_dmw' set by T-deflate.
      • T-save will unmark its 'dmw' variable.
    • Both threads are poised to call cas_set_mark() at the same time.

T-save Wins First Race

    T-save                                       object            T-deflate
    -------------------------------------------  +-------------+   -------------------------------------------
    install_displaced_markword_in_object() {    | mark=om_ptr |  install_displaced_markword_in_object() {
    dmw = header()                    +-------------+  dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
         dmw->hash() == 0) {                                           dmw->hash() == 0) {
       create marked_dmw                                             create marked_dmw
       dmw = cmpxchg(marked_dmw, &header, dmw)                       dmw = cmpxchg(marked_dmw, &header, dmw)
    }                                                             }
    // dmw == original dmw here                                   // dmw == marked_dmw here
    if (dmw->is_marked())                                         if (dmw->is_marked())
       unmark dmw                                                    unmark dmw
    obj = object()                                                obj = object()
    obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)
    • This diagram is the same as "T-deflate Wins First Race" except we've swapped the post cmpxchg() comments.
    • Since T-save won the race, its 'dmw' variable contains the header/dmw from the ObjectMonitor.
    • Since T-deflate lost the race, its 'dmw' variable contains the 'marked_dmw' set by T-save.
      • T-deflate will unmark its 'dmw' variable.
    • Both threads are poised to call cas_set_mark() at the same time.

Either Wins the Second Race

    • Note: The owner == DEFLATER_MARKER and contentions < 0 values that are set by T-deflate (stage one of async deflation) remain in place until after T-deflate does a handshake (or safepoint) operation with all JavaThreads. This handshake forces T-enter to make forward progress and see that the ObjectMonitor is being async deflated before T-enter checks in for the handshake.

T-enter Wins By Cancellation Via DEFLATER_MARKER Swap

    T-enter              T-save                                       object            T-deflate
    -------------------------------------------  +-------------+   -------------------------------------------
    install_displaced_markword_in_object() {   | mark=dmw    |  install_displaced_markword_in_object() {
     dmw = header()                   +-------------+  dmw = header()
if (!dmw->is_marked() && ObjectMonitor         if (!dmw->is_marked() &&
         dmw->hash() == 0) {                                           dmw->hash() == 0) {
       create marked_dmw                                             create marked_dmw
       dmw = cmpxchg(marked_dmw, &header, dmw)                       dmw = cmpxchg(marked_dmw, &header, dmw)
     }                                                             }
     // dmw == ...      T-deflate
    -------------------------------------------- +-------------------------+  --------------------------------------------
    ObjectMonitor::enter() {                                    // dmw| == ...
   owner=DEFLATER_MARKER if (dmw->is_marked())                                         if (dmw->is_marked())
       unmark dmw                                                    unmark dmw
     obj = object()                                                obj = object()
     obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)
    • It does not matter whether T-save or T-deflate won the cmpxchg() call so the comment does not say who won.
    • It does not matter whether T-save or T-deflate won the cas_set_mark() call; in this scenario both were trying to restore the same value.
    • The object's mark field has changed from 'om_ptr' → 'dmw'.

Please notice that install_displaced_markword_in_object() does not do any retries on any code path:

    • Instead the code adapts to being the loser in a cmpxchg() by unmarking its copy of the dmw.
    • In the second race, if a thread loses the cas_set_mark() race, there is also no need to retry because the object's header has been restored by the other thread.

Hashcodes and Object Header Interference

If we have a race between a T-deflate thread and a thread trying to get/set a hashcode (T-hash), then the race is between the ObjectMonitorHandle.save_om_ptr(obj, mark) call in T-hash and deflation protocol in T-deflate.

Start of the Race

    T-hash                  ObjectMonitor              T-deflate
    |  deflate_monitor_using_JT() {
    add_to_contentions(1)   | contentions=1           |  try_set_owner_from(NULL, DEFLATER_MARKER)
   1> EnterI() {   +-------------------------+ 1> :
  if (try_set_owner_from(DEFLATER_MARKER, || 2> : <thread_stalls>
      Self) == DEFLATER_MARKER) {               \/ :
    // Add marker for cancellation +-------------------------------------  ++ :
add_to_contentions(1) | owner=Self/T-enter | :
// EnterI is done | contentions=2 | : <thread_resumes>
return +-------------------------+  ----------------------------------------
    save_om_ptr() {         | owner=NULL            |  deflate_monitor_using_JT() {
      :                     | ref_count=0          | 1> cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> atomic inc ref_count  +-----------------------+
    • The data fields are at their starting values.
    • T-deflate is about to execute cmpxchg().
    • T-hash is about to increment ref_count.
    • The "1>" markers are showing where each thread is at for the ObjectMonitor box.

Racing Threads

    T-hash                  ObjectMonitor              T-deflate
    ----------------------  +--- prev = cmpxchg(&contentions, 0, -max_jint)
} || if (prev == 0) {
2> add_to_contentions(-1) \/ 3> } else {
} // enter() is done +--------------------+  ------------------------------------------
    save_om_ptr() {         | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
      : + if (try_set_owner_from(DEFLATER_MARKER,
: <does app work>   | ref_count=0   |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> atomic inc ref_count  +-----------------------+| owner=Self/T-enter|NULL | if (contentions != 0 || waitersNULL) != 0DEFLATER_MARKER) {
                            3> :   }
| contentions=1 | add_to_contentions(-1)
exit() monitor 1> prev = cmpxchg(-max_jint, &ref_count, 0)
    • T-deflate has set the owner field to DEFLATER_MARKER.
    • The "1>" markers are showing where each thread is at for the ObjectMonitor box:
      • T-deflate is about to execute cmpxchg().
      • T-save is about to increment the ref_count.

T-deflate Wins

If T-deflate wins the race, then T-hash will have to retry at most once.

    T-hash                      ObjectMonitor              T-deflate
    +-------------------------  +-----------------------+  ------------------------------------------
    save_om_ptr() {           | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
   1> atomic inc ref_count    | ref_count=-max_jint |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   if (owner ==           +-----------------------+  if (contentions != 0 || waiters != 0) {
          DEFLATER_MARKER && + }
4> owner = NULL || 4> bailout on deflation
\/ ||   }
        ref_count <= 0) {              \/              prev = cmpxchg(-max_jint, &ref_count, 0)
        restore obj header +-------------------------+ 1> if (prev == 0 &&
    
atomic dec ref_count  | owner=DEFLATER_MARKER |    Self/T-enter|NULL |
owner == DEFLATER_MARKER) {
     2> return false to   | ref_count=-max_jint   restore obj header
     cause a retry      +-----------------------+  2> finish the deflation
| contentions=0 } |
}+-------------------------+
    • T-deflate has set owner to DEFLATER_MARKER.
    • T-enter has called ObjectMonitor::enter(), noticed that the owner is contended, increments contentions, and is about to call ObjectMonitor::EnterI().
    • The first
    • T-deflate made it past the cmpxchg() of ref_count before T-hash incremented it.
    • T-deflate set the ref_count field to -max_jint and is about to make the last of the protocol checks.
    • The first ObjectMonitor box is showing the fields at this point and the "1>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate sees "prev == 0 && owner == DEFLATER_MARKER" so it knows that it has won the race.
    • T-deflate restores obj header (not shown).
    • T-hash increments the ref_count.
    • T-hash observes "owner == DEFLATER_MARKER && ref_count <= 0" so it restores obj header (not shown) and decrements ref_count.
    • The second ObjectMonitor box is showing the fields at this point and the "2>1>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate finishes the deflation workstalls after setting the owner field to DEFLATER_MARKER.
    • T-hash returns false to cause a retry and when T-hash retries:
      • it observes the restored object header (done by T-hash or T-deflate):
        • if the object's header does not have a hash, then generate a hash and merge it with the object's header.
        • Otherwise, extract the hash from the object's header and return it.

T-hash Wins

If T-hash wins the race, then the ref_count will cause T-deflate to bail out on deflating the monitor.

Note: header is not mentioned in any of the previous sections for simplicity.

    • enter calls EnterI() to do the contended enter work:
      • EnterI() sets the owner field from DEFLATER_MARKER to Self/T-enter.
      • EnterI() increments contentions one extra time since it cancelled async deflation via a DEFLATER_MARKER swap.
      • Note: The extra increment also makes the return value from is_being_async_deflated() stable; the previous A-B-A algorithm would allow the contentions field to flicker from 0 → -max_jint and back to zero. With the current algorithm, a negative contentions field value is a linearization point so once it is negative, we are committed to performing async deflation.
      • T-enter owns the monitor and returns from EnterI() (contentions still has both increments).
    • The second ObjectMonitor box is showing the fields at this point and the "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-enter decrements contentions and returns from enter() (contentions still has the extra increment).
    • T-enter is now ready to do work that requires the monitor to be owned.
    • T-enter is doing app work (but it also could have finished and exited the monitor and it still has the extra increment).
    • T-deflate resumes, tries to set the contentions field to -max_jint and fails because contentions == 1 (the extra increment comes into play!).
    • The third ObjectMonitor box is showing the fields at this point and the "3>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate tries to restore the owner field from DEFLATER_MARKER to NULL:
      • If it does not succeed, then the EnterI() call managed to cancel async deflation via a DEFLATER_MARKER swap so T-deflate decrements contentions to get rid of the extra increment that EnterI() did as a marker for this type of cancellation.
      • If it does succeed, then EnterI() did not cancel async deflation via a DEFLATER_MARKER swap and we don't have an extra increment to get rid of.
      • Note: For the previous bullet, async deflation is still cancelled because the ObjectMonitor is now busy with a contended enter.
    • T-enter finished doing app work and is about to exit the monitor (or it has already exited the monitor).

    • The fourth ObjectMonitor box is showing the fields at this point and the "4>" markers are showing where each thread is at for that ObjectMonitor box.

An Example of Object Header Interference

After T-deflate has won the race for deflating an ObjectMonitor it has to restore the header in the associated object. Of course another thread can be trying to do something to the object's header at the same time. Isn't asynchronous work exciting?!?!

ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-enter thread and a T-deflate thread:

Start of the Race

    T-enter                                          object           T-deflate
    -------------------------------------    T-hash                    ObjectMonitor              T-deflate
    -------------------------  +-----------------------+  ------------------------------------------
    save_om_ptr() {           | header=dmw_no_hash | deflate_monitor_using_JT() {
      atomic inc ref_count    | owner=DEFLATER_MARKER |   cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> if (owner ==            | ref_count=1      | if (contentions != 0 || waiters != 0) {
          DEFLATER_MARKER && +-----------------------+   }
         ref_count <= 0) {  ||  1> prev = cmpxchg(-max_jint, &ref_count, 0)
      } else {               \/              if (prev == 0 &&
   2> save om_ptr in the +----------  +-------------+ owner == DEFLATER_MARKER) {
       ObjectMonitorHandle | header=dmw_no_hash | } else {
        return true | owner=NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
      }   -----------------------------------------------
install_displaced_markword_in_object(oop obj) { | mark=om_ptr |  install_displaced_markword_in_object(oop obj) {
    dmw = header()                    +-------------+  dmw = | ref_count=1 | 2> bailout on deflation
    } +-----------------------+ }
    if save_om_ptr() {header()
      obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this) ||
      if no hash \/
      gen hash & merge +}
    • The data field (mark) is at its starting value.
    • 'dmw' is a local copy in each thread.
    • T-enter and T-deflate are both calling install_displaced_markword_in_object() at the same time.
    • Both threads are poised to call cas_set_mark() at the same time.

Either Thread Wins the Race

    T-enter                                          object            T-deflate
    -----------------------------------------------  +-------------+   -----------------------------------------------+
   hash = hash(header) | header=dmw_hash |
   } | owner=NULL |
3> atomic dec ref_count | ref_count=1 |
return hash     install_displaced_markword_in_object(oop obj) {  | mark=dmw    |  install_displaced_markword_in_object(oop obj) {
     dmw = header()                   +-----------------------+
    • T-deflate has set the owner field to DEFLATER_MARKER.
    • T-hash has incremented ref_count before T-deflate made it to cmpxchg().
    • The first ObjectMonitor box is showing the fields at this point and the "1>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate bails out on deflation, but first it tries to restore the owner field:
      • The return value of cmpxchg() is not checked here.
      • If T-deflate cannot restore the owner field to NULL, then another thread has managed to enter the monitor (or enter and exit the monitor) and we don't want to overwrite that information.
    • T-hash observes:
      • "owner == DEFLATER_MARKER && ref_count > 0" or
      • "owner == NULL && ref_count > 0" so it gets ready to save the ObjectMonitor*.
    • The second ObjectMonitor box is showing the fields at this point and the "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-hash saves the ObjectMonitor* in the ObjectMonitorHandle (not shown) and returns to the caller.
    • save_om_ptr() returns true since the ObjectMonitor is safe:
      • if ObjectMonitor's 'header/dmw' field does not have a hash, then generate a hash and merge it with the 'header/dmw' field.
      • Otherwise, extract the hash from the ObjectMonitor's 'header/dmw' field.
    • The third ObjectMonitor box is showing the fields at this point and the "3>" marker is showing where T-hash is at for that ObjectMonitor box.
    • T-hash decrements the ref_count field.
    • T-hash returns the hash value.

...

     dmw = header()
     obj->cas_set_mark(dmw, this)                                  obj->cas_set_mark(dmw, this)
    • It does not matter whether T-enter or T-deflate won the cas_set_mark() call; in this scenario both were trying to restore the same value.
    • The object's mark field has changed from 'om_ptr' → 'dmw'.

Please notice that install_displaced_markword_in_object() does not do any retries on any code path:

    • If a thread loses the cas_set_mark() race, there is no need to retry because the object's header has been restored by the other thread.

Hashcodes and Object Header Interference

There are a few races that can occur between a T-deflate thread and a thread trying to get/set a hashcode (T-hash) in an ObjectMonitor:

  1. If the object has an ObjectMonitor (i.e., is inflated) and if the ObjectMonitor has a hashcode, then the hashcode value can be carefully fetched from the ObjectMonitor and returned to the caller (T-hash). If there is a race with async deflation, then we have to retry.
  2. There are several reasons why we might have to inflate the ObjectMonitor in order to set the hashcode:
    1. The object is neutral, does not contain a hashcode and we (T-hash) lost the race to try an install a hashcode in the mark word.
    2. The object is stack locked and does not contain a hashcode in the mark word.
    3. The object has an ObjectMonitor and the ObjectMonitor does not have a hashcode.
      Note: In this case, the inflate() call on the common fall thru code path is almost always a no-op since the existing ObjectMonitor is not likely to be async deflated before inflate() sees that the object already has an ObjectMonitor and bails out.

The common fall thru code path (executed by T-hash) that inflates the ObjectMonitor in order to set the hashcode can race with an async deflation (T-deflate). After the hashcode has been stored in the ObjectMonitor, we (T-hash) check if the ObjectMonitor has been async deflated (by T-deflate). If it has, then we (T-hash) retry because we don't know if the hashcode was stored in the ObjectMonitor before the object's header was restored (by T-deflate). Retrying (by T-hash) will result in the hashcode being stored in either object's header or in the re-inflated ObjectMonitor's header as appropriate.

Spin-Lock Monitor List Management In Theory

Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching from course grained Thread::muxAcquire(&gListLock) and Thread::muxRelease(&gListLock) pairs to spin-lock monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the spin-lock list management code has to have a number of special cases which are described here.

The Spin-Lock Monitor List management code was pushed to JDK15 using the following bug id:

JDK-8235795 replace monitor list mux{Acquire,Release}(&gListLock) with spin locks

The Async Monitor Deflation project makes a few additional changes on top of what was pushed via JDK-8235795.

The Simple Case

There is one simple case of spin-lock list management with the Java Monitor subsystem so we'll start with that code as a way to introduce the spin-lock concepts:

     L1:    while (true) {
     L2:      PaddedObjectMonitor* cur = Atomic::load(&g_block_list);
     L3:      Atomic::store(&new_blk[0]._next_om, cur);
     L4:      if (Atomic::cmpxchg(&g_block_list, cur, new_blk) == cur) {
     L5:        Atomic::add(&LVarsom_list_globals.population, _BLOCKSIZE - 1);
     L6:        break;
     L7:      }
     L8:    }

...

    • prepends a 'new_blk' to the front of 'g_block_list'
    • increments the 'LVarsthe 'om_list_globals.population' counter to include the number of new elements

...

    • L2 loads the current 'g_block_list' value into 'cur'.
    • L3 stores 'cur' into the 0th element's next field for 'new_blk'.
    • L4 is the critical decision point for this list update. cmpxchg will change 'g_block_list' to 'new_blk' iff 'g_block_list' == 'cur' (publish it).
      • if the cmpxchg return value is 'cur', then we succeeded with the list update and we atomically update 'LVarsom_list_globals.population' to match.
      • Otherwise we loop around and do everything again from L2. This is the "spin" part of spin-lock. (smile)

...

to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'LVarsom_list_globals.population' counter is considered to be just accounting (pun intended).

...

Note: This subsection is talking about "Simple Take" and "Simple Prepend" in abstract terms. The purpose of this code and A-B-A example is to introduce the race concepts. The code shown here is not an exact match for the project code and the specific A-B-A example is not (currently) found in the project code.

...

The purpose of this subsection is to provide background information about how ObjectMonitors move between the various lists. This project changes the way these movements are implemented, but does not change the movements themselves. For example, newly allocated blocks of ObjectMonitors are always prepending to the global free list; this is true in the baseline and is true in this project. One exception is the optional addition of the global wait list (see below).

...

    • ObjectMonitors are deflated at a safepoint by:
          ObjectSynchronizer::deflate_monitor_list() calling ObjectSynchronizer::deflate_monitor()
      And when Async Monitor Deflation is enabled, they are deflated by:
          ObjectSynchronizer::deflate_monitor_list_using_JT() calling ObjectSynchronizer::deflate_monitor_using_JT()

    • Idle ObjectMonitors are deflated by the ServiceThread when Async Monitor Deflation is enabled. They can also be deflated at a safepoint by the VMThread or by a task worker thread. Safepoint deflation is used when Async Monitor Deflation is disabled or when there is a special deflation request, e.g., System.gc()..gc().

    • An idle ObjectMonitor is deflated and extracted from its in-use list and prepended to the global wait list. The in-use list can be either the global in-use list or a per-thread in-use list. Deflated ObjectMonitors are always prepended to the global wait list.

      • The om_list_globals.wait_list allows ObjectMonitors to be safely deflated without reuse races.
      • After a

      An idle ObjectMonitor is deflated and extracted from its in-use list and prepended to the global free list. The in-use list can be either the global in-use list or a per-thread in-use list. Deflated ObjectMonitors are always prepended to the global free list.

      • In CR7/v2.07/10-for-jdk14, the HandshakeAfterDeflateIdleMonitors diagnostic option is added to enable a new g_wait_list that tracks deflated ObjectMonitors until after a handshake/safepoint with all JavaThreads; in CR9/v2.09/12-for-jdk14, g_wait_list was renamed to LVars.wait_list.
      • The LVars.wait_list allows ObjectMonitors to be safely deflated on platforms that do not have C2 inc_om_ref_count() implemented. See the "T-save Complication with C2" subsection above for the gory C2 details.
      • So when the option is enabled, idle ObjectMonitors are deflated and extracted from an in-use list and prepended to LVars.wait_list; after the handshake/safepoint with all JavaThreads, the ObjectMonitors on the LVarsom_list_globals.wait_list are prepended to the global free list.

...

    • global free list:
      • prepended to by JavaThreads that allocated a new block of ObjectMonitors (malloc time)
      • prepended to by JavaThreads that are exiting (and have a non-empty per-thread free list)
      • taken from the head by JavaThreads that need to allocate ObjectMonitor(s) for their per-thread free list (reprovision)
      • prepended to by deflation done by:
        • either the VMThread or a worker thread for safepoint based
        • or the ServiceThread for async monitor deflation
    • global in-use list:
      • prepended to by JavaThreads that are exiting (and have a non-empty per-thread free list)
      • extracted from by deflation done by:
        • either the VMThread or a worker thread for safepoint based
        • or the ServiceThread for async monitor deflation
    • global wait list:only used when HandshakeAfterDeflateIdleMonitors == true
      • prepended by the ServiceThread during async deflation
      • entire list detached and prepended to the global free list by the ServiceThread during async deflation
      • Note: The global wait list serves the same function as Carsten's gFreeListNextSafepoint list in his prototype.
    • per-thread free list:
      • prepended to by a JavaThread when it needs to allocate new ObjectMonitor(s) (reprovision)
      • taken from the head by a JavaThread when it needs to allocate a new ObjectMonitor (inflation)
      • prepended to by a JavaThread when it isn't able to link the object to the ObjectMonitor (failed inflation)
      • entire list detached and prepended to the global free list when the JavaThread is exiting
    • per-thread in-use list:
      • prepended to by a JavaThread when it allocates a new ObjectMonitor (inflation, optimistically in-use)
      • extracted from by deflation done by:
        • either the VMThread or a worker thread for safepoint based
        • or the ServiceThread for async monitor deflation
      • entire list detached and prepended to the global in-use list when the JavaThread is exiting

...

    L01:    while (true) {
    L02:      om_lock(m);  // Lock m so we can safely update its next field.
    L03:      ObjectMonitor* cur = NULL;
    L04:      // Lock the list head to guard against A-B-A race:
    L05:      if ((cur = get_list_head_locked(list_p)) != NULL) {
    L06:        // List head is now locked so we can safely switch it.
    L07:        setm->set_next_om(m, cur);  // m now points to cur (and unlocks m)
    L08:        Atomic::store(list_p, m);  // Switch list head to unlocked m.
    L09:        om_unlock(cur);
    L10:        break;
    L11:      }
    L12:      // The list is empty so try to set the list head.
    L13:      assert(cur == NULL, "cur must be NULL: cur=" INTPTR_FORMAT, p2i(cur));
    L14:      setm->set_next_om(m, cur);  // m now points to NULL (and unlocks m)
    L15:      if (Atomic::cmpxchg(list_p, cur, m) == cur) {
    L16:        // List head is now unlocked m.
    L17:        break;
    L18:      }
    L19:      // Implied else: try it all again
    L20:    }
    L21:    Atomic::inc(count_p);

...

ObjectMonitor 'm' is safely on the list at the point that we have updated 'list_p' to refer to 'm'. In this subsection's block of code, we also called three new functions: om_lock(), get_list_head_locked() and set_next_om(), that are explained in the next few subsections about helper functions.

Note: The above code snippet comes from prepend_to_common(); see that function for more context and a few more comments.

try_om_lock(), mark_om_ptr(), and set_next_om() Helper Functions

Managing spin-locks on ObjectMonitors has been abstracted into a few helper functions. try_om_lock() is the first interesting one:

    L1:  static bool try_om_lock(ObjectMonitor* om) {
    L2:    // Get current next field without any OM_LOCK_BIT value.
    L3:    ObjectMonitor* next = (ObjectMonitor*)((intptr_t)Atomic::load(&om->_next_om) & ~OM_LOCK_BITunmarked_next(om);
    L4:    if (Atomic::cmpxchg(&om->>try_set_next_om, (next, mark_om_ptr(next)) != next) {
    L5:      return false;  // Cannot lock the ObjectMonitor.
    L6:    }
    L7:    return true;
    L8:  }

...

    • L2 casts the ObjectMonitor* into a type that will allow the '|' operator to be used.
    • We use the 0x1 (OM_LOCK_BIT) bit as our locking value because ObjectMonitors are aligned on a cache line so the low order bit is not used by the normal addressing of an ObjectMonitor*.

set_next_om() is the next interesting function and it also only needs a quick explanation:

    L1:  staticinline void ObjectMonitor::set_next_om(ObjectMonitor* om, ObjectMonitor* value) {
    L2:    Atomic::store(&om->_next_om, value);
    L3:  }

...

    • This function is simply a wrapper around a store of an ObjectMonitor* into the next field in an ObjectMonitor.
    • The typical "setcur->set_next_om(cur, next)" call sequence is easier to read than "OrderAccess::release_store(&cur→_next_om, next)".

...

    L01:    if (from_per_thread_alloc) {
    L02:      if ((mid = get_list_head_locked(&self->om_in_use_list)) == NULL) {
    L03:        fatal("thread=" INTPTR_FORMAT " in-use list must not be empty.", p2i(self));
    L04:      }
    L05:      next = unmarked_next(mid);
    L06:      while (true) {
    L07:        if (m == mid) {
    L08:          if (cur_mid_in_use == NULL) {
    L09:           L07:        Atomic::store(&self->om_in_use_list, next);
    L10L08:               } else if (m == next) {
    L09:        mid = next;
    L10:        om_lock(mid);
    L11:            set        next = unmarked_next(cur_mid_in_use, next);
    L12:          }        self->om_in_use_list->set_next_om(next);
    L13:               extracted} =else true;{
    L14:          Atomic::dec(&self->om_in_use_count)        ObjectMonitor* anchor = next;
    L15:                 om_unlocklock(midanchor);
    L16:          break        om_unlock(mid);
    L17:        } while ((mid = unmarked_next(anchor)) != NULL) {
    L18:        if (cur_mid_in_use !m == NULLmid) {
    L19:          om_unlock(cur_mid_in_use next = unmarked_next(mid);
    L20:        } anchor->set_next_om(next);
    L21:        cur_mid_in_use = mid break;
    L22:        mid} =else next;{
    L23:        if om_lock(mid == NULL) {);
    L24:          fatal("must find m=" INTPTR_FORMAT "on om_in_use_list=" INTPTR_FORMAT,
    L25:                p2i(m), p2i(om_unlock(anchor);
    L25:            anchor = mid;
    L26:        }
    L27:        }
    L28:      }
    L29:      Atomic::dec(&self->om_in_use_listcount));
    L26L30:           }
    L27:        om_lockunlock(mid);
    L28:        next = unmarked_next(mid);
    L29:      }
    L30:    L31: }
    L31L32:    prepend_to_om_free_list(self, m);

...

    • L02 is used to lock self's in-use list head:
      • 'mid' is self's in-use list head and it is locked.
    • L05 'next' is the unmarked next field from 'mid'.
    • L06 → L28→ L07: handle first special case where the target ObjectMonitor 'm' matches the list head.
    • L08 → L12: handle second special case where the target ObjectMonitor 'm' matches next after the list head.
    • L14 → L30: self's in-use list is traversed looking for the target ObjectMonitor 'm':
      • L07L18: if the current 'mid' matches 'm':
        • L08: if cur_mid_in_use is NULL, we're still processing the head of the thread's in-use list so...
          • L09: we store 'next' into the list head.
        • else
          • L11: we set cur_mid_in_use's next field to 'next'.
        • L13 → L16
        • L19: get the next after 'm'
        • L20: update the anchor to refer to the next after 'm'
        • L21: break out since we found a match
      • else
        • L23: lock the current 'mid'
        • L24-5: unlock the current anchor and advance to the new anchor
        • loop around and try again
    • L29 → L30: we've successfully extracted 'm' from self's in-use list so we decrement self's in-use counter, unlock 'mid' and we're done.L1[89]: if cur_mid_in_use != NULL, then unlock cur_mid_in_use.
    • L21: set 'cur_mid_in_use' to 'mid'
      Note: cur_mid_in_use keeps the locked 'mid' so that it remains stable for a possible next field change. It cannot be deflated while it is locked.
    • L22: set 'mid' to 'next'.
    • L2[78]: lock the new 'mid' and update 'next'; loop around and do it all againand we're done.

The last line of the code block (L31L32) prepends 'm' to self's free list.

...

    L01:  int ObjectSynchronizer::deflate_monitor_list(ObjectMonitor** list_p,
    L02:                                               int* count_p,
    L03:                                               ObjectMonitor** free_head_p,
    L04:                                               ObjectMonitor** free_tail_p) {
    L05:    ObjectMonitor* cur_mid_in_use = NULL;
    L06:    ObjectMonitor* mid = NULL;
    L07:    ObjectMonitor* next = NULL;
    L08:    int deflated_count = 0;
    L09:    if ((mid = get_list_head_locked(list_p)) == NULL) {
    L10:      return 0;  // The list is empty so nothing to deflate.
    L11:    }
    L12:    next = unmarked_next(mid);
    L13:    while (true) {
    L14:      oop obj = (oop) mid->object();
    L15:      if (obj != NULL && deflate_monitor(mid, obj, free_head_p, free_tail_p)) {
    L16:        if (cur_mid_in_use == NULL) {
    L17:          Atomic::store(list_p, next);
    L18:        } else {
    L19:          set_next(cur_mid_in_use, ->set_next_om(next);
    L20:        }
    L21:        deflated_count++;
    L22:        Atomic::dec(count_p);
    L23:        setmid->set_next_om(mid, NULL);
    L24:      } else {
    L25:        om_unlock(mid);
    L26:        cur_mid_in_use = mid;
    L27:      }
    L28:      mid = next;
    L29:      if (mid == NULL) {
    L30:        break;  // Reached end of the list so nothing more to deflate.
    L31:      }
    L32:      om_lock(mid);
    L33:      next = unmarked_next(mid);
    L34:    }
    L35:    return deflated_count;
    L36:  }

Note: The above version of deflate_monitor_list() uses locking, but those changes were dropped during the code review cycle for JDK-8235795. The locking is only needed when additional calls to audit_and_print_stats() are used during debugging so it was decided that the pushed version would be simpler.

The above is not an exact copy of the code block from deflate_monitor_list(), but it is the highlights. What the above code block needs to do is pretty simple:

...

ObjectSynchronizer::deflate_monitor_list_using_JT() is responsible for asynchronously deflating idle ObjectMonitors using a JavaThread. This function uses the more complicated lock-cur_mid_in_use-and-mid-as-we-go protocol because om_release() can do list deletions in parallel. We also lock-next-next-as-we-go to prevent an om_flush() that is behind this thread from passing us. Because this function can asynchronously interact with so many other functions, this is the largest clip of code:

    L01:  int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor** list_p,
    L02:                                                                                                              int* count_p,
    L03:                                                                                                              ObjectMonitor** free_head_p,
    L04:                                                                                                              ObjectMonitor** free_tail_p,
    L05:                                                                                                              ObjectMonitor** saved_mid_in_use_p) {
    L06:      JavaThread* self = JavaThread::current();
    L07:      ObjectMonitor* cur_mid_in_use = NULL;
    L08:      ObjectMonitor* mid = NULL;
    L09:      ObjectMonitor* next = NULL;
    L10:      ObjectMonitor* next_next = NULL;
    L11:      int deflated_count = 0;
    L12:   NoSafepointVerifier nsv;
    L13:   if (*saved_mid_in_use_p == NULL) {
    L13L14:          if ((mid = get_list_head_locked(list_p)) == NULL) {
    L14L15:              return 0;  // The list is empty so nothing to deflate.
    L15L16:          }
    L16L17:          next = unmarked_next(mid);
    L17L18:      } else {
    L18L19:          cur_mid_in_use = *saved_mid_in_use_p;
    L19L20:          om_lock(cur_mid_in_use);
    L20L21:          mid = unmarked_next(cur_mid_in_use);
    L21L22:          if (mid == NULL) {
    L22L23:              om_unlock(cur_mid_in_use);
    L23L24:              *saved_mid_in_use_p = NULL;
    L24L25:              return 0;  // The remainder is empty so nothing more to deflate.
    L25L26:          }
    L26L27:          om_lock(mid);
    L27L28:          next = unmarked_next(mid);
    L28L29:      }
    L29L30:      while (true) {
    L30L31:          if (next != NULL) {
    L31L32:              om_lock(next);
    L32L33:              next_next = unmarked_next(next);
    L33L34:          }
    L34L35:          if (mid->object() != NULL && mid->is_old() &&
    L35L36:                  deflate_monitor_using_JT(mid, free_head_p, free_tail_p)) {
    L36L37:              if (cur_mid_in_use == NULL) {
    L37L38:                  Atomic::store(list_p, next);
    L38L39:              } else {
    L39L40:                  ObjectMonitor* locked_next = mark_om_ptr(next);
    L40L41:                  set_next(cur_mid_in_use, ->set_next_om(locked_next);
    L41L42:              }
    L42L43:              deflated_count++;
    L43L44:              Atomic::dec(count_p);
    L44L45:              setmid->set_next_om(mid, NULL);
    L45L46:              mid = next;  // mid keeps non-NULL next's locked state
    L46L47:              next = next_next;
    L47L48:          } else {
    L48L49:              if (cur_mid_in_use != NULL) {
    L49L50:                  om_unlock(cur_mid_in_use);
    L50L51:              }
    L51L52:              cur_mid_in_use = mid;
    L52L53:              mid = next;  // mid keeps non-NULL next's locked state
    L53L54:              next = next_next;
    L54L55:              if (SafepointMechanism::should_block(self) &&
    L55L56:                      cur_mid_in_use != Atomic::load(list_p) && cur_mid_in_use->is_old()) {
    L56L57:                  *saved_mid_in_use_p = cur_mid_in_use;
    L57L58:                  om_unlock(cur_mid_in_use);
    L58L59:                  if (mid != NULL) {
    L59L60:                      om_unlock(mid);
    L60L61:                  }
    L61L62:                  return deflated_count;
    L62L63:              }
    L63L64:          }
    L64L65:          if (mid == NULL) {
    L65L66:              if (cur_mid_in_use != NULL) {
    L66L67:                  om_unlock(cur_mid_in_use);
    L67L68:              }
    L68L69:              break;  // Reached end of the list so nothing more to deflate.
    L69L70:          }
    L70L71:      }
    L71L72:      *saved_mid_in_use_p = NULL;
    L72L73:      return deflated_count;
    L73L74:    }

The above is not an exact copy of the code block from deflate_monitor_list_using_JT(), but it is the highlights. What the above code block needs to do is pretty simple:

...

Since we're using the more complicated lock-cur_mid_in_use-and-mid-as-we-go protocol and also the lock-next-next-as-we-go protocol, there is a mind numbing amount of detail:

    • L1[23-67]: Handle the initial setup if we are not resuming after a safepoint or a handshake:
      • L13L14: locks the 'list_p' head (if it is not empty):
      • L16L17: 'next' is the unmarked next field from 'mid'.
    • L17L18-L27L28: Handle the initial setup if we are resuming after a safepoint or a handshake:
      • L19L20: lock 'cur_mid_in_use'
      • L20L21: update 'mid'
      • L21L22-L24L25: If 'mid' == NULL, then we've resumed context at the end of the list so we're done.
      • L26L27: lock 'mid'
      • L27L28: update 'next'
    • L29L30-L70L71: We walk each 'mid' in the list and determine if it can be deflated:
      • L3[01-23]: if next != NULL, then lock 'next' and update 'next_next'
      • L34L35-L46L47: if 'mid' is associated with an object, 'mid' is old, and can be deflated:
        • L36L37: if cur_mid_in_use is NULL, we're still processing the head of the in-use list so...
          • L37L38: we store the list head to 'next'.
        • else
          • L39L40: make a locked copy of 'next'
          • L40L41: we set cur_mid_in_use's next field to 'locked_next'.
        • L42 L43 L44L45: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
          Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unlocks it).
        • L45L46: advance 'mid' to 'next'.
          Note: 'mid' keeps non-NULL 'next's locked state
        • L46L47: advance 'next' to 'next_next'.
      • L47L48-L62L63: 'mid' can't be deflated so we have to carefully advance the list pointers:
        • L4[89]L49,50: if cur_mid_in_use != NULL, then unlock 'cur_mid_in_use'.
        • L51L52: advance 'cur_mid_in_use' to 'mid'.
          Note: 'mid' is still locked and 'cur_mid_in_use' keeps that state.
        • L52L53: advance 'mid' to 'next'.
          Note: A non-NULL 'next' is still locked and 'mid' keeps that state.
        • L53L54: advance 'next' to 'next_next'.
        • L54L55-L61L62: Handle a safepoint or a handshake if one has started and it is safe to do so.
      • L64L65-L68L69: we reached the end of the list:
        • L6[5667]: if cur_mid_in_use != NULL, then unlock 'cur_mid_in_use'.
        • L68L69: break out of the loop because we are done
    • L71L72: not pausing for a safepoint or handshake so clear saved state.
    • L72L73: all done so return 'deflated_count'.

...

ObjectSynchronizer::deflate_idle_monitors() handles deflating idle monitors at a safepoint from the global in-use list using ObjectSynchronizer::deflate_monitor_list(). There are only a few things that are worth mentioning:

    • Atomic::load(&LVarsom_list_globals.in_use_list) is used to get the latest global in-use list.
    • Atomic::load(&LVarsom_list_globals.in_use_count) is used to get the latest global in-use count.
    • prepend_list_to_global_free_list(free_head_p, free_tail_p, deflated_count) is used to prepend the deflated ObjectMonitors on the global free list.

...

ObjectSynchronizer::deflate_common_idle_monitors_using_JT() handles asynchronously deflating idle monitors from either the global in-use list or a per-thread in-use list using ObjectSynchronizer::deflate_monitor_list_using_JT(). There are only a few things that are worth mentioning:

    • Atomic::load(&LVarsom_list_globals.in_use_count) is used to get the latest global in-use count.
    • Atomic::load(&target→om_in_use_count) is used to get the latest per-thread in-use count.
    • prepend_list_to_global_free_list(free_head_p, free_tail_p, local_deflated_count) is used to prepend the deflated ObjectMonitors on the global free list.

...

  • New diagnostic option '-XX:AsyncDeflateIdleMonitors' that is default 'true' so that the new mechanism is used by default, but it can be disabled for potential failure diagnosis.
  • ObjectMonitor deflation is still initiated or signaled as needed at a safepoint. When Async Monitor Deflation is in use, flags are set so that the work is done by the ServiceThread which offloads the safepoint cleanup mechanism.
    • Having the ServiceThread deflate a potentially long list of in-use monitors could potentially delay the start of a safepoint. This is detected in ObjectSynchronizer::deflate_monitor_list_using_JT() which will save the current state when it is safe to do so and return to its caller to drop locks as needed before honoring the safepoint request.
  • New diagnostic option '-XX:AsyncDeflationInterval' that is default 250 millis; this this option controls how frequently we async default idle monitors when MonitorUsedDeflationThreshold is exceeded.New diagnostic option '-XX:HandshakeAfterDeflateIdleMonitors' that is default false on the LP64 X64 platform and default true on other platforms that implement C2MonitorUsedDeflationThreshold is exceeded.
  • Everything else is just monitor list management, infrastructure, logging, debugging and the like. :-)

...

    • For this option, exceeded means:

   ((LVarsom_list_globals.population - LVarsom_list_globals.free_count) / LVarsom_list_globals.population) > NN%

  • If MonitorBound is exceeded (default is 0 which means off), cleanup safepoint will be induced.
  • For this option, exceeded means:

(LVars.population - LVars.free_count) > MonitorBound

...

  • Changes to the safepoint deflation mechanism by the Async Monitor Deflation project (when async deflation is enabled):
    • If System.gc() is called, then a special deflation request is made which invokes the safepoint deflation mechanism.
    • Added the AsyncDeflationInterval diagnostic option (default 250 millis, 0 means off) to prevent MonitorUsedDeflationThreshold requests from swamping the ServiceThread.
      • Description: Async deflate idle monitors every so many milliseconds when MonitorUsedDeflationThreshold is exceeded (0 is off).
      • A special deflation request can cause an async deflation to happen sooner than AsyncDeflationInterval.
    • SafepointSynchronize::dois_cleanup_tasksneeded() now calls:
      • ObjectSynchronizer::is_safepoint_deflation_needed() instead of ObjectSynchronizer::is_cleanup_needed().
      • is_safepoint_deflation_needed() returns true only if a special deflation request is made (see abovea special deflation request is made (see above).
    • SafepointSynchronize::do_cleanup_tasks() now (indirectly) calls:
      • ObjectSynchronizer::do_safepoint_work() instead of ObjectSynchronizer::deflate_idle_monitors().
      • do_cleanup_tasks() can be called for non deflation related cleanup reasons and that will still result in a call to do_safepoint_work().
    • ObjectSynchronizer::do_safepoint_work() only does the safepoint cleanup tasks if there is a special deflation request. Otherwise it just sets the is_async_deflation_requested flag and notifies the ServiceThread.
    • ObjectSynchronizer::deflate_idle_monitors() and ObjectSynchronizer::deflate_thread_local_monitors() do nothing unless there is a special deflation request.

...

  • Other invocation changes by the Async Monitor Deflation project (when async deflation is enabled):

    • VM_Exit::doit_prologue() will request a special cleanup to reduce the noise in 'monitorinflation' logging at VM exit time.

    • Before the final safepoint in a non-System.exit() end to the VM, we will request a special cleanup to reduce the noise in 'monitorinflation' logging at VM exit time.

    • The following whitebox test functions will request a special cleanup:
      • WB_G1StartMarkCycle()

      • WB_FullGC()
      • WB_ForceSafepoint()

Gory Details

  • Counterpart function mapping for those that know the existing code:
    • ObjectSynchronizer class:
      • deflate_idle_monitors() has deflate_idle_monitors_using_JT(), deflate_global_idle_monitors_using_JT(), deflate_per_thread_idle_monitors_using_JT(), and deflate_common_idle_monitors_using_JT().
      • deflate_monitor_list() has deflate_monitor_list_using_JT()
      • deflate_monitor() has deflate_monitor_using_JT()
    • ObjectMonitor class:
      • clear() has clear_using_JT()
  • These functions recognize the Async Monitor Deflation protocol and adapt their operations:
    • ObjectMonitor::enter()
    • ObjectMonitor::EnterI()ObjectMonitor::ReenterI()
    • ObjectSynchronizer::quick_enter()
    • ObjectSynchronizer::deflate_monitor()
    • Note: These changes include handling the lingering owner == DEFLATER_MARKER value.
  • Also these functions had to adapt and retry their operations:
    • ObjectSynchronizer::FastHashCode()
    • ObjectSynchronizer::current_thread_holds_lock()
    • ObjectSynchronizer::query_lock_ownership()
    • ObjectSynchronizer::get_lock_owner()
    • ObjectSynchronizer::monitors_iterate()
    • ObjectSynchronizer::inflate_helper()
    • ObjectSynchronizer::inflate() 
  • Various assertions had to be modified to pass without their real check when AsyncDeflateIdleMonitors is true; this is due to the change in semantics for the ObjectMonitor owner field.
  • ObjectMonitor has a new allocation_state field that supports three states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied to ObjectMonitors that have reached the 'Old' state.
    • Note: Prior to CR1/v2.01/4-for-jdk13, the allocation state was transitioned from 'New' to 'Old' in deflate_monitor_via_JT(). This meant that deflate_monitor_via_JT() had to see an ObjectMonitor twice before deflating it. This policy was intended to prevent oscillation from 'New' → 'Old' and back again.
    • In CR1/v2.01/4-for-jdk13, the allocation state is transitioned from 'New' -> "Old" in inflate(). This makes ObjectMonitors available for deflation earlier. So far there has been no signs of oscillation from 'New' → 'Old' and back again.
    ObjectMonitor has a new ref_count field that is used as part of the async deflation protocol and to indicate that an ObjectMonitor* is in use so the ObjectMonitor should not be deflated; this is needed for operations on non-busy monitors so that ObjectMonitor values don't change while they are being queried. There is a new ObjectMonitorHandle helper to manage the ref_count
    • .
  • The ObjectMonitor::owner() accessor detects DEFLATER_MARKER and returns NULL in that case to minimize the places that need to understand the new DEFLATER_MARKER value.
  • System.gc()/JVM_GC() causes a special monitor list cleanup request which uses the safepoint based monitor list mechanism. So even if AsyncDeflateIdleMonitors is enabled, the safepoint based mechanism is still used by this special case.
    • This is necessary for those tests that do something to cause an object's monitor to be inflated, clear the only reference to the object and then expect that enough System.gc() calls will eventually cause the object to be GC'ed even when the thread never inflates another object's monitor. Yes, we have several tests like that. :-)