Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: First pass at updates for CR7/v2.07/10-for-jdk14.

Note: Updating the wiki for the upcoming CR7/v2.07/10-for-jdk14 review cycle. Changes have been made, but not yet sanity checked.

Table of Contents:

Table of Contents

...

RFE: 8153224 Monitor deflation prolong safepoints
         https://bugs.openjdk.java.net/browse/JDK-8153224

Full Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/910-for-jdk14.v2.0607.full/

Inc Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/910-for-jdk14.v2.0607.inc/

Background

This patch for Async Monitor Deflation is based on Carsten Varming's

...

    • This diagram starts after "Racing Threads".
    • The "1>" markers are showing where each thread is at for that ObjectMonitor box:
      • T-save and T-deflate both observe owner == DEFLATER_MARKER and a negative ref_count field.
    • T-save has lost the race: it restores the obj header (not shown) and decrements the ref_count.
    • T-deflate restores the obj header (not shown).
    • The "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-save returns false to cause the caller to retry.
    • T-deflate finishes the deflation.
    • Note: As of CR5/v2.05/8-for-jdk13, the owner == DEFLATER_MARKER value is allowed to linger until a deflated ObjectMonitor is reused for an enter operation. This prevents narrows the C2 ObjectMonitor enter optimization from racing race window with async deflation.

T-save Wins

...

    • This diagram starts after "Racing Threads".
    • The "1>" markers are showing where each thread is at for the ObjectMonitor box:
      • T-save and T-deflate both observe a ref_count field > 0.
    • T-save has won the race and it saves the ObjectMonitor* in the ObjectMonitorHandle (not shown).
    • T-deflate detects that it has lost the race (prev != 0) and bails out on deflating the ObjectMonitor:
      • Before bailing out T-deflate tries to restore the owner field to NULL if it is still DEFLATER_MARKER.
    • The "2>" markers are showing where each thread is at for that ObjectMonitor box.

T-

...

save Complication with C2

Sorry in advance for the sudden deep dive into really gory C2 details, but this is related to a majority of save_om_ptr() so this is the right place to talk about the complication.

As of CR7/v2.07/10-for-jdk14, we have added C2 inc_om_ref_count() on X64 to implement the ref_count management parts of save_om_ptr():

    • inc_om_ref_count() does not implement the "restore obj header" part nor the "save om_ptr in the ObjectMonitorHandle" part mentioned in the previous two subsections.
    • inc_om_ref_count() is used by C2 fast_lock(), C2 fast_unlock() and C2 rtm_inflated_locking() on LP64.
    • The v2.05 version of C2 fast_lock() has code to detect a deflated and recycled ObjectMonitor after acquiring ownership of the ObjectMonitor. The solution to the race was to drop ownership and take the slow enter path. We have spent a lot of time and energy analyzing this race and the solution to this race and have convinced ourselves that the solution introduces theoretical problems with succession. The proper solution is to switch to using inc_om_ref_count() to protect the ObjectMonitor* for the duration of C2 fast_lock().
    • Robbin wrote a new test called MoCrazy that is targeted at the C2 optimizations. This test revealed a race in the baseline C2 fast_unlock() where ownership was reacquired in order to ensure proper succession. So baseline C2 fast_unlock() had a similar version of the race that we thought we fixed in C2 fast_lock(). The proper solution is to switch to using inc_om_ref_count() to protect the ObjectMonitor* for the duration of C2 fast_unlock().
    • C2 rtm_inflated_locking() is similarly exposed to races with async deflation so inc_om_ref_count() is used to protect the ObjectMonitor* for the duration of C2 rtm_inflated_locking().

T-enter Wins By A-B-A

    T-enter                T-enter                                       ObjectMonitor                T-deflate
    -------------------------------------------- +-------------------------+  ------------------------------------------
    ObjectMonitor::enter() { | owner=DEFLATER_MARKER |  deflate_monitor_using_JT() {
    <owner is contended>   | ref_count=1            |  cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> EnterI() {   +-------------------------+ 1> :
  if (owner == DEFLATER_MARKER && || 2> : <thread_stalls>
      cmpxchg(Self, &owner,                    \/ ObjectMonitor               :
    DEFLATER_MARKER) T-deflate
    -------------------------------------------- +-------------------------+  ------------------------------------------
    ObjectMonitor::enter() { :
| owner== DEFLATER_MARKER |  deflate_monitor_using_JT() {
    <owner is contended>   | owner=Self/T-enter | :
| ref_count=1            // EnterI is done cmpxchg(DEFLATER_MARKER, &owner, NULL)
   1> EnterI() {   | ref_count=0 | : <thread_resumes>
return +-------------------------+ 1> :
  prev = cmpxchg(-max_jint, &ref_count, 0)
if (owner == DEFLATER_MARKER && } || 2> : <thread_stalls>
      cmpxchg(Self, &owner,   ||                  \/ if (prev == 0 &&:
    } // enter() is done DEFLATER_MARKER) +-------------------------+ :
\/ 3> owner == DEFLATER_MARKER) {
~OMH: atomic dec ref_count | owner=Self/T-enter | :
+-------------------------+ // EnterI }is elsedone {
2> : <does app work> | ref_count=0 | owner=Self/T-enter|NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
3> :: <thread_resumes>
return +-------------------------+ |prev ref_count= cmpxchg(-max_jint, &ref_count, 0)
| } atomic add max_jint to ref_count
exit() monitor +-------------------------+ 4> bailout on deflation
4> owner = NULL || }
if (prev == 0 &&
} // enter() is done \/
3> owner == DEFLATER_MARKER) {
~OMH: atomic dec ref_count +-------------------------+
} else {
2> : <does app work> | owner=Self/T-enter|NULL |
cmpxchg(NULL, &owner, DEFLATER_MARKER)
3> : | ref_count=0 -max_jint |
atomic add max_jint to ref_count
exit() monitor +-------------------------+ 4> bailout on deflation
4> owner = NULL || }
\/
+-------------------------+
| owner=Self/T-enter|NULL |
| ref_count=0 |
+-------------------------+
    • T-deflate has executed cmpxchg() and set owner to DEFLATE_MARKER.
    • T-enter has called ObjectMonitor::enter() with "ref_count == 1", noticed that the owner is contended and is about to call ObjectMonitor::EnterI().
    • The first
    • T-deflate has executed cmpxchg() and set owner to DEFLATE_MARKER.
    • T-enter has called ObjectMonitor::enter() with "ref_count == 1", noticed that the owner is contended and is about to call ObjectMonitor::EnterI().
    • The first ObjectMonitor box is showing the fields at this point and the "1>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate stalls after setting the owner field to DEFLATER_MARKER.
    • T-enter calls EnterI() to do the contended enter work:
      • EnterI() observes owner == DEFLATER_MARKER and uses cmpxchg() to set the owner field to Self/T-enter.
      • T-enter owns the monitor, returns from EnterI(), and returns from enter().
      • The ObjectMonitorHandle destructor decrements the ref_count.
    • T-enter is now ready to do work that requires the monitor to be owned.
    • The second ObjectMonitor box is showing the fields at this point and the "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-enter is doing app work (but it also could have finished and exited the monitor).
    • T-deflate resumes, calls cmpxchg() to set the ref_count field to -max_jint, and passes the first part of the bailout expression because "prev == 0".
    • The third ObjectMonitor box is showing the fields at this point and the "3>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate performs the A-B-A check which observes that "owner != DEFLATE_MARKER" and bails out on deflation:
      • Depending on when T-deflate resumes after the stall, it will see "owner == T-enter" or "owner == NULL".
      • Both of those values will cause deflation to bailout so we have to conditionally undo work:
        • restore the owner field to NULL if it is still DEFLATER_MARKER (it's not DEFLATER_MARKER)
        • undo setting ref_count to -max_jint by atomically adding max_jint to ref_count which will restore ref_count to its proper value.
      • If the T-enter thread has managed to enter but not exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
        • NULL → DEFLATE_MARKER → Self/T-enter

      • so we really have A1-B-A2, but the A-B-A principal still holds.

      • If the T-enter thread has managed to enter and exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:

        • NULL → DEFLATE_MARKER → Self/T-enter  → NULL

      • so we really have A-B1-B2-A, but the A-B-A principal still holds.

    • T-enter finished doing app work and is about to exit the monitor (or it has already exited the monitor).

    • The fourth ObjectMonitor box is showing the fields at this point and the "4>1>" markers are showing where each thread is at for that ObjectMonitor box.

An Example of Object Header Interference

...

    • T-deflate

...

    • stalls after setting the owner field to DEFLATER_MARKER.
    • T-enter calls EnterI() to do the contended enter work:
      • EnterI() observes owner == DEFLATER_MARKER and uses cmpxchg() to set the owner field to Self/T-enter.
      • T-enter owns the monitor, returns from EnterI(), and returns from enter().
      • The ObjectMonitorHandle destructor decrements the ref_count.
    • T-enter is now ready to do work that requires the monitor to be owned.
    • The second ObjectMonitor box is showing the fields at this point and the "2>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-enter is doing app work (but it also could have finished and exited the monitor).
    • T-deflate resumes, calls cmpxchg() to set the ref_count field to -max_jint, and passes the first part of the bailout expression because "prev == 0".
    • The third ObjectMonitor box is showing the fields at this point and the "3>" markers are showing where each thread is at for that ObjectMonitor box.
    • T-deflate performs the A-B-A check which observes that "owner != DEFLATE_MARKER" and bails out on deflation:
      • Depending on when T-deflate resumes after the stall, it will see "owner == T-enter" or "owner == NULL".
      • Both of those values will cause deflation to bailout so we have to conditionally undo work:
        • restore the owner field to NULL if it is still DEFLATER_MARKER (it's not DEFLATER_MARKER)
        • undo setting ref_count to -max_jint by atomically adding max_jint to ref_count which will restore ref_count to its proper value.
      • If the T-enter thread has managed to enter but not exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
        • NULL → DEFLATE_MARKER → Self/T-enter

      • so we really have A1-B-A2, but the A-B-A principal still holds.

      • If the T-enter thread has managed to enter and exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:

        • NULL → DEFLATE_MARKER → Self/T-enter  → NULL

      • so we really have A-B1-B2-A, but the A-B-A principal still holds.

    • T-enter finished doing app work and is about to exit the monitor (or it has already exited the monitor).

    • The fourth ObjectMonitor box is showing the fields at this point and the "4>" markers are showing where each thread is at for that ObjectMonitor box.

An Example of Object Header Interference

After T-deflate has won the race for deflating an ObjectMonitor it has to restore the header in the associated object. Of course another thread can be trying to do something to the object's header at the same time. Isn't asynchronous work exciting?!?!

ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-save thread and a T-deflate thread:

Start of the Race

    T-save                                       object           T-deflate
    -------------------------------------------  +-------------+  --

ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-save thread and a T-deflate thread:

Start of the Race

    T-save                                       object           T-deflate
    -------------------------------------------  +-------------+  --------------------------------------------
install_displaced_markword_in_object() { | mark=om_ptr |  install_displaced_markword_in_object() {
    dmw = header()                    +-------------+  dmw = header()
    if (!dmw->is_marked() &&                                     if (!dmw->is_marked() &&
      dmw->hash() == 0) {                                          dmw->hash() == 0) {
      create marked_dmw                    create marked_dmw
    dmw = cmpxchg(marked_dmw, &header, dmw)                      dmw = cmpxchg(marked_dmw, &header, dmw)
} }

...

Note: The above code snippet comes from ObjectSynchronizer::prepend_block_to_lists(); see that function for more complete context (and comments).

Note: In v2.06, L2 uses OrderAccess::load_acquire() and L3 uses OrderAccess::release_store(). David H. pointed out that a regular load and regular store can be used. I've made the change for the upcoming V2.07 and will remove this note when the project rolls forward to v2.07.

The Not So Simple Case or  Taking The Not So Simple Case or Taking and Prepending on the Same List Leads to A-B-A Races

...

    T1: Simple Take:                           |                                            | T2: Simple Prepend:
---------------- | T1 and T3 see this initial list: | -------------------
while (true) { | +---+ +---+ +---+ | :
cur = head; | head -> | A | -> | X | -> | Y | | :
next = cur->next; | +---+ +---+ +---+ | :
: | T3 takes "A", T2 sees this list: | :
: | +---+ +---+ | :
: | head -> | X | -> | Y | | :
: | +---+ +---+ | while (true) {
: | T2 prepends "B": | cur = head;
: | +---+ +---+ +---+ | new->next = cur;
: | head -> | B | -> | X | -> | Y | | if (cmpxchg(new, &head, cur) == cur) {
: | +---+ +---+ +---+ | break;
: | T3 prepends "A": | }
: | +---+ +---+ +---+ +---+ | }
: | head -> | A | -> | B | -> | X | -> | Y | |
: | +---+ +---+ +---+ +---+ |
: | T1 takes "A", loses "B": |
: | +---+ |
: | | B | ----+ |
: | +---+ | |
: | V |
: | +---+ +---+ |
if (cmpxchg(next, &head, cur) == cur) { | head -> | X | -> | Y | |
} | +---+ +---+ |
} | +---+ |
return cur; | cur -> | A | |
| +---+ |

So the simple algorithms are not sufficient when we allow simultaneous take and prepend operations.

Marking to Solve the A-B-A Race

Note: This subsection is talking about "Marking" as a solution to the A-B-A race in abstract terms. The purpose of this marking code and A-B-A example is to introduce the solution concepts. The code shown here is not an exact match for the project code.

One solution to the A-B-A race is to mark the next field in a node to indicate that the node is busy. Only one thread can successfully mark the next field in a node at a time and other threads must loop around and retry their marking operation until they succeed. Each thread that marks the next field in a node must unmark the next field when it is done with the node so that other threads can proceed.

Here's the take algorithm modified with marking (still ignores the empty list for clarity):

    // "take" a node with marking:
while (true) {
cur = head;
if (!mark_next(cur, &next)) {
// could not mark cur so try again
continue;
}
if (head != cur) {
// head changed while marking cur so try again
unmark_next(cur);
continue;
}
// list head is now marked so switch it to next which also makes list head unmarked
OrderAccess::release_store(&head, next);
unmark_next(cur); // unmark cur and return it
return cur;
}

The modified take algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the take algorithm from executing in parallel with a prepend algorithm and losing a node.

Also notice that we update the list head pointer with release-store instead of with cmpxchg. Since we have the list head marked, we are not racing with other threads to change the list head pointer so we can use the smaller release-store hammer instead of the heavier cmpxchg hammer.

Here's the prepend algorithm modified with marking (ignores the empty list for clarity):

    // "prepend" a node with marking:
while (true) {
cur = head;
if (!mark_next(cur, &next)) {
// could not mark cur so try again
continue;
}
if (head != cur) {
// head changed while marking cur so try again
unmark_next(cur);
continue;
}
// list head is now marked so switch it to 'new' which also makes list head unmarked
Atomic::release_store(&head, new);
unmark_next(cur); // unmark the previous list head
}

The modified prepend algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the prepend algorithm from executing in parallel with the take algorithm and losing a node.

Also notice that we update the list head pointer with release-store instead of with cmpxchg for the same reasons as the previous algorithm.

Background: ObjectMonitor Movement Between the Lists

The purpose of this subsection is to provide background information about how ObjectMonitors move between the various lists. This project changes the way these movements are implemented, but does not change the movements themselves. For example, newly allocated blocks of ObjectMonitors are always prepending to the global free list; this is true in the baseline and is true in this project.

ObjectMonitor Allocation Path

    • ObjectMonitors are allocated by ObjectSynchronizer::om_alloc().
    • Assume that the calling JavaThread has an empty free list and the global free list is also empty:
      • A block of ObjectMonitors is allocated by the calling JavaThread and prepended to the global free list.
      • ObjectMonitors are taken from the front of the global free list by the calling JavaThread and prepended to the JavaThread's free list by ObjectSynchronizer::om_release().
      • An ObjectMonitor is taken from the front of the JavaThread's free list and prepended to the JavaThread's in-use list (optimistically).

ObjectMonitor Deflation Path

    • ObjectMonitors are deflated at a safepoint by:
          ObjectSynchronizer::deflate_monitor_list() calling ObjectSynchronizer::deflate_monitor()
      And when Async Monitor Deflation is enabled, the are deflated by:
          ObjectSynchronizer::deflate_monitor_list_using_JT() calling ObjectSynchronizer::deflate_monitor_using_JT()

    • Idle ObjectMonitors are deflated by the ServiceThread when Async Monitor Deflation is enabled. They can also be deflated at a safepoint by the VMThread or by a task worker thread. Safepoint deflation is used when Async Monitor Deflation is disabled or when there is a special deflation request made, e.g., System.gc().

    • An idle ObjectMonitor is deflated and extracted from its in-use list and prepended to the global free list. The in-use list can be either the global in-use list or a per-thread in-use list. Deflated ObjectMonitors are always prepended to the global free list.

ObjectMonitor Flush Path

    • ObjectMonitors are flushed by ObjectSynchronizer::om_flush().
    • When a JavaThread exits, the ObjectMonitors on its in-use list are prepended on the global in-use list and the ObjectMonitors on its free list are prepended on the global free list.

ObjectMonitor Linkage Path

    • ObjectMonitors are linked with objects by ObjectSynchronizer::inflate().
    • An inflate() call by one JavaThread can race with an inflate() call by another JavaThread for the same object.
    • When inflate() realizes that it failed to link an ObjectMonitor with the target object, it calls ObjectSynchronizer::om_release() to extract the ObjectMonitor from the JavaThread's in-use list and prepends it on the JavaThread's free list.
      Note: Remember that ObjectSynchronizer::om_alloc() optimistically added the newly allocated ObjectMonitor to the JavaThread's in-use list.
    • When inflate() successfully links an ObjectMonitor with the target object, that ObjectMonitor stays on the JavaThread's in-use list.

Lock-Free Monitor List Management In Reality

Prepending To A List That Also Allows Deletes

It is now time to switch from algorithms to real snippets from the code.

The next case to consider for lock-free list management with the Java Monitor subsystem is prepending to a list that also allows deletes. As you might imagine, the possibility of a prepend racing with a delete makes things more complicated. The solution is to "mark" the next field in the ObjectMonitor at the head of the list we're trying to prepend to. A successful mark tells other prependers or deleters that the marked ObjectMonitor is busy and they will need to retry their own mark operation.

Note: This is the v2.06 version of code and associated notes:

    L01:  while (true) {
L02: ObjectMonitor* cur = OrderAccess::load_acquire(list_p);
L03:    ObjectMonitor* next = NULL;
L04:    if (!mark_next(m, &next)) {
L05:      continue;  // failed to mark next field so try it all again
L06:    }
L07:    set_next(m, cur);  // m now points to cur (and unmarks m)
L08:    if (cur == NULL) {
L09: // No potential race with other prependers since *list_p is empty.
L10:      if (Atomic::cmpxchg(m, list_p, cur) == cur) {
L11: // Successfully switched *list_p to 'm'.
L12:        Atomic::inc(count_p);
L13:        break;
L14:      }
L15:      // Implied else: try it all again
L16:    } else {
L17:      // Try to mark next field to guard against races:
L18:      if (!mark_next(cur, &next)) {
L19:        continue;  // failed to mark next field so try it all again
L20:      }
L21:      // We marked the next field so try to switch *list_p to 'm'.
L22:      if (Atomic::cmpxchg(m, list_p, cur) != cur) {
L23:        // The list head has changed so unmark the next field and try again:
L24:        set_next(cur, next);
L25:        continue;
L26:      }
L27:      Atomic::inc(count_p);
L28:      set_next(cur, next);  // unmark next field
L29:      break;
L30:    }
L31:  }

What the above block of code does is:

    • prepends an ObjectMonitor 'm' to the front of the list referred to by list_p
      • mark 'm's next field and update 'm' to refer to the list head
      • mark the list head's next field
      • update 'list_p' to refer to 'm'
      • unmark the next field in the previous list head
    • increments the counter referred to by 'count_p' by one

The above block of code can be called by multiple prependers in parallel or with deleters running in parallel and does not lose track of any ObjectMonitor. Of course, the "does not lose track of any ObjectMonitor" part is where all the details come in:

    • L02 load-acquires the current 'list_p' value into 'cur'; the use of load-acquire is necessary to get the latest value release-stored by another thread; the current 'list_p' is updated by either a release-store or a cmpxchg depending on the algorithm that made the update; only the release-store needs to match up with a load-acquire, but this code doesn't know whether release-store or cmpxchg was used.
    • L04 tries to mark 'm's next field; if marking fails, then another thread (T2) has 'm' marked and we try again until it is unmarked.
      You might be asking yourself: why does T2 have 'm' marked?
      • Before T1 was trying to prepend 'm' to the in-use list, T1 and T2 were racing to take an ObjectMonitor off the free list.
      • T1 won the race, marked 'm', removed 'm' from the free list and unmarked 'm'; T2 stalled before trying to mark 'm'.
      • T2 resumed and marked 'm', realized that 'm' was no longer the head of the free list, unmarked 'm' and tried it all again.
      • If our thread (T1) does not mark 'm' before it tries to prepend it to the in-use list, then T2's umarking of 'm' could erase the next value that T1 wants to put in 'm'.
    • L07 sets 'm's next field to the current list head 'cur' (which also unmarks 'm').
    • L08 → L13 recognizes that the current list is empty and tries to cmpxchg 'list_p' to 'm':
      • if cmpxchg works, then the counter referred to by 'count_p' is incremented by one and we're done.
      • Otherwise, another prepender won the race to update the list head so we have to try again.
    • L16 → L29 is where we handle a non empty current list:
      • L18 tries to mark the current list head 'cur'; if marking fails, then another thread (a prepender or a deleter) has 'cur' marked and we try again until it is unmarked.
      • Once our thread has 'cur' marked, another prepender or deleter will have to retry until we have unmarked 'cur'.
      • L22 tries to cmpxchg 'list_p' to 'm':
        • if cmpxchg does not work, then we unmark 'cur' and try again; the cmpxchg can fail if another thread has managed to change the list head 'list_p' and unmarked 'cur' after we load-acquired list_p on L02 and before we tried to cmpxchg it on L22.
        • Otherwise, the counter referred to by 'count_p' is incremented by one, we unmark 'cur' and we're done.

ObjectMonitor 'm' is safely on the list at the point that we have updated 'list_p' to refer to 'm'. In this subsection's block of code, we also called two new functions, mark_next() and set_next(), that are explained in the next subsection.

+---+                              |

So the simple algorithms are not sufficient when we allow simultaneous take and prepend operations.

Marking to Solve the A-B-A Race

Note: This subsection is talking about "Marking" as a solution to the A-B-A race in abstract terms. The purpose of this marking code and A-B-A example is to introduce the solution concepts. The code shown here is not an exact match for the project code.

One solution to the A-B-A race is to mark the next field in a node to indicate that the node is busy. Only one thread can successfully mark the next field in a node at a time and other threads must loop around and retry their marking operation until they succeed. Each thread that marks the next field in a node must unmark the next field when it is done with the node so that other threads can proceed.

Here's the take algorithm modified with marking (still ignores the empty list for clarity):

    // "take" a node with marking:
while (true) {
cur = head;
if (!mark_next(cur, &next)) {
// could not mark cur so try again
continue;
}
if (head != cur) {
// head changed while marking cur so try again
unmark_next(cur);
continue;
}
// list head is now marked so switch it to next which also makes list head unmarked
OrderAccess::release_store(&head, next);
unmark_next(cur); // unmark cur and return it
return cur;
}

The modified take algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the take algorithm from executing in parallel with a prepend algorithm and losing a node.

Also notice that we update the list head pointer with release-store instead of with cmpxchg. Since we have the list head marked, we are not racing with other threads to change the list head pointer so we can use the smaller release-store hammer instead of the heavier cmpxchg hammer.

Here's the prepend algorithm modified with marking (ignores the empty list for clarity):

    // "prepend" a node with marking:
while (true) {
cur = head;
if (!mark_next(cur, &next)) {
// could not mark cur so try again
continue;
}
if (head != cur) {
// head changed while marking cur so try again
unmark_next(cur);
continue;
}
// list head is now marked so switch it to 'new' which also makes list head unmarked
Atomic::release_store(&head, new);
unmark_next(cur); // unmark the previous list head
}

The modified prepend algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the prepend algorithm from executing in parallel with the take algorithm and losing a node.

Also notice that we update the list head pointer with release-store instead of with cmpxchg for the same reasons as the previous algorithm.

Background: ObjectMonitor Movement Between the Lists

The purpose of this subsection is to provide background information about how ObjectMonitors move between the various lists. This project changes the way these movements are implemented, but does not change the movements themselves. For example, newly allocated blocks of ObjectMonitors are always prepending to the global free list; this is true in the baseline and is true in this project.

ObjectMonitor Allocation Path

    • ObjectMonitors are allocated by ObjectSynchronizer::om_alloc().
    • Assume that the calling JavaThread has an empty free list and the global free list is also empty:
      • A block of ObjectMonitors is allocated by the calling JavaThread and prepended to the global free list.
      • ObjectMonitors are taken from the front of the global free list by the calling JavaThread and prepended to the JavaThread's free list by ObjectSynchronizer::om_release().
      • An ObjectMonitor is taken from the front of the JavaThread's free list and prepended to the JavaThread's in-use list (optimistically).

ObjectMonitor Deflation Path

    • ObjectMonitors are deflated at a safepoint by:
          ObjectSynchronizer::deflate_monitor_list() calling ObjectSynchronizer::deflate_monitor()
      And when Async Monitor Deflation is enabled, they are deflated by:
          ObjectSynchronizer::deflate_monitor_list_using_JT() calling ObjectSynchronizer::deflate_monitor_using_JT()

    • Idle ObjectMonitors are deflated by the ServiceThread when Async Monitor Deflation is enabled. They can also be deflated at a safepoint by the VMThread or by a task worker thread. Safepoint deflation is used when Async Monitor Deflation is disabled or when there is a special deflation request, e.g., System.gc().

    • An idle ObjectMonitor is deflated and extracted from its in-use list and prepended to the global free list. The in-use list can be either the global in-use list or a per-thread in-use list. Deflated ObjectMonitors are always prepended to the global free list.

      • In CR7/v2.07/10-for-jdk14, the HandshakeAfterDeflateIdleMonitors diagnostic option is added to enable a new g_wait_list that tracks deflated ObjectMonitors until after a handshake/safepoint with all JavaThreads.
      • The g_wait_list allows ObjectMonitors to be safely deflated on platforms that do not have C2 inc_om_ref_count() implemented. See the "T-save Complication with C2" subsection above for the gory C2 details.
      • So when the option is enabled, idle ObjectMonitors are deflated and extracted from an in-use list and prepended to g_wait_list; after the handshake/safepoint with all JavaThreads, the ObjectMonitors on the g_wait_list are prepended to the global free list.

ObjectMonitor Flush Path

    • ObjectMonitors are flushed by ObjectSynchronizer::om_flush().
    • When a JavaThread exits, the ObjectMonitors on its in-use list are prepended on the global in-use list and the ObjectMonitors on its free list are prepended on the global free list.

ObjectMonitor Linkage Path

    • ObjectMonitors are linked with objects by ObjectSynchronizer::inflate().
    • An inflate() call by one JavaThread can race with an inflate() call by another JavaThread for the same object.
    • When inflate() realizes that it failed to link an ObjectMonitor with the target object, it calls ObjectSynchronizer::om_release() to extract the ObjectMonitor from the JavaThread's in-use list and prepends it on the JavaThread's free list.
      Note: Remember that ObjectSynchronizer::om_alloc() optimistically added the newly allocated ObjectMonitor to the JavaThread's in-use list.
    • When inflate() successfully links an ObjectMonitor with the target object, that ObjectMonitor stays on the JavaThread's in-use list.

Lock-Free Monitor List Management In Reality

Prepending To A List That Also Allows Deletes

It is now time to switch from algorithms to real snippets from the code.

The next case to consider for lock-free list management with the Java Monitor subsystem is prepending to a list that also allows deletes. As you might imagine, the possibility of a prepend racing with a delete makes things more complicated. The solution is to "mark" the next field in the ObjectMonitor at the head of the list we're trying to prepend to. A successful mark tells other prependers or deleters that the marked ObjectMonitor is busy and they will need to retry their own mark operation.Note: This is the v2.07 version of code and associated notes:

    L01:  while (true) {
    L02:    (void)mark_next_loop(m);  // mark m so we can safely update its next field
    L03:    ObjectMonitor* cur = NULL;
    L04:    ObjectMonitor* next = NULL;
    L05:    // Mark the list head to guard against A-B-A race:
    L06:    if (mark_list_head(list_p, &cur, &next)) {
    L07:      // List head is now marked so we can safely switch it.
    L08:      set_next(m, cur);  // m now points to cur (and unmarks m)
    L09:      OrderAccess::release_store(list_p, m);  // Switch list head to unmarked m.
    L10:      set_next(cur, next);  // Unmark the previous list head.
    L11:      break;
    L12:    }
    L13:    // The list is empty so try to set the list head.
    L14:    assert(cur == NULL, "cur must be NULL: cur=" INTPTR_FORMAT, p2i(cur));
    L15:    set_next(m, cur);  // m now points to NULL (and unmarks m)
    L16:    if (Atomic::cmpxchg(m, list_p, cur) == cur) {
    L17:      // List head is now unmarked m.
    L18:      break;
    L19:    }
    L20:    // Implied else: try it all again
    L21:  }
L22: Atomic::inc(count_p);

...

    L01:  if (from_per_thread_alloc) {
    L02:    mark_list_head(&self->om_in_use_list, &mid, &next);
    L03:    while (true) {
    L04:      if (m == mid) {
    L05:        if (Atomic::cmpxchg(next, cur_mid_in_use == NULL) {
    L06:          OrderAccess::release_store(&self->om_in_use_list, mid) != mid) {, next);
    L06L07:                 ObjectMonitor* marked_mid = mark_om_ptr(mid);} else {
    L07L08:                 AtomicOrderAccess::cmpxchg(next, release_store(&cur_mid_in_use->_next_om, marked_midnext);
    L08:       L09: }
    L09L10:        extracted = true;
    L10L11:        Atomic::dec(&self->om_in_use_count);
    L11L12:        set_next(mid, next);
    L12L13:        break;
    L13L14:      }
    L14L15:      if (cur_mid_in_use != NULL) {
    L15L16:        set_next(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L16L17:      }
    L17L18:      cur_mid_in_use = mid;
    L18:      mid = next;
    L19:      next = mark_next_loop(mid);
    L20:    }
    L21:  }
    L22:  prepend_to_om_free_list(self, m);

Note: In v2.07, I figured out a simpler way to do L05-L08:

    L05:        if (cur_mid_in_use == NULL) {next;
    L06:          OrderAccess::release_store(&self->om_in_use_list, nextL20:      next = mark_next_loop(mid);
    L07L21:           }
    elseL22:  {}
    L08L23:          OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
    L09:        }

...

  prepend_to_om_free_list(self, m);

Most of the above code block extracts 'm' from self's in-use list; it is not an exact quote from om_release(), but it is the highlights:

    • L02 is used to mark self's in-use list head:
      • 'mid' is self's in-use list head and its next field is marked.
      • 'next' is the unmarked next field from 'mid'.
    • L03 → L20L21: self's in-use list is traversed looking for the target ObjectMonitor 'm':L04: if the current 'mid' matches 'm':L05: if we can't cmpxchg self's in-use list head to the 'next' ObjectMonitor *:
      Note: This cmpxchg only works if 'm':
      • L04: if the current 'mid' matches 'm':
        • L05: if cur_mid_in_use is NULL, we're still processing the head of the thread is self's in-use list head and no-harm-no-foul if 'm' is not self's in-use list head. This is faster than doing a load-acquire of self's in-use list head, checking the value and then calling cmpxchgso...
          • L06: make a marked copy of 'mid'we cmpxchg the list head's next field from 'marked_mid' to 'next'.
        • else
          • L08L07: we cmpxchg cur the cur_mid_in_use's next field from 'marked_mid' to 'next'.Note: We use cmpxchg here instead of release-store so that we can sanity check the result; see the real code.
          • Note: In v2.07, I figured out a simpler way to do L05-L08 so the above bullets will be updated.
        • L09 → L12L10 → L13: we've successfully extracted 'm' from self's in-use list so we decrement self's in-use counter, unmark the next field in 'mid' and we're done.
      • L1[4556]: if cur_mid_in_use != NULL, then unmark its next field.
      • L17L18: set 'cur_mid_in_use' to 'mid'
        Note: cur_mid_in_use keeps the marked next field so that it remains stable for a possible next field change. It cannot be deflated while it is marked.
      • L18L19: set 'mid' to 'next'.
      • L19L20: mark next field in the new 'mid' and update 'next'; loop around and do it all again.

The last line of the code block (L22L23) prepends 'm' to self's free list.

...

    L01:  int ObjectSynchronizer::deflate_monitor_list(ObjectMonitor* volatile * list_p,
    L02:                                               int volatile * count_p,
    L03:                                               ObjectMonitor** free_head_p,
    L04:                                               ObjectMonitor** free_tail_p) {
    L05:    ObjectMonitor* cur_mid_in_use = NULL;
    L06:    ObjectMonitor* mid = NULL;
    L07:    ObjectMonitor* next = NULL;
    L08:    int deflated_count = 0;
    L09:    if (!mark_list_head(list_p, &mid, &next)) {
    L10:      return 0;  // The list is empty so nothing to deflate.
    L11:    }
    L12:    while (true) {
    L13:      oop obj = (oop) mid->object();
    L14:      if (obj != NULL && deflate_monitor(mid, obj, free_head_p, free_tail_p)) {
    L15:        if (Atomic::cmpxchg(next, if (cur_mid_in_use == NULL) {
    L16:          OrderAccess::release_store(list_p, mid) != mid) {
    L16:          Atomic::cmpxchg(next, next);
L17: } else {
L18: OrderAccess::release_store(&cur_mid_in_use->_next_om, midnext);
    L17L19:        }
    L18L20:        deflated_count++;
    L19L21:        Atomic::dec(count_p);
    L20L22:        set_next(mid, NULL);
    L21L23:        mid = next;
    L22L24:      } else {
    L23L25:        set_next(mid, next);  // unmark next field
    L24L26:        cur_mid_in_use = mid;
    L25L27:        mid = next;
    L26L28:      }
    L27L29:      if (mid == NULL) {
    L28L30:        break;  // Reached end of the list so nothing more to deflate.
    L29:      }
    L30:      next = mark_next_loop(mid);
    L31:    }
    L32:    return deflated_count;
    L33:  }

Note: In v2.07, I figured out a simpler way to do L15-L16:

    L15:       if (cur_mid_in_use == NULL) {
    L16:         OrderAccess::release_store(list_p, next.
    L31:      }
    L32:      next = mark_next_loop(mid);
    L17L33:          } else {
    L18L34:         OrderAccess::release_store(&cur_mid_in_use->_next_om, next)    return deflated_count;
    L19L35:       }

...

  }

The above is not an exact copy of the code block from deflate_monitor_list(), but it is the highlights. What the above code block needs to do is pretty simple:

...

    • L09 marks the 'list_p' head (if it is not empty):
      • 'mid' is 'list_p's head and its next field is marked.
      • 'next' is the unmarked next field from 'mid'.
    • L12-L32L34: We walk each 'mid' in the list and determine if it can be deflated:
      • L14: if 'mid' is associated with an object and can be deflated:
        • L15: if cur_mid_in_use is NULL, we're still processing the head of the in-use list so...
          • L16: we cmpxchg the list head's next field from 'marked_mid' to 'next'.
        • else
          • L18: we cmpxchg the cur
          we can't cmpxchg 'list_p' to the 'next' ObjectMonitor*:
          Note: This cmpxchg only works if 'mid' is 'list_p's list head and no-harm-no-foul if 'mid' is not 'list_p's list head. This is faster than doing a load-acquire of 'list_p', checking the value and then calling cmpxchg.
          • L16: we cmpxchg cur_mid_in_use's next field from 'marked_mid' to 'next'.Note: We use cmpxchg here instead of release-store so that we can sanity check the result; see the real code.
          • Note: In v2.07, I figured out a simpler way to do L15-L16 so the above bullets will be updated.
        • L18 → L21L20 → L23: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
          Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unmarks it).
      • L2[24-57]: 'mid' can't be deflated so unmark mid's next field and advance both 'cur_mid_in_use' and 'mid'.
      • L2[78L29 → L30]: we reached the end of the list so break out of the loop.
      • L30L32: mark next field in the new 'mid' and update 'next'; loop around and do it all again.
    • L32L34: all done so return 'deflated_count'.

...

    L01:  int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor* volatile * list_p,
    L02:                                                        int volatile * count_p,
    L03:                                                        ObjectMonitor** free_head_p,
    L04:                                                        ObjectMonitor** free_tail_p,
    L05:                                                        ObjectMonitor** saved_mid_in_use_p) {
    L06:    ObjectMonitor* cur_mid_in_use = NULL;
    L07:    ObjectMonitor* mid = NULL;
    L08:    ObjectMonitor* next = NULL;
    L09:    ObjectMonitor* next_next = NULL;
    L10:    int deflated_count = 0;
    L11:    if (*saved_mid_in_use_p == NULL) {
    L12:      if (!mark_list_head(list_p, &mid, &next)) {
    L13:        return 0;  // The list is empty so nothing to deflate.
    L14:      }
    L15:    } else {
    L16:      cur_mid_in_use = *saved_mid_in_use_p;
    L17:      mid = mark_next_loop(cur_mid_in_use);
    L18:      if (mid == NULL) {
    L19:        set_next(cur_mid_in_use, NULL);  // unmark next field
    L20:        *saved_mid_in_use_p = NULL;
    L21:        return 0;  // The remainder is empty so nothing more to deflate.
    L22:      }
    L23:      next = mark_next_loop(mid);
    L24:    }
    L25:    while (true) {
    L26:      if (next != NULL) {
    L27:        next_next = mark_next_loop(next);
    L28:      }
    L29:      if (mid->object() != NULL && mid->is_old() &&
    L30:          deflate_monitor_using_JT(mid, free_head_p, free_tail_p)) {
    L31:        if (Atomic::cmpxchg(next, list_p, mid) != midcur_mid_in_use == NULL) {
    L32:          ObjectMonitor* marked_mid = mark_om_ptr(midOrderAccess::release_store(list_p, next);
    L33:        } else {
    L34:          ObjectMonitor* marked_next = mark_om_ptr(next);
    L34L35:                 AtomicOrderAccess::cmpxchg(marked_next, release_store(&cur_mid_in_use->_next_om, marked_midnext);
    L35:       L36: }
    L36L37:        deflated_count++;
    L37L38:        Atomic::dec(count_p);
    L38L39:        set_next(mid, NULL);
    L39L40:        mid = next;  // mid keeps non-NULL next's marked next field
    L40L41:        next = next_next;
    L41L42:      } else {
    L42L43:        if (cur_mid_in_use != NULL) {
    L43L44:          set_next(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L44L45:        }
    L45L46:        cur_mid_in_use = mid;
    L46L47:        mid = next;  // mid keeps non-NULL next's marked next field
    L47L48:        next = next_next;
    L48L49:        if (SafepointSynchronize::is_synchronizing() &&
    L49L50:            cur_mid_in_use != OrderAccess::load_acquire(list_p) &&
    L50L51:            cur_mid_in_use->is_old()) {
    L51L52:          *saved_mid_in_use_p = cur_mid_in_use;
    L52L53:          set_next(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L53L54:          if (mid != NULL) {
    L54L55:            set_next(mid, next);  // umark mid
    L55L56:          }
    L56L57:          return deflated_count;
    L57L58:        }
    L58L59:      }
    L59L60:      if (mid == NULL) {
    L60L61:        if (cur_mid_in_use != NULL) {
    L61L62:          set_next(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L62L63:        }
    L63L64:        break;  // Reached end of the list so nothing more to deflate.
    L64:      }
    L65:    }
    L66:    *saved_mid_in_use_p = NULL;
    L67:    return deflated_count;
    L68:  }

Note: In v2.07, I figured out a simpler way to do L31-L34:

    L31:        if (cur_mid_in_use == NULL) {
    L32:          OrderAccess::release_store(list_p, next);
    L33:        } else {
    L34:          ObjectMonitor* marked_next = mark_om_ptr(next);
    L35:          OrderAccess::release_store(&cur_mid_in_use->_next_om, marked_next);
    L36:        to deflate.
    L65:      }
    L66:    }
    L67:    *saved_mid_in_use_p = NULL;
    L68:    return deflated_count;
    L69:  }

The line numbers in the analysis below are still for the v2.06 version and will be updated when we roll the project forward to v2.07.

...

    • L1[1-3]: Handle the initial setup if we are not resuming after a safepoint:
      • L12 marks the 'list_p' head (if it is not empty):
        • 'mid' is 'list_p's head and its next field is marked.
        • 'next' is the unmarked next field from 'mid'.
    • L15-L23: Handle the initial setup if we are resuming after a safepoint:
      • L17: mark next field in 'cur_mid_in_use' and update 'mid'
      • L18-L21: If 'mid' == NULL, then we've resumed context at the end of the list so we're done.
      • L23: mark next field in 'mid' and update 'next'
    • L25-L63L64: We walk each 'mid' in the list and determine if it can be deflated:
      • L2[67]: if next != NULL, then mark next field in 'next' and update 'next_next'
      • L29-L40L41: if 'mid' is associated with an object, 'mid' is old, and can be deflated:
        • L31: if we can't cmpxchg 'list_p' to the 'next' ObjectMonitor*:
          Notecur_mid_in_use is NULL, we're still processing the head of the in-use list so...
          • L32: we cmpxchg the list head's next field from 'marked_mid' to 'next'.
        • else
          • L34
          : This cmpxchg only works if 'mid' is 'list_p's list head and no-harm-no-foul if 'mid' is not 'list_p's list head. This is faster than doing a load-acquire of 'list_p', checking the value and then calling cmpxchg.
          • L32: make a marked copy of 'mid'
          • L33: make a marked copy of 'next'
          • L34L35: we cmpxchg cur the cur_mid_in_use's next field from 'marked_mid' to 'marked_next'.Note: We use cmpxchg here instead of release-store so that we can sanity check the result; see the real code.
          • Note: In v2.07, I figured out a simpler way to do L31-L34 so the above bullets will be updated.
        • L36 → L38L37 → L39: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
          Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unmarks it).
        • L39L40: advance 'mid' to 'next'.
          Note: 'mid' keeps non-NULL 'next's marked next field.
        • L40L41: advance 'next' to 'next_next'.
      • L41L42-L56L57: 'mid' can't be deflated so we have to carefully advance the list pointers:
        • L4[2334]: if cur_mid_in_use != NULL, then unmark next field in 'cur_mid_in_use'.
        • L45L46: advance 'cur_mid_in_use' to 'mid'.
          Note: The next field in 'mid' is still marked and 'cur_mid_in_use' keeps that.
        • L46L47: advance 'mid' to 'next'.
          Note: The next field in a non-NULL 'next' is still marked and 'mid' keeps that.
        • L47L48: advance 'next' to 'next_next'.
        • L48L49-L56L57: Handle a safepoint if one has started and it is safe to do so.
      • L59L60-L63L64: we reached the end of the list:
        • L6[0112]: if cur_mid_in_use != NULL, then unmark next field in 'cur_mid_in_use'.
        • L63L64: break out of the loop because we are done
    • L66L67: not pausing for a safepoint so clear saved state.
    • L67L68: all done so return 'deflated_count'.

...

  • New diagnostic option '-XX:AsyncDeflateIdleMonitors' that is default 'true' so that the new mechanism is used by default, but it can be disabled for potential failure diagnosis.
  • ObjectMonitor deflation is still initiated or signaled as needed at a safepoint. When Async Monitor Deflation is in use, flags are set so that the work is done by the ServiceThread which offloads the safepoint cleanup mechanism.
    • Having the ServiceThread deflate a potentially long list of in-use monitors could potentially delay the start of a safepoint. This is detected in ObjectSynchronizer::deflate_monitor_list_using_JT() which will save the current state when it is safe to do so and return to its caller to drop locks as needed before honoring the safepoint requestsave the current state when it is safe to do so and return to its caller to drop locks as needed before honoring the safepoint request.
  • New diagnostic option '-XX:AsyncDeflationInterval' that is default 250 millis; this this option controls how frequently we async default idle monitors when MonitorUsedDeflationThreshold is exceeded.
  • New diagnostic option '-XX:HandshakeAfterDeflateIdleMonitors' that is default false on the LP64 X64 platform and default true on other platforms that implement C2.
  • Everything else is just monitor list management, infrastructure, logging, debugging and the like. :-)

...

  • The existing safepoint deflation mechanism is still invoked at safepoint "cleanup" time when '-XX:AsyncDeflateIdleMonitors' is false or when a special cleanup request is made.
  • SafepointSynchronize::do_cleanup_tasks() calls:
    • ObjectSynchronizer::prepare_deflate_idle_monitors()
    • A ParallelSPCleanupTask is used to perform the tasks (possibly using parallel tasks):
      • A ParallelSPCleanupThreadClosure is used to perform the per-thread tasks:
        • ObjectSynchronizer::deflate_thread_local_monitors() to deflate per-thread idle monitors
      • ObjectSynchronizer::deflate_idle_monitors() to deflate global idle monitors
    •  ObjectSynchronizer::finish_deflate_idle_monitors()
  • If MonitorUsedDeflationThreshold is exceeded (default is 90%, 0 means off), then the ServiceThread will invoke a cleanup safepoint when '-XX:AsyncDeflateIdleMonitors' is false. When '-XX:AsyncDeflateIdleMonitors' is true, the ServiceThread will call ObjectSynchronizer::deflate_idle_monitors_using_JT().
    • This experimental flag was added in JDK10 via:

...

    • For this option, exceeded means:

   ((gMonitorPopulation - gMonitorFreeCount) / gMonitorPopulationg_om_population - g_om_free_count) / g_om_population) > NN%

  • If MonitorBound is exceeded (default is 0 which means off), cleanup safepoint will be induced.
  • For this option, exceeded means:

(gMonitorPopulation - gMonitorFreeCountg_om_population - g_om_free_count) > MonitorBound

  • This is a very difficult option to use correctly as it does not scale.
  • The MonitorBound option has been deprecated via JDK-8230938.
  • Changes to the safepoint deflation mechanism by the Async Monitor Deflation project (when async deflation is enabled):
    • If System.gc() is called, then a special deflation request is made which invokes the safepoint deflation mechanism.
    • Added the AsyncDeflationInterval diagnostic option (default 250 millis, 0 means off) to prevent MonitorUsedDeflationThreshold requests from swamping the ServiceThread.
      • Description: Async deflate idle monitors every so many milliseconds when MonitorUsedDeflationThreshold is exceeded (0 is off).
      • A special deflation request can cause an async deflation to happen sooner than AsyncDeflationInterval.
    • SafepointSynchronize::do_cleanup_tasks() now calls:
      • ObjectSynchronizer::is_safepoint_deflation_needed() instead of ObjectSynchronizer::is_cleanup_needed().
      • is_safepoint_deflation_needed() returns true only if a special deflation request is made (see above).
    • ObjectSynchronizer::do_safepoint_work() only does the safepoint cleanup tasks if there is a special deflation request. Otherwise it just sets the is_async_deflation_requested flag and notifies the ServiceThread.
    • ObjectSynchronizer::deflate_idle_monitors() and ObjectSynchronizer::deflate_thread_local_monitors() do nothing unless there is a special deflation request.
  • Changes to the ServiceThread mechanism by the Async Monitor Deflation project (when async deflation is enabled):

    • The ServiceThread will wake up every GuaranteedSafepointInterval to check for cleanup tasks.

      • This allows is_async_deflation_needed() to be checked at the same interval.

    • The ServiceThread handles deflating global idle monitors and deflating the per-thread idle monitors by calling ObjectSynchronizer::deflate_idle_monitors_using_JT().

  • Other invocation changes by the Async Monitor Deflation project (when async deflation is enabled):

    • VM_Exit::doit_prologue() will request a special cleanup to reduce the noise in 'monitorinflation' logging at VM exit time.

    • Before the final safepoint in a non-System.exit() end to the VM, we will request a special cleanup to reduce the noise in 'monitorinflation' logging at VM exit time.

...

  • Counterpart function mapping for those that know the existing code:
    • ObjectSynchronizer class:
      • deflate_idle_monitors() has deflate_idle_monitors_using_JT(), deflate_global_idle_monitors_using_JT(), deflate_per_thread_idle_monitors_using_JT(), and deflate_common_idle_monitors_using_JT().
      • deflate_monitor_list() has deflate_monitor_list_using_JT()
      • deflate_monitor() has deflate_monitor_using_JT()
    • ObjectMonitor class:
      • clear() has clear_using_JT()
  • These functions recognize the Async Monitor Deflation protocol and adapt their operations:
    • ObjectMonitor::enter()
    • ObjectMonitor::EnterI()
    • ObjectMonitor::ReenterI()
    • ObjectSynchronizer::quick_enter()
    • ObjectSynchronizer::deflate_monitor()
    • Note: These changes include handling the lingering owner == DEFLATER_MARKER value.
  • Also these functions had to adapt and retry their operations:
    • ObjectSynchronizer::FastHashCode()
    • ObjectSynchronizer::current_thread_holds_lock()
    • ObjectSynchronizer::query_lock_ownership()
    • ObjectSynchronizer::get_lock_owner()
    • ObjectSynchronizer::monitors_iterate()
    • ObjectSynchronizer::inflate_helper()
    • ObjectSynchronizer::inflate() 
  • Various assertions had to be modified to pass without their real check when AsyncDeflateIdleMonitors is true; this is due to the change in semantics for the ObjectMonitor owner field.
  • ObjectMonitor has a new allocation_state field that supports three states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied to ObjectMonitors that have reached the 'Old' state.
    • Note: Prior to CR1/v2.01/4-for-jdk13, the allocation state was transitioned from 'New' to 'Old' in deflate_monitor_via_JT(). This meant that deflate_monitor_via_JT() had to see an ObjectMonitor twice before deflating it. This policy was intended to prevent oscillation from 'New' → 'Old' and back again.
    • In CR1/v2.01/4-for-jdk13, the allocation state is transitioned from 'New' -> "Old" in inflate(). This makes ObjectMonitors available for deflation earlier. So far there has been no signs of oscillation from 'New' → 'Old' and back again.
  • ObjectMonitor has a new ref_count field that is used as part of the async deflation protocol and to indicate that an ObjectMonitor* is in use so the ObjectMonitor should not be deflated; this is needed for operations on non-busy monitors so that ObjectMonitor values don't change while they are being queried. There is a new ObjectMonitorHandle helper to manage the ref_count.
  • The ObjectMonitor::owner() accessor detects DEFLATER_MARKER and returns NULL in that case to minimize the places that need to understand the new DEFLATER_MARKER value.
  • System.gc()/JVM_GC() causes a special monitor list cleanup request which uses the safepoint based monitor list mechanism. So even if AsyncDeflateIdleMonitors is enabled, the safepoint based mechanism is still used by this special case.
    • This is necessary for those tests that do something to cause an object's monitor to be inflated, clear the only reference to the object and then expect that enough System.gc() calls will eventually cause the object to be GC'ed even when the thread never inflates another object's monitor. Yes, we have several tests like that. :-)