Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: First pass at updates for CR9/v2.09/12-for-jdk14.

...

Full Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/1012-for-jdk14.v2.0709.full/

Inc Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/1012-for-jdk14.v2.0709.inc/

Background

This patch for Async Monitor Deflation is based on Carsten Varming's

...

Please note that in Carsten's original prototype, there was another race in ObjectSynchronizer::FastHashCode() when the object's monitor had to be inflated. The setting of the hashcode in the ObjectMonitor's header/dmw could race with T-deflate. That race is resolved in this version by the use of an ObjectMonitorHandle in the call to ObjectSynchronizer::inflate(). The ObjectMonitor* returned by ObjectMonitorHandle.om_ptr() has a non-zero ref_count so no additional races with T-deflate are possible.

Spin-Lock

...

Monitor List Management In Theory

Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching from course grained Thread::muxAcquire(&gListLock) and Thread::muxRelease(&gListLock) pairs to spin-lock -free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the spin-lock -free list management code has to have a number of special cases which are described here.

...

There is one simple case of spin-lock -free list management with the Java Monitor subsystem so we'll start with that code as a way to introduce the spin-lock -free concepts:

        L1:     while (true) {
        L2:      PaddedObjectMonitor* cur = Atomic::load(&g_block_list);
        L3:       Atomic::store(&new_blk[0]._next_om =, cur);
        L4:      if (Atomic::cmpxchg(new_blk, &g_block_list, cur, new_blk) == cur) {
        L5:        Atomic::add(&LVars.population, _BLOCKSIZE - 1, &g_om_population);
        L6:        break;
        L7:      }
        L8:     }

What the above block of code does is:

    • prepends a 'new_blk' to the front of 'g_block_list'
    • increments the 'g_om_LVars.population' counter to include the number of new elements

...

    • L2 loads the current 'g_block_list' value into 'cur'; g_block_list is only modified by cmpxchg so a load (instead of a load-acquire) is sufficient.
    • L3 stores 'cur' into the 0th element's next field for 'new_blk'; g_block_list is only modified by cmpxchg so a store (instead of a release-store) is sufficient..
    • L4 is the critical L4 is the critical decision point for this lock-free list update. cmpxchg will change 'g_block_list' to 'new_blk' iff 'g_block_list' == 'cur' (publish it).
      • if the cmpxchg return value is 'cur', then we succeeded with the lock-free list update and we atomically update 'g_om_LVars.population' to match.
      • Otherwise we loop around and do everything again from L2. This is the "spin" part of spin-lock. (smile)

At the point that cmpxchg has published the new 'g_block_list' value, 'new_blk' is now first block in the list and the 0th element's next field is used to find the previous first block; all of the monitor list blocks are chained together via the next field in the block's 0th element. It is the use of cmpxchg to update 'g_block_list' and the checking of the return value from cmpxchg that insures that we don't lose track of any blocks.

...

to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'g_om_LVars.population' counter is considered to be just accounting (pun intended).

...

The left hand column shows "Thread1T1" taking a node "A" from the front of the list and it shows the simple code that does that operation. The right hand column shows "Thread2T2" prepending a node "B" to the front of the list and it shows the simple code that does that operation. We have a third thread, "Thread3T3", that does a take followed by a prepend, but we don't show a column for "Thread3T3". Instead we have a column in the middle that shows the results of the interleaved operations of all three threads:

...

When we allow simultaneous take and prepend operations on the same list, the simple algorithms are exposed to A-B-A races. An A-B-A race is a situation where the head of the list can change from node "A" to node "B" and back to node "A" again without the simple algorithm being aware that critical state has changed. In the middle column of the above diagram, we show what happens when Thread3 T3 causes the head of the list to change from node "A" to node "B" (a take operation) and back to node "A" (a prepend operation). That A-B-A race causes Thread1 T1 to lose node "B" when it updates the list head to node "X" instead of node "B" because Thread1 T1 was unaware that its local 'next' value was stale.

Here's the diagram again with the code in Thread1 T1 and Thread2 T2 lined up with the effects of the A-B-A race executed by Thread3T3:

    T1: Simple Take:                           |                                            | T2: Simple Prepend:
---------------- | T1 and T3 see this initial list: | -------------------
while (true) { | +---+ +---+ +---+ | :
cur = head; | head -> | A | -> | X | -> | Y | | :
next = cur->next; | +---+ +---+ +---+ | :
: | T3 takes "A", T2 sees this list: | :
: | +---+ +---+ | :
: | head -> | X | -> | Y | | :
: | +---+ +---+ | while (true) {
: | T2 prepends "B": | cur = head;
: | +---+ +---+ +---+ | new->next = cur;
: | head -> | B | -> | X | -> | Y | | if (cmpxchg(new, &head, cur) == cur) {
: | +---+ +---+ +---+ | break;
: | T3 prepends "A": | }
: | +---+ +---+ +---+ +---+ | }
: | head -> | A | -> | B | -> | X | -> | Y | |
: | +---+ +---+ +---+ +---+ |
: | T1 takes "A", loses "B": |
: | +---+ |
: | | B | ----+ |
: | +---+ | |
: | V |
: | +---+ +---+ |
if (cmpxchg(next, &head, cur) == cur) { | head -> | X | -> | Y | |
} | +---+ +---+ |
} | +---+ |
return cur; | cur -> | A | |
| +---+ |

So the simple algorithms are not sufficient when we allow simultaneous take and prepend operations.

...

Spin-Locking to Solve the A-B-A Race

Note: This subsection is talking about "MarkingSpin-Locking" as a solution to the A-B-A race in abstract terms. The purpose of this marking spin-locking code and A-B-A example is to introduce the solution concepts. The code shown here is not an exact match for the project code.

One solution to the A-B-A race is to mark spin-lock the next field in a node to indicate that the node is busy. Only one thread can successfully mark spin-lock the next field in a node at a time and other threads must loop around and retry their marking spin-locking operation until they succeed. Each thread that marks spin-locks the next field in a node must unmark unlock the next field when it is done with the node so that other threads can proceed.

Here's the take algorithm modified with marking spin-locking (still ignores the empty list for clarity):

    // "take" a node with markinglocking:
while (true) {
cur = head;
if (!marktry_om_nextlock(cur, &next)) {
// could not marklock cur so try again
continue;
}
if (head != cur) {
// head changed while markinglocking cur so try again
unmarkom_nextunlock(cur);
continue;
}
next = unmarked_next(cur);
// list head is now markedlocked so switch it to next which also makes list head unmarkedunlocked
OrderAccessAtomic::release_store(&head, next);
unmarkom_nextunlock(cur); // unmarkunlock cur and return it
return cur;
}

The modified take algorithm does not change the list head pointer until it has successfully marked locked the list head node. Notice that after we mark lock the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked locked is still the list head is it safe to modify the list head pointer. The marking locking of the list head prevents the take algorithm from executing in parallel with a prepend algorithm and losing a node.

Also notice that we update the list head pointer with release-store instead of with cmpxchg. Since we have the list head markedlocked, we are not racing with other threads to change the list head pointer so we can use the smaller release-store hammer a simple store instead of the heavier heavy cmpxchg hammer.

Here's the prepend algorithm modified with marking locking (ignores the empty list for clarity):

    // "prepend" a node with markinglocking:
while (true) {
cur = head;
if (!marktry_om_nextlock(cur, &next)) {
// could not marklock cur so try again
continue;
}
if (head != cur) {
// head changed while markinglocking cur so try again
unmarkom_nextunlock(cur);
continue;
}
next = unmarked_next(cur);
// list head is now markedlocked so switch it to 'new' which also makes list head unmarkedunlocked
Atomic::release_store(&head, new);
unmarkom_nextunlock(cur); // unmarkunlock the previous list head
}

The modified prepend algorithm does not change the list head pointer until it has successfully marked locked the list head node. Notice that after we mark lock the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked locked is still the list head is it safe to modify the list head pointer. The marking locking of the list head prevents the prepend algorithm from executing in parallel with the take algorithm and losing a node.

Also notice that we update the list head pointer with release-store instead of with cmpxchg for the same reasons as the previous algorithm.

...

    • ObjectMonitors are deflated at a safepoint by:
          ObjectSynchronizer::deflate_monitor_list() calling ObjectSynchronizer::deflate_monitor()
      And when Async Monitor Deflation is enabled, they are deflated by:
          ObjectSynchronizer::deflate_monitor_list_using_JT() calling ObjectSynchronizer::deflate_monitor_using_JT()

    • Idle ObjectMonitors are deflated by the ServiceThread when Async Monitor Deflation is enabled. They can also be deflated at a safepoint by the VMThread or by a task worker thread. Safepoint deflation is used when Async Monitor Deflation is disabled or when there is a special deflation request, e.g., System.gc().

    • An idle ObjectMonitor is deflated and extracted from its in-use list and prepended to the global free list. The in-use list can be either the global in-use list or a per-thread in-use list. Deflated ObjectMonitors are always prepended to the global free list.

      • In CR7/v2.07/10-for-jdk14, the HandshakeAfterDeflateIdleMonitors diagnostic option is added to enable a new g_wait_list that tracks deflated ObjectMonitors until after a handshake/safepoint with all JavaThreads.; in CR9/v2.09/12-for-jdk14, g_wait_list was renamed to LVars.wait_list.
      • The LVars.The g_wait_list allows ObjectMonitors to be safely deflated on platforms that do not have C2 inc_om_ref_count() implemented. See the "T-save Complication with C2" subsection above for the gory C2 details.
      • So when the option is enabled, idle ObjectMonitors are deflated and extracted from an in-use list and prepended to g_LVars.wait_list; after the handshake/safepoint with all JavaThreads, the ObjectMonitors on the g_LVars.wait_list are prepended to the global free list.

...

    • global free list:
      • prepended to by JavaThreads that allocated a new block of ObjectMonitors (malloc time)
      • prepended to by JavaThreads that are exiting (and have a non-empty per-thread free list)
      • taken from the head by JavaThreads that need to allocate ObjectMonitor(s) for their per-thread free list (reprovision)
      • prepended to by deflation done by:
        • either the VMThread or a worker thread for safepoint based
        • or the ServiceThread for async monitor deflation
    • global in-use list:
      • prepended to by JavaThreads that are exiting (and have a non-empty per-thread free list)
      • extracted from by deflation done by:
        • either the VMThread or a worker thread for safepoint based
        • or the ServiceThread for async monitor deflation
    • global wait list:
      • only used when HandshakeAfterDeflateIdleMonitors == true
      • prepended by the ServiceThread during async deflation
      • entire list detached and prepended to the global free list by the ServiceThread during async deflation
      • Note: The global wait list serves the same function as Carsten's gFreeListNextSafepoint list in his prototype.
    • per-thread free list:
      • prepended to by a JavaThread when it needs to allocate new ObjectMonitor(s) (reprovision)
      • taken from the head by a JavaThread when it needs to allocate a new ObjectMonitor (inflation)
      • prepended to by a JavaThread when it isn't able to link the object to the ObjectMonitor (failed inflation)
      • entire list detached and prepended to the global free list when the JavaThread is exiting
    • per-thread in-use list:
      • prepended to by a JavaThread when it allocates a new ObjectMonitor (inflation, optimistically in-use)
      • extracted from by deflation done by:
        • either the VMThread or a worker thread for safepoint based
        • or the ServiceThread for async monitor deflation
      • entire list detached and prepended to the global in-use list when the JavaThread is exiting

Spin-Lock

...

Monitor List Management In Reality

Prepending To A List That Also Allows Deletes

...

The next case to consider for spin-lock -free list management with the Java Monitor subsystem is prepending to a list that also allows deletes. As you might imagine, the possibility of a prepend racing with a delete makes things more complicated. The solution is to "mark" lock the next field in the ObjectMonitor at the head of the list we're trying to prepend to. A successful mark lock tells other prependers or deleters that the marked locked ObjectMonitor is busy and they will need to retry their own mark lock operation.

    L01:     while (true) {
    L02:    (void)mark_next_loop      om_lock(m);  // markLock m so we can safely update its next field.
    L03:         ObjectMonitor* cur = NULL;
    L04:    ObjectMonitor* next = NULL;
    L05:          // MarkLock the list head to guard against A-B-A race:
    L06L05:         if (mark(cur = get_list_head_locked(list_p, &cur, &next)))) != NULL) {
    L07L06:             // List head is now markedlocked so we can safely switch it.
    L08L07:             set_next(m, cur);  // m now points to cur (and unmarksunlocks m)
    L09L08:             OrderAccessAtomic::release_store(list_p, m);  // Switch list head to unmarkedunlocked m.
    L10L09:             setom_nextunlock(cur, next);  // Unmark the previous list head.
    L11L10:             break;
    L12L11:         }
    L13L12:         // The list is empty so try to set the list head.
    L14L13:         assert(cur == NULL, "cur must be NULL: cur=" INTPTR_FORMAT, p2i(cur));
    L15L14:         set_next(m, cur);  // m now points to NULL (and unmarksunlocks m)
    L16L15:         if (Atomic::cmpxchg(m, list_p, cur, m) == cur) {
    L17L16:             // List head is now unmarkedunlocked m.
    L18L17:             break;
    L19L18:         }
    L20L19:         // Implied else: try it all again
    L21L20:     }
    L22L21:    Atomic::inc(count_p);

What the above block of code does is:

    • prepends an ObjectMonitor 'm' to the front of the list referred to by list_p
      • mark lock 'm's next fieldmark
      • lock the list head's next field
      • update 'm' to refer to the list head
      • update 'list_p' to refer to 'm'
      • unmark the next field in unlock the previous list head
    • increments the counter referred to by 'count_p' by one

The above block of code can be called by multiple prependers in parallel or with deleters running in parallel and must not lose track of any ObjectMonitor. Of course, the "must not lose track of any ObjectMonitor" part is where all the details come in:

    • L02 loops to mark locks 'm's next field; internally we have to loop because another thread (T2) might have 'm' marked locked and we try again until we have marked locked it.
      You might be asking yourself: why does T2 have 'm' markedlocked?
      • Before T1 was trying to prepend 'm' to an in-use list, T1 and T2 were racing to take an ObjectMonitor off the free list.
      • T1 won the race, marked locked 'm', removed 'm' from the free list and unmarked unlocked 'm'; T2 stalled before trying to mark lock 'm'.
      • T2 resumed and marked locked 'm', realized that 'm' was no longer the head of the free list, unmarked unlocked 'm' and tried it all again.
      • If our thread (T1) does not mark lock 'm' before it tries to prepend it to an in-use list, then T2's umarking unlocking of 'm' could erase the next value that T1 wants to put in 'm'.
    • L06 L05 tries to mark lock the list head 'list_p'; if markget_list_head_locked() returns truenon-NULL, we have the list head marked locked and can safely update it:
      • L08L07: Update 'm's next field to point to the current list head (which unmarks unlocks 'm').
      • L09L08: release-store 'm' into 'list_p' which switches the list head to an unmarked unlocked 'm'.
      • L10L09: We unmark unlock the previous list head.
    • If markget_list_head_locked() returned falseNULL, we have an empty list:
      • L15L14: Update 'm's next field to NULL (which unmarks unlocks 'm').
      • L16L15: Try to cmpxchg 'list_p' to 'm':
        • if cmpxchg works, then we're done.
        • Otherwise, another prepender won the race to update the list head so we have to try again.
    • The counter referred to by 'count_p' is incremented by one.

ObjectMonitor 'm' is safely on the list at the point that we have updated 'list_p' to refer to 'm'. In this subsection's block of code, we also called three new functions, mark_next_loop: om_lock(), markget_list_head_locked() and set_next(), that are explained in the next few subsections about helper functions.

Note: The above code snippet comes from prepend_to_common(); see that function for more context and a few more comments.

...

try_om_

...

lock(), mark_om_ptr(), and set_next() Helper Functions

Managing marks spin-locks on ObjectMonitors has been abstracted into a few helper functions. marktry_om_nextlock() is the first interesting one:

       L01L1:  static bool marktry_om_nextlock(ObjectMonitor* om, ObjectMonitor** next_p) {
    L02L2:    // Get current next field without any markingOM_LOCK_BIT value.
    L03L3:    ObjectMonitor* next = (ObjectMonitor*)
L04:        ((intptr_t)OrderAccessAtomic::load_acquire(&om->_next_om) & ~0x1~OM_LOCK_BIT);
    L05L4:    if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next, mark_om_ptr(next)) != next) {
    L06L5:      return false;  // CouldCannot notlock mark the next field or it was already markedObjectMonitor.
    L07L6:    } 
L08:    *next_p = next;
L09    L7:    return true;
    L10L8:  }

The above function tries to mark lock the next field in an ObjectMonitor:

    • If marking locking is successful, then the unmarked next field is returned via parameter and true is returned.
    • Otherwise, false is returned.

The function can be called by multiple threads at the same time and only one thread will succeed in the marking locking operation (return == true) and all other threads will get return == false. Of course, the "only one thread will succeed" part is where all the details come in:

    • L0[34] load-acquires L3 loads the ObjectMonitor's next field and strips the marking locking bit:
      • The unmarked unlocked value is saved in 'next'.
      • We need the unmarked unlocked next value in order to properly detect if the next field was already markedlocked.
    • L05 L4 tries to cmpxchg a marked locked 'next' value into the ObjectMonitor's next field:
      • if cmpxchg does not work, then we return false:
        • The cmpxchg will not work if the next field changes after we load-acquired loaded the value on L04L3.
        • The cmpxchg will not work if the next field is already markedlocked.
      • Otherwise, we return the unmarked 'next' value via 'next_p' and return true.

The marktry_om_nextlock() function calls another helper function, mark_om_ptr(), that needs a quick explanation:

    L1:  static ObjectMonitor* mark_om_ptr(ObjectMonitor* om) {
    L2:    return (ObjectMonitor*)((intptr_t)om | 0x1OM_LOCK_BIT);
    L3:  }

This function encapsulates the setting of the marking locking bit in an ObjectMonitor* for the purpose of hiding the details and making the calling code easier to read:

    • L2 casts the ObjectMonitor* into a type that will allow the '|' operator to be used.
    • We use the 0x1 (OM_LOCK_BIT) bit as our marking locking value because ObjectMonitors are aligned on a cache line so the low order bit is not used by the normal addressing of an ObjectMonitor*.

...

    L1:  static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
    L2:    OrderAccessAtomic::release_store(&om->_next_om, value);
    L3:  }

...

    • This function is simply a wrapper around a release-store of an ObjectMonitor* into the next field in an ObjectMonitor.
    • The typical "set_next(cur, next)" call sequence is easier to read than "OrderAccess::release_store(&cur→_next_om, next)".

...

om_

...

lock() Helper Function

markom_next_looplock() is the next interesting helper function:

    L1:  static ObjectMonitor*void markom_next_looplock(ObjectMonitor* om) {
    L2:    ObjectMonitor* next;
    L3:    while (true) {
    L4L3:      if (marktry_om_nextlock(om, &next)) {
    L5L4:        // Marked om's next field so return the unmarked value.
    L6:        return next;
    L7L5:      }
    L8L6:    }
    L9L7:  }

The above function loops until it marks the next field of the target ObjectMonitor. The unmarked value of the next field is returned by the function. locks the target ObjectMonitor. There is nothing particularly special about this function so we don't need any line specific annotations.

Debugging Tip: If there's a bug where an ObjectMonitor's next field is not properly unmarkedunlocked, then this function will loop forever and the caller will be stuck.

...

get_list_head_locked() Helper Function

markget_list_head_locked() is the next interesting helper function:

    L01:  static boolObjectMonitor* markget_list_head_locked(ObjectMonitor* volatile * list_p,
    L02:                             ObjectMonitor** mid_p, ObjectMonitor** next_p) {
    L03L02:    while (true) {
    L04L03:      ObjectMonitor* mid = OrderAccessAtomic::load_acquire(list_p);
    L05L04:      if (mid == NULL) {
    L06L05:        return falseNULL;  // The list is empty so nothing to mark.
    L07L06:      }
    L08L07:      if (marktry_om_nextlock(mid, next_p)) {
    L09L08:        if (OrderAccessAtomic::load_acquire(list_p) != mid) {
    L10L09:          // The list head changed so we have to retry.
    L11L10:          setom_nextunlock(mid, *next_p);  // unmark mid
    L12L11:          continue;
    L13L12:        }
    L14L13:        // We marked next field to guard against races.
    L15:        *mid_p = midreturn mid;
    L16:        return true;
    L17L14:      }
    L18L15:    }
    L19L16:   }

The above function tries to mark the next field in lock the list head's ObjectMonitor:

    • If the list is empty, false NULL is returned.
    • Otherwise, the list head's ObjectMonitor* is returned via parameter (mid_p), the unmarked next field is returned via parameter (next_p) and true is returned.

The function can be called by more than one thread on the same 'list_p' at a time. False is only returned when 'list_p' refers to an empty list. Otherwise only one thread will return true an ObjectMonitor* at a time with the 'mid_p' and 'next_p' return parameters set. Since the next field in 'mid_p' is markedObjectMonitor is locked, any parallel callers to markget_list_head_locked() will loop until the next field in the list head's ObjectMonitor is no longer markedlocked. That typically happens when the list head's ObjectMonitor is taken off the list and 'list_p' is advanced to the next ObjectMonitor on the list. Of course, making sure that "only one thread will return true at a time" is where all the details come in:

    • L04 load-acquires L03 loads the current 'list_p' value into 'mid'; the use of load-acquire is necessary to get the latest value release-stored by another thread; the current 'list_p' is updated by either a release-store or a cmpxchg depending on the algorithm that made the update; only the release-store needs to match up with a load-acquire, but this code doesn't know whether release-store or cmpxchg was used.
    • L0[5645] is the empty list check and the only time that false NULL is returned by this function.
    • L08 L07 tries to mark the next field in lock 'mid':
      • If marking locking is not successful, we loop around to try it all again .
      • If marking is successful, then 'next_p' contains mid's unmarked next field value.
      • (the "spin" part of spin-lock).
      • L08 loads L09 load-acquires the current 'list_p' value to see if it still matches 'mid':
        • If the list head has changed, then we unmark unlock mid on L11 L10 and try it all again.
        • Otherwise, 'mid' is returned via 'mid_p' and we return true.

When this function returns truea non-NULL ObjectMonitor*, the next field in 'mid_p' is marked ObjectMonitor is locked and any parallel callers of markget_list_head_locked() on the same list will be looping until the next field in the list head's ObjectMonitor is no longer markedlocked. The caller that just got the 'true' return ObjectMonintor* needs to finish up its work with 'mid_p' quickly.

Debugging Tip: If there's a bug where the list head ObjectMonitor 's next field is not properly unmarkedunlocked, then this function will loop forever and the caller will be stuck.

...

The next case to consider for spin-lock -free list management with the Java Monitor subsystem is taking an ObjectMonitor from the start of a list. Taking an ObjectMonitor from the start of a list is a specialized form of delete that is guaranteed to interact with a thread that is prepending to the same list at the same time. Again, the core of the solution is to "mark" the next field in lock the ObjectMonitor at the head of the list we're trying to take the ObjectMonitor from, but we use slightly different code because we have less linkages to make than a prepend.

    L01:  static ObjectMonitor* take_from_start_of_common(ObjectMonitor* volatile * list_p,
    L02:                                                  int volatile * count_p) {
    L03:    ObjectMonitor* nexttake = NULL;
    L04:    ObjectMonitor* take = NULL;
    L05:    // MarkLock the list head to guard against A-B-A race:
    L06L05:    if (!mark(take = get_list_head_locked(list_p, &take, &next))) == NULL) {
    L07L06:      return NULL;  // None are available.
    L07:    }
    L08:    }ObjectMonitor* next = unmarked_next(take);
    L09:    // Switch markedlocked list head to next (which unmarksunlocks the list head, but
    L10:    // leaves take markedlocked):
    L11:    OrderAccessAtomic::release_store(list_p, next);
    L12:    Atomic::dec(count_p);
    L13:    // UnmarkUnlock take, but leave the next value for any lagging list
    L14:    // walkers. It will get cleaned up when take is prepended to
    L15:    // the in-use list:
    L16:    setom_nextunlock(take, next);
    L17:    return take;
    L18:  }

What the above function does is:

    • Tries to mark lock the ObjectMonitor at the head of the list:
      • Marking Locking will only fail if the list is empty so that NULL can be returned.
      • Otherwise markget_list_head_locked() will loop until the ObjectMonitor at the list head has been markedlocked.
    • Get the next pointer from the taken ObjectMonitor.
    • Updates the list head to refer to the next ObjectMonitor.
    • Decrements the counter referred to by 'count_p'.
    • Unmarks the next field in Unlocks the taken ObjectMonitor.

The function can be called by more than one thread at a time and each thread will take a unique ObjectMonitor from the start of the list (if one is available) without losing any other ObjectMonitors on the list. Of course, the "take a unique ObjectMonitor" and "without losing any other ObjectMonitors" parts are where all the details come in:

    • L06 L05 tries to mark lock the list head:
      • markget_list_head_locked() returns false NULL if the list is empty so we return NULL on L07L06.
      • Otherwise, 'take' is a pointer to the marked locked list head and 'next' is the unmarked next field in the list head.
    • L08 gets the next pointer from take.
    • L11 release-stores 'next' into 'list_p'.
      You might be asking yourself: Why release-store instead of cmpxchg?
      • markget_list_head_locked() only returns to the caller when it has marked locked the next field in the ObjectMonitor at the head of the list.
      • Because of that guarantee, any prepender or deleter thread that is running in parallel must loop until we have release-stored 'next' into 'list_p' which unmarks unlocks the list head.
    • L12 decrements the counter referred to by 'count_p'.
    • L16 unmarks unlocks 'take' using the unmarked 'next' value we got from mark_list_head():
      • Keeping the 'next' value in take's next field allows any lagging list walker to get to the next ObjectMonitor on that list.
      • take's next field will get cleaned up when take is prepended to its target in-use list.
    • L17 returns 'take' to the caller.

Using The New Spin-Lock

...

Monitor List Functions

ObjectSynchronizer::om_alloc(Thread* self, ...)

...

2) Try to allocate from the global free list (up to self→om_free_provision times):

    • take_from_start_of_gglobal_free_list() takes an ObjectMonitor from the global free list (if possible).
    • om_release(self, take, false) prepends the newly taken ObjectMonitor to self's free list.
    • Retry the allocation from step 1.

...

ObjectSynchronizer::om_release() is responsible for putting an ObjectMonitor on self's free list. If 'from_per_thread_alloc' is true, then om_release() is also responsible for extracting the ObjectMonitor from self's in-use list. The extraction from self's in-use list must happen first:

    L01:     if (from_per_thread_alloc) {
    L02:    mark      if ((mid = get_list_head_locked(&self->om_in_use_list, &mid, &next);
    L03:    while (true) {
    L04:      if (m == midNULL) {
    L05L03:        if (cur_fatal("thread=" INTPTR_FORMAT " in-use list must not be empty.", p2i(self));
    L04:      }
    L05:      next = unmarked_next(mid);
    L06:      while (true) {
    L07:        if (m == mid) {
    L08:          if (cur_mid_in_use == NULL) {
    L06L09:                     OrderAccessAtomic::release_store(&self->om_in_use_list, next);
    L07L10:                 } else {
    L08L11:        OrderAccess::release_store(&            set_next(cur_mid_in_use->_next_om, next);
    L09L12:          }
    L10L13:                 extracted = true;
    L11L14:                 Atomic::dec(&self->om_in_use_count);
    L12L15:                 setom_nextunlock(mid, next);
    L13L16:                 break;
    L14L17:             }
    L15L18:             if (cur_mid_in_use != NULL) {
    L16L19:                 setom_nextunlock(cur_mid_in_use, mid);  // umark);
    L20:        }
    L21:        cur_mid_in_use = mid;
    L17:      } L22:        mid = next;
    L23:        if (mid == NULL) {
    L18:      cur_midL24:          fatal("must find m=" INTPTR_FORMAT "on om_in_use = mid_list=" INTPTR_FORMAT,
    L25:                p2i(m), p2i(self->om_in_use_list));
    L19L26:             mid = next}
    L27:        om_lock(mid);
    L20L28:             next = markunmarked_next_loop(mid);
    L21L29:         }
    L22L30:     }
    L23L31:     prepend_to_om_free_list(self, m);

Most of the above code block extracts 'm' from self's in-use list; it is not an exact quote from om_release(), but it is the highlights:

    • L02 is used to mark lock self's in-use list head:
      • 'mid' is self's in-use list head and its next field it is markedlocked.
    • L05 'next' is the unmarked next field from 'mid'.
    • L03 L06 L21L28: self's in-use list is traversed looking for the target ObjectMonitor 'm':
      • L04L07: if the current 'mid' matches 'm':
        • L05L08: if cur_mid_in_use is NULL, we're still processing the head of the thread's in-use list so...
          • L06L09: we cmpxchg store 'next' into the list head's next field from 'marked_mid' to 'next'..
        • else
          • L08L11: we cmpxchg the curset cur_mid_in_use's next field from 'marked_mid' to 'next'.
        • L10 L13 L13L16: we've successfully extracted 'm' from self's in-use list so we decrement self's in-use counter, unmark the next field in unlock 'mid' and we're done.
      • L1[5689]: if cur_mid_in_use != NULL, then unmark its next fieldunlock cur_mid_in_use.
      • L18L21: set 'cur_mid_in_use' to 'mid'
        Note: cur_mid_in_use keeps the marked next field locked 'mid' so that it remains stable for a possible next field change. It cannot be deflated while it is markedlocked.
      • L19L22: set 'mid' to 'next'.
      • L20: mark next field in L2[78]: lock the new 'mid' and update 'next'; loop around and do it all again.

The last line of the code block (L23L31) prepends 'm' to self's free list.

...

ObjectSynchronizer::om_flush() is reponsible for flushing self's in-use list to the global in-use list and self's free list to the global free list during self's thread exit processing. om_flush() starts with self's in-use list:

    L01:     if (mark(in_use_list = get_list_head_locked(&self->om_in_use_list, &in_use_list, &next))) != NULL) {
    L02:         in_use_tail = in_use_list;
    L03:         in_use_count++;
    L04:         for (ObjectMonitor* cur_om = unmarked_next(in_use_list); cur_om != NULL;) {
    L05:             if (is_next_markedlocked(cur_om)) {
    L06:                 while (is_next_markedlocked(cur_om)) {
    L07:                     os::naked_short_sleep(1);
    L08:                 }
    L09:                 cur_om = unmarked_next(in_use_tail);
    L10:                 continue;
    L11:             }
    L12:             if (!cur_om->is_activefree()) {
    L13:        cur_om = unmarked_next(in_use_tail);
    L14:        continue;
    L15:      }
    L16:      in_use_tail = cur_om;
    L17:      in_use_count++;
    L18:      cur_om = unmarked_next(cur_om);
    L19:    }
    L20:    OrderAccess::release_:          cur_om = unmarked_next(in_use_tail);
    L14:          continue;
    L15:        }
    L16:        in_use_tail = cur_om;
    L17:        in_use_count++;
    L18:        cur_om = unmarked_next(cur_om);
    L19:      }
    L20:      guarantee(in_use_tail != NULL, "invariant");
    L21:      int l_om_in_use_count = Atomic::load(&self->om_in_use_count);
    L22:      ADIM_guarantee(l_om_in_use_count == in_use_count, "in-use counts don't match: "
    L23:                     "l_om_in_use_count=%d, in_use_count=%d", l_om_in_use_count, in_use_count);
    L24:      Atomic::store(&self->om_in_use_count, 0);
    L21L25:         OrderAccessAtomic::release_store(&self->om_in_use_list, (ObjectMonitor*)NULL);
    L22L26:         setom_nextunlock(in_use_list, next);
    L23L27:     }

The above is not an exact copy of the code block from om_flush(), but it is the highlights. What the above code block needs to do is pretty simple:

...

However, in this case, there are a lot of details:

    • L01 marks locks the in-use list head (if it is not empty):
      • 'in_use_list' is self's in-use list head and its next field is marked.
      • 'next' is the unmarked next field from 'in_use_list'.
      • The in-use list head is kept marked locked to prevent an async deflation thread from entering the list behind this thread.
        Note: An async deflation thread does check to see if the target thread is exiting, but if it has made it past that check before this thread started exiting, then we're racing.
    • L04-L18L19: loops over the in-use list counting and advancing 'in_use_tail'.
      • L05-L10: 'cur_om' s next field is marked locked so there must be an async deflater thread or a list walker thread ahead of us so we delay to give it a chance to finish and refetch 'in_use_tail's (possibly changed) next field and try again.
      • L12-L14: 'cur_om' was deflated and its allocation state was changed to Free while it was markedlocked. We just happened to be lucky enough to see it just after it was unmarked unlocked (and added to the free list). We refetch 'in_use_tail's (possibly changed) next field and try again.
      • L1[67]: finally 'cur_om' has been completely vetted so we can update 'in_use_tail' and increment 'in_use_count'.
      • L18: advance 'cur_om' to the next ObjectMonitor and do it all again.
    • L20L24: release-store self's in-use count to zero.
      Note: We clear self's in-use count before umarking unlocking self's in-use list head to avoid races.
    • L21L25: release-store self's in-use list head to NULL.
    • L22L26: unmark unlock the disconnected list head.
      Note: Yes, the next field in self's in-use list head was kept marked locked for the whole loop to keep any racing async deflater thread or list walker thread out of the in-use list. After L21L26, the racing async deflater thread will loop around and see self's in-use list is empty and bail out. Similarly, a racing list walker thread will retry and see self's in-use list is empty and bail out.

The code to process self's free list is much, much simpler because we don't have any races with an async deflater thread like self's in-use list. The only interesting bits:

    • load-acquire self's free list head.
    • release-store self's free list head count to NULLzero.
    • release-store self's free list count head to zeroNULL.

The last interesting bits for this function are prepending the local lists to the right global places:

    • prepend_list_to_gglobal_free_list(free_list, free_tail, free_count);
    • prepend_list_to_gglobal_om_in_use_list(in_use_list, in_use_tail, in_use_count);

...

ObjectSynchronizer::deflate_monitor_list() is responsible for deflating idle ObjectMonitors at a safepoint. This function can use the simpler marklock-mid-as-we-go protocol since there can be no parallel list deletions due to the safepoint:

    L01:  int ObjectSynchronizer::deflate_monitor_list(ObjectMonitor* volatile * list_p,
    L02:                                               int volatile * count_p,
    L03:                                               ObjectMonitor** free_head_p,
    L04:                                               ObjectMonitor** free_tail_p) {
    L05:    ObjectMonitor* cur_mid_in_use = NULL;
    L06:    ObjectMonitor* mid = NULL;
    L07:    ObjectMonitor* next = NULL;
    L08:    int deflated_count = 0;
    L09:    if (!mark(mid = get_list_head_locked(list_p, &mid, &next))) == NULL) {
    L10:      return 0;  // The list is empty so nothing to deflate.
    L11:    }
    L12:    next = unmarked_next(mid);
    L13:    while (true) {
    L13L14:      oop obj = (oop) mid->object();
    L14L15:      if (obj != NULL && deflate_monitor(mid, obj, free_head_p, free_tail_p)) {
    L15L16:        if (cur_mid_in_use == NULL) {
    L16L17:          OrderAccessAtomic::release_store(list_p, next);
    L17: L18:        } else {
    L18L19: OrderAccess::release_store(&          set_next(cur_mid_in_use->_next_om, next);
    L19L20:        }
    L20L21:        deflated_count++;
    L21L22:        Atomic::dec(count_p);
    L22L23:        set_next(mid, NULL);
    L23:        mid = next;
    L24:      } else {
    L25:        setom_nextunlock(mid, next);  // unmark next field
    L26:        cur_mid_in_use = mid;
    L27:        mid = next;      }
    L28:      } mid = next;
    L29:      if (mid == NULL) {
    L30:        break;  // Reached end of the list so nothing more to deflate.
    L31:      }
    L32:      om_lock(mid);
    L33:      next = markunmarked_next_loop(mid);
    L33L34:    }
    L34L35:    return deflated_count;
    L35L36:  }

The above is not an exact copy of the code block from deflate_monitor_list(), but it is the highlights. What the above code block needs to do is pretty simple:

...

Since we're using the simpler mark-mid-as-we-go protocol, there are not too many details:

    • L09 marks : locks the 'list_p' head (if it is not empty)
    • L12: 'next' :
    • 'mid' is 'list_p's head and its next field is marked.
    • 'next' is the unmarked next field from 'mid'.
    • L12L13-L34L33: We walk each 'mid' in the list and determine if it can be deflated:
      • L14L15: if 'mid' is associated with an object and can be deflated:
        • L15L16: if cur_mid_in_use is NULL, we're still processing the head of the in-use list so...
          • L16L17: we cmpxchg store the list head 's next field from 'marked_mid' to 'next'.
        • else
          • L18L19: we cmpxchg the curset cur_mid_in_use's next field from 'marked_mid' to 'next'.
        • L20 L21 → L23: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
          Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unmarks unlocks it).
      • L2[4-76]: else 'mid' can't be deflated so unmark unlock mid 's next field and advance both 'cur_mid_in_use' and .
      • L28: advance 'mid'.
      • L29 → L30]: we reached the end of the list so break out of the loop.
      • L32: mark next field in lock the new 'mid'
      • L33: and update 'next'; loop around and do it all again.
    • L34L35: all done so return 'deflated_count'.

...

ObjectSynchronizer::deflate_monitor_list_using_JT() is responsible for asynchronously deflating idle ObjectMonitors using a JavaThread. This function uses the more complicated marklock-cur_mid_in_use-and-mid-as-we-go protocol because om_release() can do list deletions in parallel. We also marklock-next-next-as-we-go to prevent an om_flush() that is behind this thread from passing us. Because this function can asynchronously interact with so many other functions, this is the largest clip of code:

    L01:  int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor* volatile ** list_p,
    L02:                                                        int volatile * count_p,
    L03:                                                        ObjectMonitor** free_head_p,
    L04:                                                        ObjectMonitor** free_tail_p,
    L05:                                                        ObjectMonitor** saved_mid_in_use_p) {
    L06:    JavaThread* self = JavaThread::current();
    L07:    ObjectMonitor* cur_mid_in_use = NULL;
    L07L08:    ObjectMonitor* mid = NULL;
    L08L09:    ObjectMonitor* next = NULL;
    L09L10:    ObjectMonitor* next_next = NULL;
    L10L11:    int deflated_count = 0;
    L11L12:    if (*saved_mid_in_use_p == NULL) {
    L12L13:      if (!mark((mid = get_list_head_locked(list_p, &mid, &next)))) == NULL) {
    L13L14:        return 0;  // The list is empty so nothing to deflate.
    L14L15:      }
    L15L16:      next = unmarked_next(mid);
    L17:    } else {
    L16L18:      cur_mid_in_use = *saved_mid_in_use_p;
    L19:      om_lock(cur_mid_in_use);
    L17L20:      mid = markunmarked_next_loop(cur_mid_in_use);
    L18L21:      if (mid == NULL) {
    L19L22:        setom_nextunlock(cur_mid_in_use, NULL);  // unmark next field
    L20L23:        *saved_mid_in_use_p = NULL;
    L21L24:        return 0;  // The remainder is empty so nothing more to deflate.
    L22L25:      }
    L23 L26:      om_lock(mid);
    L27:      next = markunmarked_next_loop(mid);
    L24L28:    }
    L25L29:    while (true) {
    L26L30:      if (next != NULL) {
    L27 if (next != NULL) {
    L31:        om_lock(next);
    L32:        next_next = markunmarked_next_loop(next);
    L28L33:      }
    L29L34:      if (mid->object() != NULL && mid->is_old() &&
    L30L35:          deflate_monitor_using_JT(mid, free_head_p, free_tail_p)) {
    L31L36:        if (cur_mid_in_use == NULL) {
    L32L37:          OrderAccessAtomic::release_store(list_p, next);
    L33L38:        } else {
    L34L39:          ObjectMonitor* markedlocked_next = mark_om_ptr(next);
    L35L40:        OrderAccess::release_store(&          set_next(cur_mid_in_use->_next_om, markedlocked_next);
    L36L41:        }
    L37L42:        deflated_count++;
    L38L43:        Atomic::dec(count_p);
    L39L44:        set_next(mid, NULL);
    L40L45:        mid = next;  // mid keeps non-NULL next's markedlocked next fieldstate
    L41L46:        next = next_next;
    L42L47:      } else {
    L43L48:        if (cur_mid_in_use != NULL) {
    L44L49:          setom_nextunlock(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L45L50:        }
    L46L51:        cur_mid_in_use = mid;
    L47L52:        mid = next;  // mid keeps non-NULL next's markedlocked next fieldstate
    L48L53:        next = next_next;
    L49L54:        if (SafepointSynchronizeSafepointMechanism::isshould_synchronizingblock(self) &&
    L50L55:            cur_mid_in_use != OrderAccessAtomic::load_acquire(list_p) &&
    L51:            cur_mid_in_use->is_old()) {
    L52L56:          *saved_mid_in_use_p = cur_mid_in_use;
    L53L57:          setom_nextunlock(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L54L58:          if (mid != NULL) {
    L55L59:            setom_nextunlock(mid, next);  // umark mid
    L56L60:          }
    L57L61:          return deflated_count;
    L58L62:        }
    L59L63:      }
    L60L64:      if (mid == NULL) {
    L61L65:        if (cur_mid_in_use != NULL) {
    L62L66:          setom_nextunlock(cur_mid_in_use, mid);  // umark cur_mid_in_use
    L63L67:        }
    L64L68:        break;  // Reached end of the list so nothing more to deflate.
    L65L69:      }
    L66L70:    }
    L67L71:    *saved_mid_in_use_p = NULL;
    L68L72:    return deflated_count;
    L69L73:     }

...

The above is not an exact copy of the code block from deflate_monitor_list_using_JT(), but it is the highlights. What the above code block needs to do is pretty simple:

...

Since we're using the more complicated marklock-cur_mid_in_use-and-mid-as-we-go protocol and also the marklock-next-next-as-we-go protocol, there is a mind numbing amount of detail:

    • L1[12-36]: Handle the initial setup if we are not resuming after a safepoint or a handshake:
      • L12 marks L13: locks the 'list_p' head (if it is not empty):
      • 'mid' is 'list_p's head and its next field is marked.
      • 'next' L16: 'next' is the unmarked next field from 'mid'.
    • L15L17-L23L27: Handle the initial setup if we are resuming after a safepoint or a handshake:
      • L17: mark next field in L19: lock 'cur_mid_in_use' and
      • L20: update 'mid'
      • L18L21-L21L24: If 'mid' == NULL, then we've resumed context at the end of the list so we're done.
      • L23: mark next field in L26: lock 'mid' and
      • L27: update 'next'
    • L25L29-L64L70: We walk each 'mid' in the list and determine if it can be deflated:
      • L2L3[670-2]: if next != NULL, then mark next field in lock 'next' and update 'next_next'
      • L29L34-L41L46: if 'mid' is associated with an object, 'mid' is old, and can be deflated:
        • L31L36: if cur_mid_in_use is NULL, we're still processing the head of the in-use list so...
          • L32L37: we cmpxchg store the list head 's next field from 'marked_mid' to 'next'.
        • else
          • L34L39: make a marked locked copy of 'next'
          • L35L40: we cmpxchg the curset cur_mid_in_use's next field from to 'markedlocked_mid' to 'next'.
        • L37 L42 L39L44: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
          Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unmarks unlocks it).
        • L40L45: advance 'mid' to 'next'.
          Note: 'mid' keeps non-NULL 'next's marked next field.locked state
        • L46L41: advance 'next' to 'next_next'.
      • L42L47-L57L62: 'mid' can't be deflated so we have to carefully advance the list pointers:
        • L4[3489]: if cur_mid_in_use != NULL, then unmark next field in unlock 'cur_mid_in_use'.
        • L46L51: advance 'cur_mid_in_use' to 'mid'.
          Note: The next field in 'mid' is still marked locked and 'cur_mid_in_use' keeps that state.
        • L47L52: advance 'mid' to 'next'.
          Note: The next field in a A non-NULL 'next' is still marked locked and 'mid' keeps that state.
        • L48L53: advance 'next' to 'next_next'.
        • L49L54-L57L61: Handle a safepoint or a handshake if one has started and it is safe to do so.
      • L60L64-L64L68: we reached the end of the list:
        • L6[1256]: if cur_mid_in_use != NULL, then unmark next field in unlock 'cur_mid_in_use'.
        • L64L68: break out of the loop because we are done
    • L67L71: not pausing for a safepoint or handshake so clear saved state.
    • L68L72: all done so return 'deflated_count'.

...

ObjectSynchronizer::deflate_idle_monitors() handles deflating idle monitors at a safepoint from the global in-use list using ObjectSynchronizer::deflate_monitor_list(). There are only a few things that are worth mentioning:

    • OrderAccessAtomic::load_acquire(&g_om_LVars.in_use_list) is used to get the latest global in-use list.OrderAccess
    • Atomic::load_acquire(&g_om_LVars.in_use_count) is used to get the latest global in-use count.
    • prepend_list_to_gglobal_free_list(free_head_p, free_tail_p, deflated_count) is used to prepend the deflated ObjectMonitors on the global free list.

...

ObjectSynchronizer::deflate_common_idle_monitors_using_JT() handles asynchronously deflating idle monitors from either the global in-use list or a per-thread in-use list using ObjectSynchronizer::deflate_monitor_list_using_JT(). There are only a few things that are worth mentioning:

    • OrderAccessAtomic::load_acquire(&g_om_LVars.in_use_count) is used to get the latest global in-use count.
    • OrderAccessAtomic::load_acquire(&target→om_in_use_count) is used to get the latest per-thread in-use count.
    • prepend_list_to_gglobal_free_list(free_head_p, free_tail_p, local_deflated_count) is used to prepend the deflated ObjectMonitors on the global free list.

...

    • For this option, exceeded means:

   ((g_om_LVars.population - g_om_LVars.free_count) / g_om_LVars.population) > NN%

  • If MonitorBound is exceeded (default is 0 which means off), cleanup safepoint will be induced.
  • For this option, exceeded means:

(g_om_LVars.population - g_om_LVars.free_count) > MonitorBound

...