This page describes adding support for Async Monitor Deflation to OpenJDK. The primary goal of this project is to reduce the time spent in safepoint cleanup operations.
RFE: 8153224 Monitor deflation prolong safepoints
https://bugs.openjdk.java.net/browse/JDK-8153224
Full Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.full/
Inc Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/9-for-jdk14.v2.06.inc/
This patch for Async Monitor Deflation is based on Carsten Varming's
http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
which has been ported to work with monitor lists. Monitor lists were optional via the '-XX:+MonitorInUseLists' option in JDK8, the option became default 'true' in JDK9, the option became deprecated in JDK10 via JDK-8180768, and the option became obsolete in JDK12 via JDK-8211384. Carsten's webrev is based on JDK10 so there was a bit of porting work needed to merge his code and/or algorithms with jdk/jdk.
Carsten also submitted a JEP back in the JDK10 time frame:
JDK-8183909 Concurrent Monitor Deflation
https://bugs.openjdk.java.net/browse/JDK-8183909
The OpenJDK JEP process has evolved a bit since JDK10 and a JEP is no longer required for a project that is well defined to be within one area of responsibility. Async Monitor Deflation is clearly defined to be in the JVM Runtime team's area of responsibility so it is likely that the JEP (JDK-8183909) will be withdrawn and the work will proceed via the RFE (JDK-8153224).
The current idle monitor deflation mechanism executes at a safepoint during cleanup operations. Due to this execution environment, the current mechanism does not have to worry about interference from concurrently executing JavaThreads. Async Monitor Deflation uses the ServiceThread to deflate idle monitors so the new mechanism has to detect interference and adapt as appropriate. In other words, data races are natural part of Async Monitor Deflation and the algorithms have to detect the races and react without data loss or corruption.
ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a three part prototcol:
If we lose any of the races, the monitor cannot be deflated at this time.
Once we know it is safe to deflate the monitor (which is mostly field resetting and monitor list management), we have to restore the object's header. That's another racy operation that is described below in "Restoring the Header With Interference Detection".
ObjectMonitor::install_displaced_markword_in_object() is the new piece of code that handles all the racy situations with restoring an object's header asynchronously. The function is called from two places (deflation and saving an ObjectMonitor* in an ObjectMonitorHandle). The restoration protocol for the object's header uses the mark bit along with the hash() value staying at zero to indicate that the object's header is being restored. Only one of the possible racing scenarios can win and the losing scenarios all adapt to the winning scenario's object header value.
Various code paths have been updated to recognize an owner field equal to DEFLATER_MARKER or a negative ref_count field and those code paths will retry their operation. This is the shortest "Key Part" description, but don't be fooled. See "Gory Details" below.
ObjectMonitor::save_om_ptr() is used to safely save an ObjectMonitor* in an ObjectMonitorHandle. ObjectSynchronizer::deflate_monitor_using_JT() is used to asynchronously deflate an idle monitor. save_om_ptr() and deflate_monitor_using_JT() can interfere with each other. The thread calling save_om_ptr() (T-save) is potentially racing with another JavaThread (T-deflate) so both threads have to check the results of the races.
T-save ObjectMonitor T-deflate
---------------------- +-----------------------+ ----------------------------------------
save_om_ptr() { | owner=NULL | deflate_monitor_using_JT() {
1> atomic inc ref_count | ref_count=0 | 1> cmpxchg(DEFLATER_MARKER, &owner, NULL)
+-----------------------+
T-save ObjectMonitor T-deflate
---------------------- +-----------------------+ ------------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
1> atomic inc ref_count | ref_count=0 | cmpxchg(DEFLATER_MARKER, &owner, NULL)
+-----------------------+ :
1> prev = cmpxchg(-max_jint, &ref_count, 0)
T-save ObjectMonitor T-deflate
--------------------------------- +-----------------------+ ------------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
atomic inc ref_count | ref_count=-max_jint+1 | cmpxchg(DEFLATER_MARKER, &owner, NULL)
1> if (owner == DEFLATER_MARKER && +-----------------------+ :
ref_count <= 0) { || prev = cmpxchg(-max_jint, &ref_count, 0)
restore obj header \/ 1> if (prev == 0 &&
atomic dec ref_count +-----------------------+ owner == DEFLATER_MARKER) {
2> return false to force retry | owner=DEFLATER_MARKER | restore obj header
} | ref_count=-max_jint | 2> finish the deflation
+-----------------------+ }
T-save ObjectMonitor T-deflate
--------------------------------- +-----------------------+ ------------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
atomic inc ref_count | ref_count=1 | cmpxchg(DEFLATER_MARKER, &owner, NULL)
1> if (owner == DEFLATER_MARKER && +-----------------------+ :
ref_count <= 0) { || prev = cmpxchg(-max_jint, &ref_count, 0)
} else { \/ 1> if (prev == 0 &&
save om_ptr in the +-----------------------+ owner == DEFLATER_MARKER) {
ObjectMonitorHandle | owner=NULL | } else {
2> return true | ref_count=1 | cmpxchg(NULL, &owner, DEFLATER_MARKER)
+-----------------------+ 2> return
T-enter ObjectMonitor T-deflate
-------------------------------------------- +-------------------------+ ------------------------------------------
ObjectMonitor::enter() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
<owner is contended> | ref_count=1 | cmpxchg(DEFLATER_MARKER, &owner, NULL)
1> EnterI() { +-------------------------+ 1> :
if (owner == DEFLATER_MARKER && || 2> : <thread_stalls>
cmpxchg(Self, &owner, \/ :
DEFLATER_MARKER) +-------------------------+ :
== DEFLATER_MARKER) { | owner=Self/T-enter | :
// EnterI is done | ref_count=0 | : <thread_resumes>
return +-------------------------+ prev = cmpxchg(-max_jint, &ref_count, 0)
} || if (prev == 0 &&
} // enter() is done \/ 3> owner == DEFLATER_MARKER) {
~OMH: atomic dec ref_count +-------------------------+ } else {
2> : <does app work> | owner=Self/T-enter|NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
3> : | ref_count=-max_jint | atomic add max_jint to ref_count
exit() monitor +-------------------------+ 4> bailout on deflation
4> owner = NULL || }
\/
+-------------------------+
| owner=Self/T-enter|NULL |
| ref_count=0 |
+-------------------------+
NULL → DEFLATE_MARKER → Self/T-enter
so we really have A1-B-A2, but the A-B-A principal still holds.
If the T-enter thread has managed to enter and exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
NULL → DEFLATE_MARKER → Self/T-enter → NULL
so we really have A-B1-B2-A, but the A-B-A principal still holds.
T-enter finished doing app work and is about to exit the monitor (or it has already exited the monitor).
The fourth ObjectMonitor box is showing the fields at this point and the "4>" markers are showing where each thread is at for that ObjectMonitor box.
After T-deflate has won the race for deflating an ObjectMonitor it has to restore the header in the associated object. Of course another thread can be trying to do something to the object's header at the same time. Isn't asynchronous work exciting?!?!
ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-save thread and a T-deflate thread:
T-save object T-deflate
------------------------------------------- +-------------+ --------------------------------------------
install_displaced_markword_in_object() { | mark=om_ptr | install_displaced_markword_in_object() {
dmw = header() +-------------+ dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
T-save object T-deflate
------------------------------------------- +-------------+ -------------------------------------------
install_displaced_markword_in_object() { | mark=om_ptr | install_displaced_markword_in_object() {
dmw = header() +-------------+ dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
// dmw == marked_dmw here // dmw == original dmw here
if (dmw->is_marked()) if (dmw->is_marked())
unmark dmw unmark dmw
obj = object() obj = object()
obj->cas_set_mark(dmw, this) obj->cas_set_mark(dmw, this)
T-save object T-deflate
------------------------------------------- +-------------+ -------------------------------------------
install_displaced_markword_in_object() { | mark=om_ptr | install_displaced_markword_in_object() {
dmw = header() +-------------+ dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
// dmw == original dmw here // dmw == marked_dmw here
if (dmw->is_marked()) if (dmw->is_marked())
unmark dmw unmark dmw
obj = object() obj = object()
obj->cas_set_mark(dmw, this) obj->cas_set_mark(dmw, this)
T-save object T-deflate
------------------------------------------- +-------------+ -------------------------------------------
install_displaced_markword_in_object() { | mark=dmw | install_displaced_markword_in_object() {
dmw = header() +-------------+ dmw = header()
if (!dmw->is_marked() && if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
// dmw == ... // dmw == ...
if (dmw->is_marked()) if (dmw->is_marked())
unmark dmw unmark dmw
obj = object() obj = object()
obj->cas_set_mark(dmw, this) obj->cas_set_mark(dmw, this)
Please notice that install_displaced_markword_in_object() does not do any retries on any code path:
If we have a race between a T-deflate thread and a thread trying to get/set a hashcode (T-hash), then the race is between the ObjectMonitorHandle.save_om_ptr(obj, mark) call in T-hash and deflation protocol in T-deflate.
T-hash ObjectMonitor T-deflate
---------------------- +-----------------------+ ----------------------------------------
save_om_ptr() { | owner=NULL | deflate_monitor_using_JT() {
: | ref_count=0 | 1> cmpxchg(DEFLATER_MARKER, &owner, NULL)
1> atomic inc ref_count +-----------------------+
T-hash ObjectMonitor T-deflate
---------------------- +-----------------------+ ------------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
: | ref_count=0 | cmpxchg(DEFLATER_MARKER, &owner, NULL)
1> atomic inc ref_count +-----------------------+ if (contentions != 0 || waiters != 0) {
}
1> prev = cmpxchg(-max_jint, &ref_count, 0)
If T-deflate wins the race, then T-hash will have to retry at most once.
T-hash ObjectMonitor T-deflate
------------------------- +-----------------------+ ------------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
1> atomic inc ref_count | ref_count=-max_jint | cmpxchg(DEFLATER_MARKER, &owner, NULL)
if (owner == +-----------------------+ if (contentions != 0 || waiters != 0) {
DEFLATER_MARKER && || }
ref_count <= 0) { \/ prev = cmpxchg(-max_jint, &ref_count, 0)
restore obj header +-----------------------+ 1> if (prev == 0 &&
atomic dec ref_count | owner=DEFLATER_MARKER | owner == DEFLATER_MARKER) {
2> return false to | ref_count=-max_jint | restore obj header
cause a retry +-----------------------+ 2> finish the deflation
} }
If T-hash wins the race, then the ref_count will cause T-deflate to bail out on deflating the monitor.
Note: header is not mentioned in any of the previous sections for simplicity.
T-hash ObjectMonitor T-deflate
------------------------- +-----------------------+ ------------------------------------------
save_om_ptr() { | header=dmw_no_hash | deflate_monitor_using_JT() {
atomic inc ref_count | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
1> if (owner == | ref_count=1 | if (contentions != 0 || waiters != 0) {
DEFLATER_MARKER && +-----------------------+ }
ref_count <= 0) { || 1> prev = cmpxchg(-max_jint, &ref_count, 0)
} else { \/ if (prev == 0 &&
2> save om_ptr in the +-----------------------+ owner == DEFLATER_MARKER) {
ObjectMonitorHandle | header=dmw_no_hash | } else {
return true | owner=NULL | cmpxchg(NULL, &owner, DEFLATER_MARKER)
} | ref_count=1 | 2> bailout on deflation
} +-----------------------+ }
if save_om_ptr() { ||
if no hash \/
gen hash & merge +-----------------------+
hash = hash(header) | header=dmw_hash |
} | owner=NULL |
3> atomic dec ref_count | ref_count=1 |
return hash +-----------------------+
Please note that in Carsten's original prototype, there was another race in ObjectSynchronizer::FastHashCode() when the object's monitor had to be inflated. The setting of the hashcode in the ObjectMonitor's header/dmw could race with T-deflate. That race is resolved in this version by the use of an ObjectMonitorHandle in the call to ObjectSynchronizer::inflate(). The ObjectMonitor* returned by ObjectMonitorHandle.om_ptr() has a non-zero ref_count so no additional races with T-deflate are possible.
Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching to lock-free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the lock-free list management code has to have a number of special cases which will be described here.
There is one simple case of lock-free list management with the Java Monitor subsystem so we'll start with that code as a way to introduce the lock-free concepts:
L1: while (true) {
L2: PaddedObjectMonitor* cur = OrderAccess::load_acquire(&g_block_list);
L3: OrderAccess::release_store(&new_blk[0]._next_om, cur);
L4: if (Atomic::cmpxchg(new_blk, &g_block_list, cur) == cur) {
L5: Atomic::add(_BLOCKSIZE - 1, &g_om_population);
L6: break;
L7: }
L8: }
What the above block of code does is:
The above block of code can be called by multiple threads in parallel and does not lose track of any blocks. Of course, the "does not lose track of any blocks" part is where all the details come in:
At the point that cmpxchg has published the new 'g_block_list' value, 'new_blk' is now first block in the list and the 0th element's next field is used to find the previous first block; all of the monitor list blocks are chained together via the next field in the block's 0th element. It is the use of cmpxchg to update 'g_block_list' and the checking of the return value from cmpxchg that insures that we don't lose track of any blocks.
This example is considered to be the "simple case" because we only prepend to the list (no deletes) and we only use:
to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'g_om_population' counter is considered to be just accounting (pun intended).
The concepts introduced here are:
Note: The above code snippet comes from ObjectSynchronizer::prepend_block_to_lists(); see that function for more complete context (and comments).
The next case to consider for lock-free list management with the Java Monitor subsystem is prepending to a list that also allows deletes. As you might imagine, the possibility of a prepend racing with a delete makes things more complicated. The solution is to "mark" the next field in the ObjectMonitor at the head of the list we're trying to prepend to. A successful mark tells other prependers or deleters that the marked ObjectMonitor is busy and they will need to retry their own mark operation.
L01: while (true) {
L02: ObjectMonitor* cur = OrderAccess::load_acquire(list_p);
L03: ObjectMonitor* next = NULL;
L04: if (!mark_next(m, &next)) {
L05: continue; // failed to mark next field so try it all again
L06: }
L07: set_next(m, cur); // m now points to cur (and unmarks m)
L08: if (cur == NULL) {
L09: // No potential race with other prependers since *list_p is empty.
L10: if (Atomic::cmpxchg(m, list_p, cur) == cur) {
L11: // Successfully switched *list_p to 'm'.
L12: Atomic::inc(count_p);
L13: break;
L14: }
L15: // Implied else: try it all again
L16: } else {
L17: // Try to mark next field to guard against races:
L18: if (!mark_next(cur, &next)) {
L19: continue; // failed to mark next field so try it all again
L20: }
L21: // We marked the next field so try to switch *list_p to 'm'.
L22: if (Atomic::cmpxchg(m, list_p, cur) != cur) {
L23: // The list head has changed so unmark the next field and try again:
L24: set_next(cur, next);
L25: continue;
L26: }
L27: Atomic::inc(count_p);
L28: set_next(cur, next); // unmark next field
L29: break;
L30: }
L31: }
What the above block of code does is:
The above block of code can be called by multiple prependers in parallel or with deleters running in parallel and does not lose track of any ObjectMonitor. Of course, the "does not lose track of any ObjectMonitor" part is where all the details come in:
ObjectMonitor 'm' is safely on the list at the point that we have updated 'list_p' to refer to 'm'. In this subsection's block of code, we also called two new functions, mark_next() and set_next(), that are explained in the next subsection.
Note: The above code snippet comes from prepend_to_common(); see that function for more context and a few more comments.
Managing marks on ObjectMonitors has been abstracted into a few helper functions. mark_next() is the first interesting one:
L01: static bool mark_next(ObjectMonitor* om, ObjectMonitor** next_p) {
L02: // Get current next field without any marking value.
L03: ObjectMonitor* next = (ObjectMonitor*)
L04: ((intptr_t)OrderAccess::load_acquire(&om->_next_om) & ~0x1);
L05: if (Atomic::cmpxchg(mark_om_ptr(next), &om->_next_om, next) != next) {
L06: return false; // Could not mark the next field or it was already marked.
L07: }
L08: *next_p = next;
L09: return true;
L10: }
The above function tries to mark the next field in an ObjectMonitor:
The function can be called by multiple threads at the same time and only one thread will succeed in the marking operation (return == true) and all other threads will get return == false. Of course, the "only one thread will succeed" part is where all the details come in:
The mark_next() function calls another helper function, mark_om_ptr(), that needs a quick explanation:
L1: static ObjectMonitor* mark_om_ptr(ObjectMonitor* om) {
L2: return (ObjectMonitor*)((intptr_t)om | 0x1);
L3: }
This function encapsulates the setting of the marking bit in an ObjectMonitor* for the purpose of hiding the details and making the calling code easier to read:
set_next() is the next interesting function and it also only needs a quick explanation:
L1: static void set_next(ObjectMonitor* om, ObjectMonitor* value) {
L2: OrderAccess::release_store(&om->_next_om, value);
L3: }
This function encapsulates the setting of the next field in an ObjectMonitor for the purpose of hiding the details and making the calling code easier to read:
The next case to consider for lock-free list management with the Java Monitor subsystem is taking an ObjectMonitor from the start of a list. Taking an ObjectMonitor from the start of a list is a specialized form of delete that is guaranteed to interact with a thread that is prepending to the same list at the same time. Again, the core of the solution is to "mark" the next field in the ObjectMonitor at the head of the list we're trying to take the ObjectMonitor from, but we use slightly different code because we have less linkages to make than a prepend.
L01: static ObjectMonitor* take_from_start_of_common(ObjectMonitor* volatile * list_p,
L02: int volatile * count_p) {
L03: ObjectMonitor* next = NULL;
L04: ObjectMonitor* take = NULL;
L05: // Mark the list head to guard against A-B-A race:
L06: if (!mark_list_head(list_p, &take, &next)) {
L07: return NULL; // None are available.
L08: }
L09: // Switch marked list head to next (which unmarks the list head, but
L10: // leaves take marked):
L11: OrderAccess::release_store(list_p, next);
L12: Atomic::dec(count_p);
L13: // Unmark take, but leave the next value for any lagging list
L14: // walkers. It will get cleaned up when take is prepended to
L15: // the in-use list:
L16: set_next(take, next);
L17: return take;
L18: }
What the above function does is:
The function can be called by more than one thread at a time and each thread will take a unique ObjectMonitor from the start of the list (if one is available) without losing any other ObjectMonitors on the list. Of course, the "take a unique ObjectMonitor" and "without losing any other ObjectMonitors" parts are where all the details come in:
The take_from_start_of_common() function calls another helper function, mark_list_head(), that is explained in the next subsection.
mark_list_head() is the next interesting helper function:
L01: static bool mark_list_head(ObjectMonitor* volatile * list_p,
L02: ObjectMonitor** mid_p, ObjectMonitor** next_p) {
L03: while (true) {
L04: ObjectMonitor* mid = OrderAccess::load_acquire(list_p);
L05: if (mid == NULL) {
L06: return false; // The list is empty so nothing to mark.
L07: }
L08: if (mark_next(mid, next_p)) {
L09: if (OrderAccess::load_acquire(list_p) != mid) {
L10: // The list head changed so we have to retry.
L11: set_next(mid, *next_p); // unmark mid
L12: continue;
L13: }
L14: // We marked next field to guard against races.
L15: *mid_p = mid;
L16: return true;
L17: }
L18: }
L19: }
The above function tries to mark the next field in the list head's ObjectMonitor:
The function can be called by more than one thread on the same 'list_p' at a time. False is only returned when 'list_p' refers to an empty list. Otherwise only one thread will return true at a time with the 'mid_p' and 'next_p' return parameters set. Since the next field in 'mid_p' is marked, any parallel callers to mark_list_head() will loop until the next field in the list head's ObjectMonitor is no longer marked. That typically happens when the list head's ObjectMonitor is taken off the list and 'list_p' is advanced to the next ObjectMonitor on the list. Of course, making sure that "only one thread will return true at a time" is where all the details come in:
When this function returns true, the next field in 'mid_p' is marked and any parallel callers of mark_list_head() on the same list will be looping until the next field in the list head's ObjectMonitor is no longer marked. The caller that just got the 'true' return needs to finish up its work with 'mid_p' quickly.
ObjectSynchronizer::om_alloc() is responsible for allocating an ObjectMonitor and returning it to the caller. It has a three step algorithm:
1) Try to allocate from self's local free-list:
2) Try to allocate from the global free list (up to self→om_free_provision times):
3) Allocate a block of new ObjectMonitors:
ObjectSynchronizer::om_release() is responsible for putting an ObjectMonitor on self's free list. If 'from_per_thread_alloc' is true, then om_release() is also responsible for extracting the ObjectMonitor from self's in-use list. The extraction from self's in-use list must happen first:
L01: if (from_per_thread_alloc) {
L02: mark_list_head(&self->om_in_use_list, &mid, &next);
L03: while (true) {
L04: if (m == mid) {
L05: if (Atomic::cmpxchg(next, &self->om_in_use_list, mid) != mid) {
L06: ObjectMonitor* marked_mid = mark_om_ptr(mid);
L07: Atomic::cmpxchg(next, &cur_mid_in_use->_next_om, marked_mid);
L08: }
L09: extracted = true;
L10: Atomic::dec(&self->om_in_use_count);
L11: set_next(mid, next);
L12: break;
L13: }
L14: if (cur_mid_in_use != NULL) {
L15: set_next(cur_mid_in_use, mid); // umark cur_mid_in_use
L16: }
L17: cur_mid_in_use = mid;
L18: mid = next;
L19: next = mark_next_loop(mid);
L20: }
L21: }
L22: prepend_to_om_free_list(self, m);
Most of the above code block extracts 'm' from self's in-use list; it is not an exact quote from om_release(), but it is the highlights:
The last line of the code block (L22) prepends 'm' to self's free list.
mark_next_loop() is the next interesting helper function:
L1: static ObjectMonitor* mark_next_loop(ObjectMonitor* om) {
L2: ObjectMonitor* next;
L3: while (true) {
L4: if (mark_next(om, &next)) {
L5: // Marked om's next field so return the unmarked value.
L6: return next;
L7: }
L8: }
L9: }
The above function loops until it marks the next field of the target ObjectMonitor. The unmarked value of the next field is returned by the function. There is nothing particularly special about this function so we don't need any line specific annotations.
ObjectSynchronizer::om_flush() is reponsible for flushing self's in-use list to the global in-use list and self's free list to the global free list during self's thread exit processing. om_flush() starts with self's in-use list:
L01: if (mark_list_head(&self->om_in_use_list, &in_use_list, &next)) {
L02: in_use_tail = in_use_list;
L03: in_use_count++;
L04: for (ObjectMonitor* cur_om = unmarked_next(in_use_list); cur_om != NULL;) {
L05: if (is_next_marked(cur_om)) {
L06: while (is_next_marked(cur_om)) {
L07: os::naked_short_sleep(1);
L08: }
L09: cur_om = unmarked_next(in_use_tail);
L10: continue;
L11: }
L12: if (!cur_om->is_active()) {
L13: cur_om = unmarked_next(in_use_tail);
L14: continue;
L15: }
L16: in_use_tail = cur_om;
L17: in_use_count++;
L18: cur_om = unmarked_next(cur_om);
L19: }
L20: OrderAccess::release_store(&self->om_in_use_count, 0);
L21: OrderAccess::release_store(&self->om_in_use_list, (ObjectMonitor*)NULL);
L22: set_next(in_use_list, next);
L23: }
The above is not an exact copy of the code block from om_flush(), but it is the highlights. What the above code block needs to do is pretty simple:
However, in this case, there are a lot of details:
The code to process self's free list is much, much simpler because we don't have any races with an async deflater thread like self's in-use list. The only interesting bits:
The last interesting bits for this function are prepending the local lists to the right global places:
ObjectSynchronizer::deflate_monitor_list() is responsible for deflating idle ObjectMonitors at a safepoint. This function can use the simpler mark-mid-as-we-go protocol since there can be no parallel list deletions due to the safepoint:
L01: int ObjectSynchronizer::deflate_monitor_list(ObjectMonitor* volatile * list_p,
L02: int volatile * count_p,
L03: ObjectMonitor** free_head_p,
L04: ObjectMonitor** free_tail_p) {
L05: ObjectMonitor* cur_mid_in_use = NULL;
L06: ObjectMonitor* mid = NULL;
L07: ObjectMonitor* next = NULL;
L08: int deflated_count = 0;
L09: if (!mark_list_head(list_p, &mid, &next)) {
L10: return 0; // The list is empty so nothing to deflate.
L11: }
L12: while (true) {
L13: oop obj = (oop) mid->object();
L14: if (obj != NULL && deflate_monitor(mid, obj, free_head_p, free_tail_p)) {
L15: if (Atomic::cmpxchg(next, list_p, mid) != mid) {
L16: Atomic::cmpxchg(next, &cur_mid_in_use->_next_om, mid);
L17: }
L18: deflated_count++;
L19: Atomic::dec(count_p);
L20: set_next(mid, NULL);
L21: mid = next;
L22: } else {
L23: set_next(mid, next); // unmark next field
L24: cur_mid_in_use = mid;
L25: mid = next;
L26: }
L27: if (mid == NULL) {
L28: break; // Reached end of the list so nothing more to deflate.
L29: }
L30: next = mark_next_loop(mid);
L31: }
L32: return deflated_count;
L33: }
The above is not an exact copy of the code block from deflate_monitor_list(), but it is the highlights. What the above code block needs to do is pretty simple:
Since we're using the simpler mark-mid-as-we-go protocol, there are not too many details:
ObjectSynchronizer::deflate_monitor_list_using_JT() is responsible for asynchronously deflating idle ObjectMonitors using a JavaThread. This function uses the more complicated mark-cur_mid_in_use-and-mid-as-we-go protocol because om_release() can do list deletions in parallel. We also mark-next-next-as-we-go to prevent an om_flush() that is behind this thread from passing us. Because this function can asynchronously interact with so many other functions, this is the largest clip of code:
L01: int ObjectSynchronizer::deflate_monitor_list_using_JT(ObjectMonitor* volatile * list_p,
L02: int volatile * count_p,
L03: ObjectMonitor** free_head_p,
L04: ObjectMonitor** free_tail_p,
L05: ObjectMonitor** saved_mid_in_use_p) {
L06: ObjectMonitor* cur_mid_in_use = NULL;
L07: ObjectMonitor* mid = NULL;
L08: ObjectMonitor* next = NULL;
L09: ObjectMonitor* next_next = NULL;
L10: int deflated_count = 0;
L11: if (*saved_mid_in_use_p == NULL) {
L12: if (!mark_list_head(list_p, &mid, &next)) {
L13: return 0; // The list is empty so nothing to deflate.
L14: }
L15: } else {
L16: cur_mid_in_use = *saved_mid_in_use_p;
L17: mid = mark_next_loop(cur_mid_in_use);
L18: if (mid == NULL) {
L19: set_next(cur_mid_in_use, NULL); // unmark next field
L20: *saved_mid_in_use_p = NULL;
L21: return 0; // The remainder is empty so nothing more to deflate.
L22: }
L23: next = mark_next_loop(mid);
L24: }
L25: while (true) {
L26: if (next != NULL) {
L27: next_next = mark_next_loop(next);
L28: }
L29: if (mid->object() != NULL && mid->is_old() &&
L30: deflate_monitor_using_JT(mid, free_head_p, free_tail_p)) {
L31: if (Atomic::cmpxchg(next, list_p, mid) != mid) {
L32: ObjectMonitor* marked_mid = mark_om_ptr(mid);
L33: ObjectMonitor* marked_next = mark_om_ptr(next);
L34: Atomic::cmpxchg(marked_next, &cur_mid_in_use->_next_om, marked_mid);
L35: }
L36: deflated_count++;
L37: Atomic::dec(count_p);
L38: set_next(mid, NULL);
L39: mid = next; // mid keeps non-NULL next's marked next field
L40: next = next_next;
L41: } else {
L42: if (cur_mid_in_use != NULL) {
L43: set_next(cur_mid_in_use, mid); // umark cur_mid_in_use
L44: }
L45: cur_mid_in_use = mid;
L46: mid = next; // mid keeps non-NULL next's marked next field
L47: next = next_next;
L48: if (SafepointSynchronize::is_synchronizing() &&
L49: cur_mid_in_use != OrderAccess::load_acquire(list_p) &&
L50: cur_mid_in_use->is_old()) {
L51: *saved_mid_in_use_p = cur_mid_in_use;
L52: set_next(cur_mid_in_use, mid); // umark cur_mid_in_use
L53: if (mid != NULL) {
L54: set_next(mid, next); // umark mid
L55: }
L56: return deflated_count;
L57: }
L58: }
L59: if (mid == NULL) {
L60: if (cur_mid_in_use != NULL) {
L61: set_next(cur_mid_in_use, mid); // umark cur_mid_in_use
L62: }
L63: break; // Reached end of the list so nothing more to deflate.
L64: }
L65: }
L66: *saved_mid_in_use_p = NULL;
L67: return deflated_count;
L68: }
The above is not an exact copy of the code block from deflate_monitor_list_using_JT(), but it is the highlights. What the above code block needs to do is pretty simple:
Since we're using the more complicated mark-cur_mid_in_use-and-mid-as-we-go protocol and also the mark-next-next-as-we-go protocol, there is a mind numbing amount of detail:
ObjectSynchronizer::deflate_idle_monitors() handles deflating idle monitors at a safepoint from the global in-use list using ObjectSynchronizer::deflate_monitor_list(). There are only a few things that are worth mentioning:
ObjectSynchronizer::deflate_common_idle_monitors_using_JT() handles asynchronously deflating idle monitors from either the global in-use list or a per-thread in-use list using ObjectSynchronizer::deflate_monitor_list_using_JT(). There are only a few things that are worth mentioning:
The devil is in the details! Housekeeping or administrative stuff are usually detailed, but necessary.
JDK-8181859 Monitor deflation is not checked in cleanup path
((gMonitorPopulation - gMonitorFreeCount) / gMonitorPopulation) > NN%
(gMonitorPopulation - gMonitorFreeCount) > MonitorBound
Changes to the ServiceThread mechanism by the Async Monitor Deflation project (when async deflation is enabled):
The ServiceThread will wake up every GuaranteedSafepointInterval to check for cleanup tasks.
This allows is_async_deflation_needed() to be checked at the same interval.
The ServiceThread handles deflating global idle monitors and deflating the per-thread idle monitors.
Other invocation changes by the Async Monitor Deflation project (when async deflation is enabled):
VM_Exit::doit_prologue() will request a special cleanup to reduce the noise in 'monitorinflation' logging at VM exit time.
Before the final safepoint in a non-System.exit() end to the VM, we will request a special cleanup to reduce the noise in 'monitorinflation' logging at VM exit time.