Summary
This page describes adding support for Async Monitor Deflation to OpenJDK. The primary goal of this project is to reduce the time spent in safepoint cleanup operations.
RFE: 8153224 Monitor deflation prolong safepoints
https://bugs.openjdk.java.net/browse/JDK-8153224
Webrev: http://cr.openjdk.java.net/~dcubed/8153224-webrev/3-for-jdk13
Background
This patch for Async Monitor Deflation is based on Carsten Varming's
http://cr.openjdk.java.net/~cvarming/monitor_deflate_conc/0/
which has been ported to work with monitor lists. Monitor lists were optional via the '-XX:+MonitorInUseLists' option in JDK8, the option became default 'true' in JDK9, the option became deprecated in JDK10 via JDK-8180768, and the option became obsolete in JDK12 via JDK-8211384. Carsten's webrev is based on JDK10 so there was a bit of porting work needed to merge his code and/or algorithms with jdk/jdk.
Carsten also submitted a JEP back in the JDK10 time frame:
JDK-8183909 Concurrent Monitor Deflation
https://bugs.openjdk.java.net/browse/JDK-8183909
The OpenJDK JEP process has evolved a bit since JDK10 and a JEP is no longer required for a project that is well defined to be within one area of responsibility. Async Monitor Deflation is clearly defined to be in the JVM Runtime team's area of responsibility so it is likely that the JEP (JDK-8183909) will be withdrawn and the work will proceed via the RFE (JDK-8153224).
Introduction
The current idle monitor deflation mechanism executes at a safepoint during cleanup operations. Due to this execution environment, the current mechanism does not have to worry about interference from concurrently executing JavaThreads. Async Monitor Deflation uses JavaThreads and the ServiceThread to deflate idle monitors so the new mechanism has to detect interference and adapt as appropriate. In other words, data races are natural part of Async Monitor Deflation and the algorithms have to detect the races and react without data loss or corruption.
Key Parts of the Algorithm
1) Deflation With Interference Detection
ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a three part prototcol:
- Setting a NULL owner field to DEFLATER_MARKER with cmpxchg() forces any contending thread through the slow path. A racing thread would be trying to set the owner field.
- Making a zero count a large negative value with cmpxchg() forces racing threads to retry. A racing thread would have set the owner field (after we stored DEFLATER_MARKER) and would be trying to increment the count field.
- If the owner field is still equal to DEFLATER_MARKER, then we have won all the races and can deflate the monitor.
If we lose any of the races, the monitor cannot be deflated at this time.
Once we know it is safe to deflate the monitor (which is mostly field resetting and monitor list management), we have to restore
the object's header. That's another racy operation that is described below in "Restoring the Header With Interference Detection".
2) Restoring the Header With Interference Detection
ObjectMonitor::install_displaced_markword_in_object() is the new piece of code that handles all the racy situations with restoring an object's header asynchronously. The function is called from a couple of places (deflation and object monitor entry) and can also race with installation of a hash for the object. The restoration protocol for the object's header uses the mark bit along with the hash() value staying at zero to indicate that the object's header
is being restored. Only one of the three possible racing scenarios can win and the losing scenarios all adapt to the winning scenario's object header value.
3) Using "owner" or "count" With Interference Detection
Various code paths have been updated to recognize an owner field equal to DEFLATER_MARKER or a negative count field and those code paths will retry their operation. This is the shortest "Key Part" description, but don't be fooled. See "Gory Details" below.
An Example of ObjectMonitor Interference
For example, when ObjectMonitor::enter() detects genuine contention via the owner field, it atomically increments the count field to indicate that the ObjectMonitor is busy. The thread calling enter() (T-enter) is potentially racing with an Async Monitor Deflation by another JavaThread (T-deflate) so both threads have to check the result of the race.
Start of the Race
ObjectMonitor T-deflate
T-enter +-----------------------+ --------------------------------------
---------------- | owner=NULL | cmpxchg(DEFLATER_MARKER, &owner, NULL)
| count=0 |
+-----------------------+
- The data fields are at their starting values.
- T-deflate is about to execute cmpxchg()
- T-enter hasn't done anything yet.
Racing Threads
ObjectMonitor T-deflate
T-enter +-----------------------+ --------------------------------------
---------------- | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
owner contended | count=0 | :
atomic inc count +-----------------------+ prev = cmpxchg(-max_jint, &count, 0)
- T-deflate has executed cmpxchg() and set owner to DEFLATE_MARKER.
- T-enter has observed the contended owner field.
- T-enter and T-deflate are racing to update the count field.
T-deflate Wins
ObjectMonitor T-deflate
T-enter +-----------------------+ --------------------------------------
---------------- | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
owner contended | count=-max_jint | :
atomic inc count +-----------------------+ prev = cmpxchg(-max_jint, &count, 0)
if (count <= 0 && owner if (prev == 0 &&
== DEFLATER_MARKER) { owner == DEFLATER_MARKER) {
restore header restore header
retry enter finish the deflation
} }
- This diagram starts after "Racing Threads".
- T-enter and T-deflate both observe owner == DEFLATER_MARKER and a negative count field.
- T-enter has lost the race and it retries.
- T-deflate finishes deflation of the ObjectMonitor
T-enter Wins
ObjectMonitor T-deflate
T-enter +-----------------------+ ----------------------------------------
---------------- | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
owner contended | count=1 | :
atomic inc count +-----------------------+ prev = cmpxchg(-max_jint, &count, 0)
if (count > 0) if (prev != 0 ||
do contended owner != DEFLATER_MARKER)
enter work bailout on deflation (nothing to undo)
- This diagram starts after "Racing Threads".
- T-enter and T-deflate both observe a count field > 0.
- T-enter has won the race and it proceeds with the normal contended enter work.
- T-deflate detects that it has lost the race and bails out on deflating the ObjectMonitor.
- In this example, T-deflate never reaches the DEFLATER_MARKER check and it has nothing to undo.
T-enter Wins By A-B-A
ObjectMonitor T-deflate
T-enter +-------------------------+ ----------------------------------------
------------------------------------------ | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
owner contended | count=1 | : <thread_stalls>
atomic inc count +-------------------------+ :
if (count > 0) || :
EnterI() \/ :
cmpxchg(Self, &owner, DEFLATER_MARKER) +-------------------------+ :
atomic dec count | owner=Self/T-enter | : <thread_resumes>
} | count=0 | prev = cmpxchg(-max_jint, &count, 0)
// finished with enter +-------------------------+ if (prev != 0)
: <does app work> || bailout on deflation (nothing to undo)
exit() monitor \/ else if owner != DEFLATER_MARKER) {
owner = NULL +-------------------------+ atomic add max_jint to count
| owner=Self/T-enter|NULL | bailout on deflation
| count=0 | }
+-------------------------+
- This diagram starts after "Racing Threads".
- T-enter observes a count field > 0.
- T-deflate stalls after setting the owner field to DEFLATER_MARKER.
- T-enter has won the race and calls EnterI() to do the contended enter work.
- EnterI() observes owner == DEFLATER_MARKER and uses cmpxchg() to set the owner field to Self/T-enter.
- T-enter decrements the count field because it is no longer contending for the monitor; it owns the monitor.
- The second ObjectMonitor box is showing the fields at this point.
- T-deflate resumes, sets the count field to -max_jint, and passes the first part of the bailout expression because "prev == 0".
- T-deflate observes that "owner != DEFLATE_MARKER" and bails out on deflation.
- Depending on when T-deflate resumes after the stall, it will see "owner == T-enter" or "owner == NULL".
- Both of those values will cause deflation to bailout, but in this case we have to undo setting count to -max_jint by atomically adding max_jint to count which will restore count to its proper value.
- The third ObjectMonitor box is showing the fields at this point.
If the T-enter thread has managed to enter but not exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
NULL → DEFLATE_MARKER → Self/T-enter
so we really have A1-B-A2, but the A-B-A principal still holds.- If the T-enter thread has managed to enter and exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
NULL → DEFLATE_MARKER → Self/T-enter → NULL
so we really have A-B1-B2-A, but the A-B-A principal still holds.
An Example of Object Header Interference
After T-deflate has won the race for deflating an ObjectMonitor it has to restore the header in the associated object. Of course another thread can be trying to do something to the object's header at the same time. Isn't asynchronous work exciting?!?!
ObjectMonitor::install_displaced_markword_in_object() is called from two places so we can have a race between a T-enter thread and a T-deflate thread:
Start of the Race
T-enter object T-deflate
----------------------------------------- +-------------+ -----------------------------------------
dmw = header() | mark=om_ptr | dmw = header()
if (!dmw->is_marked() && +-------------+ if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
- The data field (mark) is at its starting value.
- 'dmw' and 'marked_dmw' are local copies in each thread.
- T-enter and T-deflate are both calling install_displaced_markword_in_object() at the same time.
- Both threads are poised to call cmpxchg() at the same time.
T-deflate Wins First Race
T-enter object T-deflate
----------------------------------------- +-------------+ -----------------------------------------
dmw = header() | mark=om_ptr | dmw = header()
if (!dmw->is_marked() && +-------------+ if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
// dmw == marked_dmw here // dmw == original dmw here
if (dmw->is_marked()) if (dmw->is_marked())
unmark dmw unmark dmw
obj = object() obj = object()
obj->cas_set_mark(dmw, this) obj->cas_set_mark(dmw, this)
- The return value from cmpxchg() in each thread will be different.
- Since T-deflate won the race, its 'dmw' variable contains the header/dmw from the ObjectMonitor.
- Since T-enter lost the race, its 'dmw' variable contains the 'marked_dmw' set by T-deflate.
- T-enter will unmark its 'dmw' variable
- Both threads are poised to call cas_set_mark() at the same time.
T-enter Wins First Race
T-enter object T-deflate
----------------------------------------- +-------------+ -----------------------------------------
dmw = header() | mark=om_ptr | dmw = header()
if (!dmw->is_marked() && +-------------+ if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
// dmw == original dmw here // dmw == marked_dmw here
if (dmw->is_marked()) if (dmw->is_marked())
unmark dmw unmark dmw
obj = object() obj = object()
obj->cas_set_mark(dmw, this) obj->cas_set_mark(dmw, this)
- This diagram is the same as "T-deflate Wins First Race" except we've swapped the post cmpxchg() comments.
- Since T-enter won the race, its 'dmw' variable contains the header/dmw from the ObjectMonitor.
- Since T-deflate lost the race, its 'dmw' variable contains the 'marked_dmw' set by T-enter.
- T-deflate will unmark its 'dmw' variable
- Both threads are poised to call cas_set_mark() at the same time.
Either Wins the Second Race
T-enter object T-deflate
----------------------------------------- +-------------+ -----------------------------------------
dmw = header() | mark=dmw | dmw = header()
if (!dmw->is_marked() && +-------------+ if (!dmw->is_marked() &&
dmw->hash() == 0) { dmw->hash() == 0) {
create marked_dmw create marked_dmw
dmw = cmpxchg(marked_dmw, &header, dmw) dmw = cmpxchg(marked_dmw, &header, dmw)
} }
// dmw == ... // dmw == ...
if (dmw->is_marked()) if (dmw->is_marked())
unmark dmw unmark dmw
obj = object() obj = object()
obj->cas_set_mark(dmw, this) obj->cas_set_mark(dmw, this)
- It does not matter whether T-enter or T-deflate won the cmpxchg() call so the comment does not say who won.
- It does not matter whether T-enter or T-deflate won the cas_set_mark() call; in this scenario both were trying to restore the same value.
- The object's mark field has changed from 'om_ptr' → 'dmw'.
Please notice that install_displaced_markword_in_object() does not do any retries on any code path:
- Instead the code adapts to being the loser in a cmpxchg() by unmarking its copy of the dmw.
- In the second race, if a thread loses the cas_set_mark() race, there is also no need to retry because the object's header has been restored by the other thread.
Hash Codes and Object Header Interference
If we have a race between a T-deflate thread and a thread trying to get/set a hash code (T-hash), then the first race is between the
ObjectMonitorHandle.save_om_ptr(obj, mark) call in T-hash and deflation protocol in T-deflate.
Start of the Race
T-hash ObjectMonitor T-deflate
---------------------- +-----------------------+ --------------------------------------
save_om_ptr() { | owner=NULL | cmpxchg(DEFLATER_MARKER, &owner, NULL)
: | count=0 |
atomic inc ref_count | ref_count=0 |
+-----------------------+
Racing Threads
T-hash ObjectMonitor T-deflate
---------------------- +-----------------------+ --------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
: | count=0 | if (waiters != 0 or ref_count != 0) {
atomic inc ref_count | ref_count=1 | }
+-----------------------+ prev = cmpxchg(-max_jint, &count, 0)
T-deflate Wins
T-hash ObjectMonitor T-deflate
------------------------ +-----------------------+ --------------------------------------
save_om_ptr() { | owner=DEFLATER_MARKER | cmpxchg(DEFLATER_MARKER, &owner, NULL)
atomic inc ref_count | count=-max_jint | if (waiters != 0 or ref_count != 0) {
if (owner == | ref_count=0 | }
DEFLATER_MARKER) { +-----------------------+ prev = cmpxchg(-max_jint, &count, 0)
atomic dec ref_count if (prev == 0 &&
return false to owner == DEFLATER_MARKER) {
cause a retry restore header
} finish the deflation
}
T-hash Wins
T-hash ObjectMonitor T-deflate
------------------------ +-----------------------+ ----------------------------------------
save_om_ptr() { | owner=NULL | cmpxchg(DEFLATER_MARKER, &owner, NULL)
atomic inc ref_count | count=0 | if (waiters != 0 or ref_count != 0) {
if (owner == | ref_count=1 | cmpxchg(NULL, &owner, DEFLATER_MARKER)
DEFLATER_MARKER) { +-----------------------+ return false to cause a retry
} }
if (object no longer prev = cmpxchg(-max_jint, &count, 0)
has a monitor or
is a different
monitor) {
atomic dec ref_count
return false to
cause a retry
}
save om_ptr in the
ObjectMonitorHandle
}
If T-hash wins the first race, then the ref_count will cause T-deflate to bail out on deflating the monitor; ref_count is not mentioned in any of the previous examples for simplicity. If T-deflate wins the race, then T-hash will retry which will bring us to the second race.
In the second T-hash versus T-deflate race, T-deflate is trying to restore the object's header/mark from the ObjectMonitor*'s header/dmw field and T-hash is trying to read a stable mark value from the object's header/mark that will allow it to get/set a hash code. When T-hash reads a stable mark value:
- If T-hash sees the ObjectMonitor* that T-deflate is deflating, then T-hash will retry again.
- If T-hash sees the restored object header set by T-deflate, then T-hash will proceed with its normal hash code processing.
Please note that in Carsten's original prototype, there was another race in ObjectSynchronizer::FastHashCode() when the object's monitor had to be inflated. The setting of the hash code in the ObjectMonitor's header/dmw could race with T-deflate. That race is resolved in this version by the use of an ObjectMonitorHandle in the call to ObjectSynchronizer::inflate(). The ObjectMonitor* returned by ObjectMonitorHandle.om_ptr() has a non-zero ref_count so no additional races with T-deflate are possible.
Housekeeping Parts of the Algorithm
The devil is in the details! Housekeeping or administrative stuff are usually detailed, but necessary.
- New diagnostic option '-XX:AsyncDeflateIdleMonitors' that is default 'true' so that the new mechanism is used by default, but it can be disabled for potential failure diagnosis.
- ObjectMonitor deflation is still initiated or signaled as needed at a safepoint. When Async Monitor Deflation is in use, flags are set so that the work is done by JavaThreads and the ServiceThread which offloads the safepoint cleanup mechanism.
- ObjectSynchronizer::omAlloc() is modified to call (as needed) ObjectSynchronizer::deflate_per_thread_idle_monitors_using_JT(). Having the JavaThread cleanup its own per-thread monitor list permits this work to happen without any per-thread list locking or critical sections.
- Having a JavaThread deflate a potentially long list of in-use monitors could potentially delay the start of a safepoint. This is detected in ObjectSynchronizer::deflate_monitor_list_using_JT() which will save the current state when it is safe to do so and return to its caller to drop locks as needed before honoring the safepoint request.
- ObjectSynchronizer::inflate() has to be careful how omAlloc() is called. If the inflation cause is inflate_cause_vm_internal, then it is not safe to deflate monitors on the per-thread lists so we skip that. When monitor deflation is done, inflate() has to do the oop refresh dance that is common to any code that can go to a safepoint while holding a naked oop. And, no you can't use a Handle here either. :-)
- Everything else is just monitor list management, infrastructure, logging, debugging and the like. :-)
Gory Details
- Counterpart function mapping for those that know the existing code:
- ObjectSynchronizer class:
- deflate_idle_monitors() has deflate_global_idle_monitors_using_JT() and deflate_per_thread_idle_monitors_using_JT()
- deflate_monitor_list() has deflate_monitor_list_using_JT()
- deflate_monitor() has deflate_monitor_using_JT()
- ObjectMonitor class:
- is_busy() has is_busy_async()
- clear() has clear_using_JT()
- ObjectSynchronizer class:
- These functions recognize the Async Monitor Deflation protocol and adapt their operations:
- ObjectMonitor::enter()
- ObjectMonitor::EnterI()
- ObjectMonitor::ReenterI()
- most callers to enter() had to indirectly adapt to the protocol and retry their operations.
- Also these functions had to adapt and retry their operations:
- ObjectSynchronizer::quick_enter()
- ObjectSynchronizer::slow_enter()
- ObjectSynchronizer::reenter()
- ObjectSynchronizer::jni_enter()
- ObjectSynchronizer::FastHashCode()
- ObjectSynchronizer::current_thread_holds_lock()
- ObjectSynchronizer::query_lock_ownership()
- ObjectSynchronizer::get_lock_owner()
- ObjectSynchronizer::monitors_iterate()
- ObjectSynchronizer::inflate_helper()
- ObjectSynchronizer::inflate()
- Various assertions had to be modified to pass without their real check when AsyncDeflateIdleMonitors is true; this is due to the change in semantics for the ObjectMonitor owner and count fields.
- ObjectMonitor has a new allocation_state field that supports three states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied to ObjectMonitors that have reached the 'Old' state. When the Async Monitor Deflation code sees an ObjectMonitor in the 'New' state, it is changed to the 'Old' state, but is not deflated. This prevents a newly allocated ObjectMonitor from being immediately deflated which could cause an inflation<->deflation oscillation.
- ObjectMonitor has a new ref_count field that is used to indicate that an ObjectMonitor* is in use so the ObjectMonitor should not be deflated; this is needed for operations on non-busy monitors so that ObjectMonitor values don't change while they are being queried. There is a new ObjectMonitorHandle helper to manage the ref_count.
- The ObjectMonitor::owner() accessor detects DEFLATER_MARKER and returns NULL in that case to minimize the places that need to understand the new DEFLATER_MARKER value.
- System.gc()/JVM_GC() causes a special monitor list cleanup request which uses the safepoint based monitor list mechanism. So even if AsyncDeflateIdleMonitors is enabled, the safepoint based mechanism is still used by this special case.
- This is necessary for those tests that do something to cause an object's monitor to be inflated, clear the only reference to the object and then expect that enough System.gc() calls will eventually cause the object to be GC'ed even when the thread never inflates another object's monitor. Yes, we have several tests like that. :-)