...
RFE: 8153224 Monitor deflation prolong safepoints
https://bugs.openjdk.java.net/browse/JDK-8153224
Full Webrev: 1517-for-jdk15+2324.v2.1215.full
Inc Webrev: 1517-for-jdk15+2324.v2.1215.inc
Background
This patch for Async Monitor Deflation is based on Carsten Varming's
...
The current idle monitor deflation mechanism executes at a safepoint during cleanup operations. Due to this execution environment, the current mechanism does not have to worry about interference from concurrently executing JavaThreads. Async Monitor Deflation uses the ServiceThread to deflate idle monitors so the new mechanism has to detect interference and adapt as appropriate. In other words, data races are natural part of Async Monitor Deflation and the algorithms have to detect the races and react without data loss or corruption.
Async Monitor Deflation is performed in two stages: stage one performs the two part protocol described in "Deflation With Interference Detection" below and moves the async deflated ObjectMonitors from an in-use list to a global wait list; the ServiceThread performs a handshake (or a safepoint) with all other JavaThreads after stage one is complete and that forces any racing threads to make forward progress; stage two moves the ObjectMonitors from the global wait list to the global free list. The special values that mark an ObjectMonitor as async deflated remain in their fields until the ObjectMonitor is moved from the global free list to a per-thread free list which is sometime after stage two has completed.
Key Parts of the Algorithm
...
ObjectSynchronizer::deflate_monitor_using_JT() is the new counterpart to ObjectSynchronizer::deflate_monitor() and does the heavy lifting of asynchronously deflating a monitor using a three two part prototcol:
- Setting a NULL owner field to DEFLATER_MARKER with cmpxchg() forces any contending thread through the slow path. A racing thread would be trying to set the owner field.
- Making a zero contentions field a large negative value with cmpxchg() forces racing threads to retry. A racing thread would would be trying to increment the contentions field.
If
...
If we lose any of the races, the monitor cannot we lose any of the races, the monitor cannot be deflated at this time.
Once we know it is safe to deflate the monitor (which is mostly field resetting and monitor list management), we have to restore the object's header. That's another racy operation that is described below in "Restoring the Header With Interference Detection".
The setting of the special values that mark an ObjectMonitor as async deflated and the restoration of the object's header comprise the first stage of Async Monitor Deflation.
ObjectMonitor::install_displaced_markword_in_object() is the new piece of code that handles all the racy situations with restoring an object's header asynchronously. The function is called from two three places (deflation and , ObjectMonitor::enter(), and FastHashCode). Only one of the possible racing scenarios can win and the losing scenarios all adapt to the winning scenario's object header value.
...
T-enter ObjectMonitor T-deflate
------------------------ +-----------------------+ --------------------------------------------
enter() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
1> atomic inc contentionsadd_to_contentions(1) | contentions=0 | cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
+-----------------------+ :
1> prev = cmpxchg(-max_jint, &contentions, 0, -max_jint)
- T-deflate has executed cmpxchg() and set owner to DEFLATER_MARKER.
- T-enter still hasn't done anything yet
- The "1>" markers are showing where each thread is at for the ObjectMonitor box:
- T-enter and T-deflate are racing to update the contentions field.
...
T-enter ObjectMonitor T-deflate
---------------------------------- +-------------------------+ --------------------------------------------
enter() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
atomic inc contentionsadd_to_contentions(1) | contentions=-max_jint+1 | cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
1> if (owner == DEFLATER_MARKER &&is_being_async_deflated()) { +-------------------------+ :
restore obj contentionsheader <= 0) { || prev = cmpxchg(-max_jint, &contentions, 0, -max_jint)
restore obj headeradd_to_contentions(-1) \/ 1> if (prev == 0) &&
atomic dec contentions {
2> return false to force retry +-------------------------+ owner == DEFLATER_MARKER) {
2> return false to force retry restore obj header
} | owner=DEFLATER_MARKER | 2> finish restorethe obj headerdeflation
} | contentions=-max_jint | 2> finish the deflation}
+-------------------------+ }
- This diagram starts after "Racing Threads".
- The "1>" markers are showing where each thread is at for that ObjectMonitor box:
- T-enter and T-deflate both observe owner == DEFLATER_MARKER and a negative contentions field.
- T-enter has lost the race: it restores the obj header (not shown) and decrements contentions.
- T-deflate restores the obj header (not shown).
- The "2>" markers are showing where each thread is at for that ObjectMonitor box.
- T-enter returns false to cause the caller to retry.
- T-deflate finishes the deflation.
T-enter Wins
T-enter ObjectMonitor T-deflate
---------------------------------- +-------------------------+ ---------------------------------------------
enter() { | owner=DEFLATER_MARKER. | deflate_monitor_using_JT() {
atomic inc contentionsadd_to_contentions(1) | contentions=1 | cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
1> if (owner == DEFLATER_MARKER &&is_being_async_deflated()) { +-------------------------+ :
} contentions <= 0) { || prev = cmpxchg(-max_jint, &contentions, 0, -max_jint)
} 2> <continue contended enter> \/ 1> if (prev == 0) {
\/ 1> if (prev == 0 &&
2> <continue contended enter> +-------------------------+ owner == DEFLATER_MARKER)} else {
| owner=NULL | } else { try_set_owner_from(DEFLATER_MARKER, NULL)
| contentions=1 | cmpxchg(NULL, &owner, DEFLATER_MARKER)2> return
+-------------------------+ 2> return
- This diagram starts after "Racing Threads".
- The "1>" markers are showing where each thread is at for the ObjectMonitor box:
- T-enter and T-deflate both observe a contentions field > 0.
- T-enter has won the race and it continues with the contended enter protocol.
- T-deflate detects that it has lost the race (prev != 0) and bails out on deflating the ObjectMonitor:
- Before bailing out T-deflate tries to restore the owner field to NULL if it is still DEFLATER_MARKER.
- The "2>" markers are showing where each thread is at for that ObjectMonitor box.
...
- Note: The owner == DEFLATER_MARKER and contentions < 0 values that are set by T-deflate (stage one of async deflation) remain in place until after T-deflate does a handshake (or safepoint) operation with all JavaThreads. This handshake forces T-enter to make forward progress and see that the ObjectMonitor is being async deflated before T-enter checks in for the handshake.
T-enter Wins By Cancellation Via DEFLATER_MARKER Swap
T-enter ObjectMonitor T-deflate
-------------------------------------------- +-------------------------+ --------------------------------------------
ObjectMonitor::enter() { | owner=DEFLATER_MARKER | deflate_monitor_using_JT() {
increment contentions add_to_contentions(1) | contentions=1 | cmpxchg(try_set_owner_from(NULL, DEFLATER_MARKER, &owner, NULL)
1> EnterI() { +-------------------------+ 1> :
if (owner == try_set_owner_from(DEFLATER_MARKER && , || 2> : <thread_stalls>
cmpxchg(Self, &owner,Self) == DEFLATER_MARKER) { \/ :
// Add marker for cancellation DEFLATER_MARKER) +-------------------------+ :
add_to_contentions(1) == DEFLATER_MARKER) { | owner=Self/T-enter | :
// EnterI is done | contentions=02 | : <thread_resumes>
return +-------------------------+ prev = cmpxchg(-max_jint&contentions, &contentions0, 0-max_jint)
} || if (prev == 0) &&{
2> decrement contentionsadd_to_contentions(-1) \/ 3> owner == DEFLATER_MARKER)} else {
} // enter() is done +-------------------------+ } else {
2>if (try_set_owner_from(DEFLATER_MARKER,
: <does app work> | owner=Self/T-enter|NULL | cmpxchg( NULL, &owner,) != DEFLATER_MARKER) {
3> : | contentions=-max_jint1 | atomic add max_jint to _contentions(-1)
exit() monitor +-------------------------+ 4> bailout on deflation}
4> owner = NULL || 4> }bailout on deflation
\/ }
+-------------------------+
| owner=Self/T-enter|NULL |
| contentions=0 |
+-------------------------+
- T-deflate has executed cmpxchg() and set owner to DEFLATER_MARKER.
- T-enter has called ObjectMonitor::enter(), noticed that the owner is contended, increments contentions, and is about to call ObjectMonitor::EnterI().
- The first ObjectMonitor box is showing the fields at this point and the "1>" markers are showing where each thread is at for that ObjectMonitor box.
- T-deflate stalls after setting the owner field to DEFLATER_MARKER.
- T-enter calls EnterI() to do the contended enter work:
- EnterI() observes owner == sets the owner field from DEFLATER_MARKER and uses cmpxchg() to set the owner field to Self/T-enter.T-enter
- owns the monitor and returns from EnterI() .
T-enter decrements contentions and returns from enter()- increments contentions one extra time since it cancelled async deflation via a DEFLATER_MARKER swap.
- Note: The extra increment also makes the return value from is_being_async_deflated() stable; the previous A-B-A algorithm would allow the contentions field to flicker from 0 → -max_jint and back to zero. With the current algorithm, a negative contentions field value is a linearization point so once it is negative, we are committed to performing async deflation.
- T-enter
is now ready to do work that requires the monitor to be owned- owns the monitor and returns from EnterI() (contentions still has both increments).
- The second ObjectMonitor box is showing the fields at this point and the "2>" markers are showing where each thread is at for that ObjectMonitor box.
- T-enter decrements contentions and returns from enter() (contentions still has the extra increment).
- T-enter is now ready to do work that requires the monitor to be owned.
- T-enter is doing app work (but it also could have finished and exited the monitor and it still has the extra increment).
- T-deflate resumes, calls cmpxchg() tries to set the contentions field to -max_jint , and passes the first part of the bailout expression because "prev == 0"and fails because contentions == 1 (the extra increment comes into play!).
- The third ObjectMonitor box is showing the fields at this point and the "3>" markers are showing where each thread is at for that ObjectMonitor box.
- T-deflate performs the A-B-A check which observes that "owner != DEFLATER_MARKER" and bails out on deflation:
- Depending on when T-deflate resumes after the stall, it will see "owner == T-enter" or "owner == NULL".
Both of those values will cause deflation to bailout so we have to conditionally undo work:
tries to restore the owner field to NULL if it is still from DEFLATER_MARKER to NULL:- If it does not succeed, then the EnterI() call managed to cancel async deflation via a DEFLATER_MARKER swap so T-deflate decrements contentions to get rid of the extra increment that EnterI() did as a marker for this type of cancellation.
- If it does succeed, then EnterI() did not cancel async deflation via a DEFLATER_MARKER swap and we don't have an extra increment to get rid of.
- Note: For the previous bullet, async deflation is still cancelled because the ObjectMonitor is now busy with a contended enter
(it's not DEFLATER_MARKER)- undo setting contentions to -max_jint by atomically adding max_jint to contentions which will restore contentions to its proper value.
- If the T-enter thread has managed to enter but not exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
so we really have A1-B-A2, but the A-B-A principal still holds.
If the T-enter thread has managed to enter and exit the monitor during the T-deflate stall, then our owner field A-B-A transition is:
- so we really have A-B1-B2-A, but the A-B-A principal still holds.
T-enter finished doing app work and is about to exit the monitor (or it has already exited the monitor).
The fourth ObjectMonitor box is showing the fields at this point and the "4>" markers are showing where each thread is at for that ObjectMonitor box.
...
- If the object has an ObjectMonitor (i.e., is inflated) and if the ObjectMonitor has a hashcode, then the hashcode value can be safely carefully fetched from the ObjectMonitor and returned to the caller (T-hash). The first stage of a racing async deflation (by T-deflate) won't affect the hashcode value that is stored in an ObjectMonitor, i.e., the race is benign.If there is a race with async deflation, then we have to retry.
- There are several reasons why we might have to inflate the ObjectMonitor in order to set the hashcode:
- The object is neutral, does not contain a hashcode and we (T-hash) lost the race to try an install a hashcode in the mark word.
- The object is stack locked and does not contain a hashcode in the mark word.
- The object has an ObjectMonitor and the ObjectMonitor does not have a hashcode.
Note: In this case, the inflate() call on the common fall thru code path is almost always a no-op since the existing ObjectMonitor is not likely to be async deflated before inflate() sees that the object already has an ObjectMonitor and bails out.
...
((om_list_globals.population - om_list_globals.free_count) / om_list_globals.population) > NN%
- If MonitorBound is exceeded (default is 0 which means off), cleanup safepoint will be induced.
- For this option, exceeded means:
(om_list_globals.population - om_list_globals.free_count) > MonitorBound
...
- The MonitorBound option has been deprecated via JDK-8230938.
- Changes to the safepoint deflation mechanism by the Async Monitor Deflation project (when async deflation is enabled):
- If System.gc() is called, then a special deflation request is made which invokes the safepoint deflation mechanism.
- Added the AsyncDeflationInterval diagnostic option (default 250 millis, 0 means off) to prevent MonitorUsedDeflationThreshold requests from swamping the ServiceThread.
- Description: Async deflate idle monitors every so many milliseconds when MonitorUsedDeflationThreshold is exceeded (0 is off).
- A special deflation request can cause an async deflation to happen sooner than AsyncDeflationInterval.
- SafepointSynchronize::is_cleanup_needed() now calls:
- ObjectSynchronizer::is_safepoint_deflation_needed() instead of ObjectSynchronizer::is_cleanup_needed().
- is_safepoint_deflation_needed() returns true only if a special deflation request is made (see above).
- SafepointSynchronize::do_cleanup_tasks() now (indirectly) calls:
- ObjectSynchronizer::do_safepoint_work() instead of ObjectSynchronizer::deflate_idle_monitors().
- do_cleanup_tasks() can be called for non deflation related cleanup reasons and that will still result in a call to do_safepoint_work().
- ObjectSynchronizer::do_safepoint_work() only does the safepoint cleanup tasks if there is a special deflation request. Otherwise it just sets the is_async_deflation_requested flag and notifies the ServiceThread.
- ObjectSynchronizer::deflate_idle_monitors() and ObjectSynchronizer::deflate_thread_local_monitors() do nothing unless there is a special deflation request.
...
Gory Details
- Counterpart function mapping for those that know the existing code:
- ObjectSynchronizer class:
- deflate_idle_monitors() has deflate_idle_monitors_using_JT(), deflate_global_idle_monitors_using_JT(), deflate_per_thread_idle_monitors_using_JT(), and deflate_common_idle_monitors_using_JT().
- deflate_monitor_list() has deflate_monitor_list_using_JT()
- deflate_monitor() has deflate_monitor_using_JT()
- ObjectMonitor class:
- clear() has clear_using_JT()
- These functions recognize the Async Monitor Deflation protocol and adapt their operations:
- ObjectMonitor::enter()
- ObjectMonitor::EnterI()
- ObjectSynchronizer::quick_enter()
- ObjectSynchronizer::deflate_monitor()
- Note: These changes include handling the lingering owner == DEFLATER_MARKER value.
- Also these functions had to adapt and retry their operations:
- ObjectSynchronizer::FastHashCode()
- ObjectSynchronizer::inflate()
- Various assertions had to be modified to pass without their real check when AsyncDeflateIdleMonitors is true; this is due to the change in semantics for the ObjectMonitor owner field.
- ObjectMonitor has a new allocation_state field that supports three states: 'Free', 'New', 'Old'. Async Monitor Deflation is only applied to ObjectMonitors that have reached the 'Old' state.
- Note: Prior to CR1/v2.01/4-for-jdk13, the allocation state was transitioned from 'New' to 'Old' in deflate_monitor_via_JT(). This meant that deflate_monitor_via_JT() had to see an ObjectMonitor twice before deflating it. This policy was intended to prevent oscillation from 'New' → 'Old' and back again.
- In CR1/v2.01/4-for-jdk13, the allocation state is transitioned from 'New' -> "Old" in inflate(). This makes ObjectMonitors available for deflation earlier. So far there has been no signs of oscillation from 'New' → 'Old' and back again.
- The ObjectMonitor::owner() accessor detects DEFLATER_MARKER and returns NULL in that case to minimize the places that need to understand the new DEFLATER_MARKER value.
- System.gc()/JVM_GC() causes a special monitor list cleanup request which uses the safepoint based monitor list mechanism. So even if AsyncDeflateIdleMonitors is enabled, the safepoint based mechanism is still used by this special case.
- This is necessary for those tests that do something to cause an object's monitor to be inflated, clear the only reference to the object and then expect that enough System.gc() calls will eventually cause the object to be GC'ed even when the thread never inflates another object's monitor. Yes, we have several tests like that. :-)