...
Please note that in Carsten's original prototype, there was another race in ObjectSynchronizer::FastHashCode() when the object's monitor had to be inflated. The setting of the hashcode in the ObjectMonitor's header/dmw could race with T-deflate. That race is resolved in this version by the use of an ObjectMonitorHandle in the call to ObjectSynchronizer::inflate(). The ObjectMonitor* returned by ObjectMonitorHandle.om_ptr() has a non-zero ref_count so no additional races with T-deflate are possible.
Lock-Free Monitor List Management In Theory
Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching to lock-free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the lock-free list management code has to have a number of special cases which are described here.
...
The Not So Simple Case or Taking and Prepending on the Same List Leads to A-B-A Races
Note: This subsection is talking about "Simple Take" and "Simple Prepend" in abstract terms. The purpose of this code and A-B-A example is to introduce the race concepts. The code shown here is not an exact match for the project code and the specific A-B-A example is not found in the project code.
The left hand column shows "Thread1" taking a node "A" from the front of the list and it shows the simple code that does that operation. The right hand column shows "Thread2" prepending a node "B" to the front of the list and it shows the simple code that does that operation. We have a third thread, "Thread3", that does a take followed by a prepend, but we don't show a column for "Thread3". Instead we have a column in the middle that shows the results of the interleaved operations of all three threads:
...
Marking to Solve the A-B-A Race
Note: This subsection is talking about "Marking" as a solution to the A-B-A race in abstract terms. The purpose of this marking code and A-B-A example is to introduce the solution concepts. The code shown here is not an exact match for the project code.
One solution to the A-B-A race is to mark the next field in a node to indicate that the node is busy. Only one thread can successfully mark the next field in a node at a time and other threads must loop around and retry their marking operation until they succeed. Each thread that marks the next field in a node must unmark the next field when it is done with the node so that other threads can proceed.
...
Also notice that we update the list head pointer with release-store instead of with cmpxchg for the same reasons as the previous algorithm.the previous algorithm.
Background: ObjectMonitor Movement Between the Lists
The purpose of this subsection is to provide background information about how ObjectMonitors move between the various lists. This project changes the way these movements are implemented, but does not change the movements themselves. For example, newly allocated blocks of ObjectMonitors are always prepending to the global free list; this is true in the baseline and is true in this project.
ObjectMonitor Allocation Path
- ObjectMonitors are allocated by ObjectSynchronizer::om_alloc().
- Assume that the calling JavaThread has an empty free list and the global free list is also empty:
- A block of ObjectMonitors is allocated by the calling JavaThread and prepended to the global free list.
- ObjectMonitors are taken from the front of the global free list by the calling JavaThread and prepended to the JavaThread's free list by ObjectSynchronizer::om_release().
- An ObjectMonitor is taken from the front of the JavaThread's free list and prepended to the JavaThread's in-use list (optimistically).
ObjectMonitor Deflation Path
ObjectMonitors are deflated at a safepoint by:
ObjectSynchronizer::deflate_monitor_list() calling ObjectSynchronizer::deflate_monitor()
And when Async Monitor Deflation is enabled, the are deflated by:
ObjectSynchronizer::deflate_monitor_list_using_JT() calling ObjectSynchronizer::deflate_monitor_using_JT()
Idle ObjectMonitors are deflated by the ServiceThread when Async Monitor Deflation is enabled. They can also be deflated at a safepoint by the VMThread or by a task worker thread. Safepoint deflation is used when Async Monitor Deflation is disabled or when there is a special deflation request made, e.g., System.gc().
An idle ObjectMonitor is deflated and extracted from its in-use list and prepended to the global free list. The in-use list can be either the global in-use list or a per-thread in-use list. Deflated ObjectMonitors are always prepended to the global free list.
ObjectMonitor Flush Path
- ObjectMonitors are flushed by ObjectSynchronizer::om_flush().
- When a JavaThread exits, the ObjectMonitors on its in-use list are prepended on the global in-use list and the ObjectMonitors on its free list are prepended on the global free list.
ObjectMonitor Linkage Path
- ObjectMonitors are linked with objects by ObjectSynchronizer::inflate().
- An inflate() call by one JavaThread can race with an inflate() call by another JavaThread for the same object.
- When inflate() realizes that it failed to link an ObjectMonitor with the target object, it calls ObjectSynchronizer::om_release() to extract the ObjectMonitor from the JavaThread's in-use list and prepends it on the JavaThread's free list.
Note: Remember that ObjectSynchronizer::om_alloc() optimistically added the newly allocated ObjectMonitor to the JavaThread's in-use list. - When inflate() successfully links an ObjectMonitor with the target object, that ObjectMonitor stays on the JavaThread's in-use list.
Lock-Free Monitor List Management In Reality
Prepending To A List That Also Allows Deletes
...
The above function loops until it marks the next field of the target ObjectMonitor. The unmarked value of the next field is returned by the function. There is nothing particularly special about this function so we don't need any line specific annotations.
Debugging Tip: If there's a bug where an ObjectMonitor's next field is not properly unmarked, then this function will loop forever and the caller will be stuck.
mark_list_head() Helper Function
...
When this function returns true, the next field in 'mid_p' is marked and any parallel callers of mark_list_head() on the same list will be looping until the next field in the list head's ObjectMonitor is no longer marked. The caller that just got the 'true' return needs to finish up its work with 'mid_p' quickly.
Debugging Tip: If there's a bug where the list head ObjectMonitor's next field is not properly unmarked, then this function will loop forever and the caller will be stuck.
Taking From The Start Of A List
...
The function can be called by more than one thread at a time and each thread will take a unique ObjectMonitor from the start of the list (if one is available) without losing any other ObjectMonitors on the list. Of course, the "take a unique ObjectMonitor" and "without losing any other ObjectMonitors" parts are where all the details come in:
- L03 L06 tries to mark the list head:
- mark_list_head() returns false if the list is empty so we return NULL on L07.
- Otherwise, 'take' is a pointer to the marked list head and 'next' is the unmarked next field in the list head.
- L11 release-stores 'next' into 'list_p'.
You might be asking yourself: Why release-store instead of cmpxchg?- mark_list_head() only returns to the caller when it has marked the next field in the ObjectMonitor at the head of the list.
- Because of that guarantee, any prepender or deleter thread that is running in parallel must loop until we have release-stored 'next' into 'list_p' which unmarks the list head.
- L12 decrements the counter referred to by 'count_p'.
- L16 unmarks 'take' using the unmarked 'next' value we got from mark_list_head():
- Keeping the 'next' value in take's next field allows any lagging list walker to get to the next ObjectMonitor on that list.
- take's next field will get cleaned up when take is prepended to its target in-use list.
- L17 returns 'take' to the caller.
...
Most of the above code block extracts 'm' from self's in-use list; it is not an exact quote from om_release(), but it is the highlights:
- L02 is used to mark self's in-use list head:
- 'mid' is self's in-use list head and its next field is marked.
- 'next' is the unmarked next field from 'mid'.
- L03 → L20: self's in-use list is traversed looking for the target ObjectMonitor 'm':
- L04: if the current 'mid' matches 'm':
- L05: if we can't cmpxchg self's in-use list head to the 'next' ObjectMonitor*:
Note: This cmpxchg only works if 'm' is self's in-use list head and no-harm-no-foul if 'm' is not self's in-use list head. This is faster than doing a load-acquire of self's in-use list head, checking the value and then calling cmpxchg.
- L06: make a marked copy of 'mid'
- L07: we cmpxchg cur_mid_in_use's next field from 'marked_mid' to 'next'.
Note: We use cmpxchg here instead of release-store so that we can sanity check the result; see the real code. - Note: In v2.07, I figured out a simpler way to do L05-L08 so the above bullets will be updated.
- L09 → L12: we've successfully extracted 'm' from self's in-use list so we decrement self's in-use counter, unmark the next field in 'mid' and we're done.
- L1[45]: if cur_mid_in_use != NULL, then unmark its next field.
- L17: set 'cur_mid_in_use' to 'mid'
Note: cur_mid_in_use keeps the marked next field so that it remains stable for a possible next field change. It cannot be deflated while it is marked. - L18: set 'mid' to 'next'.
- L19: mark next field in the new 'mid' and update 'next'; loop around and do it all again.
...
Note: In v2.07, I figured out a simpler way to do L15-L17L16:
L15: if (cur_mid_in_use == NULL) {
L16: OrderAccess::release_store(list_p, next);
L17: } else {
L18: OrderAccess::release_store(&cur_mid_in_use->_next_om, next);
L19: }...
- L09 marks the 'list_p' head (if it is not empty):
- 'mid' is 'list_p's head and its next field is marked.
- 'next' is the unmarked next field from 'mid'.
- L12-L32: We walk each 'mid' in the list and determine if it can be deflated:
- L14: if 'mid' is associated with an object and can be deflated:
- L15: if we can't cmpxchg 'list_p' to the 'next' ObjectMonitor*:
Note: This cmpxchg only works if 'mid' is 'list_p's list head and no-harm-no-foul if 'mid' is not 'list_p's list head. This is faster than doing a load-acquire of 'list_p', checking the value and then calling cmpxchg.
- L16: we cmpxchg cur_mid_in_use's next field from 'mid' to 'next'.
Note: We use cmpxchg here instead of release-store so that we can sanity check the result; see the real code. - Note: In v2.07, I figured out a simpler way to do L15-L16 so the above bullets will be updated.
- L18 → L21: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unmarks it).
...
Note: In v2.07, I figured out a simpler way to do L31-L35L34:
L31: if (cur_mid_in_use == NULL) {
L32: OrderAccess::release_store(list_p, next);
L33: } else {
L34: ObjectMonitor* marked_next = mark_om_ptr(next);
L35: OrderAccess::release_store(&cur_mid_in_use->_next_om, marked_next);
L36: }...
Since we're using the more complicated mark-cur_mid_in_use-and-mid-as-we-go protocol and also the mark-next-next-as-we-go protocol, there is a mind numbing amount of detail:
- L1[141-3]: Handle the initial setup if we are not resuming after a safepoint:
- L12 marks the 'list_p' head (if it is not empty):
- 'mid' is 'list_p's head and its next field is marked.
- 'next' is the unmarked next field from 'mid'.
- L15-L23: Handle the initial setup if we are resuming after a safepoint:
- L17: mark next field in 'cur_mid_in_use' and update 'mid'
- L18-L21: If 'mid' == NULL, then we've resumed context at the end of the list so we're done.
- L23: mark next field in 'mid' and update 'next'
- L25-L65L63: We walk each 'mid' in the list and determine if it can be deflated:
- L2[67]: if next != NULL, then mark next field in 'next' and update 'next_next'
- L29-L30L40: if 'mid' is associated with an object, 'mid' is old, and can be deflated:
- L31: if we can't cmpxchg 'list_p' to the 'next' ObjectMonitor*:
Note: This cmpxchg only works if 'mid' is 'list_p's list head and no-harm-no-foul if 'mid' is not 'list_p's list head. This is faster than doing a load-acquire of 'list_p', checking the value and then calling cmpxchg.- L32: make a marked copy of 'mid'
- L33: make a marked copy of 'next'
- L34: we cmpxchg cur_mid_in_use's next field from 'marked_mid' to 'marked_next'.
Note: We use cmpxchg here instead of release-store so that we can sanity check the result; see the real code. - Note: In v2.07, I figured out a simpler way to do L31-L34 so the above bullets will be updated.
- L36 → L38: we've successfully extracted 'mid' from 'list_p's list so we increment 'deflated_count', decrement the counter referred to by 'count_p', set 'mid's next field to NULL and we're done.
Note: 'mid' is the current tail in the 'free_head_p' list so we have to NULL terminate it (which also unmarks it). - L39: advance 'mid' to 'next'.
Note: 'mid' keeps non-NULL 'next's marked next field. - L40: advance 'next' to 'next_next'.
- L41-L57L56: 'mid' can't be deflated so we have to carefully advance the list pointers:
- L4[23]: if cur_mid_in_use != NULL, then unmark next field in 'cur_mid_in_use'.
- L45: advance 'cur_mid_in_use' to 'mid'.
Note: The next field in 'mid' is still marked and 'cur_mid_in_use' keeps that. - L46: advance 'mid' to 'next'.
Note: The next field in a non-NULL 'next' is still marked and 'mid' keeps that. - L47: advance 'next' to 'next_next'.
- L48-L56: Handle a safepoint if one has started and it is safe to do so.
- L59-L63: we reached the end of the list:
- L6[01]: if cur_mid_in_use != NULL, then unmark next field in 'cur_mid_in_use'.
- L63: break out of the loop because we are done
- L66: not pausing for a safepoint so clear saved state.
- L67: all done so return 'deflated_count'.
...