- Loading...
...
Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching to lock-free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the lock-free list management code has to have a number of special cases which will be are described here.
...
L1: while (true) {
L2: PaddedObjectMonitor* cur = OrderAccess::load_acquire(&g_block_list);
L3: OrderAccess::release_store(&new_blk[0]._next_om, = cur);
L4: if (Atomic::cmpxchg(new_blk, &g_block_list, cur) == cur) {
L5: Atomic::add(_BLOCKSIZE - 1, &g_om_population);
L6: break;
L7: }
L8: }...
The above block of code can be called by multiple threads in parallel and does must not lose track of any blocks. Of course, the "does must not lose track of any blocks" part is where all the details come in:
...
This example is considered to be the "simple case" because we only prepend to the list (no deletes) and we only use:
to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'g_om_population' counter is considered to be just accounting (pun intended).
...
Note: The above code snippet comes from ObjectSynchronizer::prepend_block_to_lists(); see that function for more complete context (and comments).
The left hand column shows "Thread1" taking a node "A" from the front of the list and it shows the simple code that does that operation. The right hand column shows "Thread2" prepending a node "B" to the front of the list and it shows the simple code that does that operation. We have a third thread, "Thread3", that does a take followed by a prepend, but we don't show a column for "Thread3". Instead we have a column in the middle that shows the results of the interleaved operations of all three threads:
T1: Simple Take: | | T2: Simple Prepend:
---------------- | T1 and T3 see this initial list: | -------------------
+---+ +---+ +---+ | +---+ +---+ +---+ | +---+ +---+
head -> | A | -> | X | -> | Y | | head -> | A | -> | X | -> | Y | | head -> | X | -> | Y |
+---+ +---+ +---+ | +---+ +---+ +---+ | +---+ +---+
| T3 takes "A", T2 sees this list: |
Take a node || | +---+ +---+ | Prepend a node ||
from the front || | head -> | X | -> | Y | | to the front ||
of the list || | +---+ +---+ | of the list ||
\/ | T2 prepends "B": | \/
| +---+ +---+ +---+ |
+---+ +---+ | head -> | B | -> | X | -> | Y | | +---+ +---+ +---+
head -> | X | -> | Y | | +---+ +---+ +---+ | head -> | B | -> | X | -> | Y |
+---+ +---+ | T3 prepends "A": | +---+ +---+ +---+
+---+ | +---+ +---+ +---+ +---+ |
cur -> | A | | head -> | A | -> | B | -> | X | -> | Y | |
+---+ | +---+ +---+ +---+ +---+ |
| T1 takes "A", loses "B": |
// "take" a node: | +---+ | // "prepend" a node:
while (true) { | | B | ----+ | while (true) {
cur = head; | +---+ | | cur = head;
next = cur->next; | V | new->next = cur;
if (cmpxchg(next, &head, cur) == cur) { | +---+ +---+ | if (cmpxchg(new, &head, cur) == cur) {
break; // success changing head | head -> | X | -> | Y | | break; // success changing head
} | +---+ +---+ | }
} | +---+ | }
return cur; | cur -> | A | |
| +---+ |
The "Simple Take" and "Simple Prepend" algorithms are just fine by themselves. The "Simple Prepend" algorithm is almost identical to the algorithm in the "The Simple Case" and just like that algorithm, it works fine if we are only doing prepend operations on the list. Similarly, the "Simple Take" algorithm works just fine if we are only doing take operations on the list; the only thing missing is an empty list check, but that would have clouded the example.
When we allow simultaneous take and prepend operations on the same list, the simple algorithms are exposed to A-B-A races. An A-B-A race is a situation where the head of the list can change from node "A" to node "B" and back to node "A" again without the simple algorithm being aware that critical state has changed. In the middle column of the above diagram, we show what happens when Thread3 causes the head of the list to change from node "A" to node "B" (a take operation) and back to node "A" (a prepend operation). That A-B-A race causes Thread1 to lose node "B" when it updates the list head to node "X" instead of node "B" because Thread1 was unaware that the 'next' value in its code was stale.
Here's the diagram again with the code in Thread1 and Thread2 lined up with the effects of the A-B-A race executed by Thread3:
T1: Simple Take: | | T2: Simple Prepend:
---------------- | T1 and T3 see this initial list: | -------------------
while (true) { | +---+ +---+ +---+ | :
cur = head; | head -> | A | -> | X | -> | Y | | :
next = cur->next; | +---+ +---+ +---+ | :
: | T3 takes "A", T2 sees this list: | :
: | +---+ +---+ | :
: | head -> | X | -> | Y | | :
: | +---+ +---+ | while (true) {
: | T2 prepends "B": | cur = head;
: | +---+ +---+ +---+ | new->next = cur;
: | head -> | B | -> | X | -> | Y | | if (cmpxchg(new, &head, cur) == cur) {
: | +---+ +---+ +---+ | break;
: | T3 prepends "A": | }
: | +---+ +---+ +---+ +---+ | }
: | head -> | A | -> | B | -> | X | -> | Y | |
: | +---+ +---+ +---+ +---+ |
: | T1 takes "A", loses "B": |
: | +---+ |
: | | B | ----+ |
: | +---+ | |
: | V |
: | +---+ +---+ |
if (cmpxchg(next, &head, cur) == cur) { | head -> | X | -> | Y | |
} | +---+ +---+ |
} | +---+ |
return cur; | cur -> | A | |
| +---+ |
So the simple algortihms are not sufficient when we allow simultaneous take and prepend operations.
One solution to the A-B-A race is to mark the next field in a node to indicate that the node is busy. Only one thread can successfully mark the next field in a node at a time and other threads must loop around and retry their marking operation until they succeed. Each thread that marks the next field in a node must unmark the next field when it is done with the node so that other threads can proceed.
Here's the take algorithm modified with marking (still ignores the empty list for clarity):
// "take" a node with marking:
while (true) {
cur = head;
if (!mark_next(cur, &next)) {
// could not mark cur so try again
continue;
}
if (head != cur) {
// head changed while marking cur so try again
unmark_next(cur);
continue;
}
// list head is now marked so switch it to next which also makes list head unmarked
Atomic::release_store(head, next);
unmark_next(cur); // unmark cur and return it
return cur;
}
The modified take algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the take algorithm from executing in parallel with a prepend algorithm and losing a node.
Here's the prepend algorithm modified with marking (ignores the empty list for clarity):
// "prepend" a node with marking:
while (true) {
cur = head;
if (!mark_next(cur, &next)) {
// could not mark cur so try again
continue;
}
if (head != cur) {
// head changed while marking cur so try again
unmark_next(cur);
continue;
}
// list head is now marked so switch it to 'new' which also makes list head unmarked
Atomic::release_store(head, new);
unmark_next(cur); // unmark the previous list head
}
The modified prepend algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the prepend algorithm from executing in parallel with the take algorithm and losing a node.
...