Page History

...

Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching to lock-free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the lock-free list management code has to have a number of special cases which will be are described here.

The Simple Case

...

    L1:  while (true) {
    L2:    PaddedObjectMonitor* cur = OrderAccess::load_acquire(&g_block_list);
    L3:    OrderAccess::release_store(&new_blk[0]._next_om, = cur);
    L4:    if (Atomic::cmpxchg(new_blk, &g_block_list, cur) == cur) {
    L5:      Atomic::add(_BLOCKSIZE - 1, &g_om_population);
    L6:      break;
    L7:    }
    L8:  }

...

The above block of code can be called by multiple threads in parallel and does must not lose track of any blocks. Of course, the "does must not lose track of any blocks" part is where all the details come in:

- L2 load-acquires loads the current 'g_block_list' value into 'cur'; the use of g_block_list is only modified by cmpxchg so a load (instead of a load-acquire is necessary to get the latest value cmpxchg'ed by another thread) is sufficient.
- L3 release-stores 'cur' into the 0th element's next field for 'new_blk'; the use of release-store is necessary to make sure that proper next field value is visible as soon as ' g_block_list ' is updated to refer to 'new_blk' (publish it)is only modified by cmpxchg so a store (instead of a release-store) is sufficient.
- L4 is the critical decision point for this lock-free update. cmpxchg will change 'g_block_list' to 'new_blk' iff 'g_block_list' == 'cur' (publish it).
  - if the cmpxchg return value is 'cur', then we succeeded with the lock-free update and we atomically update 'g_om_population' to match.
  - Otherwise we loop around and do everything again from L2.

...

This example is considered to be the "simple case" because we only prepend to the list (no deletes) and we only use:

- one load-acquire
- one release-store and
- one cmpxchg

to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'g_om_population' counter is considered to be just accounting (pun intended).

...

Note: The above code snippet comes from ObjectSynchronizer::prepend_block_to_lists(); see that function for more complete context (and comments).

The Not So Simple Case or Taking and Prepending on the Same List Leads to A-B-A Races

The left hand column shows "Thread1" taking a node "A" from the front of the list and it shows the simple code that does that operation. The right hand column shows "Thread2" prepending a node "B" to the front of the list and it shows the simple code that does that operation. We have a third thread, "Thread3", that does a take followed by a prepend, but we don't show a column for "Thread3". Instead we have a column in the middle that shows the results of the interleaved operations of all three threads:

    T1: Simple Take:                           |                                            | T2: Simple Prepend:
    ----------------                           |  T1 and T3 see this initial list:          | -------------------
            +---+    +---+    +---+            |          +---+    +---+    +---+           |          +---+    +---+
    head -> | A | -> | X | -> | Y |            |  head -> | A | -> | X | -> | Y |           |  head -> | X | -> | Y |
            +---+    +---+    +---+            |          +---+    +---+    +---+           |          +---+    +---+
                                               |  T3 takes "A", T2 sees this list:          |
    Take a node     ||                         |          +---+    +---+                    |  Prepend a node  ||
    from the front  ||                         |  head -> | X | -> | Y |                    |  to the front    ||
    of the list     ||                         |          +---+    +---+                    |  of the list     ||
                    \/                         |  T2 prepends "B":                          |                  \/
                                               |          +---+    +---+    +---+           |
            +---+    +---+                     |  head -> | B | -> | X | -> | Y |           |          +---+    +---+    +---+
    head -> | X | -> | Y |                     |          +---+    +---+    +---+           |  head -> | B | -> | X | -> | Y |
            +---+    +---+                     |  T3 prepends "A":                          |          +---+    +---+    +---+
           +---+                               |          +---+    +---+    +---+    +---+  |
    cur -> | A |                               |  head -> | A | -> | B | -> | X | -> | Y |  |
           +---+                               |          +---+    +---+    +---+    +---+  |
                                               |  T1 takes "A", loses "B":                  |
    // "take" a node:                          |  +---+                                     |  // "prepend" a node:
    while (true) {                             |  | B | ----+                               |  while (true) {
      cur = head;                              |  +---+     |                               |    cur = head;
      next = cur->next;                        |            V                               |    new->next = cur;
      if (cmpxchg(next, &head, cur) == cur) {  |          +---+    +---+                    |    if (cmpxchg(new, &head, cur) == cur) {
        break;  // success changing head       |  head -> | X | -> | Y |                    |      break; // success changing head
      }                                        |          +---+    +---+                    |    }
    }                                          |         +---+                              |  }
    return cur;                                |  cur -> | A |                              |
                                               |         +---+                              |

The "Simple Take" and "Simple Prepend" algorithms are just fine by themselves. The "Simple Prepend" algorithm is almost identical to the algorithm in the "The Simple Case" and just like that algorithm, it works fine if we are only doing prepend operations on the list. Similarly, the "Simple Take" algorithm works just fine if we are only doing take operations on the list; the only thing missing is an empty list check, but that would have clouded the example.

When we allow simultaneous take and prepend operations on the same list, the simple algorithms are exposed to A-B-A races. An A-B-A race is a situation where the head of the list can change from node "A" to node "B" and back to node "A" again without the simple algorithm being aware that critical state has changed. In the middle column of the above diagram, we show what happens when Thread3 causes the head of the list to change from node "A" to node "B" (a take operation) and back to node "A" (a prepend operation). That A-B-A race causes Thread1 to lose node "B" when it updates the list head to node "X" instead of node "B" because Thread1 was unaware that the 'next' value in its code was stale.

Here's the diagram again with the code in Thread1 and Thread2 lined up with the effects of the A-B-A race executed by Thread3:

    T1: Simple Take:                           |                                            | T2: Simple Prepend:
    ----------------                           |  T1 and T3 see this initial list:          | -------------------
    while (true) {                             |          +---+    +---+    +---+           |  :
      cur = head;                              |  head -> | A | -> | X | -> | Y |           |  :
      next = cur->next;                        |          +---+    +---+    +---+           |  :
      :                                        |  T3 takes "A", T2 sees this list:          |  :
      :                                        |          +---+    +---+                    |  :
      :                                        |  head -> | X | -> | Y |                    |  :
      :                                        |          +---+    +---+                    |  while (true) {
      :                                        |  T2 prepends "B":                          |    cur = head; 
      :                                        |          +---+    +---+    +---+           |    new->next = cur;
      :                                        |  head -> | B | -> | X | -> | Y |           |    if (cmpxchg(new, &head, cur) == cur) {
      :                                        |          +---+    +---+    +---+           |      break;
      :                                        |  T3 prepends "A":                          |    }
      :                                        |          +---+    +---+    +---+    +---+  |  }
      :                                        |  head -> | A | -> | B | -> | X | -> | Y |  |
      :                                        |          +---+    +---+    +---+    +---+  |
      :                                        |  T1 takes "A", loses "B":                  |
      :                                        |  +---+                                     |
      :                                        |  | B | ----+                               |
      :                                        |  +---+     |                               |
      :                                        |            V                               |
      :                                        |          +---+    +---+                    |
      if (cmpxchg(next, &head, cur) == cur) {  |  head -> | X | -> | Y |                    |
      }                                        |          +---+    +---+                    |
    }                                          |         +---+                              |
    return cur;                                |  cur -> | A |                              |
                                               |         +---+                              |

So the simple algortihms are not sufficient when we allow simultaneous take and prepend operations.

Marking to Solve the A-B-A Race

One solution to the A-B-A race is to mark the next field in a node to indicate that the node is busy. Only one thread can successfully mark the next field in a node at a time and other threads must loop around and retry their marking operation until they succeed. Each thread that marks the next field in a node must unmark the next field when it is done with the node so that other threads can proceed.

Here's the take algorithm modified with marking (still ignores the empty list for clarity):

    // "take" a node with marking:
    while (true) {
      cur = head;
      if (!mark_next(cur, &next)) {
        // could not mark cur so try again
        continue;
      }
      if (head != cur) {
        // head changed while marking cur so try again
        unmark_next(cur);
        continue;
      }
      // list head is now marked so switch it to next which also makes list head unmarked
      Atomic::release_store(head, next);
      unmark_next(cur);  // unmark cur and return it
      return cur;
    }

The modified take algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the take algorithm from executing in parallel with a prepend algorithm and losing a node.

Here's the prepend algorithm modified with marking (ignores the empty list for clarity):

    // "prepend" a node with marking:
    while (true) {
      cur = head;
      if (!mark_next(cur, &next)) {
        // could not mark cur so try again
        continue;
      }
      if (head != cur) {
        // head changed while marking cur so try again
        unmark_next(cur);
        continue;
      }
      // list head is now marked so switch it to 'new' which also makes list head unmarked
      Atomic::release_store(head, new);
      unmark_next(cur);  // unmark the previous list head
    }

The modified prepend algorithm does not change the list head pointer until it has successfully marked the list head node. Notice that after we mark the list head node we have to verify that the list head pointer hasn't changed in the mean time. Only after we have verified that the node we marked is still the list head is it safe to modify the list head pointer. The marking of the list head prevents the prepend algorithm from executing in parallel with the take algorithm and losing a node.

Prepending To A List That Also Allows Deletes

...

Child pages

Versions Compared

Old Version 52

New Version 53

Key

The Simple Case

The Not So Simple Case or Taking and Prepending on the Same List Leads to A-B-A Races

Marking to Solve the A-B-A Race

Prepending To A List That Also Allows Deletes