Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add "The Simple Case" sub-section to new "Lock-Free Monitor List Management" top-level section.

...

Please note that in Carsten's original prototype, there was another race in ObjectSynchronizer::FastHashCode() when the object's monitor had to be inflated. The setting of the hashcode in the ObjectMonitor's header/dmw could race with T-deflate. That race is resolved in this version by the use of an ObjectMonitorHandle in the call to ObjectSynchronizer::inflate(). The ObjectMonitor* returned by ObjectMonitorHandle.om_ptr() has a non-zero ref_count so no additional races with T-deflate are possible.

Lock-Free Monitor List Management

Use of specialized measurement code with the CR5/v2.05/8-for-jdk13 bits revealed that the gListLock contention is responsible for much of the performance degradation observed with SPECjbb2015. Consequently the primary focus of the next round of changes is/was on switching to lock-free monitor list management. Of course, since the Java Monitor subsystem is full of special cases, the lock-free list management code has to have a number of special cases which will be described here.

The Simple Case

There is one simple case of lock-free list management with the Java Monitor subsystem so we'll start with that code as a way to introduce the lock-free concepts:

 L1: while (true) {
L2: PaddedObjectMonitor* cur = OrderAccess::load_acquire(&g_block_list);
L3: OrderAccess::release_store(&new_blk[0]._next_om, cur);
L4: if (Atomic::cmpxchg(new_blk, &g_block_list, cur) == cur) {
L5: Atomic::add(_BLOCKSIZE - 1, &g_om_population);
L6: break;
L7: }
L8: }

What the above block of code does is:

    • prepends a 'new_blk' to the front of 'g_block_list'
    • increments the 'g_om_population' counter to include the number of new elements

The above block of code can be called by multiple threads in parallel and does not lose track of any blocks. Of course, the "does not lose track of any blocks" part is where all the details come in:

    • L2 load-acquires the current 'g_block_list' value into 'cur'; the use of load-acquire is necessary to get the latest value cmpxchg'ed by another thread.
    • L3 release-stores 'cur' into the 0th element's next field for 'new_blk'; the use of release-store is necessary to make sure that proper next field value is visible as soon as 'g_block_list' is updated to refer to 'new_blk' (publish it).
    • L4 is the critical decision point for this lock-free update. cmpxchg will change 'g_block_list' to 'new_blk' iff 'g_block_list' == 'cur' (publish it).
      • if the cmpxchg return value is 'cur', then we succeeded with the lock-free update and we atomically update 'g_om_population' to match.
      • Otherwise we loop around and do everything again from L2.

At the point that cmpxchg has published the new 'g_block_list' value, 'new_blk' is now first block in the list and the 0th element's next field is used to find the previous first block; all of the monitor list blocks are chained together via the next field in the block's 0th element. It is the use of cmpxchg to update 'g_block_list' and the checking of the return value from cmpxchg that insures that we don't lose track of any blocks.

This example is considered to be the "simple case" because we only prepend to the list (no deletes) and we only use:

    • one load-acquire
    • one store-release and
    • one cmpxchg

to achieve the safe update of the 'g_block_list' value; the atomic increment of the 'g_om_population' counter is considered to be just accounting (pun intended).

The concepts introduced here are:

    • update the new thing to refer to head of existing list
    • try to update the head of the existing list to refer to the new thing
    • retry as needed

Note: The above code snippet comes from ObjectSynchronizer::prepend_block_to_lists(); see that function for more complete context (and comments).

Housekeeping Parts of the Algorithm

...