- Loading...
The current implementation of ParallelGC's the Parallel Full GC is not suitable too rigid for Lilliput 2. In particular's Compact Identity Hash-Code. Specifically, it does not work with Compact Identity Hashcodeallow for objects to be resized/expanded when they move, which is a requirement for Compact Identity Hash-Code. The reason for that is the compressor-style algorithm that is used in this Parallel Full GC. This algorithm precomputes forwarding addresses of objects in bulk (e.g. 64-word-blocks) and then only requires a few additional computations to get the true forwarding address of each object. The problem with this approach is that it is too rigid and does not allow for expanding of moving objects, which is necessary for Compact Identity Hashcode. Here I want to outline a parallel mark-compact algorithm that does not have this restriction.
The basic idea is to use an algorithm that is similar to the one that's used by G1 and Shenandoah. We assume that the first phase of the full GC identifies all reachable objects by marking them in a marking bitmap, as is already implemented by the current implementation. Unless G1 and Shenandoah, the Parallel GC does not divide the heap into regions. This is a problem, because division into regions is the important feature that facilitates the parallelism in G1 and Shenandoah. However, the current full GC implementation provides machinery to partition the heap spaces into regions and deal with objects that overlap region boundaries.
The forwarding phase is the 2nd phase of the full GC, and starts after marking completed. This phase scans the heap and computes the forwarding addresses for all reachable objects.
The work of calculating the forwarding addresses is carried by multiple GC workers, therefore we need a way to divide up the work between the GC threads. This is achieved as follows:
The heap is divided into N equal-sized regions, and we start out with a list of regions. This list comprises the whole heap.
...
lies in the fact that the algorithm parallelizes by dividing up the heap into equal-sized regions (with overlapping objects), and pre-compute destination boundaries for each region by inspecting the size of each live object per region. However, we can't determine the size of objects until we know whether or not an object will move at all. This is further complicated by the fact that we cannot even assume that only a dense prefix will not move - the expansion of moved objects can lead to a situation where a subsequent object would not move.
The basic idea is to not make assumptions about object sizes, and instead determine the destination location more dynamically. We can adapt the algorithm that is used by G1 and Shenandoah GC. The difficulty is that in G1 and Shenandoah, regions are a bit more rigid in that they don't allow objects that cross region boundaries. That property makes parallelization much easier because worker threads can fully own a region without potential interference from other worker threads.
Therefore we need to make regions of more flexible sizes. In the (single-threaded) summary phase that follows after marking and precedes compaction, we set up our list of regions by starting out with equal-sized regions, and then adjusting each region's bottom upwards to be the first word of the region that is not an overlapping object, and adjust its end upwards to the first word that is not an overlapping object (which will also be the bottom of the subsequent region).
With those more flexible regions set-up, we can basically 1:1 adapt G1/Shenandoah's algorithm for the forwarding and compaction phases. Forwarding works like this:
Similarily, we can also 1:1 adapte G1/Shenandoah's compaction phase.