- Loading...
Instead of allocating storage to store objects' identity hash-code ('i-hash'), we propose to only allocate that storage when needed, that is, when an application calls System.identityHashCode(). This allows to reduce the size of object headers to 32 bits/4 bytes.
The idea is to do the following:
We still need 2 bits in the object header to indicate which state the object is in. The meaning of those 2 bits are the following:
Question: Isn't there a problem when objects can grow, can we end up with a larger heap occupancy after GC than we had before? Why is this ok/how do we deal with this?
First of all, let's narrow the problem. We don't have to solve the legit OOM that happens because the heap is too small to accommodate all allocations. And I-hash-code is also an allocation (even with legacy i-hash - we just currently allocate it up-front). Also, existing code would continue to work with the same configuration, because we are saving (on average) 4 bytes per object, and require (maximum) 4 bytes per object for I-hash. It’s still a net-win.
What we do need to solve is malfunctioning/misbehaving GC because of unexpected heap/space/region growth.
Also, consider that this is mostly a problem for STW GCs (IMO) only. Concurrent GCs *already* have the problem that they can OOM during GC because of concurrent allocations. It’s not really interesting if that allocation comes from concurrent running Java threads or because of expanding I-hash. Concurrent GCs somehow have to deal with it already, and as stated above, we don’t have to solve legit OOMs. But again, GCs must not misbehave.
Let’s look at the various GC algorithms.
Full-GC/mark-compact (Serial, G1, Shenandoah) have an interesting property which leads to never-expanding overall heap usage. Objects are always ‘sliding’ towards the bottom (for G1 and Shenandoah it’s more complex, but conceptually the same). An object either does not move at all (e.g. already at the bottom), or moves towards the bottom. When an i-hashed object moves towards the bottom, it may have to be expanded, which makes it use one more word. This can lead to the situation that a subsequent object does *not* have to move at all. If that subsequent object is I-hashed, it would *not* have to be expanded. It can never happen that an object is moved towards the top. Therefore we can conclude that mark-compact GCs would never produce a heap that is larger than before GC, and therefore GC operation can not fail. (Yes, it is possible that it does not free up any space, but that would be a legit OOM and user should configure more heap to begin with.)
The story is different for 2-space compaction (Serial, Parallel young GCs): It is very well possible that scavenging from-space could end up requiring more memory in to-space, which would be a problem because of GC malfunction. The proposal to address this problem is to promote all objects that need to grow straight into old-gen. This seems like a reasonable thing to do - keys of a hash-table are more likely to be longer-lived anyway. Once they are in old-den, the mark-compact reasoning holds.
Region-based collectors (G1, Shenandoah, ZGC) don’t really have the problem: if they run OOM during GC because of expanding objects, it must be a legit OOM. They would not misbehave otherwise. G1 and Shenandoah even do a last-ditch mark-compact, for which the above reasoning holds.