Multi-mapped memory can be used to add metadata bits to a pointer without changing which object it points to. ZGC uses it to implement transparent support of its colored pointers. First, here's a description of how metadata bits can be added to pointers without using multi-mapped memory or extra hardware support.
Metadata bits without multi-mapped memory
In an ordinary pointer all the bits are used to describe the address of an object. Two pointers with different bit patterns will point to different objects. Take the following to pointers as and example: 0x13210 and 0x23210. They point to two different objects.
Hex : Binary Address bits: 0x13210 : 0001 0011 0010 0001 0000 Address bits: 0x23210 : 0010 0011 0010 0001 0000
When adding metadata bits to pointers we don't want those bits to change which object it points to. So, the bits of the pointers are split into two parts, one for the address of the pointed-to object, and one for the metadata. We can choose where we make the split, but for the following examples, let's use the 16 least significant bits for the object address, and the rest of the bits for the metadata. This gives us this pointer layout:
0xM..MAAAA : mmmm ... mmmm aaaa aaaa aaaa aaaa (M/m = metadata, A/a = address)
Using the values from the two pointers in the example above, we get the following after the split:
Pointer value: 0x13210 : 0001 0011 0010 0001 0000 Metadata bits: 0x1 : 0001 Address bits: 0x3210 : 0011 0010 0001 0000 Pointer value: 0x23210 : 0010 0011 0010 0001 0000 Metadata bits: 0x2 : 0010 Address bits: 0x3210 : 0011 0010 0001 0000
Here, the two pointers 0x13210 and 0x23210 refers to the same object, which is located at address 0x3210. The pointers contain the metadata 0x1 and 0x2, respectively.
We can't directly access the contents of the object at 0x3210 through the pointers 0x13210 or 0x23210. First we need to remove the metadata bits to get the correct address, and then we can access the pointed-to memory (dereference).
// Pointer with 16 address bits ptr_with_metadata = 0x13210; // Remove metadata bits to get the address 0x3210 AddressBitsMask = ((1 << 16) - 1); address = ptr_with_metadata & AddressBitsMask // Dereference and use the object at the given address use(*address)
This removal of the metadata bits adds CPU instructions to the generated code, and results in a slow-down of the application.
Now, let's see how multi-mapped memory can be used to get rid of the need to remove the metadata bits when dereferencing pointers
Metadata bits using multi-mapped memory
If we take our two example pointers and separate the address bits from the metadata bits like this:
Pointer value: 0x13210 : 0001 0011 0010 0001 0000 Address bits: 0x3210 : 0011 0010 0001 0000 Metadata bits: 0x10000 : 0001 0000 0000 0000 0000 Pointer value: 0x23210 : 0010 0011 0010 0001 0000 Address bits: 0x3210 : 0011 0010 0001 0000 Metadata bits: 0x20000 : 0010 0000 0000 0000 0000
Then map the same 16 bits (64 KB) of allocated memory starting at the address given by the metadata bits:
+-----------+ 0x10000 -----+ | Mapping 1 | \ | | +---> +------------------------+ | | / | | +-----------+ 0x20000 -----+ | 64 KB allocated memory | | Mapping 2 | | | | | +------------------------+ | | +-----------+ 0x30000
Then the address bits would act as an offset into the allocated memory. Both 0x13210 and 0x23210 would refer to the object located 0x3210 bytes into the allocated memory area.
+-----------+ 0x10000 | | | X | 0x13210 -----+ +----------------------+ | | \ | | +-----------+ 0x20000 +---> | X @ offset 0x3210 | | | / | | | X | 0x23210 -----+ +----------------------+ | | +-----------+ 0x30000
And now we have the sought after property that independent of what metadata bits we have in the pointers we will get to the same object.
Considerations
Restricts maximum addressable memory
The examples above show two possible metadata bits values, but it is possible to use more. We need one mapping per metadata bits value that the code uses. However, there are some limitations to how many metadata bits get access to when using multi-mapped memory as described. The limitation exists because the virtual memory address range usable by user processes are limited to less than 64 bits. On Linux x64 with a four-level page table, the user process can "only" access 128TB of memory. The 47 least significant bits is what's available. The rest of the bits must be 0. This means that an application using multi-mapped memory for metadata bits, must be able to fit both the metadata bits and the object offset bits within the available 47 bits. Every extra metadata bit used halves the available offset bits and in effect the max amount of memory that can be addressed.
Tools double-count the memory
Tools like ps
and top
reports to Resident Set Size (RSS) of a process. From man ps
:
rss RSS resident set size, the non-swapped physical memory that a task has used (in kilobytes) ...
Even though it says that it reports physical memory, the reported value includes all virtual memory areas that map to the same physical memory.
Requires support from OS and/or hardware
Normally when you allocate a large chunk of memory you request anonymous memory through system calls like mmap or VirtualAlloc. You either specify the virtual address range you'd like, or you ask for a size without specifying where the memory should be allocated. You get no "handle" to the physical memory that is going to "back" the returned virtual memory range.
When memory is multi-mapped, we want better control over the physical memory and the virtual memory. We first allocate physical memory, and then map multiple virtual memory address ranges to the physical memory. This means that we need to use other APIs compared to when anonymous memory is allocated. Some of these APIs have not always had support for all features; for example large pages. Other APIs are fairly new and are not available on older OS versions.
Some CPUs support a hardware solution for masking of metadata bits. See AArch64 Tagged Address ABI and Intel Linear Address Masking (LAM). One potential drawback of using this is that it is less flexible in the number of metadata bits that can be used.
TLB Pressure
Accessing the same memory through multiple virtual memory address could put more pressure on the TLB. Note that ZGC puts restrictions on what metadata values a pointer may have before being dereferenced, so it's usage of multi-mapped memory isn't hitting this problem.