Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Info
iconfalse

The Z Garbage Collector, also known as ZGC, is a scalable low latency garbage collector designed to meet the following goals:

  • Pause times do not exceed 10ms
  • Pause times do not increase with the heap or live-set size
  • Handle heaps ranging from a few hundred megabytes to multi terabytes in size

At a glance, ZGC is:

  • Concurrent
  • Region-based
  • Compacting
  • NUMA-aware
  • Using colored pointers
  • Using load barriers

At its core, ZGC is a concurrent garbage collector, meaning all heavy lifting work is done while Java threads continue to execute. This greatly limits the impact garbage collection will have on your application's response time.

This OpenJDK project is sponsored by the HotSpot Group.

Tip
titleDownload

Linux/x64

Info
titleSource Code

http://hg.openjdk.java.net/jdk/jdk

Info
titleTalks

Jfokus 2018 - Slides | Video
FOSDEM 2018 - Slides

Info
titleMailing List

Subscribe | Archive

Info
titleProject

JEP | Members

 

 

 

 

 

 

 

Warning
titleNote

ZGC is under active development, which means that information and advice given here might change in the future.

 

Table of Contents

Supported Platforms

ZGC is currently only available on Linux/x64. Support for other platforms might be added in the future, if there is enough demand.

Download Early Access Build

Early Access builds are available for Linux/x64. These builds are updated on a regular basis.

Download and Build from Source

The source code can be found in the JDK repository. Just clone, configure (and make sure you supply the configure option --with-jvm-features=zgc) and make.

Code Block
$ hg clone http://hg.openjdk.java.net/jdk/jdk
$ cd zgc
$ sh configure --with-jvm-features=zgc
$ make images

This will build a complete JDK for you, with support for ZGC enabled. On Linux, the root directory of the new JDK will be found here:

Code Block
./build/linux-x86_64-normal-server-release/images/jdk

And the Java launcher will be found in its usual place, here:

Code Block
./build/linux-x86_64-normal-server-release/images/jdk/bin/java

Quick Start

If you're trying out ZGC for the first time, start by using the following GC options:

Code Block
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx<size> -Xlog:gc

For more logging and basic tuning to improve throughput and latencydetailed logging, use the following options:

Code Block
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx<size> -Xms<size> -XX:+UseLargePages -XX:ConcGCThreads=<threads> -Xlog:gc*

See below for more information on these and additional options.

JVM Options

Enabling ZGC

Use the -XX:+UnlockExperimentalVMOptions -XX:+UseZGC options to enable ZGC.

Setting Heap Size

In general, ZGC works best when there is enough heap headroom to keep up with the allocation rate without having to run back to back GCs. With too little headroom the GC will simply not be able to keep up and Java threads will be stalled to allow the GC to catch up. Stalling of Java threads should basically be avoided at all cost by increasing the heap size (or as a secondary option, increase the number of concurrent GC threads, see below).

For optimal performance, setting the initial heap size (-Xms) equal to the maximum heap size (-Xmx) is generally recommended.

Setting Parallel/Concurrent GC Threads

ZGC uses both -XX:ParallelGCThreads=<threads> and -XX:ConcGCThreads=<threads> to determine how many worker threads to use during different GC phases. If they are not set, ZGC will try to select an appropriate number. However, please note that the optimal number of threads to use heavily depends on the characteristics of the workload you're running, which means that you almost always want to explicitly specify these to get optimal throughput and latency. We hope to be able to remove this recommendation at some point in the future, when ZGC's heuristics for this becomes good enough, but for now it's recommended that you try different settings and pick the best one.

ParallelGCThreads sets the level of parallelism used during pauses and hence directly affects the pause times. Generally speaking, the more threads the better, as long as you don't over provision the machine (i.e. use more threads than cores/hw-threads) or the application root set is so small that is can easily be handled by just a few threads.

The following GC phases are affected by ParallelGCThreads:

  • Pause Mark Start - Number of threads used for marking roots.
  • Pause Mark End - Number of threads used for weak root processing (StringTable, JNI Weak Handles, etc.).
  • Pause Relocate Start - Number of threads used for relocating roots.

The most important tuning option for ZGC is setting the max heap size (-Xmx<size>). Since ZGC is a concurrent collector a max heap size must be selected such that, 1) the heap can accommodate the live-set of your application, and 2) there is enough headroom in the heap to allow allocations to be serviced while the GC is running. How much headroom is needed very much depends on the allocation rate and the live-set size of the application. In general, the more memory you give to ZGC the better. But at the same time, wasting memory is undesirable, so it’s all about finding a balance between memory usage and how often the GC needs to run.

Setting Concurrent GC Threads

The second tuning option one might want to look at is setting the number of concurrent GC threads (-XX:ConcGCThreads=<number>). ZGC has heuristics to automatically select this number. This heuristic usually works well but depending on the characteristics of the application this might need to be adjusted. This option essentially dictates how much CPU-time the GC should be given. Give it too much and the GC will steal too much CPU-time from the application. Give it too little, and the application might allocate garbage faster than the GC can collect itConcGCThreads sets the level of parallelism used during concurrent phases. The number of threads to use during these phases is a balance between allowing the GC to make progress and not stealing too much CPU time from the application. Generally speaking, if there are unused CPUs/cores in the system, always allow concurrent threads to use them. If the application is already using all CPUs/cores, then the machine is essentially already over-provisioned and you have to allow for a throughput reduction by either letting concurrent GC threads steal/compete for CPU time, or by actively reducing the application CPU footprint.

NOTE! In general, if low latency (i.e. low application response time) is important to for you application, then never over-provision your system. Ideally, your system should never have more than 70% CPU utilization.

The following GC phases are affected by ConcGCThreads:

  • Concurrent Mark - Number of threads used for concurrent marking.
  • Concurrent Reference Processing - Number of threads used for concurrent reference processing (i.e. handling Soft/Weak/Final/PhantomReference objects).
  • Concurrent Relocate - Number of threads used for concurrent relocation

    .

    Example:

    When running SPECjbb2015, on a two socket Intel Xeon E5-2690 machine, which a total of 2 x 8 = 16 cores (with hyper-threading, 2 x 16 = 32 HW-threads) using a 128G heap, the following options typically results in optimal throughput and latency:

    Code Block-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms128G -Xmx128G -XX:+UseLargePages -XX:ParallelGCThreads=20 -XX:ConcGCThreads=4

    Enabling Large Pages

    Configuring ZGC to use large pages will generally yield better performance (in terms of throughput, latency and start up time) and comes with no real disadvantage, except that it's slightly more complicated to setup. The setup process typically requires root privileges, which is why it's not enabled by default.

    Large pages are also known as "huge pages" on Linux/x86 and have a size of 2MB.

    Let's assume you want a 16G Java heap. That means you need 16G / 2M = 8192 huge pages.

    First assign at least 16G (8192 pages) of memory to the pool of huge pages. The "at least" part is important, since enabling the use of large pages in the JVM means that not only the GC will try to use these for the Java heap, but also that other parts of the JVM will try to use them for various internal data structures (code heap, marking bitmaps, etc). In this example we will therefore reserve 9216 pages (18G) to allow for 2G of non-Java heap allocations to use large pages.

    Configure the system's huge page pool to have the required number pages (requires root privileges):

    Code Block
    $ echo 9216 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

    Note that the above command is not guaranteed to be successful if the kernel can not find enough free huge pages to satisfy the request. Also note that it might take some time for the kernel to process the request. Before proceeding, check the number of huge pages assigned to the pool to make sure the request was successful and has completed.

    Code Block
    $ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
    9216 

    NOTE! If you're using a Linux kernel >= 4.14, then the next step (where you mount a hugetlbfs filesystem) can be skipped. However, if you're using an older kernel then ZGC needs to access large pages through a hugetlbfs filesystem.

    Mount a hugetlbfs filesystem (requires root privileges) and make it accessible to the user running the JVM (in this example we're assuming this user has 123 as its uid).

    Code Block
    $ mkdir /hugepages
    $ mount -t hugetlbfs -o uid=123 nodev /hugepages 

    Now start the JVM using the -XX:+UseLargePages option.

    Code Block
    $ java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms16G -Xmx16G -XX:+UseLargePages ...

    If there are more than one accessible hugetlbfs filesystem available, then (and only then) do you also have to use -XX:ZPath to specify the path to the filesystems you want to use. For example, assume there are multiple accessible hugetlbfs filesystems mounted, but the filesystem you specifically want to use it mounted on /hugepages, then use the following options.

    Code Block
    $ java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms16G -Xmx16G -XX:+UseLargePages -XX:ZPath=/hugepages ...

    NOTE! The configuration of the huge page pool and the mounting of the hugetlbfs file system is not persistent across reboots, unless adequate measures are taken.

    Enabling Transparent Huge Pages

    An alternative to using explicit large pages (as described above) is to use transparent huge pages. Use of transparent huge pages is usually not recommended for latency sensitive applications, because it tends to cause unwanted latency spikes. However, it might be worth experimenting with to see if/how your workload is affected by it. But be aware, your mileage may vary.

    Note that using ZGC with transparent huge pages enabled requires Linux kernel >= 4.7.

    Use the following options to enable transparent huge pages in the VM:

    Code Block
    -XX:+UseLargePages -XX:+UseTransparentHugePages

    These options tell the JVM to issue madvise(..., MADV_HUGEPAGE) calls for memory it mapps, which is useful when using transparent huge pages in madvise mode.

    To enable transparent huge pages you also need to configure the kernel, by enabling the madvise mode.

    Code Block
    $ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

    and

    Code Block
    $ echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled

    See the kernel documentation for more information.

    Enabling NUMA Support

    ZGC has basic NUMA support, which means it will try it's best to direct Java heap allocations to NUMA-local memory. This feature is enabled by default. However, it will automatically be disabled if the JVM detects that it's bound to a sub-set of the CPUs in the system. In general, you don't need to worry about this setting, but if you want to explicitly override the JVM's decision you can do so by using the -XX:+UseNUMA or -XX:-UseNUMA options.

    When running on a NUMA machine (e.g. a multi-socket x86 machine), having NUMA support enabled will often give a noticeable performance boost.

    Enabling GC Logging

    GC logging is enabled using the following command-line option:

    Code Block
    -Xlog:<tag set>,[<tag set>, ...]:<log file>

    For general information/help on this option:

    Code Block
    -Xlog:help

    To enable basic logging (one line of output per GC):

    Code Block
    -Xlog:gc:gc.log

    To enable GC logging that is useful for tuning/performance analysis:

    Code Block
    -Xlog:gc*:gc.log

    Where gc* means log all tag combinations that contain the gc tag, and :gc.log means write the log to a file named gc.log.