- Loading...
Shenandoah is the low pause time garbage collector that reduces GC pause times by performing more garbage collection work concurrently with the running Java program. Shenandoah does the bulk of GC work concurrently, including the concurrent compaction, which means its pause times are no longer directly proportional to the size of the heap. Garbage collecting a 200 GB heap or a 2 GB heap should have the similar low pause behavior.
Shenandoah does not have any special needs from the OS, but the build might require fiddling with code.
Supported | Comment | |
---|---|---|
Linux | Primary target; continuously tested. | |
Windows | Secondary target; continuously tested. | |
macOS | Additional target; tested by community. | |
Solaris | Additional target; tested by community. | |
Others | The porting should be trivial, please try and contact Shenandoah devs with your success and failure reports. |
Shenandoah does need to implement some hardware-specific bits. Porting requires some assembly-level coding.
Supported | Comment | |
---|---|---|
x86_64 | Primary target; continuously tested. | |
x86_32 | Secondary target; continuously tested. | |
AArch64 | Primary target; continuously tested. | |
PPC64 | Secondary target; community tested. | |
RISCV64 | Secondary target; community tested, part of riscv-port | |
ARM32 | In (slow) development; help welcome. | |
S390X | Not supported; contributions welcome. | |
SPARC | Not supported. No hardware to test on. | |
Others | Please contact Shenandoah devs for guidance if you are willing to port Shenandoah to another platform. |
Shenandoah is in upstream OpenJDK 12+ (JEP 189), and later contributed to upstream OpenJDK 11u. Downstream backport to OpenJDK 8u is available as well. Shenandoah follows the "express" development model, where the features and bugfixes are continuously backported to previous supported JDK releases. Critical bugfixes are backported first, and are released as soon as possible. Non-critical bugfixes and features may appear in backports a bit later. The improvements in shared GC and runtime code might not be easily backportable, although some of them are backported as part of OpenJDK Updates project. Major GC improvements might ship even later.
Supported | Ready for Production | Role | Comment | |
---|---|---|---|---|
JDK 8 | Stable LTS | Available as non-mainline 8u backport. Check with your vendor for availability. See known vendors list below. | ||
JDK 9 | Discontinued, migrate to 11/17 as soon as possible. | |||
JDK 10 | Discontinued, migrate to 11/17 as soon as possible. | |||
JDK 11 | Stable LTS | In mainline OpenJDK 11u since 11.0.9. Requires opt-in during build time, check with your vendor for availability. See known vendors list below. | ||
JDK 12 | Discontinued, migrate to 17 as soon as possible. | |||
JDK 13 | Discontinued, migrate to 17 | |||
JDK 14 | Discontinued, migrate to 17. | |||
JDK 15 | Discontinued, migrate to 17. | |||
JDK 16 | Discontinued, migrate to 17. | |||
JDK 17 | Stable LTS | In mainline OpenJDK builds. | ||
JDK 18 | Dev/Test | In mainline OpenJDK builds. |
This means you don't have to select the very latest JDK release to have most of the fixes and conveniences, but later releases might be more up-to-date.
If you want to understand the gory details how the changes flow between the development repos and builds, look at this diagram.
Shenandoah availability differs by vendor and JDK release. OpenJDK 12+ builds normally include Shenandoah by default. OpenJDK 11 requires the opt-in during build time.
Known vendor status is:
Fedora 24+ OpenJDK 8+ builds include Shenandoah
Nightly Builds
There are (nightly/weekly) development builds available at these locations:
Linux/x86_64 nightly builds are also available as Docker images, e.g.:
# Update the image to the most recent one: $ docker pull shipilev/openjdk $ docker pull shipilev/openjdk:17 $ docker pull shipilev/openjdk:11 # Run the latest version: $ docker run --rm -it shipilev/openjdk java -XX:+UseShenandoahGC -Xlog:gc -version [0.007s][info][gc] Using Shenandoah ... # Run the JDK 17 version: $ docker run --rm -it shipilev/openjdk:17 java -XX:+UseShenandoahGC -Xlog:gc -version [0.007s][info][gc] Using Shenandoah ... # Run the JDK 11 version: $ docker run --rm -it shipilev/openjdk:11 java -XX:+UseShenandoahGC -Xlog:gc -version [0.008s][info][gc] Using Shenandoah ...
There are several ways to report bugs. Here is the checklist:
Shenandoah is the regionalized collector, it maintains the heap as the collection of regions.
The regular Shenandoah GC cycle looks like this:
GC(3) Pause Init Mark 0.771ms GC(3) Concurrent marking 76480M->77212M(102400M) 633.213ms GC(3) Pause Final Mark 1.821ms GC(3) Concurrent cleanup 77224M->66592M(102400M) 3.112ms GC(3) Concurrent evacuation 66592M->75640M(102400M) 405.312ms GC(3) Pause Init Update Refs 0.084ms GC(3) Concurrent update references 75700M->76424M(102400M) 354.341ms GC(3) Pause Final Update Refs 0.409ms GC(3) Concurrent cleanup 76244M->56620M(102400M) 12.242ms
The phases above do roughly this:
Heap sizes: Shenandoah performance, like the performance of almost all other GCs, depends on heap size. It should perform better in cases when there is enough heap space to accommodate allocations while concurrent phases are running (see "Failure Modes" section below). The time for concurrent phases correlates with the live data set size (LDS) -- the space taken by live data. Therefore, the reasonable heap size is dependent on LDS and allocation pressure in the workload: for a given allocation rate, larger LDS-es require proportionally larger heap sizes; for a given LDS, larger allocation rates require larger heap sizes. For some workloads with minuscule live data sets and moderate allocation pressure, 1...2 GB heaps performs well. We routinely test on 4...128 GB heaps on various workloads with up to 80% LDS size. Don't be shy to try different heap sizes to see what fits your workload.
Pauses: Shenandoah's pause behavior is largely dominated by root set operations: scanning and updating the roots. Root set includes: local variables, references embedded in generated code, interned Strings, references from classloaders (e.g. static final references), JNI references, JVMTI references. Having larger root set generally means longer pauses with Shenandoah, unless concrete JDK version has the capabilities for doing parts of that work concurrently and Shenandoah is able to use it. Second-order effects are: a) weak reference processing (which happens in Final Mark pause), but only for those references that need processing; and b) class unloading and other JDK cleanups (which also happens in Final Mark pause). These second-order effects can be mitigated by configuring additional options that control processing frequency (including disabling it altogether) and/or modifying the applications to play a bit nicer.
Throughput: Since Shenandoah is concurrent GC, it employs barriers to maintain invariants during the collection cycle. Those barriers might induce the measurable throughput loss. See the diagnostic section below for the ways to dissect what is happening there. Some users report that the throughput loss due to barriers is paid off with naturally offloading concurrent GC work to spare and otherwise idle cores; in other words, in some cases it trades higher application+JVM utilization for higher application throughput.
In most cases, the pause times are within 0..10ms and throughput losses are within 0..15%. The actual performance numbers depend heavily on the actual application, load profile, etc. With applications that do not have a lot of roots, weak reference and/or class churn, the pauses can be in sub-millisecond range. With applications that do not mutate heap as much, or are well optimized by current compilers, the barrier overhead can be near zero. The rest of the section describes the approaches to test and diagnose performance behaviors with Shenandoah. If you suspect something is off on your concrete use case, consider letting developers know about them. Chances are, that is a manageable issue or a straight-away bug.
Basic configuration and command line options:
It is almost always a good idea to run with logging enabled. This summary table conveys important information about GC performance, and we would almost inevitably ask for one in a performance bug report. Heuristics logs are useful to figure out GC outliers.
Other recommended JVM options are:
Modes define the major way Shenandoah runs. This defines what barriers, if any, Shenandoah is using, and defines the major performance characteristics. Mode can be selected with -XX:ShenandoahGCMode=<name>. Available modes are:
After mode is selected, heuristics tell when Shenandoah starts the GC cycle, and regions it deems for evacuation. Heuristics can be selected with -XX:ShenandoahGCHeuristics=<name>. Some heuristics accept configuration parameters, which might help to tailor the GC operation to your use case better. Available heuristics include:
-XX:ShenandoahInitFreeThreshold=#: Initial threshold at which to trigger "learning" collections
-XX:ShenandoahMinFreeThreshold=#: free space threshold at which heuristics triggers the GC unconditionally
-XX:ShenandoahAllocSpikeFactor=#: How much heap to reserve for absorbing allocation spikes
-XX:ShenandoahGarbageThreshold=#: Sets the percentage of garbage a region need to contain before it can be marked for collection.
-XX:ShenandoahMinFreeThreshold=#: Set the percentage of free heap at which a GC cycle is started
-XX:ShenandoahGarbageThreshold=#: Sets the percentage of garbage a region need to contain before it can be marked for collection
compact (previously erroneously known as continuous). This heuristics runs GC cycles continuously, starting the next cycle as soon as previous cycle finishes, as long as allocations happen. This heuristics would normally incur throughput overheads, but shall provide the most prompt space reclamation. Useful tuning knobs are:
-XX:ConcGCThreads=#: Trim down the number of concurrent GC threads to make more room for application to run
-XX:ShenandoahAllocationThreshold=#: Set percentage of memory allocated since last GC cycle before starting another one
Concurrent GC like Shenandoah implicitly relies on collecting faster than application allocates. If allocation pressure is high, and there is not enough space to absorb the allocations while GC is running, Allocation Failure would eventually happen. Shenandoah has a graceful degradation ladder that helps to survive the cases like these. The ladder consists of:
In addition to usual GC log that would print individual Degenerated GC and Full GC events, -Xlog:gc+stats would show something like this at the end of the run:
Under allocation pressure, concurrent cycles may cancel, and either continue cycle under stop-the-world pause or result in stop-the-world Full GC. Increase heap size, tune GC heuristics, set more aggressive pacing delay, or lower allocation rate to avoid Degenerated and Full GC cycles. 4912 successful concurrent GCs 0 invoked explicitly 3 Degenerated GCs 3 caused by allocation failure 3 happened at Update Refs 0 upgraded to Full GC 0 Full GCs 0 invoked explicitly 0 caused by allocation failure 0 upgraded from Degenerated GC ALLOCATION PACING: Max pacing delay is set for 10 ms. Higher delay would prevent application outpacing the GC, but it will hide the GC latencies from the STW pause times. Pacing affects the individual threads, and so it would also be invisible to the usual profiling tools, but would add up to end-to-end application latency. Raise max pacing delay with care. Actual pacing delays histogram: From - To Count 1 ms - 2 ms: 87 2 ms - 4 ms: 142 4 ms - 8 ms: 297 8 ms - 16 ms: 1733 16 ms - 32 ms: 21 32 ms - 64 ms: 1
From this, there are a few things to try if application runs into either of these degradation steps:
Approaches to performance analysis:
Many throughput differences can be explained by GC barriers overhead. When running with -XX:ShenandoahGCHeuristics=passive, and that heuristics only, barriers are not required for correctness, and so heuristics disables them. It is then possible to enable the barriers selectively back, and see what barriers are affecting throughput performance. The list of barriers that "passive" heuristics is disabling is listed in GC output, like this:
$ java -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=passive -Xlog:gc [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahSATBBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahKeepAliveBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahWriteBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahReadBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahStoreValReadBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahCASBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahAcmpBarrier by default [0.002s][info][gc] Passive heuristics implies -XX:-ShenandoahCloneBarrier by default [0.003s][info][gc] Using Shenandoah
--with-native-debug-symbols=internal
, this will get you the mapping to C++ codeperf record java ...
(plain profile) or perf record -g java ...
(call tree profile)perf report
"a"
on the method usually gives a more detailed disassembly for itIt is important to understand that GC pauses might not be the only significant contributor to response times in regular applications. Having large GC pause spells the problem with response time with a very high probability, but the absence of long GC pauses does not always mean decent response time. Queueing delays, network latencies, other services latencies, OS scheduler jitter, etc. could be the contributing cost. Running Shenandoah with response time measurement is recommended to get the full picture of what is going on in the system, which can then be used to correlate with GC pause time statistics.
For example, this is a sample report with jHiccup on one of the workloads:
This section describes the ways one can diagnose and/or debug Shenandoah.
These are the steps you can do to narrow the problem area:
General debugging techniques apply to Shenandoah:
This would guarantee you run the latest and greatest version. Some features and bugfixes may not be available in older JDK versions. Older JDK versions are supposed to be more stable.
Adding --enable-debug
to ./configure
would produce the "fastdebug" build that has more diagnostics.
You might find downloading the workspaces takes too long, especially for jdk10+ workspaces. In such case, you can download the workspace tarball from here: https://builds.shipilev.net/workspaces/
# JDK master: $ hg clone http://hg.openjdk.java.net/jdk/jdk shenandoah # JDK 11u: $ hg clone http://hg.openjdk.java.net/jdk-updates/jdk11u shenandoah # JDK 8u: $ hg clone http://hg.openjdk.java.net/shenandoah/jdk8 shenandoah $ cd shenandoah/ # Configure and build, JDK 11+: $ sh ./configure $ make images # Configure and build, JDK 8: $ sh ./get_source.sh $ sh ./configure $ make images # Run! JDK 11+: $ build/linux-x86_64-normal-server-release/images/jdk/bin/java -XX:+UseShenandoahGC -Xlog:gc [...][info][gc] Using Shenandoah # Run! JDK 8: $ build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java -XX:+UseShenandoahGC -version openjdk version "1.8.0-internal" OpenJDK Runtime Environment (build 1.8.0-internal-shade_2016_12_19_15_52-b00) OpenJDK 64-Bit Server VM (build 25.71-b00, mixed mode)
Note: OpenJDK is normally compiled with all warnings treated as errors. Newer compilers may have more warnings that codebase had not yet caught up with. You can pass --disable-warnings-as-errors
to ./configure
in those cases.
In all cases for building from source it is optional, but advisable to run the tests. This is especially important on platforms beyond what Shenandoah currently targets, and/or building with too new or too old toolchains. You will need jtreg to run the tests, and it makes sense to run test against fastdebug build first:
# Download and unpack jtreg from https://ci.adoptopenjdk.net/view/Dependencies/job/jtreg/ # Hook up jtreg to the build: $ sh ./configure --with-jtreg=<jtreg folder> --with-debug-level=fastdebug $ sh ./configure --with-jtreg=<jtreg folder> --with-debug-level=release # Run the tests: $ CONF=linux-x86_64-normal-server-fastdebug make images run-test TEST="tier3_gc_shenandoah" $ CONF=linux-x86_64-normal-server-release make images run-test TEST="tier3_gc_shenandoah"