Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Performance techniques used in the Hotspot JVM

Excerpt

What code shapes does the JVM optimize best? Here is a list.

Knowing these optimizations may help language implementors generate bytecodes that run faster. Basic information about bytecodes is in Chapter 7 of the JVM Spec..

...

  • The server compiler likes a loop with an int counter (int i = 0), a constant stride (i++), and loop-invariant limit (i <= n).
  • Loops over arrays work especially well when the compiler can relate the counter limit to the length of the array(s).
  • For long loops over arrays, the majority of iterations are free of individual range checks.
  • Loops are typically peeled by one iteration, to "shake out" tests which are loop invariant but execute only on a non-zero tripcount. Null checks are the key example.
  • If a loop contains a call, it is best if that call is inlined, so that loop can be optimized as a whole.
  • A loop can have multiple exits. Any deoptimization point counts as a loop exit.
  • If your loop has a rare exceptional condition, consider exiting to another (slower) loop when it happens.

Profiling

Profiling is performed at the bytecode level in the interpreter and tier one compiler. The compiler leans heavily on profile data to motivate optimistic optimizations.

  • Every null check site has a record of whether a null was ever seen.
  • Similar points can be made about other low-level checks.
  • Every call site with a receiver has a record of which types were encountered (up to 2-3 types).
  • There is also a type profile for every checkcast, instanceof, and aastore. (Helps with generics.)
  • Every call site and branch point has a record of execution counts.

...

  • Use a disassembler if available to inspect the generated code.
  • Switches are profiled but the profile information is poorly used. For now, consider building an initial decision tree if you know one or two cases are really common.
  • Exception throwing compiles to a goto, if the thrower and catcher inline together. For such uses, rely on preallocated or cloned exceptions, or override the fillInStackTrace part of exception creation, which is an expensive, reflective native call.
  • Do not use jsr/ret. Just clone your finally code if you have to.
  • If you are compiling a non-Java language, consider using standard mangling conventions.
  • If you are generating almost the same class many times in a row, with small variations, factor out the repeating parts into a superclass or static helper class.
  • For small variations in the remaining part, consider using a single named class as a template and loading it multiple times as an anonymous class with constant pool edits. Anonymous classes load and unload faster than named ones.

Presentations and Papers