Tail call support in Davinci

Tail call optimization guarantees that a series of tail calls executes in bounded stack space. No StackOverflow exception happens.

After applying the tail call patches Hotspot supports tail call optimization for all method invocations that are marked as tail call. The optimization is performed by the VM if the invocation is marked as a 'tail call' and the necessary conditions for tail calls are met. That is the VM supports hard tail calls. If the VM can not guarantee that the innvocation will be optimized it throws an exception. Note that the VM will not recognize normal calls that could be tail call optimized and perform the optimization on them.

To enable tail call optimization the VM is started with the flag -XX:+TailCalls.

java -XX:+TailCalls TailCallExample

Tail call optimization is implemented by replacing the caller's frame with the callee's frame if the java security mechanism allows it. If removing the caller's frame is not possible, the VM proceeds in two possible ways depending on whether the 'TailCallStackCompressions' flag is set or not.

TailCallException is thrown. If the VM detects that the security information between caller and tail called callee differs it throws a TailCallException. The security information differs if the class holding the caller method has a different protection domain than the class holding the callee. This behaviour is enabled by default e.g the flag -XX:+TailCallsStackCompression is not set.

Tail call is defeated. The stack frame of the caller is left on the stack. If the recursion depth is deep enought the stack becomes full. At this point, just before throwing a StackOverflow exception the VM tries to 'compress the stack' by removing superfluous stack frames. To enable this behaviour the flag -XX:+TailCallsStackCompression has to be set.

Marking a call as a 'tail call'

To mark a method call as tail call the invocation bytecode instruction has to be prefixed with a 'wide' (196) bytecode.

public static int tailcaller(int);
  Code:
   0:	iload_0
   1:	ifne	8
   4:	iload_0
   5:	iconst_1
   6:	iadd
   7:	ireturn
   8:	iload_0
   9:	iconst_1
   10:	isub
   11:	wide
   12:	invokestatic	#2; //Method tailcaller:(I)I
   15:	ireturn

}

After applying the langtools-tailcall-goto.patch (to the langtools directory) we can mark calls as tail calls using the 'goto' prefix before a method invocation in java. Above bytecode example is generated by javac from following java code.

public class Simple {
  public static int tailcaller(int x) {
    if (x==0) return x+1;

    return goto tailcaller(x-1);
  }
}

Generating code

There are two bytecode verifiers in the jdk. The old type-inference verifier and the new type-checking verifier. The type-checking verifier is used if the class file format major version is >= 50. To use this verifier, a valid stackmaptable attribute has to be emitted. Tail calls are currently only supported if the new verifier is used. Hence, to use tail calls the generated class file must have a version number >= 50.0 and must contain a valid stackmaptable attribute.

The default behaviour of the jvm is to fall back to the old verifier if the new verifier fails (e.g because the stackmaptable attribute is invalid). The jvm has an option to disable this behaviour.

java -XX:-FailOverToOldVerifier -XX:+TailCalls GeneratedClass

Implementation

Some implementation details about the prototype can be found in Arnold Schwaighofer's thesis:

http://www.ssw.uni-linz.ac.at/Research/Papers/Schwaighofer09Master/

There currently exist two branches

tailc-eager.patch: Compiled code (by server, client compiler) moves the callee's arguments to their actual position at the call site as part of the argument lowering.
tailc-lazy.patch: Compiled code moves the callee's arguments to their actual position at the callee's method entry. Hence arguments are moved twice. First to the outgoing argument of the caller area at the call site. Later the tail call method entry code moves the arguments to the caller's incoming argument area.

Stack walking and frame elision

A successful tail call removes the frame of the caller C1 and replaces it with the callee frame C2. This process can repeat an arbitrary number of times, with each successive frame C(k) performing recursive calls before it tail-calls the next frame C(k+1). Unless the thread loops infinitely, the sequence of tail-called frames ends eventually when one callee C(Max) returns to its caller P0 instead of performing a tail call. The frame P0 returned to may be called the parent, and is second-youngest frame on the stack at the point of every tail call in the sequence.

If some C(k) is interrupted and the stack is inspected in the debugger, the earlier frames C(1)..C(k-1) will not be visible unless the system has taken special care to record their historical presence for the benefit of the debugger. It might be reasonable to define debugging parameters which request the JVM to record (at extra cost) the presence of some of those historical frames.

Thread stacks can also be viewed with the API call Throwable.getStackTrace. The same considerations apply as for debugger display of frame, except that there must be a default setting in production mode (with no debugging) which is generally agreed upon. Recording no historic frames at all is almost certainly the correct default setting. If historic frames are recorded at all in production modes, then they would probably require special presentation in the stack trace. It is unlikely that historic frames will be retained or displayed in any initial version of tail call support.

A very specialized version of stack walking is performed for a limited number of system methods like Class.forName, which are sensitive to their immediate caller, and behave as if that caller (as a Class object) were passed as an extra invisible parameter. In recent versions of the JDK, these methods are clearly marked using the internal annotation @CallerSensitive, and can be reliably processed as special cases. An attempt to link to one of these methods from a tail call site should fail with a linkage error.

The API call AccessController.getContext also gives a view derived from the control stack. This view cannot omit the effects of historical frames, if the presence of those frames would affect access control checks. The information content of an access control context is a set of protection domains, each one derived from the class loader of a method in a pending frame. (There are other details, such as user-supplied protection domains and stack boundaries, but they are irrelevant to this discussion.) The set of protection domains is unordered and indifferent to repetitions. (That is, it is really a set.) The most important property of this set is that, if a stack frame is contributing to the current computation, the protection domain from that stack frame must be present in the set. Crucially, this must be true even of the historical frames C(1)..C(k-1).

Suppose the protection domains for the various frames C(1)..C(k) are PD(1)..PD(k). Each new call to C(k+1) must be inspected to see if it contributes a new protection domain PD(k+1) to the existing set {PD(1), ... PD(k)}. One simple way to do this is to detect when PD(k) and PD(k+1) differ; if they do not differ, then normal stack walking will detect the protection domain even after C(k) has been removed from the stack. Other optimizations are possible, with the normal application of "fast path" and "slow path" techniques. The collected set must be maintained somewhere. A likely place would be an extra hidden frame between the parent frame P0 and the first tail-called child C(2).

Further discussion of the problem of access control may be found in a paper by Clements and Felleisen, linked here: http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-July/005417.html

Child pages

TailCalls

Tail call support in Davinci

Marking a call as a 'tail call'

Generating code

Implementation

Stack walking and frame elision