Direct method handles
A direct method handle represents a particular named method, field, or constructor, with no transformations.
Direct method handles are obtained in one of two ways, from the constant pool (via a CONSTANT_MethodHandle
CP entry), or from a corresponding factory method on MethodHandles.Lookup
. Every kind of DMH can be obtained either way.
As defined by the JVMS, a CONSTANT_MethodHandle
CP entry can refer to a CONSTANT_Methodref
as if it were to be invoked via an instruction of type invokestatic
, invokevirtual
, invokeinterface
, or invokespecial
.
The invokespecial
mode cannot refer to a constructor, because this would be an unsafe use of that constructor. However, a special kind of CONSTANT_MethodHandle
entry can refer to a new
instruction followed by a dup
and an invokespecial
of the constructor.
A CONSTANT_MethodHandle
CP entry can also refer to a CONSTANT_Fieldref
as if it were to be invoked via an instruction of type getfield
, getstatic
, putfield
, or putstatic
.
Thus there are nine different "reference kinds" that a CONSTANT_MethodHandle
constant may have, and these are the nine different kinds of direct method handle. Each kind of method handle encapsulates an access-checked, resolved symbolic reference to a method or a field. The capabilities and semantics of the method handle are identical to those of the corresponding JVM instructions, as applied to the underlying symbolic reference.
The class MethodHandles.Lookup
provides a reflective API for constructing the same kinds of direct method handles on the fly, based on programmatically supplied symbolic reference data. Each lookup operation for a symbolic reference requires four parameters:
- the class containing the member, as a
Class
mirror - the name of the member, as a
String
- the type signature of the member, as a
MethodType
(or aClass
if a field) - the scope from which the lookup is performed, as a
Lookup
object
The last argument provides the needed starting point for the lookup, and is used to verify that the caller has permission to access the desired member.
For example, the public method String.valueOf(int)
may be accessed from any lookup object using String.class
as a scope, "valueOf"
as a name, and MethodType.methodType(String.class,int.class)
as a type signature. (The first component of the method type is the return type, and subsequent types are arguments, omitting the "self" type.)
import static java.lang.invoke.MethodHandles.*; import static java.lang.invoke.MethodType.*; MethodType MT_valueOf = methodType(String.class, int.class); MethodHandle MH_valueOf = lookup().findStatic(String.class, "valueOf", MT_valueOf); // CP equiv = MethodHandle:(ref_kind:6, Methodref:(Class:String, // NameandType:(UTF8:"valueOf", UTF8:"(I)Ljava/lang/String;"))) // behavioral equivalence to a CONSTANT_Methodref: String testval_from_CP = /*invokestatic*/ String.valueOf(42); String testval_from_MH = (String) MH_valueOf.invokeExact(42); assert(testval_from_CP.equals(testval_from_MH));
The private field String.hashCode
may be accessed from a lookup object with private capability on String
using String.class
as a scope, "hashCode"
as a name, and int.class
as a field type. This lookup will fail with a linkage error unless the lookup object is truly derived (in a trustable manner) from the String
class.
The last time anyone looked, the String
class didn't have code to create any such lookup object. More realistically, if a class wants to access one of its own private fields via a direct method handle, it can create a lookup object on itself (this is always allowed) and then introspect into itself to obtain a method handle on that field, either for getting or setting. It would be up to that class to take care not to let this method handle escape to untrusted code.
Structure of a direct method handle
Every direct method handle is an instance of the non-public class DirectMethodHandle
or a subclass. Static fields are represented by DirectMethodHandle.StaticAccessor
, other fields are represented by DirectMethodHandle.Accessor
, constructor references are represented by DirectMethodHandle.Constructor
, references via invokespecial
are represented by DirectMethodHandle.Special
, and references to other methods are represented by DirectMethodHandle
itself.
A DMH is an immutable object, in that all of its fields are final.
The private field DirectMethodHandle.member
contains a constant MemberName
, which is a low-level, unchecked symbolic reference. When a DMH is created it is given a MemberName
, which must already have been resolved.
This means that the JVM has checked the reference and filled in any information needed to execute against one or more times. The information is very much like a constant pool cache entry, and consists of a JVM-level offset and/or a Klass pointer and/or a Method pointer.
A DMH for a field contains a MemberName
which has been resolved to an offset, and that offset is used with a method like Unsafe.getByte
or Unsafe.setInt
to get or set the field value. If the field is static, the DMH contains a staticBase
parameter which will be handed to the unsafe data access; otherwise, the DMH uses the leading parameter (the nominal receiver object this
) as the base address.
A DMH for a constructor contains both an instanceClass
field (which is handed to Unsafe.allocateInstance
) and a initMethod
field (which is the subject of an unchecked invokespecial
operation). The code for a constructor creates a blank instance, marshals the incoming arguments, calls the init method (a JVM-level constructor method), and returns the filled-in instance.
A DMH for a regular method simply pushes the required arguments, and performs a suitable invocation (static, virtual, interface, or special) of the method referred to by the member
field.
In common with all method handles, a DMH also has constant type
and form
fields, specifying the MethodType
and the LambdaForm
behavior.
Lambda form for a method call
The behavioral differences alluded to above are all controlled by the lambda form of the DMH. For every one of the nine distinct reference kinds, and for each possible method or field type, there is a lambda form of the required type and behavior.
These lambda forms are created on the fly and cached. The cache is keyed by lambda form kind and method or field type. They could also be precomputed, at least in many cases. As in other parts of the system, to reduce combinatorial explosion, the cache is keyed by basic type, in which all reference types are erased to Object
.
(There are a few other stray bits used to key the cache, such as whether a field is volatile or whether a static reference is to a class that has not yet been initialized. These stray bits could be represented by conditional code in the DMH behavior, except that it seems more efficient to customize.)
The resulting lambda form code would be very type-unsafe, except that all reference values of uncertain type are explicitly check-casted. Such casts are relatively rare, since the DMH "knows" that it can only be invoked on values consistent with its advertised method-type. One place where a cast is needed is
Here is an example (taken from an actual application) of the lambda form of a DMH which performs an invokestatic
call on two references and a double:
LambdaForm(a0:L,a1:L,a2:L,a3:D)=>{ t4:L=DirectMethodHandle.internalMemberName(a0:L); t5:I=MethodHandle.linkToStatic(a1:L,a2:L,a3:D,t4:L); t5:I}
The leading argument a0
is the DMH itself. Its member-name field is extracted into t4
and then the DMH variable a0
goes dead. The remaining arguments feed the three parameters of the static method.
The static method is invoked via a low-level, unchecked, privileged primitive called linkToStatic
. The three arguments are stacked first, in the positions expected by the static method, and then the member-name t4
is appended to the argument list.
The linkToStatic
method is a native function which pops off the trailing MemberName
argument, consults the VM-level linkage information inside (which is of type Method*
in this case), and jumps to the indicated target method.
When the target method returns, the return value (which must be an int) is captured in t5
and then returned to the invoker of the DMH.
Two strange things happen here. First, the linkToStatic
method performs a tail call to its target method. While the target method executes, there may be a visible stack frame for the DMH invocation, but there will never be a stack frame for the linkToStatic
method. Second, before doing the tail call, linkToStatic
pops its last argument, so that the target method will not have to deal with it.
There one other strange thing about this whole scenario: The target method is never named. It is simply jumped to, indirectly, using the Method*
pointer stuffed by the JVM into the MemberName
, when it was resolved. Because of this, the lambda form above works equally well with (and can be shared by) any static method that takes two references and a double, and returns an int.
Edge cases for method calls
As an edge case, if the static method is in a class that is not yet initialized, a specialized lambda form is used which tweaks the class before jumping into the method, so as to be sure the class is initialized first. This case is handled by a call to DirectMethodHandle.ensureInitialized
from a variation of internalMemberName
which makes the extra checks:
LambdaForm(a0:L,a1:L,a2:L)=>{ t3:L=DirectMethodHandle.internalMemberNameEnsureInit(a0:L); t4:I=MethodHandle.linkToStatic(a1:L,a2:L,t3:L); t4:I}
The code-patching hack performed by the compilers is emulated here by patching the DMH's lambda form field after the class is safely initialized, to a version that no longer calls internalMemberNameEnsureInit
.
If the DMH were for an invokespecial
instruction, or a devirtualized invokevirtual
, the lambda form would be precisely identical to the example given above, except that the final call would be to linkToSpecial
instead of linkToStatic
. The only difference between the two "linker" routines is that linkToSpecial
performs an extra null check on the leading parameter (which must of course be a reference).
Similarly, if the DMH were for an invokevirtual
or invokeinterface
instruction, the "linker" routine would be linkToVirtual
or linkToInterface
, respectively. The structure of the LF would be the same.
The linkToVirtual
method is slightly more complicated than linkToSpecial
. Instead of a Method*
, it pulls a vtable index out of the MemberName
argument. It then null-checks the first argument (a reference, again), pulls out its klass, and indexes into the vtable to find a Method*
. After this, the sequence of events is identical to linkToSpecial
or linkToStatic
.
The linkToInterface
method performs a similar lookup, but starts by pulling both a Klass*
and an itable index out of the MemberName
. Both linkToVirtual
and linkToInterface
are closely similar to the vtable and itable stubs defined in vtableStubs_$arch.cpp
.
One might ask why the laborious distinction between invocation modes is made in the first place. The answer is simple: They use different low-level code sequences, which must be distinguished one way or another. There would be no benefit to putting a dynamic mode-switch into every DMH invocation.
Constructor calls
Here is an actual example of a lambda form for a constructor DMH:
LambdaForm(a0:L,a1:L,a2:L,a3:I)=>{ t4:L=DirectMethodHandle.allocateInstance(a0:L); t5:L=DirectMethodHandle.constructorMethod(a0:L); t6:V=MethodHandle.linkToSpecial(t4:L,a1:L,a2:L,a3:I,t5:L); t4:L}
Again, after the DMH argument a0
, there are two reference arguments and an int. The code is simple. First a blank instance is allocated, using a helper method DirectMethodHandle.allocateInstance
which calls Unsafe.allocateInstance
on the instanceClass
field value of the DMH. The blank instance is saved in t4
.
Next, the member-name of the constructor method is dug out of the DMH and put in t5
. (As noted above, this is kept handy in the initMethod
field of Constructor
.) At this point the DMH variable a0
goes dead.
Next, the constructor is called, using linkToSpecial
, just as if an invokespecial
instruction were executed against the named constructor. (But as noted above, it is not actually named, just indirectly loaded as a Method*
from the member-name in t5
.)
There is no result value from the invocation, and we formally bind the result of t6
, of "type" void. The value t4
now contains a fully constructed instance of the desired type, and it is returned.
It may be instructive to consider the type invariants required in order to make this constructor DMH safe to use:
- The selected constructor must have been accessible to the original creator of the DMH.
- The method type of the DMH must take two references plus a 32-bit value (int, short, char, byte, or boolean).
- The method type of the init-method in the DMH must correspond to those basic types also.
- If the init-method takes narrow, non-basic arguments (say,
String
orchar
) those must match the type of the DMH. - The instance-class of the DMH must exactly match the class of the init-method.
- The instance-class of the DMH must also match the return type of the DMH.
- All of these invariants are enforced by the privileged code which constructs the DMH.
Special vs. virtual calls
Note that if a DMH is created for a virtual invocation (using Lookup.findVirtual
) but the selected method can be devirtualized, the DMH produced has internal behavior identical to one produced by Lookup.findSpecial
(if such access were allowed). This is the same "opt-virtual" or "vfinal" optimization found in the JVM when working with virtual calls.
In order to distinguish between DMHs produced via the two Lookup
calls, the DirectMethodHandle.Special
class is used to represent the results of all calls to Lookup.findSpecial
(or the corresponding invokespecial
CP constants). Thus, a devirtualized method can be represented equivalently by either of the two classes, and they are behaviorally identical. The class difference allowed reflection (the MethodHandleInfo
API) to disambiguate in such a case between the two possible Lookup
calls.
Field references
Here is an example of a lambda form for DMH which performs an non-volatile getfield
of a long-valued field.
LambdaForm(a0:L,a1:L)=>{ t2:J=DirectMethodHandle.fieldOffset(a0:L); t3:L=DirectMethodHandle.checkBase(a1:L); t4:J=Unsafe.getLong((sun.misc.Unsafe@6a9a4f8e),t3:L,t2:J); t4:J}
As before, a0
is the DMH, which is of type Accessor
in this case. The only other argument, a1
, is the receiver object, whose field is being loaded.
The unsafe field offset is extracted from the DMH. Perhaps surprisingly, this is the last thing the DMH contributes to the operation. This corresponds to the operation of CP caches for getfield
, which also contribute just a simple offset.
Next, the receiver object is checked for nullness, calling the subroutine checkBase
. Finally, the desired value is extracted using Unsafe.getLong
, and returned.
Note that the capability value for the Unsafe
API is embedded into the lambda form as a constant operand to Unsafe.getLong
. This is analogous to privileged code which performs unsafe operations against a private static final reference to Unsafe
.
Direct method handle optimization
Because direct method handles are immutable, the compilers can constant-fold the fields of a DMH whenever that DMH is itself a constant.
The offsets or member-name references inside the DMH are processed by the compiler into code which is similar to that produced by the equivalent hard-coded instructions operating on symbolic references.
For example, in C2, linkToStatic
and the other "linkers" are special-cased in CallGenerator::for_method_handle_inline
. Given a known invocation mode and a constant member-name, the method call can be inlined as readily as a normal (symbolically specified) method call.
Unsafe getters and setters are special-cased in LibraryCallKit::inline_unsafe_access
. When the base address of the reference is a compile-time constant type, the actual field can be recovered from the numerical offset handed to the getter or setter.
Non-constant invocation
For non-constant DMH invocations, the compiler must emit a call to the linker method. However, this is not terribly slow, since linkToStatic
and its fellows have tightly hand-coded fast paths for compiled calls.
Unlike in the interpreter calling sequence, trailing MemberName
argument cannot simply be popped from a stack. But there is another trick available. Compiled calling sequences are assumed to allow trailing arguments to be popped without data motion. This allows the trailing member-name to be consulted and then ignored during the tail-call of the real target method.
In other words, where the interpreted version of linkToVirtual
or its fellows would have to pop the trailing argument to pick up the MemberName
, the compiled version simply has to pick it up from the Nth compiled argument position, and then treat that argument position as irrelevant in the subsequent tail-call.
(The trailing argument ignorability invariant for compiled calls is verified in SharedRuntime::check_member_name_argument_is_last_argument
. See further discussion in Linker methods for direct method handles.)
For any given basic-type, the JVM spins (as needed) a compiled version of linkToStatic
(etc.) that loads the trailing MemberName
from the appropriate register or stack location.
For interpreted calls, there is only one version of each of the four "linker" methods. This works because there is only one way to pop the trailing argument off the stack, in the interpreter.