JASM Syntax
This chapter describes JASM syntax, and how to encode class files using this syntax. Jasm is a java assembler that accepts text in the JASM format and produces a .class
file for use with a Java Virtual Machine. Jasm's primary use is as a tool for producing specialized tests for testing a JVM implementation
This chapter describes JASM syntax in the following sections:
General Syntax
JASM syntax can come in one of two variations: short-form or verbose-form. Short form uses Java-style names to refer to items in a constant-pool. Verbose form uses explicit constant-pool indexes to refer to items in the constant pool. The normal output from JDIS produces jasm files in the short-form. Using the -g option for JDIS (ie. jdis -g file.class) produces JASM source in the verbose-form.
The source text file can be free form (newlines are considered blanks) and may contain Java-style commenting. The first line of a JASM file represents the name of the resulting file in the destination directory. This name does not affect the content of the resulting file. This line has two forms:
file FILENAME
or
class CLASSNAME
In the latter case, extension .class
will be added to form FILENAME. Jasm's -d option
allows you to define the destination directory. A list of structured data items follows the class name. The length (in bytes) of each item is determined by its representation.
Description formats
TERM1|TERM2 | TERM1 or TERM2 (not both) |
[TERM] | TERM is optional |
TERM... | TERM repeated 1 or more times |
[TERM...] | TERM repeated 0 or more times |
"sequence of" | all the following terms are mandatory, in the order given. |
"set of" | any of following terms, or none of them, may appear in any order. However, repetitions are not allowed. |
"list of" | any of following terms, or none of them, may appear in any sequence. If more than one term appear, they are separated by commas (',') |
Lexical Structure
The source text file can be free form (newlines, tabs, and blank spaces are equivalent). Additionally, the source may contain standard Java and C++ comments.
STRING
, NUMBER
, and IDENT
are treated the same as in the Java Language Specification. One difference is that LETTERs include also `/', `<', `>', `(', and `)' .
Not all access bits make sense for all declarations: for example, the "super" and "interface" access flags are applied to classes only.
If an access bit is used improperly, the assembler prints a warning, but places the bit in the access set.
Note that deprecated
and synthetic
keywords are not translated to access flags in the Java sense. For these jasm generates a corresponding Deprecated
or Synthetic
attributes instead of access bits. The synthetic
access flag is used to mark compiler generated members not seen in the source (for example, a field reference to an anonymous outer class).
Local names represent labels, trap-labels and local variables. Their scope is constrained by method parenthesis.
Each CONSTANT_INDEX represents a reference into the constant pool at the specified location.
General Class Structure
The extends CONSTANT_CELL(class)
clause places the "super" element of the class file. The implements INTERFACES
clause places the table of interfaces. Since the assembler does not distinguish interfaces and ordinary classes (the only difference is one access bit), the table of interfaces of an interface class must be declared with implements
keyword, and not extends
, as in Java language.
Note:The last two rules allow TOP_LEVEL_COMPONENT
to appear in any order and number. For example, you can split constant pool table into several parts, mixing constants and method declarations.
General Source File Structure
Package declaration can appear only once in source file.
The Constant Pool and Constant Elements
A CONSTANT_CELL
refers to an element in the constant pool. It may refer to the element either by its index or its value:
Tags differentiate constant entries in a constant pool:
Generic rule for TAGGED_CONSTANT_VALUE is:
A TAG may be omitted when the context only allows one kind of a tag. For example, the argument of an anewarray
instruction should be a CONSTANT_CELL
which represents a class, so instead of
anewarray class java/lang/Object
one may write:
anewarray java/lang/Object
It is possible to write another tag, e.g.:
anewarray String java/lang/Object
However, the resulting program will be incorrect.
Another example of an implicit tag (eg. a context which implies tag) is the header of a class declaration. You may write:
aClass {
}
which is equivalent to:
class aClass {
}
Below, the tag implied by context will be included in the rules, e.g.:
CONSTANT_VALUE(int).
The exact notation of CONSTANT_VALUE
depends on the (explicit or implicit) TAG.
Note
When the JASM parser encounters an InvokeDynamic constant, it creates an entry in the BootstrapMethods attribute (the BootstrapMethods attribute is produced if it has not already been created). The entry contains a reference to the MethodHandle item in the constant pool, and, optionally, a sequence of references to additional static arguments (ldc-type constants) to the bootstrap method.
INVOKESUBTAGs for MethodHandle and (const) InvokeDynamic are defined as follows:
Static arguments for an InvokeDynamic constant are defined as follows:
INTEGER
, LONG
, FLOAT
, and DOUBLE
correspond to IntegerLiteral
and FloatingPointLiteral
as described in The Java Language Specification. If a double-word constant (LONG
or DOUBLE
) is represented with a single-word value (INTEGER
or FLOAT
, respectively), single-word value is simply promoted to double-word, as described in The Java Language Specification. If floating-point constant (FLOAT
or DOUBLE
) is represented with an integral value (INTEGER
or LONG
, respectively), the result depends on whether the integral number is preceded with the keyword "bits". If "bits" is not used, the result is a floating-point number closest in value to the decimal number. If the keyword "bits" is used, the floating-point constant takes bits of the integral value without conversion.
Thus,
float 2;
means the same as
float 2.0f;
and the same as
float bits 0x40000000;
while
float bits 2;
actually means the same as
float bits 0x00000002;
and the same as
float 2.8026e-45f
External names are names of class, method, field, or type, which stay in resulting .class file, and may be represented both by IDENT
or by STRING
(which is useful when name contains non-letter characters).
In this second example, the first CONSTANT_NAME
denotes the name of a field and second denotes its type.
In this third example, CONSTANT_NAME
denotes to the class of a field. If CONSTANT_NAME
is omitted, the current class is assumed.
Constant Declarations
Constant declarations are demonstrated in the examples below:
Field Variables
Example:
Access bits (public and static) are applied both to field1 and field2. The EXTERNAL_NAME
denotes the name of the field, CONSTANT_NAME
denotes its type, TAGGED_CONSTANT_VALUE
denotes initial value.
Method Declarations
The EXTERNAL_NAME
denotes the name of the method, CONSTANT_NAME
denotes its type.
The meaning of the THROWS
clause is the same as in Java Language Specification - it forms Exceptions attribute of a method. Jasm itself does not use this attribute in any way.
The NUMBER
denotes maximum operand stack size of the method.
The NUMBER
denotes number of local variables of the method. If omitted, it is calculated by assembler according to the signature of the method and local variable declarations.
Instructions
VM Instructions
Jasm allows for a NUMBER
(which is ignored) at the beginning of each line. This is allowed in order to remain consistent with the jdis disassembler. Jdis puts line numbers in disassembled code that may be reassembled using Jasm without any additional modifications.
SWITCHTABLE example: Java_text
will be coded in assembler as follows:
OPCODE is any mnemocode from the instruction set. If mnemocode needs an ARGUMENT, it cannot be omitted. Moreover, the kind (and number) of the argument(s) must match the kind (and number) required by the mnemocode:
InvokeDynamic Instructions
InvokeDynamic instructions are instructions that allow dynamic binding of methods to a call site. These instructions in JASM form are rather complex, and the JASM assembler does some of the necessary work to create a BootstrapMethods attribute for entries of binding methods.
his JASM code has an invokedynamic instruction of the form:
where the INVOKEDYNAMIC constant is represented as specified
The JASM assembler creates the appropriate constant entries and entries into the BootstrapMethods attribute in a resulting class file.
You can also create InvokeDynamic constants and BootstrapMethods explicitly:
In this example, const #1 = InvokeDynamic 0:#11;
is the InvokeDynamic constant that refers to BootstrapMethod at index '0' in the BootstrapMethods Attribute (BootstrapMethod #19 #8 #3;
which refers to the MethodHandle at const #19, plus 2 other static args (at const #8 and const #3).
Pseudo Instructions
Pseudo instructions are 'assembler directives', and not really instructions (in the VM sense) They typically come in two forms: Code-generating Pseudo-Instructions, and Attribute-Generating Pseudo-Instructions.
Code-Generating Pseudo-Instructions
The bytecode directive instructs the assembler to put a collection of raw bytes into the code attribute of a method:
Inserts bytes in place of the instruction. May have any number of numeric arguments, each of them to be converted into a byte and inserted in method's code.
Attribute-Generating Pseudo-Instructions
The rest of pseudo_instructions do not produce any bytecodes, and are used to form tables: local variable table, exception table,
Stack Maps, and Stack Map Frames. Line Number Tables can not be specified, but they are constructed by the assembler itself.
Local Variable Table Attribute Generation
Example:
will be coded in assembler as follows:
Exception Table Attribute Generation
To generate exception table, three pseudo-instructions are used.
TRAP_IDENT
represents the name or number of an exception table entry. CONSTANT_CELL
in "catch" pseudo_instruction means catch type. Each exception table entry contains 4 values:start-pc, end-pc, catch-pc, catch-type. In jasm, each entry is denoted with some (local) identifier, as an example: TRAP_IDENT
.
To set start-pc, place "try TRAP_IDENT" before the instruction with the desirable program counter. Similarly, use "endtry TRAP_IDENT" for end-pc and "catch TRAP_IDENT, catch-type" for catch-pc and catch-type (which is usually a constant pool reference). Try, endtry, and catch pseudoinstructions may be placed in any order. The order of entries in exception table is significant (see JVM specification). However, the only way to control this order is to place catch-clauses in appropriate textual order: assembler adds an entry in the exception table each time it encounters a catch-clause.
Example:
will be coded in assembler as follows:
StackMap Table Attribute Generation
Stack Maps are denoted by the pseudo-op opcode stack_map, and they can be identified by three basic items:
All stack_map directives are collected by the assembler, and are used to create a StackMap Table attribute.
Example 1 (MapType):
Example 2 (Object):
Example 3 (NewObject):
StackFrameType Table Attribute Generation
StackFrameTypes are similar assembler directives as StackMap. These directives can appear anywhere in the code, and the assembler will collect them to produce a StackFrameType attribute.
Example 1 (full stack frame type):
Example 2 (append, chop2, and same stack frame types):
LocalsMap Table
Locals Maps are typically associated with a stack_frame_type, and are accumulated per stack frame. They typically follow a stack_frame_type directive.
Example (a locals map specifying 2 ints):
Inner-Class Declarations
Example:
Annotation Declarations
Member Annotations
Member annotations are a subset of the basic annotations support provided in JDK 5.0 (1.5). These are annotations that ornament Packages, Classes, and Members either visibly (accessible at runtime) or invisibly (not accessible at runtime). In JASM, visible annotations are denoted by the token @, while invisible annotations are denoted by the token @-.
Synopsis
The '@+' token identifies a Runtime Visible Annotation, where the '@-' token identifies a Runtime Invisible Annotation.
Note
Types (Boolean, Byte, Char, and Short) are normalized into Integer's within the constant pool.
Annotation values with these types may be identified with a keyword in front of an integer value.
eg. boolean true (or: boolean 1)
byte 20
char 97
short 2130
Other primitive types are parsed according to normal prefix and suffix conventions
(eg. Double = xxx.xd, Float = xxx.xf, Long = xxxL).
Strings are identified and delimited by '"' (quotation marks).
Keywords 'class' and 'enum' identify those annotation types explicitly. Values within classes and enums may
either be identifiers (strings) or Constant Pool IDs.
Annotations specified as the value of an Annotation field are identified by the JASM annotation keywords '@+' and '@-'.
Arrays are delimited by '{' and '}' marks, with individual elements delimited by ',' (comma).
Examples
Example 3 (Field Annotation, All subtypes)
Note:
JASM does not enforce the annotation value declarations like a compiler would. It only checks to see that an annotation structure is well-formed.
Type Annotations
Member annotations are a subset of the basic annotations support provided in JDK 7.0 (1.7). These are annotations that ornament Packages, Classes, and Members either visibly (accessible at runtime) or invisibly (not accessible at runtime). In JASM, visible annotations are denoted by the token @T+, while invisible annotations are denoted by the token @T-.
Synopsis
Parameter Names and Parameter Annotations
Parameter annotations are another subset of the basic annotations support provided in JDK 5.0 (1.5). These are annotations that ornament Parameters to methods either visibly (accessible at runtime) or invisibly (not accessible at runtime). In JASM, visible parameter annotations are denoted by the token @+, while invisible parameter annotations are denoted by the token @-.
Parameter names come from an attribute introduced in JDK 8.0 (1.8). These are fixed parameter names that are used to ornament parameters on methods. In Jasm, parameter names are identified by the token # followed by { } braclets
Synopsis
Examples
Java Code
JASM Code
Note: The first two parameters are named ('P0'- 'P3'). Since this is a compiler controlled option, there is no way to specify parameter naming in Java source.
Default Annotations
Default annotations are another subset of the basic annotations support provided in JDK 5.0 (1.5). These are annotations that ornament Annotations either visibly (accessible at runtime) or invisibly (not accessible at runtime). Default annotations specify a default value for a given annotation field.
Synopsis
Examples
Java Code
JASM Code
PicoJava Instructions
These instructions takes 2 bytes: prefix (254 for non-privileged variant and 255 for privileged) and the opcode itself. These instructions can be coded in assembler in 2 ways: as single mnemocode identical to the description or using "priv" and "nonpriv" instructions followed with an integer representing the opcode.