Date: Fri, 29 Mar 2024 00:11:08 +0000 (UTC) Message-ID: <1403379841.1245.1711671068448@34fc92c9345b> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_1244_788239769.1711671068448" ------=_Part_1244_788239769.1711671068448 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
C2's high level IR is normally called Ideal and it's really kind= of a mid level IR. There aren't that many high level nodes in it and most = of the nodes correspond to single machine instructions on most machines. Th= e highest level are nodes like Allocate/AllocateArray and Lock/Unlock which= embody fast and slow path idioms. They get expanded at the very end of com= pilation into a larger set of nodes that represent the operations at a lowe= r level.
All instructions in ideal are subclasses of Node. Nodes have a collectio= n of Node inputs and as much as possible everything is done by using node i= nputs instead of having special fields on the subclasses of Node. Nodes mai= ntain def-use edges so that you can always visit all the users of any node = relatively cheaply. This is one of the reasons for the uniform treatment of= inputs, since it makes maintaining that information straightforward.
There's a lot of sharing of nodes in the ideal graph because of the unif= orm treatment of nodes and the use global value numbering. For instance the= re's only one copy of every constant in the graph which is good for code bu= t it makes understanding the graph somewhat difficult because globally the = graph can get pretty tangled.
It has the normal set of binary operators you'd expect and these are spe= cialized into int, long, float, double and pointer using a capital letter a= t the end to identify they type, so AddI is integer add and CmpP is the poi= nter compare.
Loads distinguish between the smaller signed and unsigned integral types= like char and byte, so you have instructions like LoadC to load a characte= r and LoadS to load a short. Stores are only differentiated by the size of = the store and not the sign. Memory operations all have a special input call= ed the Memory input which identifies the slice of memory that the load or s= tore is operating on. All memory operations are connected to each other in = a thing called the memory graph. There's an incoming memory state at the en= try point of the compile and loads and stores are connected up to that. The= memory state is divided into a bunch of slices which correspond to aliases= classes. Operations within the same slice are ordered relative to each oth= er but slices aren't ordered relative to other slices except when the slice= s merge into some form of memory barrier, like a call. The slices have to m= erge up at any operation where the compiler can't see the effect of a node,= like a call to some other Java code, or where explicitly ordering is neede= d, like volatile memory operations. These operations produce a new memory s= tate that all following loads and stores consume. The memory graph captures= all the memory dependence in the ideal and insures that load and stores ar= e issued in the correct order.
There are no explicit basic blocks in ideal at all. It's representation = is what's known as a program dependence graph in the compiler literature. A= node can only be evaluated once all of it's inputs have been evaluated so = any constraints on order of execution have to be completely captured by the= inputs to the node. There are special nodes that are said to produce contr= ol and every node has a special input, input 0, which can be set to point a= t one of these control nodes. Nodes that produce control are obvious ones l= ike If and Call but there are also Region nodes which represent the merge o= f multiple control flow paths plus Halt and Return that terminate control f= low. PhiNodes have a Region as their control input and inputs for each cont= rol flow path merging at the region. Many nodes don't require a control nod= e at all because they can safely be executed once their data inputs have be= en evaluated. All the binary operators except division don't require a cont= rol input. Division is special in Java since it might produce a divide by z= ero exception.
By setting input 0 to point at a particular control flow node you're con= straining that node to be evaluated only after that control flow node is ev= aluated. Note that control dependence expresses only the earliest point tha= t a node can be evaluated. It doesn't say anything about the latest point i= n the program that node can be evaluated. Data and memory dependence tend t= o constrain the latest evaluation point. The node can actually be evaluated= anywhere between the control point and the latest point allowed by the nod= es uses.
There are also special nodes called projections (ProjNode) which are use= d to deal with nodes which produce multiple values. If nodes represent thei= r control flow split by having two control projections, IfTrue and IfFalse,= with nodes being dependent on the appropriate control projection instead o= f being dependent on the If.
Call nodes are among the most complex nodes in the graph. In addition to= having a control input they are 4 other special inputs, along with their a= rguments and any values which are needed for deoptimization. The only other= interesting special input is the incoming memory state. This is similar to= the memory input to a load or store but it generally gathers all the live = memory slices into one MergeMem object to represent the fact all memory ope= ration preceeding the call should complete before the call is executed. Cal= ls produce multiple values using ProjNodes. In particular they produce a co= ntrol projection for nodes which are control dependent on the call, a memor= y projection to represent the state of memory after the call is complete an= d a projection for the return value of the call if there is one.
For calls which might be safepoints, there are also extra inputs which c=
apture all the values which are live as far as deopt is concerned. A separa=
te data structure called the JVMState maps from Java's notion of frames, lo=
cals and expression stack to inputs to the node.
There is also a special node call top which is use to fill inputs that we =
don't care about or which are dead.