Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  1. This page under construction...

For Sumatra targets, we wanted to experiment with an infrastructure for supporting deoptimization, that is, transferring execution from the compiled code running on the GPU back to the equivalent bytecodes being run through the interpreter on the CPU.   The following describes some experiments with deoptimization using the HSAIL Backend to graal which are now available on the graal trunk.

...

  • the deopt Id or "pc" offset where the deopt occurred
  • number of 32-bit s registers saved
  • number of 64-bit d registers saved
  • number of stack-slot variables saved (work in progress)
  • space for saving the s and d registers and stack slot variables
  • in the current hsail frame we save a bitmap of the d registers that contain oops.  This will be replaced later with a data structure for each kernel that will map a deopt ID to an oopMap for that deoptID.

To avoid code bloat, we currently have one deopt exit point per kernel, and in that deopt exit code we save the union of the actual registers that are live at any of the infopoints.

...

  • for the workitems that finished normally, there is nothing to do

  • if there are any deopted workitems, we want to run each deopting workitem thru the interpreter starting from the byte code index of the deoptimization point.  Note that it is possible for different workitems to have different deoptimization "PCs".    We currently re-dispatch each of the deopting workitems sequentially although other policies are clearly possible.
    • Note: Getting to the interpreter from the saved hsail state goes first through some special compiled host trampoline code infrastructure that Gilles Duboscq designed.  The trampoline host code takes the deoptId and a pointer to the saved hsail frame as input and then immediately deoptimizes just as any host compiled code would deoptimize.

  • for each never-ran workitem, we can just run it from the beginning of the kernel method, just making sure we pass the arguments and the appropriate workitem id for each one.  Again, we currently do this sequentially although other policies are possible.

GC Considerations

Currently, the normal kernel dispatch runs in "thread in VM" mode and thus does not need to worry about moving Oops.  However, each time a deopting workitem is run through the interpreter or each time a never-ran workitem is run we are back in Java mode which can cause GCs.  So for each saved hsail frame, we need to know which of the saved state contain oops and make sure those locations are updated in the face of GC.  The current strategy of copying oops into an Object array supplied by the java side will be replaced later with an oops_do type of strategy.