...
If the fastpath allocation from the workitem's shared tlab fails (top overflows past end as described above), then by default we deoptimize to the interpreter using the usual HSAIL deoptimization logic (todo: link here to hsail deoptimization experiments). While deoptimizing to the interpreter gets correct results, for performance we would prefer to stay on the GPU rather than to deoptimize. There is an additional graal option, HsailAllocBytesPerWorkitem, which can be used for performance experiments.The graal option HsailAllocBytesPerWorkitem specifies hints how many bytes each workitem expects to allocate. The JVM code before invoking the kernel will look at the donor thread TLAB free sizes and attempt to "close" a TLAB and try to allocate a new one if the existing free space is not large enough (taking into account the number of workitems and the number of donor threads). Behavior will be functionally correct regardless of this option, there just might be more deopts. We intend to explore other ways to reduce the probability of deopts.
Fastpath Allocation Failure, Eden Allocation
There is an additional The graal option called HsailUseEdenAllocate which, if set to true, specifies that instead of deopting, we will instead first attempt to allocate from eden. There is a single eden from which all threads (and for us all workitems) allocate. In fact, TLABs themselves are allocated from Eden. Given that we are possibly competing with real java threads for eden allocation, we use the hsail platform atomic instruction atomic_cas. While eden allocation was functionally correct, we saw a performance degradation using eden allocation compared to simply deoptimizing and so have turned it off by default. We may explore eden_allocation further in the future and we would also like to explore the strategy of allocating a whole new TLAB from the GPU when a TLAB overflows.
...