...
Does HSA allow access to all of memory? Yes. The compiled java kernels access the java heap and the runtime support code in the kernel accesses various C heap/mmap data structures used by the JVM.
Do we Does Sumatra support streams made from an ArrayList? Yes, at this time we support object collections that are backed by real arrays such as Vector and ArrayList
Do we Does Sumatra support using off-heap arrays? Not at this time but it should be possible to do so with HSA.
Do we Does Sumatra suspend GC during kernel execution? Yes, because the kernels run in the Hotspot "thread in vm" mode, where GCs will not occur except where the threads explictly check for safepoints. Some loops in kernels may contain safepoint checks. If a safepoint occurs, and the kernel sees it before completing, it may deoptimize back to CPU so as not to stop progress of the CPU threads.
...
How do you control whether it gets offloaded or not (say if you want to test parallel CPU before parallel GPU)? In the offloadable JDK we have a flag to turn on offload: -Dcom.amd.sumatra.offload.immediate=true. Offload is off by default in the Sumatra JDK.
What gets cached after the first time the kernel is compiled and used? HSAIL, HSA finalized kernel? The HSA finalized kernel gets stored in a cache so later executions can immediately reuse it.
Does this have any safepoint implications for the java threads, does the shared data get pinned? Some loops in kernels may contain safepoint checks. If a safepoint occurs, and the kernel sees notices it before completing, it may deoptimize back to CPU so as not to stop progress of the CPU threads. When a kernel deoptimization occurs, the remaining work is completed on a CPU java thread.
...
After a deopt, why couldn't you restart the never-rans on the gpu? Generally this seems like a good idea. We may implement this at a later time.
Can we Sumatra handle off-heap access, for example using Unsafe API? This seems possible but we have not investigated it.
...
What would be the impact of the performance when you deoptimizea deoptimization occurs? At this time, when we are focusing on correctness, the deoptimization is handled single-threaded on the CPU on the same thread that called the lambda. This might be slower than the offloaded kernel. Later we might use a thread pool to deal with the cleanup or relaunch the kernel with a sub-range of the original work.
...