This FAQ is based on the Q&A from our JavaOne 2014 sessions.
What GPU vendors support HSA? The member companies in the HSA foundation are shown at http://www.hsafoundation.com/
Does HSA allow access to all of memory? Yes. The compiled java kernels access the java heap and the runtime support code in the kernel accesses various C heap/mmap data structures used by the JVM.
Does Sumatra support streams made from an ArrayList? Yes, at this time we support object collections that are backed by real arrays such as Vector and ArrayList
Does Sumatra support using off-heap arrays? Not at this time but it should be possible to do so with HSA.
Does Sumatra suspend GC during kernel execution? Yes, because the kernels run in the Hotspot "thread in vm" mode, where GCs will not occur except where the threads explictly check for safepoints. Some loops in kernels may contain safepoint checks. If a safepoint occurs, and the kernel sees it before completing, it may deoptimize back to CPU so as not to stop progress of the CPU threads.
If you build Sumatra/Graal build, and also build the simulator can you run it on any computer? Yes the simulator is a normal shared library with no GPU dependency at all. The simulator is for linux and we have used it with CentOS 6.5, and Ubuntu 13.10, 14.04.
Does the simulator require installation of CUDA or OpenCL? Can it coexist with them? We use the simulator on a system with an Nvidia card/driver installed so we could test HSAIL and the Oracle PTX support. We have not tested simultaneous installation of an Nvidia card in an HSA system. The AMD OpenCL software stack is not compatible with HSA at this time.
Does the APU share page tables with the CPU? Yes, the kernels see memory the same as CPU threads and have cache coherency between CPU and GPU.
If you do have data dependencies between workitems, can you use synchronization to control it? We do not expose these features to java code at this time but HSAIL does contain barriers etc to allow this kind of behavior.
How do you control whether it gets offloaded or not (say if you want to test parallel CPU before parallel GPU)? In the offloadable JDK we have a flag to turn on offload: -Dcom.amd.sumatra.offload.immediate=true. Offload is off by default in the Sumatra JDK.
What gets cached after the first time the kernel is compiled and used? HSAIL, HSA finalized kernel? The HSA finalized kernel gets stored in a cache so later executions can immediately reuse it.
Does this have any safepoint implications for the java threads, does the shared data get pinned? Some loops in kernels may contain safepoint checks. If a safepoint occurs, and the kernel notices it before completing, it may deoptimize back to CPU so as not to stop progress of the CPU threads. When a kernel deoptimization occurs, the remaining work is completed on a CPU java thread.
If you have a stream that has more elements than there are stream cores, does it still work or do you have to break the range up into pieces? It is a normal feature of HSA to support a bigger "grid size" than there are stream cores. The HSA runtime works through the input stream source array launching waves of workitems in turn until the whole grid size of work is completed.
After a deopt, why couldn't you restart the never-rans on the gpu? Generally this seems like a good idea. We may implement this at a later time.
Can Sumatra handle off-heap access, for example using Unsafe API? This seems possible but we have not investigated it.
Does this mean the first exception aborts the kernel? Yes. When an exception happens, one workitem encounters the exception situation "first." Other workitems might also encounters the exception situation. Other workitems might not encounter the exception situation, so they might run to completion. Later workitems will not start to run if they notice an exception has already occurred. The state of all the workitems is returned to the CPU, where the exception is handled and all the rest of the work in the kernel is completed before returning to the caller of the offloaded lambda. After the exception, the program is in exactly the same state as if the exception had happened in regular CPU only execution.
What would be the impact of the performance when a deoptimization occurs? At this time, when we are focusing on correctness, the deoptimization is handled single-threaded on the CPU on the same thread that called the lambda. This might be slower than the offloaded kernel. Later we might use a thread pool to deal with the cleanup or relaunch the kernel with a sub-range of the original work.
Does HSA fully support 64-bit floating point types? Does HSA obey the
IEEE spec? Yes HSAIL supports float and double type. The HSA specifications are here: http://www.hsafoundation.com/standards/
Division by zero with floats, is that handled correctly (NAN)? Yes.