The Sumatra JDK now produces a build that allows offloading certain JDK 8 Stream API parallel streams terminating in forEach() to HSA APU/GPUs, or the HSAIL Simulator described in a nearby wiki page. To produce the offload-enabled JDK, it is a two step process. First, build the Sumatra JDK, which is build the same way as JDK 8. Then use the Sumatra JDK you built as the JAVA_HOME when building a Graal server JVM, to make a Graal JDK that includes our HSAIL support.
To build the whole system, clone the Sumatra repository http://hg.openjdk.java.net/sumatra/sumatra-dev/, then build it following the normal JDK 8 build instructions as shown at http://hg.openjdk.java.net/jdk8/jdk8/raw-file/tip/README-builds.html.
Next, clone the Graal repository, which contains the HSAIL backend used to produce the offload kernels from the lambda used in the Stream API forEach call. See the Graal wiki here.
Use the Sumatra JDK image you built as the JAVA_HOME for building Graal. Note we have been using the server build of Graal, so that the CPU methods get compiled with the -server compiler and Graal is used only for the HSAIL compilation in this configuration. For example:
$ export JAVA_HOME=/path/to/sumatra-dev/build/linux-x86_64-normal-server-release/images/j2sdk-image/
$ ./mx.sh --vmbuild product --vm server build
This builds a Graal enabled JDK that you can use to run HSAIL kernels either in the Graal mx system or standalone. To see a simple example of an HSAIL kernel running in mx, try running mx unittest as shown:
$ ./mx.sh --vm server unittest -XX:+TraceGPUInteraction -XX:+GPUOffload -G:Log=CodeGen hsail.test.IntAddTest
...
[HSAIL] library is libokra_x86_64.so
[HSAIL] using _OKRA_SIM_LIB_PATH_=/tmp/okraresource.dir_2488167353114811077/libokra_x86_64.so
[GPU] registered initialization of Okra (total initialized: 2)
[CUDA] Ptx::get_execute_kernel_from_vm_address
JUnit version 4.8
.[thread:1] scope:
[thread:1] scope: GraalCompiler
[thread:1] scope: GraalCompiler.CodeGen
Nothing to do here
Nothing to do here
Nothing to do here
version 0:95: $full : $large;
// static method HotSpotMethod<IntAddTest.run(int[], int[], int[], int)>
kernel &run (
align 8 kernarg_u64 %_arg0,
align 8 kernarg_u64 %_arg1,
align 8 kernarg_u64 %_arg2
) {
ld_kernarg_u64 $d0, [%_arg0];
ld_kernarg_u64 $d1, [%_arg1];
ld_kernarg_u64 $d2, [%_arg2];
workitemabsid_u32 $s0, 0;
@L0:
cmp_eq_b1_u64 $c0, $d0, 0; // null test
cbr $c0, @L1;
@L2:
ld_global_s32 $s1, [$d0 + 12];
cmp_ge_b1_u32 $c0, $s0, $s1;
cbr $c0, @L12;
@L3:
cmp_eq_b1_u64 $c0, $d2, 0; // null test
cbr $c0, @L4;
@L5:
ld_global_s32 $s1, [$d2 + 12];
cmp_ge_b1_u32 $c0, $s0, $s1;
cbr $c0, @L11;
@L6:
cmp_eq_b1_u64 $c0, $d1, 0; // null test
cbr $c0, @L7;
@L8:
ld_global_s32 $s1, [$d1 + 12];
cmp_ge_b1_u32 $c0, $s0, $s1;
cbr $c0, @L10;
@L9:
cvt_s64_s32 $d3, $s0;
mul_s64 $d3, $d3, 4;
add_u64 $d1, $d1, $d3;
ld_global_s32 $s1, [$d1 + 16];
cvt_s64_s32 $d1, $s0;
mul_s64 $d1, $d1, 4;
add_u64 $d2, $d2, $d1;
ld_global_s32 $s2, [$d2 + 16];
add_s32 $s2, $s2, $s1;
cvt_s64_s32 $d1, $s0;
mul_s64 $d1, $d1, 4;
add_u64 $d0, $d0, $d1;
st_global_s32 $s2, [$d0 + 16];
ret;
@L1:
mov_b32 $s0, -7691;
@L13:
ret;
@L4:
mov_b32 $s0, -6411;
brn @L13;
@L10:
mov_b32 $s0, -5403;
brn @L13;
@L7:
mov_b32 $s0, -4875;
brn @L13;
@L12:
mov_b32 $s0, -8219;
brn @L13;
@L11:
mov_b32 $s0, -6939;
brn @L13;
};
[HSAIL] heap=0x00007f95b8019cc0
[HSAIL] base=0x05a00000, capacity=210763776
External method:com.oracle.graal.compiler.hsail.test.IntAddTest.run([I[I[II)V
installCode0: ExternalCompilationResult
[HSAIL] sig:([I[I[II)V args length=3, _parameter_count=4
[HSAIL] static method
[HSAIL] HSAILKernelArguments::do_array, _index=0, 0x82b21970, is a [I
[HSAIL] HSAILKernelArguments::do_array, _index=1, 0x82b477f0, is a [I
[HSAIL] HSAILKernelArguments::do_array, _index=2, 0x82b479e0, is a [I
[HSAIL] HSAILKernelArguments::not pushing trailing int
Time: 0.208
OK (1 test)
Note you must use the extra option -XX:+GPUOffload
to enable offloading and use -XX:+TraceGPUInteraction
to see extra messages about GPU initialization etc.