...
Parallel processing units execute similar work-units, with a difference: Several work-units may be executed simultaneously, sharing a single instruction and multiple parallel inputs and outputs.unun
A program that executes one ALU instruction across a parallel set of data inputs and outputs is called SIMD, single instruction multiple thread. If the data is structured as parallel sets of registers or stack temps, and the instructions are capable of performing control flow, then the model may be called SIMT, where the last word is "thread".
...
What's in a Java Thread?
At any given mementmoment, a thread consists of:
- a bytecode T.bc being executed (part of a basic block within some "current method")
- local variables T.L[], expression stack T.S[], and monitors T.M[] (appropriate to the current method)
- a stackframe stack frame of all of the above T.F = < bc,L,S,M >
- a control stack of pending executions of either bytecode methods or native methods: T.C = {F(j)}
- thread-local values (accessed by java.lang.ThreadLocal), T.TL[]
- a permanently associated object of type java.lang.Thread, T.Thread
...
If all threads start off at a() but they compute differing values of the boolean, some will execute b() and others c(). To begin to control divergence, no thread should execute d(), until all threads have left the if-then-else statement (the block labeled L).
(Noteequence Note the consequence that throwing an exception or returning from b() or c() counts as exiting the block, even though the thread will not rejoin the others at d().)
...
Other Considerations
The SIMT model semms seems to allow some kinds of very serialized Java code to operate efficiently on GPUs.
...