Presentation CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi
1. HSA
ENABLEMENT
OF
APARAPI
EASING
THE
DEVELOPER
PATH
TO
APU/GPU
ACCELERATED
JAVA
APPLICATIONS
VIGNESH
RAVI
–
SOFTWARE
DEVELOPER
HSA
TEAM
AMD
GARY
FROST
–
SOFTWARE
FELLOW
AMD
2. HSA
ENABLEMENT
OF
APARAPI
:
AGENDA
! Java GPU enablement via Aparapi
‒ Why Java?
‒ Aparapi
‒ What is it and how is it used?
! Introduction to HSA
! How HSA simplifies Java GPU programming with Aparapi
‒ Simpler programming model using lambda expressions
‒ Removal of previous constraints thanks to SVM (Shared Virtual Memory)
! The nuts and bolts of our current HSA enablement
‒ HSAIL generation
‒ Dispatch via HSA Runtime APIs
! Summary
! Q&A
2
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
3. WHY
JAVA?
! Java
by
the
numbers
‒ 9
Million
Developers
‒ 1
Billion
Java
downloads
per
year
‒ 97%
Enterprise
desktops
run
Java
‒ 100%
of
blue
ray
players
ship
with
Java
hVp://oracle.com.edgesuite.net/[meline/java/
! Java
7
language
&
libraries
already
include
concurrency
features
‒ primi[ves
(threads,
locks,
monitors,
atomic
ops)
‒ libraries
(fork/join,
thread
pools,
executors,
futures)
! Upcoming
Java
8
include
stream
processing
enhancements
‒ support
for
‘lambda’
expressions
‒ Lambda
centric
concurrent
stream
processing
libs/apis
(java.u[l.stream.*)
3
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
4. INITIAL
APARAPI
PROJECT
OVERVIEW
(2011)
! Open Source framework
Java
Applica[on
! Allows Java developers access to GPU compute
Overload
Aparapi
KKernel
Base
Overload
Aparapi
ernel
Class’s
run()
method
Class’s
run()
method
! Aparapi Java API for expressing data parallel workloads
Aparapi
converts
bytecode
to
OpenCL™
Kernel kernel = new Kernel(){
@Override public void run(){
int i=getGlobalID();
square[i]=in[i]*in[i];
}
};
kernel.execute(size);
! Aparapi runtime capable of converting bytecode to OpenCL™
‒ Execution on OpenCL™ 1.1+ capable devices (GPUs and APUs)
Or…
‒ Execute via a thread pool if OpenCL™ is unavailable.
4
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
OpenCL™
OpenCL™ compiler &
Runtime
JVM
CPU ISA
CPU
GPU ISA
GPU
5. MEET
HSA
AND
HSAIL
! Heterogeneous
System
Architecture
standardizes
CPU/GPU
func[onality
‒ Be
ISA-‐agnos[c
for
both
CPUs
and
accelerators
‒ Support
high-‐level
programming
languages
‒ Provide
the
ability
to
access
pageable
system
memory
from
the
GPU
‒ Maintain
cache
coherency
for
system
memory
between
CPU
and
GPU
! Specifica[ons
and
simulator
from
HSA
Founda[on
‒ HSAIL
portable
ISA
is
“finalized”
to
par[cular
hardware
ISA
at
run[me
‒ Run[me
specifica[on
for
job
launch
and
control
‒ HSAIL™
simulator
for
development
and
tes[ng
before
hardware
availability
5
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
6. APARAPI
HSA
ENABLEMENT
(2013-‐2014)
Java
Applica[on
! Open
Source
project
sponsored
! Enhanced
to
support
HSA
and
Java
8
lambda
expression
Aparapi
Lambda
based
API
Aparapi
converts
bytecode
to
HSAIL
Device.hsa().forEach(size,
i -> square[i]=in[i]*in[i]
);
HSAIL
HSA Finalizer & Runtime
! Allow
developers
to
efficiently
represent
data
parallel
algorithms
using
new
Java
8
Lambda
expressions
! API’s
have
same
look
&
feel
as
proposed
Java
8
stream
API
features
! No
modifica[ons
to
the
JVM.
‒ We
provide
external
JNI/Java
libraries.
6
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
JVM
CPU ISA
CPU
GPU ISA
GPU
7. HSA
AND
LAMBDA
ENABLED
APARAPI
EXECUTION
EXAMPLE
Does
PlaLorm
Supports
HSA?
Y
N
Y
Can
bytecode
be
converted
to
HSAIL?
N
Device.hsa().forEach(size,
i -> square[i]=int[i]*int[i]
);
Is
this
the
first
execuAon
of
this
lambda
instance?
Y
Execute
Kernel
using
Java
thread
Pool
Convert
bytecode
to
HSAIL
N
N
Do
we
have
HSAIL
for
this
lambda
?
7
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
Y
Execute
HSAIL
Kernel
on
GPU/APU
8. SUMATRA
PROJECT
:
NATIVE
SUPPORT
FOR
GPU
OFFLOAD
ADDED
TO
JAVA
! AMD/Oracle
sponsored
Open
Source
(OpenJDK)
project
! Targeted
at
OpenJDK
Java
9
(2015)
Java
Applica[on
! Allow
developers
to
efficiently
represent
data
parallel
algorithms
in
Java
using
Stream
API
+
Lambda
expressions
Java
JDK
Stream
+
Lambda
API
! Sumatra
is
not
pushing
new
‘programming
model’
Java
GRAAL
JIT
backend
! Instead
we
‘repurpose’
Stream
API
+
Lambda
to
enable
both
CPU
or
GPU
compu[ng
HSAIL
! A
Sumatra
enabled
Java
Virtual
Machine™
will
dispatch
‘selected’
constructs
to
HSA
enabled
devices
at
run[me.
! Developers
already
refactoring
JDK
to
use
stream
&
lambda
API’s
‒ So
anyone
using
exis[ng
JDK
should
see
GPU
accelera[on
without
any
code
changes.
! Links:
‒ hVp://openjdk.java.net/projects/sumatra
‒ hVps://wikis.oracle.com/display/HotSpotInternals/Sumatra
‒ hVp://mail.openjdk.java.net/pipermail/sumatra-‐dev
8
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
HSA Finalizer & Runtime
JVM
CPU ISA
CPU
GPU ISA
GPU
9. HSA
ENABLEMENT
OF
JAVA
Java
7
–
OpenCL
enabled
Aparapi
Java
8
–
HSA
enabled
Aparapi
Java
9
–
HSA
enabled
Java
(Sumatra)
• Java
8
brings
Stream
+
Lambda
API.
More
natural
way
of
expressing
data
parallel
algorithms
Ini[ally
targeted
at
mul[-‐core.
• APARAPI
will
:-‐
Support
Java
8
Lambdas
Dispatch
code
to
HSA
enabled
devices
at
run[me
via
HSAIL
• Adds
na[ve
GPU
compute
support
to
Java
Virtual
Machine
(JVM)
• Developer
uses
JDK
provided
Lambda
+
Stream
API
• AMD
ini[ated
Open
Source
project
• APIs
for
data
parallel
algorithms
GPU
accelerate
Java
applica[ons
No
need
to
learn
OpenCL
• Ac[ve
community
captured
mindshare
~20
contributors
>7000
downloads
~150
visits
per
day
We
plan
to
provide
HSA
Enabled
Aparapi
(Java
8)
as
a
bridge
technology
between
OpenCL
based
Aparapi
(Java
7)
and
HSA
Enabled
Sumatra
(Java
9)
Java
Applica[on
Java
Applica[on
APARAPI
+
Lambda
API
OpenCL™
Java
JDK
Stream
+
Lambda
API
Java
GRAAL
JIT
backend
HSAIL™
HSAIL™
OpenCL™
Compiler
and
Run[me
HSA
Finalizer
&
Run[me
JVM
HSA™
Finalizer
&
Run[me
JVM
JVM
GPU ISA
CPU
• JVM
uses
GRAAL
compiler
to
generate
HSAIL
• JVM
decides
at
run[me
to
execute
on
either
CPU
or
GPU
depending
on
workload
characteris[cs.
Java
Applica[on
APARAPI
API
CPU ISA
GPU
9
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
GPU ISA
CPU ISA
CPU
GPU
GPU ISA
CPU ISA
CPU
GPU
10. A
CASE
STUDY
CENTERED
ON
NBODY
! A
Java
developer
implemen[ng
a
sequen[al
version
of
NBody
would
probably…
‒ Create
a
class
to
represent
each
body
class Body{
float x,y,z,m,vx,vy,vz;
// Include method to update position and display
void updateAndShow(Screen screen, Body[] bodies){
for (Body other:bodies){
// accumulate forces between other and this
}
// update vx,vy,vz,x,y and z from accumulated data
screen.paint(x,y,z);
}
}
! Loop
through
each
Body
(in
array
of
bodies[])
to
update
and
display
for (Body b: bodies)
b.updateAndShow(screen, bodies);
10
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
11. WITHOUT
HSA
WE
CAN’T
(EFFICIENTLY)
USE
OBJECTS
! In
Java;
allocated
Objects
are
scaVered
on
the
heap.
‒ There
is
no
way
to
allocate
an
array
of
objects
in
con[guous
memory
(as
with
C++)
‒ We
force
the
developer
to
resort
to
using
parallel
arrays
of
primi[ves
(which
are
con[guous)
float x[], y[], z[], m[], vx,[], vy[], vz[];
‒ And
to
infer
that
x[n],
y[n]
and
z[n]
holds
the
state
for
bodies[n].
Kernel kernel = new Kernel(){
public void run(){
int i = getGlobalId(0);
for (int j=0; j<bodies.length; j++){
// accum forces between (x,y,z)[j] and (x,y,z)[i]
}
// update vx[j],vy[j],vz[j],x[j],y[j] and z[j]
}
};
‒ Then
the
kernel
can
be
used
to
execute
the
code
on
the
GPU
Kernel.execute(bodies.length);
11
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
12. HSA
ENABLED
APARAPI
(AND
SUMATRA)
ALLOWS
USE
OF
OBJECTS
! So
we
code
our
Body
class
exactly
as
we
would
if
execu[ng
in
Java.
class Body{
float x,y,z,m,vx,vy,vz;
// Include method to update position and display
void updateAndShow(Screen screen, Body[] bodies){
for (Body other:bodies){
// accumulate forces between other and this
}
// update vx,vy,vz,x,y and z from accumulated data
screen.paint(x,y,z);
}
}
! Then
use
new
Aparapi
lambda
enabled
API
to
coordinate
dispatch
to
theGPU
Device.hsa().forEach(bodies, b -> {
b.updateAndShow(screen, bodies);
});
12
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
13. ‒ Step
0:
Generate
HSAIL
from
Bytecode
‒ Step
1:
Generate
host
HSA
Run[me
calls
‒ Step
1.1:
Ini[alize
HSA
run[me,
device,
queue
…
‒ Step
1.2:
Finalize
HSAIL
to
generate
GPU
ISA
‒ Step
1.3:
Bind
Java
args
to
HSA
args
‒ Step
1.4:
Dispatch
the
kernel
‒ Step
1.5:
Wait
for
comple[on
‒ Repeat
steps
1.3
-‐
1.5
for
next
itera[on
of
same
kernel
‒ Repeat
step
0
–
1
for
each
new
kernel
MyLambda.java
javac (compiler)
MyLambda.class
Runtime
! HSA
enabled
Aparapi,
at
run[me:
Development time
OVERVIEW
OF
HSA
ENABLED
APARAPI
Application
Aparapi
Generate
HSA RT
calls
Initialize
JVM
Contains
CPU ISA
Finalize
Bind Args
CPU
GPU
Dispatch
GPU ISA
13
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
Generate
HSAIL
Input
14. HIGH
LEVEL
HSA
FEATURES
! Features
currently
being
defined
in
the
HSA
Working
Groups**
‒ Unified
addressing
across
all
processors
‒ Opera[on
into
pageable
system
memory
‒ Full
memory
coherency
‒ Pla|orm
atomics
‒ User
mode
dispatch
‒ Enables
fast
dispatch
with
no
driver
involvement
‒ Architected
queuing
language
‒ Flexible
compute
dispatch,
easier
GPU
self-‐enqueue
‒ High
level
language
support
for
GPU
compute
processors
‒ Preemp[on
and
context
switching
**
All
features
subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
14
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
@
Copyright
2012
HSA
Founda[on.
All
Rights
Reserved.
15. HSA
INTERMEDIATE
LANGUAGE
(HSAIL)**
! HSAIL
is
a
virtual
ISA
for
parallel
programs
‒ Finalized
to
vendor-‐specific
ISA
by
a
JIT
compiler
or
“Finalizer”
‒ ISA
independent
by
design
for
CPU
&
GPU
! Explicitly
parallel
‒ Designed
for
data
parallel
programming
! Support
for
excep[ons,
virtual
func[ons,
and
other
high
level
language
features
! Lower
level
than
OpenCL™
SPIR
‒ Fits
naturally
in
the
OpenCL™
compila[on
stack
! Suitable
to
support
addi[onal
high
level
languages
and
programming
models:
‒ Java,
C++,
OpenMP,
etc
**
Subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
15
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
@
Copyright
2012
HSA
Founda[on.
All
Rights
Reserved.
16. HSAIL
OVERVIEW**
INSTRUCTION
SET
! Similar
to
assembly
language
for
a
RISC
CPU
‒ Load-‐store
architecture
ld_global_u64 $d0, [$d6 + 120];
$d0= load($d6+120)
add_u64
$d1= $d2+24
$d1, $d2, 24;
! 136
opcodes
(Java™
bytecode
has
200)
‒ Floa[ng
point
(single,
double,
half
(f16))
‒ Integer
(32-‐bit,
64-‐bit)
‒ Some
packed
opera[ons
‒ Branches
‒ Func[on
calls
‒ Pla$orm
Atomic
Opera[ons:
and,
or,
xor,
exch,
add,
sub,
inc,
dec,
max,
min,
cas
‒ Synchronize
host
CPU
and
HSA
Component!
! Text
and
Binary
formats
(“BRIG”)
REGISTERS
! Four
classes
of
registers
‒ C:
1-‐bit,
Control
Registers
‒ S:
32-‐bit,
Single-‐precision
FP
or
Int
‒ D:
64-‐bit,
Double-‐precision
FP
or
Long
Int
‒ Q:
128-‐bit,
Packed
data.
! Fixed
number
of
registers:
‒ 8
C
‒ S,
D,
Q
share
a
single
pool
of
resources
S + 2*D + 4*Q <= 128
Up to 128 S or 64 D or 32 Q (or a blend)
! Register
alloca[on
done
in
high-‐level
compiler
‒ Finalizer
doesn’t
have
to
perform
expensive
register
alloca[on
**
Subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
16
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
@
Copyright
2012
HSA
Founda[on.
All
Rights
Reserved.
17. SEGMENTS
AND
MEMORY
**
! 7
segments
of
memory
‒ global,
readonly,
group,
spill,
private,
arg,
kernarg,
‒ Memory
instruc[ons
can
(op[onally)
specify
a
segment
! Global
Segment
! Kernarg
Segment
‒ Programmer
writes
kernarg
segment
to
pass
arguments
to
a
kernel
! Read-‐Only
Segment
‒ Visible
to
all
HSA
agents
(including
host
CPU)
‒ Remains
constant
during
execu[on
of
kernel
‒ HSAIL
provides
sync
opera[ons
to
control
visibility
of
group
memory
addressing
‒ Very
useful
for
high-‐level
language
support
(ie
classes,
libraries)
‒ Aligns
well
with
OpenCL
2.0
“generic”
addressing
feature
ld_global_u64 $d0, [$d6]
! Flat
Addressing
! Group
Segment
ld_group_u64 $d0,[$d6+24]
‒ Each
segment
mapped
into
virtual
address
space
‒ Provides
high-‐performance
memory
shared
in
the
work-‐
st_spill_f32 $s1,[$d6+4] can
map
to
segments
based
on
‒ Flat
addresses
group.
ld_kernarg_u64 $d6,virtual
address
[%_arg0]
‒ Group
memory
can
be
read
and
wriVen
by
any
work-‐
‒ Instruc[ons
with
n
item
in
the
work-‐group
ld_u64 $d0,[$d6+24] ; flat o
explicit
segment
use
flat
! Spill,
Private,
Arg
Segments
‒ Represent
different
regions
of
a
per-‐work-‐item
stack
‒ Typically
generated
by
compiler,
not
specified
by
programmer
**
Subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
17
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
@
Copyright
2012
HSA
Founda[on.
All
Rights
Reserved.
19. APARAPI
JNI
CALL
-‐>
HSA
RUNTIME
API
Device
Discovery
&
Queue
Crea[on
APIs**
! Discover
HSA
Device
‒ Both
count
and
device_list
are
out
params
‒ User
can
iterate
over
HSA
devices
in
the
list
! User-‐Mode
Queue
Crea[on
‒ User
can
provide
pre-‐allocated
buffer
‒ If
not,
API
will
allocate
a
buffer
‒ queue
is
the
user-‐mode
queue
HsaStatus
HsaGetDevices(unsigned
int
*count,
const
HsaDevice
**device_list);
HsaStatus
HsaCreateUserModeQueue(const
HsaDevice
*device,
void
*buffer,
size_t
buffer_size,
HsaQueuePriority
queue_priority,
HsaQueueFrac[on
queue_frac[on,
HsaQueue
**queue);
**
All
APIs
subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
19
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
20. APARAPI
JNI
-‐>
HSA
RUNTIME
API
Finalize
HSAIL
to
GPU
ISA**
! Transla[ng
HSAIL
text
to
Binary
(BRIG)
‒ BRIG
is
a
binary
container
for
several
sec[ons
‒ Code
‒ String
‒ Direc[ve
‒ …
‒ libHsail
is
an
assembler/disassembler
‒ This
is
a
standalone
compiler
library
‒ Not
part
of
Run[me
! Finalize
Brig
to
IHV
specific
GPU
ISA
‒ Input:
Brig
‒ Output:
HsaKernelCode
which
contains
ISA
Status
Assemble
(const
char*
hsail_text,
HsaBrig
*brig);
HsaStatus
HsaFinalizeBrig(const
HsaDevice
*device,
HsaBrig
*brig,
const
char
*kernel_name,
const
char
*op[ons,
HsaKernelCode
**kernel);
**
All
APIs
subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
20
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
21. APARAPI
JNI
-‐>
POPULATION
OF
AQL
DISPATCH
PACKET
! AQL
Dispatch
Packet**
‒ Header
enables:
‒ Different
packet
types
‒ Specify
if
this
packet
should
wait
for
all
previous
to
complete
‒ Control
visibility
of
data
and
memory
fences
before
and
aƒer
dispatch
‒ Body
enables:
‒ Specify
the
problem
fan
out
using
launch
config
related
fields
‒ How
much
workgroup
memory?
‒ Loca[on
of
IHV
specific
GPU
ISA
‒ Loca[on
of
where
kernelargs
can
be
found
‒ A
signal
mechanism
to
wait
on
kernel
comple[on
! Only
popula[ng
Kernel
info
and
signal
are
opaque,
so
require
run[me
APIs
typedef
struct
HsaAqlDispatchPacket
{
uint32_t
format
:
8;
uint32_t
barrier
:
1;
uint32_t
acquire_fence_scope
:
2;
Header
Fields
uint32_t
release_fence_scope
:
2;
uint32_t
invalidate_instruction_cache
:
1;
uint32_t
invalidate_roi_image_cache
:
1;
uint32_t
dimensions
:
2;
uint32_t
reserved
:
15;
uint16_t
workgroup_size[3];
Launch
Config
uint16_t
reserved2;
uint32_t
grid_size[3];
uint32_t
private_segment_size_bytes;
uint32_t
group_segment_size_bytes;
Kernel
Info
uint64_t
kernel_object_address;
uint64_t
kernel_arg_address;
uint64_t
reserved3;
uint64_t
completion_signal;
Kernel
SynchronizaAon
}
HsaAqlDispatchPacket;
‒ Other
fields
are
open,
so
simple
assignments
**
Subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
21
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
22. POPULATING
KERNEL
INFO
AND
SIGNAL
USING
HSA
RT
API**
HsaStatus
HsaFinalizeBrig(const
HsaDevice
*device,
HsaBrig
*brig,
const
char
*kernel_name,
const
char
*op[ons,
HsaKernelCode
**kernel);
typedef
struct
HsaKernelCode
{
…
uint32_t
workitem_private_segment_byte_size;
uint32_t
workgroup_group_segment_byte_size;
uint64_t
kernarg_segment_byte_size;
…
}
HsaKernelCode;
typedef
struct
HsaAqlDispatchPacket
{
…
uint32_t
private_segment_size_bytes;
uint32_t
group_segment_size_bytes;
uint64_t
kernel_object_address;
uint64_t
kernel_arg_address;
…
uint64_t
completion_signal;
}
HsaStatus
HsaCreateSignal(HsaSignal
*signal);
**
Subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
22
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
Pack
Java
Args
into
a
vector
in
JNI
Register
vector
data
address
HsaStatus
HsaRegisterSystemMemory(void
*address,
size_t
size);
23. DISPATCH
AND
WAIT
ON
KERNEL
COMPLETION
! Dispatch
‒ Submit
AQL
Packet
into
the
HsaQueue
‒ Thread
safe
API
HsaStatus
HsaSubmitAql(HsaQueue
*queue,HsaAqlDispatchPacket
*aql_packet);
! Wait
on
Kernel
Comple[on
bool
is_done
=
false;
while
(!is_done)
{
status
=
HsaQuerySignal(signal,
&is_done);
assert(status
==
kHsaStatusSuccess);
}
**
Subject
to
change,
pending
comple[on
and
ra[fica[on
of
specifica[ons
in
the
HSA
Working
Groups
23
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
! Aƒer
comple[on,
disposing
HSA
resources
‒ Release
queue
‒ Release
signal
‒ Release
Kernel
object
‒ Deregister
kernel
args
related
memory
HsaStatus
HsaDestroyUserModeQueue(HsaQueue
*queue);
HsaStatus
HsaDestroySignal(HsaSignal
signal);
HsaStatus
HsaFreeKernelCode(HsaKernelCode
*kernel);
HsaStatus
HsaDeregisterSystemMemory(void
*address);
24. DEMO
24
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|
25. SUMMARY
! Aparapi
is
already
an
establish
framework
for
simplifying
execu[on
of
Java
on
GPU
devices
! HSA
enabled
Aparapi
further
simplifies
GPU
accelera[on
of
Java
applica[ons
‒ Aligns
with
Java
8
features
to
support
‘lambda’
expression
for
compactness
‒ Enables
‘large
unified’
system
memory
for
GPU
accelera[on
‒ Eases
programming
by
enabling
direct
access
to
Java
objects
on
heap
‒ Enables
fast
offload
of
Java
kernels
through
User-‐mode
queue
and
AQL
! HSA
enabled
Aparapi
lends
to
more
interes[ng
future
possibili[es
‒ Simplified
communica[on
and
workload
balancing
across
both
CPU
and
GPU
‒ Exploit
new
computa[on
paVerns
and
recursions
through
kernel
self-‐enqueue
25
|
HSA
ENABLEMENT
OF
APARAPI
|
NOVEMBER
2013|