SlideShare uma empresa Scribd logo
1 de 32
HSAIL:
PORTABLE COMPILER IR FOR HSA
HOT CHIPS TUTORIAL - AUGUST 2013
BEN SANDER
AMD SENIOR FELLOW
STATE OF GPU COMPUTING
Today’s Challenges
 Separate address spaces
 Copies
 Can’t share pointers
 New language required for compute kernel
 EX: OpenCL™ runtime API
 Compute kernel compiled separately than
host code
Emerging Solution
 HSA Hardware
 Single address space
 Coherent
 Virtual
 Fast access from all components
 Can share pointers
 Bring GPU computing to
existing, popular, programming models
 Single-source, fully supported by
compiler
 HSAIL compiler IR (Cross-platform!)
• GPUs are fast and power efficient : high compute density per-mm and per-watt
• But: Can be hard to program
PCIe
WHAT IS HSAIL?
 HSAIL is the intermediate language for parallel compute in HSA
 Generated by a high level compiler (LLVM, gcc, Java VM, etc)
 Low-level IR, close to machine ISA level
 Compiled down to target ISA by an IHV “Finalizer”
 Finalizer may execute at run time, install time, or build time
 Example: OpenCL™ Compilation Stack using HSAIL
© Copyright 2012 HSA Foundation. All Rights Reserved. 3
OpenCL™ Kernel
High-Level Compiler Flow (Developer)
Finalizer Flow (Runtime)
EDG or CLANG
SPIR
LLVM
HSAIL HSAIL
Finalizer
Hardware ISA
KEY HSAIL FEATURES
 Parallel
 Shared virtual memory
 Portable across vendors in HSA Foundation
 Stable across multiple product generations
 Consistent numerical results (IEEE-754 with defined min accuracy)
 Fast, robust, simple finalization step (no monthly updates)
 Good performance (little need to write in ISA)
 Supports all of OpenCL™ and C++ AMP™
 Support Java, C++, and other languages as well
© Copyright 2012 HSA Foundation. All Rights Reserved. 4
AGENDA
 Introduction
 HSA Parallel Execution Model
 HSAIL Instruction Set
 Example – in Java!
 Key Takeaways
HSA PARALLEL EXECUTION
MODEL
© Copyright 2012 HSA Foundation. All Rights Reserved. 6
HSA PARALLEL EXECUTION MODEL
© Copyright 2012 HSA Foundation. All Rights Reserved. 7
Basic Idea:
Programmer supplies a
“kernel” that is run on
each work-item. Kernel is
written as a single thread
of execution.
Each work-item has a
unique coordinate.
Programmer specifies grid
dimensions (for scope of
problem).
Programmer optionally
specifies work-group
dimensions (for optimized
communication).
CONVOLUTION / SOBEL EDGE FILTER
Gx = [ -1 0 +1 ]
[ -2 0 +2 ]
[ -1 0 +1 ]
Gy = [ -1 -2 -1 ]
[ 0 0 0 ]
[ +1 +2 +1 ]
G = sqrt(Gx
2 + Gy
2)
CONVOLUTION / SOBEL EDGE FILTER
Gx = [ -1 0 +1 ]
[ -2 0 +2 ]
[ -1 0 +1 ]
Gy = [ -1 -2 -1 ]
[ 0 0 0 ]
[ +1 +2 +1 ]
G = sqrt(Gx
2 + Gy
2)
2D grid
workitem
kernel
CONVOLUTION / SOBEL EDGE FILTER
Gx = [ -1 0 +1 ]
[ -2 0 +2 ]
[ -1 0 +1 ]
Gy = [ -1 -2 -1 ]
[ 0 0 0 ]
[ +1 +2 +1 ]
G = sqrt(Gx
2 + Gy
2)
2D work-group
2D grid
workitem
kernel
HSAIL INSTRUCTION SET
© Copyright 2012 HSA Foundation. All Rights Reserved. 11
HSAIL INSTRUCTION SET - OVERVIEW
 Similar to assembly language for a RISC CPU
 Load-store architecture
 ld_global_u64 $d0, [$d6 + 120] ; $d0= load($d6+120)
 add_u64 $d1, $d2, 24 ; $d1= $d2+24
 136 opcodes (Java™ bytecode has 200)
 Floating point (single, double, half (f16))
 Integer (32-bit, 64-bit)
 Some packed operations
 Branches
 Function calls
 Platform Atomic Operations: and, or, xor, exch, add, sub, inc, dec, max, min, cas
 Synchronize host CPU and HSA Component!
 Text and Binary formats (“BRIG”)
SEGMENTS AND MEMORY (1/2)
 7 segments of memory
 global, readonly, group, spill, private, arg, kernarg,
 Memory instructions can (optionally) specify a segment
 Global Segment
 Visible to all HSA agents (including host CPU)
 Group Segment
 Provides high-performance memory shared in the work-group.
 Group memory can be read and written by any work-item in the work-group
 HSAIL provides sync operations to control visibility of group memory
 Useful for expert programmers
 Spill, Private, Arg Segments
 Represent different regions of a per-work-item stack
 Typically generated by compiler, not specified by programmer
 Compiler can use these to convey intent – ie spills
© Copyright 2012 HSA Foundation. All Rights Reserved. 13
ld_global_u64 $d0, [$d6]
ld_group_u64 $d0,[$d6+24]
st_spill_f32 $s1,[$d6+4]
SEGMENTS AND MEMORY (2/2)
 Kernarg Segment
 Programmer writes kernarg segment to pass arguments to a kernel
 Read-Only Segment
 Remains constant during execution of kernel
 Flat Addressing
 Each segment mapped into virtual address space
 Flat addresses can map to segments based on virtual address
 Instructions with no explicit segment use flat addressing
 Very useful for high-level language support (ie classes, libraries)
 Aligns well with OpenCL 2.0 “generic” addressing feature
© Copyright 2012 HSA Foundation. All Rights Reserved. 14
ld_kernarg_u64 $d6, [%_arg0]
ld_u64 $d0,[$d6+24] ; flat
REGISTERS
 Four classes of registers
 C: 1-bit, Control Registers
 S: 32-bit, Single-precision FP or Int
 D: 64-bit, Double-precision FP or Long Int
 Q: 128-bit, Packed data.
 Fixed number of registers:
 8 C
 S, D, Q share a single pool of resources
 S + 2*D + 4*Q <= 128
 Up to 128 S or 64 D or 32 Q (or a blend)
 Register allocation done in high-level compiler
 Finalizer doesn’t have to perform expensive register allocation
© Copyright 2012 HSA Foundation. All Rights Reserved. 15
SIMT EXECUTION MODEL
 HSAIL Presents a “SIMT” execution model to the programmer
 “Single Instruction, Multiple Thread”
 Programmer writes program for a single thread of execution
 Each work-item appears to have its own program counter
 Branch instructions look natural
 Hardware Implementation
 Most hardware uses SIMD (Single-Instruction Multiple Data) vectors for efficiency
 Actually one program counter for the entire SIMD instruction
 Branches implemented with predication
 SIMT Advantages
 Easier to program (branch code in particular)
 Natural path for mainstream programming models
 Scales across a wide variety of hardware (programmer doesn’t see vector width)
 Cross-lane operations available for those who want peak performance
© Copyright 2012 HSA Foundation. All Rights Reserved. 16
WAVEFRONTS
 Hardware SIMD vector, composed of 1, 2, 4, 8, 16, 32, or 64 “lanes”
 Lanes in wavefront can be “active” or “inactive”
 Inactive lanes consume hardware resources but don’t do useful work
 Tradeoffs
 “Wavefront-aware” programming can be useful for peak performance
 But results in less portable code (since wavefront width is encoded in algorithm)
© Copyright 2012 HSA Foundation. All Rights Reserved. 17
if (cond) {
operationA; // cond=True lanes active here
} else {
operationB; // cond=False lanes active here
}
CROSS-LANE OPERATIONS
 Example HSAIL operation: “countlane”
 Dest set to the number of work-items in current wavefront that have non-zero source
 Example HSAIL operation: “countuplane”
 Dest set to count of earlier work-items that are active for this instruction
 Useful for compaction algorithms
© Copyright 2012 HSA Foundation. All Rights Reserved. 18
countuplane_u32 $s0
countlane_u32 $s0, $s6
HSAIL MODES
 Working group strived to limit optional modes and features in HSAIL
 Minimize differences between HSA target machines
 Better for compiler vendors and application developers
 Two modes survived
 Machine Models
 Small: 32-bit pointers, 32-bit data
 Large: 64-bit pointers, 32-bit or 64-bit data
 Vendors can support one or both models
 “Base” and “Full” Profiles
 Two sets of requirements for FP accuracy, rounding, exception reporting
© Copyright 2012 HSA Foundation. All Rights Reserved. 19
HSA PROFILES
© Copyright 2012 HSA Foundation. All Rights Reserved. 20
Feature Base Full
Addressing Modes Small, Large Small, Large
Load/store conversion of all floating point
types (f16, f32, f64) Yes Yes
All 32-bit HSAIL operations according to the
declared profile Yes Yes
F16 support (IEEE 754 or better) Yes Yes
F64 support No Yes
Precision for add/sub/mul 1/2 ULP 1/2 ULP
Precision for div 2.5 ULP 1/2 ULP
Precision for sqrt 1 ULP 1/2 ULP
HSAIL Rounding: Near Yes Yes
HSAIL Rounding: Up No Yes
HSAIL Rounding: Down No Yes
HSAIL Rounding: Zero No Yes
Subnormal floating-point Flush-to-zero Supported
Propagate NaN Payloads No Yes
FMA No Yes
Arithmetic Exception reporting DETECT
DETECT or
BREAK
Debug trap Yes Yes
EXAMPLE
© Copyright 2012 HSA Foundation. All Rights Reserved. 21
AN EXAMPLE (IN JAVA 8™)
© Copyright 2012 HSA Foundation. All Rights Reserved. 22
class Player {
private Team team;
private int scores;
private float pctOfTeamScores;
public Team getTeam() {return team;}
public int getScores() {return scores;}
public void setPctOfTeamScores(int pct) { pctOfTeamScores = pct; }
};
// “Team” class not shown
// Assume “allPlayers’ is an initialized array of Players..
Stream<Player> s = Arrays.stream(allPlayers).parallel();
s.forEach(p -> {
int teamScores = p.getTeam().getScores();
float pctOfTeamScores = (float)p.getScores()/(float) teamScores;
p.setPctOfTeamScores(pctOfTeamScores);
});
HSAIL CODE EXAMPLE
© Copyright 2012 HSA Foundation. All Rights Reserved. 23
01: version 0:95: $full : $large;
02: // static method HotSpotMethod<Main.lambda$2(Player)>
03: kernel &run (
04: kernarg_u64 %_arg0 // Kernel signature for lambda method
05: ) {
06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register
07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord
08:
09: cvt_u64_s32 $d2, $s2; // Convert X gid to long
10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref
11: add_u64 $d2, $d2, 24; // Adjust for actual elements start
12: add_u64 $d2, $d2, $d6; // Add to array ref ptr
13: ld_global_u64 $d6, [$d2]; // Load from array element into reg
14: @L0:
15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam()
16: mov_b64 $d3, $d0;
17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores ()
18: cvt_f32_s32 $s16, $s3;
19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores()
20: cvt_f32_s32 $s17, $s0;
21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores
22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores()
23: ret;
24: };
WHAT IS HSAIL?
 HSAIL is the intermediate language for parallel compute in HSA
 Generated by a high level compiler (LLVM, gcc, Java VM, etc)
 Low-level IR, close to machine ISA level
 Compiled down to target ISA by an IHV “Finalizer”
 Finalizer may execute at run time, install time, or build time
 Example: OpenCL™ Compilation Stack using HSAIL
© Copyright 2012 HSA Foundation. All Rights Reserved. 24
OpenCL™ Kernel
High-Level Compiler Flow (Developer)
Finalizer Flow (Runtime)
EDG or CLANG
SPIR
LLVM
HSAIL HSAIL
Finalizer
Hardware ISA
HSAIL AND SPIR
© Copyright 2012 HSA Foundation. All Rights Reserved. 25
Feature HSAIL SPIR
Intended Users
Compiler developers who want to
control their own code generation.
Compiler developers who want a fast
path to acceleration across a wide
variety of devices.
IR Level
Low-level, just above the machine
instruction set High-level, just below LLVM-IR
Back-end code generation Thin, fast, robust.
Flexible. Can include many
optimizations and compiler
transformation including register
allocation.
Where are compiler
optimizations performed?
Most done in high-level compiler,
before HSAIL generation.
Most done in back-end code generator,
between SPIR and device machine
instruction set
Registers Fixed-size register pool Infinite
SSA Form No Yes
Binary format Yes Yes
Code generator for LLVM Yes Yes
Back-end device targets
Modern GPU architectures
supported by members of the HSA
Foundation
Any OpenCL device including GPUs,
CPUs, FPGAs
Memory Model
Relaxed consistency with
acquire/release, barriers, and fine-
grained barriers
Flexible. Can support the OpenCL 1.2
Memory Model
TAKEAWAYS
 HSAIL Key Points
 Thin, robust, fast finalizer
 Portable (multiple HW vendors and parallel architectures)
 Complements OpenCL™
 Supports shared virtual memory and platform atomics
 HSA brings GPU computing to mainstream programming models
 Shared and coherent memory bridges “faraway accelerator” gap
 HSAIL provides the common IL for high-level languages to benefit from
parallel computing
 Java Example
 Unmodified Java8 accelerated on the GPU!
 Can use pointer-containing data structures
© Copyright 2012 HSA Foundation. All Rights Reserved. 26
TOOLS ARE AVAILABLE NOW
 HSA Programmer’s Reference Manual: HSAIL Virtual ISA and
Programming Model, Compiler Writer’s Guide, and Object Format (BRIG)
 http://hsafoundation.com/standards/
 https://hsafoundation.box.com/s/m6mrsjv8b7r50kqeyyal
 Tools now at GitHUB – HSA Foundation
 libHSAAssembler and Disassembler
 https://github.com/HSAFoundation/HSAIL-Tools
 HSAIL Instruction Set Simulator
 https://github.com/HSAFoundation/HSAIL-Instruction-Set-Simulator
 Soon: LLVM Compilation stack which outputs HSAIL and BRIG
 Java compiler for HSAIL (preliminary)
 http://openjdk.java.net/projects/sumatra/)
 http://openjdk.java.net/projects/graal/
© Copyright 2012 HSA Foundation. All Rights Reserved. 27
BACKUP
© Copyright 2012 HSA Foundation. All Rights Reserved. 28
OPPORTUNITIES WITH LLVM BASED
COMPILATION
LLVM
CLANG
C99 C++ 11 C++AMP Objective C OpenCL OpenMP KL OSL
Render
script
UPC Rust
Halide Julia Mono Fortran Haskell
AN EXAMPLE (IN OPENCL™)
© Copyright 2012 HSA Foundation. All Rights Reserved. 30
//Vector add
// A[0:N-1] = B[0:N-1] + C[0:N-1]
__kernel void vec_add (
__global const float *a,
__global const float *b,
__global float *c,
const unsigned int n)
{
// Get our global thread ID
int id = get_global_id(0);
// Make sure we do not go out of bounds
if (id < n)
c[id] = a[id] + b[id];
}
HSAIL VECTOR ADD
© Copyright 2012 HSA Foundation. All Rights Reserved. 31
version 1:0:$full:$small;
function &get_global_id(arg_u32 %ret_val)
(arg_u32 %arg_val0);
function &abort() ();
kernel &__OpenCL_vec_add_kernel(
kernarg_u32 %arg_val0,
kernarg_u32 %arg_val1,
kernarg_u32 %arg_val2,
kernarg_u32 %arg_val3)
{
@__OpenCL_vec_add_kernel_entry:
// BB#0: // %entry
ld_kernarg_u32 $s0, [%arg_val3];
workitemabsid_u32 $s1, 0;
cmp_lt_b1_u32 $c0, $s1, $s0;
ld_kernarg_u32 $s0, [%arg_val2];
ld_kernarg_u32 $s2, [%arg_val1];
ld_kernarg_u32 $s3, [%arg_val0];
cbr $c0, @BB0_2;
brn @BB0_1;
@BB0_1: // %if.end
ret;
@BB0_2: // %if.then
shl_u32 $s1, $s1, 2;
add_u32 $s2, $s2, $s1;
ld_global_f32 $s2, [$s2];
add_u32 $s3, $s3, $s1;
ld_global_f32 $s3, [$s3];
add_f32 $s2, $s3, $s2;
add_u32 $s0, $s0, $s1;
st_global_f32 $s2, [$s0];
brn @BB0_1;
};
SEGMENTS
© Copyright 2012 HSA Foundation. All Rights Reserved. 32

Mais conteúdo relacionado

Mais procurados

ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningAMD Developer Central
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 

Mais procurados (20)

Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 

Semelhante a HSA HSAIL Introduction Hot Chips 2013

HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderAMD Developer Central
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability ManagementXavierDevroey
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMchiportal
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)DataWorks Summit
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 

Semelhante a HSA HSAIL Introduction Hot Chips 2013 (20)

HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability Management
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBM
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
C programming session9 -
C programming  session9 -C programming  session9 -
C programming session9 -
 

Mais de HSA Foundation

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 
Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 

Mais de HSA Foundation (13)

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

HSA HSAIL Introduction Hot Chips 2013

  • 1. HSAIL: PORTABLE COMPILER IR FOR HSA HOT CHIPS TUTORIAL - AUGUST 2013 BEN SANDER AMD SENIOR FELLOW
  • 2. STATE OF GPU COMPUTING Today’s Challenges  Separate address spaces  Copies  Can’t share pointers  New language required for compute kernel  EX: OpenCL™ runtime API  Compute kernel compiled separately than host code Emerging Solution  HSA Hardware  Single address space  Coherent  Virtual  Fast access from all components  Can share pointers  Bring GPU computing to existing, popular, programming models  Single-source, fully supported by compiler  HSAIL compiler IR (Cross-platform!) • GPUs are fast and power efficient : high compute density per-mm and per-watt • But: Can be hard to program PCIe
  • 3. WHAT IS HSAIL?  HSAIL is the intermediate language for parallel compute in HSA  Generated by a high level compiler (LLVM, gcc, Java VM, etc)  Low-level IR, close to machine ISA level  Compiled down to target ISA by an IHV “Finalizer”  Finalizer may execute at run time, install time, or build time  Example: OpenCL™ Compilation Stack using HSAIL © Copyright 2012 HSA Foundation. All Rights Reserved. 3 OpenCL™ Kernel High-Level Compiler Flow (Developer) Finalizer Flow (Runtime) EDG or CLANG SPIR LLVM HSAIL HSAIL Finalizer Hardware ISA
  • 4. KEY HSAIL FEATURES  Parallel  Shared virtual memory  Portable across vendors in HSA Foundation  Stable across multiple product generations  Consistent numerical results (IEEE-754 with defined min accuracy)  Fast, robust, simple finalization step (no monthly updates)  Good performance (little need to write in ISA)  Supports all of OpenCL™ and C++ AMP™  Support Java, C++, and other languages as well © Copyright 2012 HSA Foundation. All Rights Reserved. 4
  • 5. AGENDA  Introduction  HSA Parallel Execution Model  HSAIL Instruction Set  Example – in Java!  Key Takeaways
  • 6. HSA PARALLEL EXECUTION MODEL © Copyright 2012 HSA Foundation. All Rights Reserved. 6
  • 7. HSA PARALLEL EXECUTION MODEL © Copyright 2012 HSA Foundation. All Rights Reserved. 7 Basic Idea: Programmer supplies a “kernel” that is run on each work-item. Kernel is written as a single thread of execution. Each work-item has a unique coordinate. Programmer specifies grid dimensions (for scope of problem). Programmer optionally specifies work-group dimensions (for optimized communication).
  • 8. CONVOLUTION / SOBEL EDGE FILTER Gx = [ -1 0 +1 ] [ -2 0 +2 ] [ -1 0 +1 ] Gy = [ -1 -2 -1 ] [ 0 0 0 ] [ +1 +2 +1 ] G = sqrt(Gx 2 + Gy 2)
  • 9. CONVOLUTION / SOBEL EDGE FILTER Gx = [ -1 0 +1 ] [ -2 0 +2 ] [ -1 0 +1 ] Gy = [ -1 -2 -1 ] [ 0 0 0 ] [ +1 +2 +1 ] G = sqrt(Gx 2 + Gy 2) 2D grid workitem kernel
  • 10. CONVOLUTION / SOBEL EDGE FILTER Gx = [ -1 0 +1 ] [ -2 0 +2 ] [ -1 0 +1 ] Gy = [ -1 -2 -1 ] [ 0 0 0 ] [ +1 +2 +1 ] G = sqrt(Gx 2 + Gy 2) 2D work-group 2D grid workitem kernel
  • 11. HSAIL INSTRUCTION SET © Copyright 2012 HSA Foundation. All Rights Reserved. 11
  • 12. HSAIL INSTRUCTION SET - OVERVIEW  Similar to assembly language for a RISC CPU  Load-store architecture  ld_global_u64 $d0, [$d6 + 120] ; $d0= load($d6+120)  add_u64 $d1, $d2, 24 ; $d1= $d2+24  136 opcodes (Java™ bytecode has 200)  Floating point (single, double, half (f16))  Integer (32-bit, 64-bit)  Some packed operations  Branches  Function calls  Platform Atomic Operations: and, or, xor, exch, add, sub, inc, dec, max, min, cas  Synchronize host CPU and HSA Component!  Text and Binary formats (“BRIG”)
  • 13. SEGMENTS AND MEMORY (1/2)  7 segments of memory  global, readonly, group, spill, private, arg, kernarg,  Memory instructions can (optionally) specify a segment  Global Segment  Visible to all HSA agents (including host CPU)  Group Segment  Provides high-performance memory shared in the work-group.  Group memory can be read and written by any work-item in the work-group  HSAIL provides sync operations to control visibility of group memory  Useful for expert programmers  Spill, Private, Arg Segments  Represent different regions of a per-work-item stack  Typically generated by compiler, not specified by programmer  Compiler can use these to convey intent – ie spills © Copyright 2012 HSA Foundation. All Rights Reserved. 13 ld_global_u64 $d0, [$d6] ld_group_u64 $d0,[$d6+24] st_spill_f32 $s1,[$d6+4]
  • 14. SEGMENTS AND MEMORY (2/2)  Kernarg Segment  Programmer writes kernarg segment to pass arguments to a kernel  Read-Only Segment  Remains constant during execution of kernel  Flat Addressing  Each segment mapped into virtual address space  Flat addresses can map to segments based on virtual address  Instructions with no explicit segment use flat addressing  Very useful for high-level language support (ie classes, libraries)  Aligns well with OpenCL 2.0 “generic” addressing feature © Copyright 2012 HSA Foundation. All Rights Reserved. 14 ld_kernarg_u64 $d6, [%_arg0] ld_u64 $d0,[$d6+24] ; flat
  • 15. REGISTERS  Four classes of registers  C: 1-bit, Control Registers  S: 32-bit, Single-precision FP or Int  D: 64-bit, Double-precision FP or Long Int  Q: 128-bit, Packed data.  Fixed number of registers:  8 C  S, D, Q share a single pool of resources  S + 2*D + 4*Q <= 128  Up to 128 S or 64 D or 32 Q (or a blend)  Register allocation done in high-level compiler  Finalizer doesn’t have to perform expensive register allocation © Copyright 2012 HSA Foundation. All Rights Reserved. 15
  • 16. SIMT EXECUTION MODEL  HSAIL Presents a “SIMT” execution model to the programmer  “Single Instruction, Multiple Thread”  Programmer writes program for a single thread of execution  Each work-item appears to have its own program counter  Branch instructions look natural  Hardware Implementation  Most hardware uses SIMD (Single-Instruction Multiple Data) vectors for efficiency  Actually one program counter for the entire SIMD instruction  Branches implemented with predication  SIMT Advantages  Easier to program (branch code in particular)  Natural path for mainstream programming models  Scales across a wide variety of hardware (programmer doesn’t see vector width)  Cross-lane operations available for those who want peak performance © Copyright 2012 HSA Foundation. All Rights Reserved. 16
  • 17. WAVEFRONTS  Hardware SIMD vector, composed of 1, 2, 4, 8, 16, 32, or 64 “lanes”  Lanes in wavefront can be “active” or “inactive”  Inactive lanes consume hardware resources but don’t do useful work  Tradeoffs  “Wavefront-aware” programming can be useful for peak performance  But results in less portable code (since wavefront width is encoded in algorithm) © Copyright 2012 HSA Foundation. All Rights Reserved. 17 if (cond) { operationA; // cond=True lanes active here } else { operationB; // cond=False lanes active here }
  • 18. CROSS-LANE OPERATIONS  Example HSAIL operation: “countlane”  Dest set to the number of work-items in current wavefront that have non-zero source  Example HSAIL operation: “countuplane”  Dest set to count of earlier work-items that are active for this instruction  Useful for compaction algorithms © Copyright 2012 HSA Foundation. All Rights Reserved. 18 countuplane_u32 $s0 countlane_u32 $s0, $s6
  • 19. HSAIL MODES  Working group strived to limit optional modes and features in HSAIL  Minimize differences between HSA target machines  Better for compiler vendors and application developers  Two modes survived  Machine Models  Small: 32-bit pointers, 32-bit data  Large: 64-bit pointers, 32-bit or 64-bit data  Vendors can support one or both models  “Base” and “Full” Profiles  Two sets of requirements for FP accuracy, rounding, exception reporting © Copyright 2012 HSA Foundation. All Rights Reserved. 19
  • 20. HSA PROFILES © Copyright 2012 HSA Foundation. All Rights Reserved. 20 Feature Base Full Addressing Modes Small, Large Small, Large Load/store conversion of all floating point types (f16, f32, f64) Yes Yes All 32-bit HSAIL operations according to the declared profile Yes Yes F16 support (IEEE 754 or better) Yes Yes F64 support No Yes Precision for add/sub/mul 1/2 ULP 1/2 ULP Precision for div 2.5 ULP 1/2 ULP Precision for sqrt 1 ULP 1/2 ULP HSAIL Rounding: Near Yes Yes HSAIL Rounding: Up No Yes HSAIL Rounding: Down No Yes HSAIL Rounding: Zero No Yes Subnormal floating-point Flush-to-zero Supported Propagate NaN Payloads No Yes FMA No Yes Arithmetic Exception reporting DETECT DETECT or BREAK Debug trap Yes Yes
  • 21. EXAMPLE © Copyright 2012 HSA Foundation. All Rights Reserved. 21
  • 22. AN EXAMPLE (IN JAVA 8™) © Copyright 2012 HSA Foundation. All Rights Reserved. 22 class Player { private Team team; private int scores; private float pctOfTeamScores; public Team getTeam() {return team;} public int getScores() {return scores;} public void setPctOfTeamScores(int pct) { pctOfTeamScores = pct; } }; // “Team” class not shown // Assume “allPlayers’ is an initialized array of Players.. Stream<Player> s = Arrays.stream(allPlayers).parallel(); s.forEach(p -> { int teamScores = p.getTeam().getScores(); float pctOfTeamScores = (float)p.getScores()/(float) teamScores; p.setPctOfTeamScores(pctOfTeamScores); });
  • 23. HSAIL CODE EXAMPLE © Copyright 2012 HSA Foundation. All Rights Reserved. 23 01: version 0:95: $full : $large; 02: // static method HotSpotMethod<Main.lambda$2(Player)> 03: kernel &run ( 04: kernarg_u64 %_arg0 // Kernel signature for lambda method 05: ) { 06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register 07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord 08: 09: cvt_u64_s32 $d2, $s2; // Convert X gid to long 10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref 11: add_u64 $d2, $d2, 24; // Adjust for actual elements start 12: add_u64 $d2, $d2, $d6; // Add to array ref ptr 13: ld_global_u64 $d6, [$d2]; // Load from array element into reg 14: @L0: 15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam() 16: mov_b64 $d3, $d0; 17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores () 18: cvt_f32_s32 $s16, $s3; 19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores() 20: cvt_f32_s32 $s17, $s0; 21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores 22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores() 23: ret; 24: };
  • 24. WHAT IS HSAIL?  HSAIL is the intermediate language for parallel compute in HSA  Generated by a high level compiler (LLVM, gcc, Java VM, etc)  Low-level IR, close to machine ISA level  Compiled down to target ISA by an IHV “Finalizer”  Finalizer may execute at run time, install time, or build time  Example: OpenCL™ Compilation Stack using HSAIL © Copyright 2012 HSA Foundation. All Rights Reserved. 24 OpenCL™ Kernel High-Level Compiler Flow (Developer) Finalizer Flow (Runtime) EDG or CLANG SPIR LLVM HSAIL HSAIL Finalizer Hardware ISA
  • 25. HSAIL AND SPIR © Copyright 2012 HSA Foundation. All Rights Reserved. 25 Feature HSAIL SPIR Intended Users Compiler developers who want to control their own code generation. Compiler developers who want a fast path to acceleration across a wide variety of devices. IR Level Low-level, just above the machine instruction set High-level, just below LLVM-IR Back-end code generation Thin, fast, robust. Flexible. Can include many optimizations and compiler transformation including register allocation. Where are compiler optimizations performed? Most done in high-level compiler, before HSAIL generation. Most done in back-end code generator, between SPIR and device machine instruction set Registers Fixed-size register pool Infinite SSA Form No Yes Binary format Yes Yes Code generator for LLVM Yes Yes Back-end device targets Modern GPU architectures supported by members of the HSA Foundation Any OpenCL device including GPUs, CPUs, FPGAs Memory Model Relaxed consistency with acquire/release, barriers, and fine- grained barriers Flexible. Can support the OpenCL 1.2 Memory Model
  • 26. TAKEAWAYS  HSAIL Key Points  Thin, robust, fast finalizer  Portable (multiple HW vendors and parallel architectures)  Complements OpenCL™  Supports shared virtual memory and platform atomics  HSA brings GPU computing to mainstream programming models  Shared and coherent memory bridges “faraway accelerator” gap  HSAIL provides the common IL for high-level languages to benefit from parallel computing  Java Example  Unmodified Java8 accelerated on the GPU!  Can use pointer-containing data structures © Copyright 2012 HSA Foundation. All Rights Reserved. 26
  • 27. TOOLS ARE AVAILABLE NOW  HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG)  http://hsafoundation.com/standards/  https://hsafoundation.box.com/s/m6mrsjv8b7r50kqeyyal  Tools now at GitHUB – HSA Foundation  libHSAAssembler and Disassembler  https://github.com/HSAFoundation/HSAIL-Tools  HSAIL Instruction Set Simulator  https://github.com/HSAFoundation/HSAIL-Instruction-Set-Simulator  Soon: LLVM Compilation stack which outputs HSAIL and BRIG  Java compiler for HSAIL (preliminary)  http://openjdk.java.net/projects/sumatra/)  http://openjdk.java.net/projects/graal/ © Copyright 2012 HSA Foundation. All Rights Reserved. 27
  • 28. BACKUP © Copyright 2012 HSA Foundation. All Rights Reserved. 28
  • 29. OPPORTUNITIES WITH LLVM BASED COMPILATION LLVM CLANG C99 C++ 11 C++AMP Objective C OpenCL OpenMP KL OSL Render script UPC Rust Halide Julia Mono Fortran Haskell
  • 30. AN EXAMPLE (IN OPENCL™) © Copyright 2012 HSA Foundation. All Rights Reserved. 30 //Vector add // A[0:N-1] = B[0:N-1] + C[0:N-1] __kernel void vec_add ( __global const float *a, __global const float *b, __global float *c, const unsigned int n) { // Get our global thread ID int id = get_global_id(0); // Make sure we do not go out of bounds if (id < n) c[id] = a[id] + b[id]; }
  • 31. HSAIL VECTOR ADD © Copyright 2012 HSA Foundation. All Rights Reserved. 31 version 1:0:$full:$small; function &get_global_id(arg_u32 %ret_val) (arg_u32 %arg_val0); function &abort() (); kernel &__OpenCL_vec_add_kernel( kernarg_u32 %arg_val0, kernarg_u32 %arg_val1, kernarg_u32 %arg_val2, kernarg_u32 %arg_val3) { @__OpenCL_vec_add_kernel_entry: // BB#0: // %entry ld_kernarg_u32 $s0, [%arg_val3]; workitemabsid_u32 $s1, 0; cmp_lt_b1_u32 $c0, $s1, $s0; ld_kernarg_u32 $s0, [%arg_val2]; ld_kernarg_u32 $s2, [%arg_val1]; ld_kernarg_u32 $s3, [%arg_val0]; cbr $c0, @BB0_2; brn @BB0_1; @BB0_1: // %if.end ret; @BB0_2: // %if.then shl_u32 $s1, $s1, 2; add_u32 $s2, $s2, $s1; ld_global_f32 $s2, [$s2]; add_u32 $s3, $s3, $s1; ld_global_f32 $s3, [$s3]; add_f32 $s2, $s3, $s2; add_u32 $s0, $s0, $s1; st_global_f32 $s2, [$s0]; brn @BB0_1; };
  • 32. SEGMENTS © Copyright 2012 HSA Foundation. All Rights Reserved. 32

Notas do Editor

  1. Proofpoints : Green500, top end is now heterogeneous computing solutions.Components = CPU Cores, 3rd Party IP, GPU cores.
  2. SPIR = Standard Portable Intermediate Representation. Standard compiler intermediate language. More detail later. At this point, note that SPIR is a high-level IR and HSAIL is a low-level IR.
  3. Familiar to folks used to GPUs.Grid contains work-groups; work-group contains workitems.Programmer specifies the global dims and the work-group ; Work-group can often be determined automatically but also can be tuned for peak performance.Wavefront – hardware-specific concept. Similar to the “vector width” of a machine. For example SSE=128, AVX=256.
  4. We want to run a Sobel edge detect filter on Peaches the dog. The equation represents the kernel that is run on each pixel in the image. Note the equation examines surrounding pixels to determine the rate of change, ie an edge.
  5. To use the HSA Parallel Execution Model, we map the image to a 2D grid. Each pixel is mapped to a work-item, and the grid specifies all the pixels in the image. The same kernel is run for each work-item in the grid, but each work-item gets it’s unique (x,y) coordinate. By using the coordinate, work-items can read the pixel values of the surrounding pixels.Key observation is that the programmer writes the code for single pixel in what looks single thread, and the execution model exposes a huge amount of parallelism.
  6. Work-groups provide an additional optional optimization opportunity. HSA supports special “group” memory that is shared by all work-items in the work-group, and is typically very high-performance. In this case, we have read-sharing between neighboring pixels, so we have picked a relatively large square grid so that each work-item can pull from neighboring pixels. This
  7. Destination first. Uses actual registers.Atomics are platform level.
  8. “Spill” = register spill.
  9. Some tension between programmability and HW implementation.
  10. Explain how SIMD hardware uses masks to run both halves of the if-then-else statements.
  11. Virtual ISA is SIMT (one thread), but can use cross-lane operations.Non-portable code.Other cross-lane operations: countlane, countuplane, masklane, sendlane, receivelane
  12. HSA supports IEEE-defined precisions.
  13. Key points : This is in Java.Standard Java8. Uses Java Stream class, and a lambda function.Player class has a pointer (object reference) to a parent Team.Stream supports parallel exectuion, including both multi-core and GPU accelerators. This is goal of HSA – make GPU programming as easy as multi-core CPU programming, and easier than multi-core + vector ISA programming.Note all functions have been inlined.
  14. This is the code for the Java ForEach loop – not translating the entire code into HSAIL.Line4 – kernarg signature.Line5 – returns the coordinate of this work-item, so we know which player to operate on.Line10 – multiple, destination first.Line19 – dereference of team variable.
  15. SPIR = Standard Portable Intermediate Representation. Standard compiler intermediate language. More detail later. At this point, note that SPIR is a high-level IR and HSAIL is a low-level IR.
  16. “Faraway accelerator” -&gt; Separate address spaces, no pointers, perf/power cost of copiesShared virtual mem and platform atomics – HSA systems looks more like CPUs with fast vector units than discrete accelerators.
  17. Flat addressing – don’t have to use the segments.