SlideShare a Scribd company logo
1 of 21
AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM
STEPHAN HODES
DEVELOPER TECHNOLOGY ENGINEER, AMD
GCN PERFORMANCE „FTW“
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2
AGENDA
GCN architecture explained
Top 10: GCN Performance Advice
Questions
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3
AMD GRAPHICS CORE NEXT
What is GCN?
‒Non VLIW architecture
‒ Less dependent on manual vectorization of shaders
‒ Susceptible to register pressure
‒Architecture used in:
‒ AMD discrete GPUs since 2012 (HD7700 and better)
‒ Kabini and Kaveri APUs
‒ Future AMD hardware
‒ New consoles
GCN Hardware is required for Mantle
‒ DirectX 12 API support
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM44
PRODUCT SPECIFICATIONS
AMD RADEON™ R9 290 SERIES
R9 290X R9 290
Compute Units 44 40
Engine Clock Up to 1 GHz Up to 950 MHz
Compute Performance 5.6 TFLOPS 4.9 TFLOPS
Memory Configuration 4GB GDDR5 / 512-bit 4GB GDDR5 / 512-bit
Memory Speed 5.0 Gbps 5.0 Gbps
AMD TrueAudio Technology Yes Yes
API Support
DirectX®
11.2
OpenGL 4.3
Mantle
DirectX®
11.2
OpenGL 4.3
Mantle
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5
GCN COMPUTE UNIT – SPECIFICS
 Non VLIW instruction set architecture
 4 [16-lane] Vector ALU (SIMD)
‒ One wavefront is 64 threads
‒ 1 SP (Single-Precision) op: 4 clocks
‒ 1 DP (Double-Precision) ADD: 8 clocks
‒ 1 DP MUL/FMA & Transcendental:16 clocks
‒ 64KB Vector GPRs
 1 fully programmable scalar ALU
‒ Shared by all threads of a wavefront
‒ Used for flow control, pointer arithmetic, etc.
‒ 8KB Scalar GPRs, scalar data cache, etc.
Branch &
Message Unit
Scalar Unit
Vector Units
(4x SIMD-16)
Vector Registers
(VGPRs, 4x 64KB)
Texture Filter
Units (4)
Local Data Share
(LDS, 64KB)
L1 Cache
(16KB)
Scheduler
Texture Fetch
Load / Store Units
(16)
Scalar Registers
(SGPRs, 8KB)
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6
GCN COMPUTE UNIT – SPECIFICS
 Distributed programmable scheduler(up to 2560 threads)
‒ Each compute unit can execute
instructions from multiple kernels
‒ Separate decode/issue for:
‒ 1 Vector Arithmetic Logic Unit (ALU)
‒ 1 Scalar ALU or Scalar Memory Read
or 1 Branch/Message
‒ 1 Vector memory access
(Read/Write/Atomic)
‒ 1 Local Data Share operation
(LDS)
‒ 1 Export or Global Data Share operation
(GDS)
Plus 1 Special/Internal – [no functional unit]
(s_nop, s_sleep, s_waitcnt, s_barrier, s_setprio)
Branch &
Message Unit
Scalar Unit
Vector Units
(4x SIMD-16)
Vector Registers
(VGPRs, 4x 64KB)
Texture Filter
Units (4)
Local Data Share
(LDS, 64KB)
L1 Cache
(16KB)
Scheduler
Texture Fetch
Load / Store Units
(16)
Scalar Registers
(SGPRs, 8KB)
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7
GCN COMPUTE UNIT – SPECIFICS
 64KB Local Data Share(LDS)
‒ 32 banks, with conflict resolution
‒ Bandwidth amplification
 16KB read/write L1 vector data cache
 Texture Units (utilize L1)
‒ 16 Load/Store units
‒ 4 Filter units
 1 Branch & Message Unit
‒ Executes branch instructions
(as dispatched by Scalar Unit)
Branch &
Message Unit
Scalar Unit
Vector Units
(4x SIMD-16)
Vector Registers
(VGPRs, 4x 64KB)
Texture Filter
Units (4)
Local Data Share
(LDS, 64KB)
L1 Cache
(16KB)
Scheduler
Texture Fetch
Load / Store Units
(16)
Scalar Registers
(SGPRs, 8KB)
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8
GCN COMPUTE UNIT – LATENCY HIDING
 Up to 10 Wavefronts/SIMD
‒ Used to hide latency
‒ Round Robin scheduling
‒ Independent kernels
‒ Often limited by GPR or LDS usage
Time (clocks) Batch 2 Batch 3 Batch 4Batch 1
Stall
Runnable
Stall
Runnable
Stall
Runnable
Stall
Runnable
Done!
Done!
Done!
Done!
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9
GDC COMPUTE UNIT – REGISTER PRESSURE
 Vector GPRs
‒ 64KB / 64 threads / 4 Byte / 10 wavefronts = 25.6 VGPR/thread => Max 24 VGPR per thread
 Scalar GPRs
‒ 8KB / 4 SIMD / 4 Byte / 10 wavefronts = 51.2 SGPR/wavefronts => Max 48 SGPR per wavefront
 LDS
‒ 32KB/threadgroup and threadgroup size 64 => 2 wavefronts/CU max.
‒ 32KB/threadgroup and threadgroup size 256 => 8 wavefronts/CU max.
‒ 16KB/threadgroup and threadgroup size 256 => 16 wavefronts/CU max.
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10
GCN SHADER OPTIMIZATION STRATEGIES
 Try reducing GPR count if you are
slightly over a waves-per-SIMD
threshold
‒ Deep nesting
‒ Local array declarations
‒ Long-lived temporary variables
 Reducing GPRs not always optimal
‒ Shadercompiler might use GPRs
to reduce latency
‒ High number of threads/CU
can thrash your caches
image_load v6, v[35:38], s[4:11]
v_mov_b32 v3, v35
image_load v7, v[3:6], s[4:11]
v_mov_b32 v38, v36
image_load v8, v[37:40], s[4:11]
v_mov_b32 v3, v37
image_load v9, v[3:6], s[4:11]
s_waitcnt vmcnt(2)
v_min_f32 v6, v6, v7
s_waitcnt vmcnt(1)
v_min_f32 v6, v6, v8
s_waitcnt vmcnt(0)
v_min_f32 v40, v6, v9
image_load v6, v[35:38], s[4:11]
v_mov_b32 v3, v35
image_load v7, v[3:6], s[4:11]
v_mov_b32 v38, v36
v_mov_b32 v3, v37
s_waitcnt vmcnt(0)
v_min_f32 v6, v6, v7
image_load v7, v[37:40], s[4:11]
s_waitcnt vmcnt(0)
v_min_f32 v6, v6, v7
image_load v7, v[3:6], s[4:11]
s_waitcnt vmcnt(0)
v_min_f32 v6, v6, v7
Always profile your changes!
http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-tools-sdks/codexl/
http://developer.amd.com/community/blog/2014/05/16/codexl-game-developers-analyze-hlsl-gcn
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11
Top 10 Performance Advice
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12
TOP 10 PERFORMANCE ADVICE
1. Use the power of DirectCompute
‒ Thread group size should be multiple of 64
‒ 256 is often a good choice.
‒ Don‘t underestimate the benefits of LDS
‒ Use asynchronous compute
‒ Don‘t switch between Compute/Rasterization
too frequently
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13
TOP 10 PERFORMANCE ADVICE
2. Don‘t over-tessellate
‒ Small triangles result in poor quad occupancy
‒ Use [maxtessfactor(X)] in Hull Shader declaration
‒ Recommended value is 15 or less
‒ Implement culling in Hull Shader
‒ Use Adaptive Tessellation
‒ Distance Adaptive
‒ Screen Space Adaptive
‒ Orientation Adaptive
!
Especially when rendering Shadowmaps!!!
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14
TOP 10 PERFORMANCE ADVICE
3. Keep your pipeline short
‒ Avoid large expansion in the Geometry Shader
‒ Often a Vertex Shader-only solution can
replace Geometry Shader usage
‒ Bokeh expansion
‒ Pointsprites
‒ Disable tessellation pipeline if unused
4. Pack shaderstage output
‒ Limit Vertex and Domain Shader output size to
4 float4/int4 attributes for best performance.
struct PS_INPUT
{
float3 vPosition;
float3 vNormal;
float2 vTexcoord1;
float2 vTexcoord2;
float2 vTexcoord3;
}; // Unoptimal
struct PS_INPUT
{
float4 vPositionTexcoord1U;
float4 vNormalTexcoord1V;
float4 vTexcoords23;
}; // Good
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15
TOP 10 PERFORMANCE ADVICE
5. Update your Data using map/unmap
‒ Avoid MAP_WRITE_DISCARD
‒ Prefer MAP_WRITE_NO_OVERWRITE
‒ Avoid UpdateSubresource
‒ Prefer Map and/or CopyResource instead
‒ UpdateSubresource is ok for small (<=4KB) updates
‒ CopyResource introduces GPU stalls
‒ Don‘t use the updated resource immediately
‒ Using data without copying it to local first
sometimes can improve performance
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16
TOP 10 PERFORMANCE ADVICE
6. Use flow control with care
‒ Flow control has little overhead
‒ Skipping data fetches usually is good
‒ Avoid non-coherent codepaths
within a wavefront
‒ Watch out for GPR pressure
caused by loops and deep nested branches
v_cmp_gt_f32 r0,r1 //a > b, establish VCC
s_mov_b64 s0,exec //Save current exec mask
s_and_b64 exec,vcc,exec //Do “if”
s_cbranch_vccz label0 //Branch if all lanes fail
v_sub_f32 r2,r0,r1 //result = a – b
v_mul_f32 r2,r2,r0 //result=result * a
label0:
s_andn2_b64 exec,s0,exec //Do “else”(s0 & !exec)
s_cbranch_execz label1 //Branch if all lanes fail
v_sub_f32 r2,r1,r0 //result = b – a
v_mul_f32 r2,r2,r1 //result = result * b
label1:
s_mov_b64 exec,s0 //Restore exec mask
// Branching code example
float fn0(float a,float b)
{
if(a>b)
return((a-b)*a);
else
return((b-a)*b);
}
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17
TOP 10 PERFORMANCE ADVICE
7. Pack your G-Buffer using RGBA16_UINT
‒ Fetches from RGBA16 are full rate (without filtering)
‒ Bilinear fetches to RGBA16 are half rate
‒ Exports to RGBA16_INT are full rate (without blending)
Caution: Blended exports to RGBA16_INT are ¼ speed
8. Depth buffer: don’t render after read
‒ Binding a depth buffer as texture will decompress it,
this will make subsequent Z ops more expensive.
‒ Critical for shadow map atlas rendering!
‒ Consider exporting depth to G-Buffer
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18
TOP 10 PERFORMANCE ADVICE
9. Batch, Batch, Batch!
‒ Add support for geometry instancing
‒ Pool & batch your updates
‒ Less important with Mantle/DirectX12
‒ Reduces Drawcall overhead
‒ Allows better scheduling
10. (DX11) Prefer engine threading
over Deferred Contexts
‒ Deferred contexts are a software feature
‒ … or move to Mantle/DirectX12 
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19
TOP 10 PERFORMANCE ADVICE
 Avoid LDS bank conflicts
‒ Accessing LDS with addresses that are
32 DWORD apart from different threads
will cause bank conflicts
‒ Unless if it‘s the same address
 Don't use gather with offsets
‒ This will result in 4 image_gather4 instructions
image_gather4_c_lz v4, v[12:15], s[4:11], s[12:15]
v_mov_b32 v11, 1
image_gather4_c_lz_o v5, v[11:14], s[4:11], s[12:15]
v_mov_b32 v11, 0x00000100
image_gather4_c_lz_o v7, v[11:14], s[4:11], s[12:15]
v_mov_b32 v11, 0x00000101
image_gather4_c_lz_o v0, v[11:14], s[4:11], s[12:15]
s_waitcnt vmcnt(0)
Bonus Advice
image_gather4_c_lz v0, v[2:5], s[4:11], s[12:15]
s_waitcnt vmcnt(0)
float4 PsExample( PsInput Input ) : SV_Target
{
return tex.GatherCmpRed(
g_SamplePointCmp,
Input.vTex,
Input.depth );
}
float4 PsExample( PsInput Input ) : SV_Target
{
return tex.GatherCmpRed(
g_SamplePointCmp,
Input.vTex,
Input.depth,
int2(0,0),
int2(1,0),
int2(0,1),
int2(1,1) );
}
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20
Questions?
Stephan.Hodes@amd.com
| GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

More Related Content

What's hot

Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauAMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
 
Shader model 5 0 and compute shader
Shader model 5 0 and compute shaderShader model 5 0 and compute shader
Shader model 5 0 and compute shaderzaywalker
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 

What's hot (20)

Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill BilodeauGS-4108, Direct Compute in Gaming, by Bill Bilodeau
GS-4108, Direct Compute in Gaming, by Bill Bilodeau
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
 
Shader model 5 0 and compute shader
Shader model 5 0 and compute shaderShader model 5 0 and compute shader
Shader model 5 0 and compute shader
 
GS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-BilodeauGS-4147, TressFX 2.0, by Bill-Bilodeau
GS-4147, TressFX 2.0, by Bill-Bilodeau
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 

Viewers also liked

GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD
 

Viewers also liked (12)

GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 

Similar to Gcn performance ftw by stephan hodes

Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022ssuser866937
 
Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012
Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012
Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012Joshua Mora
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86Droidcon Berlin
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud jtsagata
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028ssuser5b12d1
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile StudioOwen Wu
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Experiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCExperiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCGanesan Narayanasamy
 
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 

Similar to Gcn performance ftw by stephan hodes (20)

Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012
Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012
Do Theoretical Flo Ps Matter For Real Application’S Performance Kaust 2012
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
FPGA on the Cloud
FPGA on the Cloud FPGA on the Cloud
FPGA on the Cloud
 
Final_Report
Final_ReportFinal_Report
Final_Report
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
GMSL in Linux
GMSL in LinuxGMSL in Linux
GMSL in Linux
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Experiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCExperiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRC
 
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 

More from AMD Developer Central

RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...AMD Developer Central
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 

More from AMD Developer Central (7)

RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Gcn performance ftw by stephan hodes

  • 1. AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM STEPHAN HODES DEVELOPER TECHNOLOGY ENGINEER, AMD GCN PERFORMANCE „FTW“
  • 2. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2 AGENDA GCN architecture explained Top 10: GCN Performance Advice Questions
  • 3. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3 AMD GRAPHICS CORE NEXT What is GCN? ‒Non VLIW architecture ‒ Less dependent on manual vectorization of shaders ‒ Susceptible to register pressure ‒Architecture used in: ‒ AMD discrete GPUs since 2012 (HD7700 and better) ‒ Kabini and Kaveri APUs ‒ Future AMD hardware ‒ New consoles GCN Hardware is required for Mantle ‒ DirectX 12 API support
  • 4. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM44 PRODUCT SPECIFICATIONS AMD RADEON™ R9 290 SERIES R9 290X R9 290 Compute Units 44 40 Engine Clock Up to 1 GHz Up to 950 MHz Compute Performance 5.6 TFLOPS 4.9 TFLOPS Memory Configuration 4GB GDDR5 / 512-bit 4GB GDDR5 / 512-bit Memory Speed 5.0 Gbps 5.0 Gbps AMD TrueAudio Technology Yes Yes API Support DirectX® 11.2 OpenGL 4.3 Mantle DirectX® 11.2 OpenGL 4.3 Mantle
  • 5. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5 GCN COMPUTE UNIT – SPECIFICS  Non VLIW instruction set architecture  4 [16-lane] Vector ALU (SIMD) ‒ One wavefront is 64 threads ‒ 1 SP (Single-Precision) op: 4 clocks ‒ 1 DP (Double-Precision) ADD: 8 clocks ‒ 1 DP MUL/FMA & Transcendental:16 clocks ‒ 64KB Vector GPRs  1 fully programmable scalar ALU ‒ Shared by all threads of a wavefront ‒ Used for flow control, pointer arithmetic, etc. ‒ 8KB Scalar GPRs, scalar data cache, etc. Branch & Message Unit Scalar Unit Vector Units (4x SIMD-16) Vector Registers (VGPRs, 4x 64KB) Texture Filter Units (4) Local Data Share (LDS, 64KB) L1 Cache (16KB) Scheduler Texture Fetch Load / Store Units (16) Scalar Registers (SGPRs, 8KB)
  • 6. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6 GCN COMPUTE UNIT – SPECIFICS  Distributed programmable scheduler(up to 2560 threads) ‒ Each compute unit can execute instructions from multiple kernels ‒ Separate decode/issue for: ‒ 1 Vector Arithmetic Logic Unit (ALU) ‒ 1 Scalar ALU or Scalar Memory Read or 1 Branch/Message ‒ 1 Vector memory access (Read/Write/Atomic) ‒ 1 Local Data Share operation (LDS) ‒ 1 Export or Global Data Share operation (GDS) Plus 1 Special/Internal – [no functional unit] (s_nop, s_sleep, s_waitcnt, s_barrier, s_setprio) Branch & Message Unit Scalar Unit Vector Units (4x SIMD-16) Vector Registers (VGPRs, 4x 64KB) Texture Filter Units (4) Local Data Share (LDS, 64KB) L1 Cache (16KB) Scheduler Texture Fetch Load / Store Units (16) Scalar Registers (SGPRs, 8KB)
  • 7. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7 GCN COMPUTE UNIT – SPECIFICS  64KB Local Data Share(LDS) ‒ 32 banks, with conflict resolution ‒ Bandwidth amplification  16KB read/write L1 vector data cache  Texture Units (utilize L1) ‒ 16 Load/Store units ‒ 4 Filter units  1 Branch & Message Unit ‒ Executes branch instructions (as dispatched by Scalar Unit) Branch & Message Unit Scalar Unit Vector Units (4x SIMD-16) Vector Registers (VGPRs, 4x 64KB) Texture Filter Units (4) Local Data Share (LDS, 64KB) L1 Cache (16KB) Scheduler Texture Fetch Load / Store Units (16) Scalar Registers (SGPRs, 8KB)
  • 8. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8 GCN COMPUTE UNIT – LATENCY HIDING  Up to 10 Wavefronts/SIMD ‒ Used to hide latency ‒ Round Robin scheduling ‒ Independent kernels ‒ Often limited by GPR or LDS usage Time (clocks) Batch 2 Batch 3 Batch 4Batch 1 Stall Runnable Stall Runnable Stall Runnable Stall Runnable Done! Done! Done! Done!
  • 9. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9 GDC COMPUTE UNIT – REGISTER PRESSURE  Vector GPRs ‒ 64KB / 64 threads / 4 Byte / 10 wavefronts = 25.6 VGPR/thread => Max 24 VGPR per thread  Scalar GPRs ‒ 8KB / 4 SIMD / 4 Byte / 10 wavefronts = 51.2 SGPR/wavefronts => Max 48 SGPR per wavefront  LDS ‒ 32KB/threadgroup and threadgroup size 64 => 2 wavefronts/CU max. ‒ 32KB/threadgroup and threadgroup size 256 => 8 wavefronts/CU max. ‒ 16KB/threadgroup and threadgroup size 256 => 16 wavefronts/CU max.
  • 10. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10 GCN SHADER OPTIMIZATION STRATEGIES  Try reducing GPR count if you are slightly over a waves-per-SIMD threshold ‒ Deep nesting ‒ Local array declarations ‒ Long-lived temporary variables  Reducing GPRs not always optimal ‒ Shadercompiler might use GPRs to reduce latency ‒ High number of threads/CU can thrash your caches image_load v6, v[35:38], s[4:11] v_mov_b32 v3, v35 image_load v7, v[3:6], s[4:11] v_mov_b32 v38, v36 image_load v8, v[37:40], s[4:11] v_mov_b32 v3, v37 image_load v9, v[3:6], s[4:11] s_waitcnt vmcnt(2) v_min_f32 v6, v6, v7 s_waitcnt vmcnt(1) v_min_f32 v6, v6, v8 s_waitcnt vmcnt(0) v_min_f32 v40, v6, v9 image_load v6, v[35:38], s[4:11] v_mov_b32 v3, v35 image_load v7, v[3:6], s[4:11] v_mov_b32 v38, v36 v_mov_b32 v3, v37 s_waitcnt vmcnt(0) v_min_f32 v6, v6, v7 image_load v7, v[37:40], s[4:11] s_waitcnt vmcnt(0) v_min_f32 v6, v6, v7 image_load v7, v[3:6], s[4:11] s_waitcnt vmcnt(0) v_min_f32 v6, v6, v7 Always profile your changes! http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-tools-sdks/codexl/ http://developer.amd.com/community/blog/2014/05/16/codexl-game-developers-analyze-hlsl-gcn
  • 11. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11 Top 10 Performance Advice
  • 12. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12 TOP 10 PERFORMANCE ADVICE 1. Use the power of DirectCompute ‒ Thread group size should be multiple of 64 ‒ 256 is often a good choice. ‒ Don‘t underestimate the benefits of LDS ‒ Use asynchronous compute ‒ Don‘t switch between Compute/Rasterization too frequently
  • 13. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13 TOP 10 PERFORMANCE ADVICE 2. Don‘t over-tessellate ‒ Small triangles result in poor quad occupancy ‒ Use [maxtessfactor(X)] in Hull Shader declaration ‒ Recommended value is 15 or less ‒ Implement culling in Hull Shader ‒ Use Adaptive Tessellation ‒ Distance Adaptive ‒ Screen Space Adaptive ‒ Orientation Adaptive ! Especially when rendering Shadowmaps!!!
  • 14. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14 TOP 10 PERFORMANCE ADVICE 3. Keep your pipeline short ‒ Avoid large expansion in the Geometry Shader ‒ Often a Vertex Shader-only solution can replace Geometry Shader usage ‒ Bokeh expansion ‒ Pointsprites ‒ Disable tessellation pipeline if unused 4. Pack shaderstage output ‒ Limit Vertex and Domain Shader output size to 4 float4/int4 attributes for best performance. struct PS_INPUT { float3 vPosition; float3 vNormal; float2 vTexcoord1; float2 vTexcoord2; float2 vTexcoord3; }; // Unoptimal struct PS_INPUT { float4 vPositionTexcoord1U; float4 vNormalTexcoord1V; float4 vTexcoords23; }; // Good
  • 15. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15 TOP 10 PERFORMANCE ADVICE 5. Update your Data using map/unmap ‒ Avoid MAP_WRITE_DISCARD ‒ Prefer MAP_WRITE_NO_OVERWRITE ‒ Avoid UpdateSubresource ‒ Prefer Map and/or CopyResource instead ‒ UpdateSubresource is ok for small (<=4KB) updates ‒ CopyResource introduces GPU stalls ‒ Don‘t use the updated resource immediately ‒ Using data without copying it to local first sometimes can improve performance
  • 16. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16 TOP 10 PERFORMANCE ADVICE 6. Use flow control with care ‒ Flow control has little overhead ‒ Skipping data fetches usually is good ‒ Avoid non-coherent codepaths within a wavefront ‒ Watch out for GPR pressure caused by loops and deep nested branches v_cmp_gt_f32 r0,r1 //a > b, establish VCC s_mov_b64 s0,exec //Save current exec mask s_and_b64 exec,vcc,exec //Do “if” s_cbranch_vccz label0 //Branch if all lanes fail v_sub_f32 r2,r0,r1 //result = a – b v_mul_f32 r2,r2,r0 //result=result * a label0: s_andn2_b64 exec,s0,exec //Do “else”(s0 & !exec) s_cbranch_execz label1 //Branch if all lanes fail v_sub_f32 r2,r1,r0 //result = b – a v_mul_f32 r2,r2,r1 //result = result * b label1: s_mov_b64 exec,s0 //Restore exec mask // Branching code example float fn0(float a,float b) { if(a>b) return((a-b)*a); else return((b-a)*b); }
  • 17. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17 TOP 10 PERFORMANCE ADVICE 7. Pack your G-Buffer using RGBA16_UINT ‒ Fetches from RGBA16 are full rate (without filtering) ‒ Bilinear fetches to RGBA16 are half rate ‒ Exports to RGBA16_INT are full rate (without blending) Caution: Blended exports to RGBA16_INT are ¼ speed 8. Depth buffer: don’t render after read ‒ Binding a depth buffer as texture will decompress it, this will make subsequent Z ops more expensive. ‒ Critical for shadow map atlas rendering! ‒ Consider exporting depth to G-Buffer
  • 18. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18 TOP 10 PERFORMANCE ADVICE 9. Batch, Batch, Batch! ‒ Add support for geometry instancing ‒ Pool & batch your updates ‒ Less important with Mantle/DirectX12 ‒ Reduces Drawcall overhead ‒ Allows better scheduling 10. (DX11) Prefer engine threading over Deferred Contexts ‒ Deferred contexts are a software feature ‒ … or move to Mantle/DirectX12 
  • 19. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19 TOP 10 PERFORMANCE ADVICE  Avoid LDS bank conflicts ‒ Accessing LDS with addresses that are 32 DWORD apart from different threads will cause bank conflicts ‒ Unless if it‘s the same address  Don't use gather with offsets ‒ This will result in 4 image_gather4 instructions image_gather4_c_lz v4, v[12:15], s[4:11], s[12:15] v_mov_b32 v11, 1 image_gather4_c_lz_o v5, v[11:14], s[4:11], s[12:15] v_mov_b32 v11, 0x00000100 image_gather4_c_lz_o v7, v[11:14], s[4:11], s[12:15] v_mov_b32 v11, 0x00000101 image_gather4_c_lz_o v0, v[11:14], s[4:11], s[12:15] s_waitcnt vmcnt(0) Bonus Advice image_gather4_c_lz v0, v[2:5], s[4:11], s[12:15] s_waitcnt vmcnt(0) float4 PsExample( PsInput Input ) : SV_Target { return tex.GatherCmpRed( g_SamplePointCmp, Input.vTex, Input.depth ); } float4 PsExample( PsInput Input ) : SV_Target { return tex.GatherCmpRed( g_SamplePointCmp, Input.vTex, Input.depth, int2(0,0), int2(1,0), int2(0,1), int2(1,1) ); }
  • 20. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20 Questions? Stephan.Hodes@amd.com
  • 21. | GCN PERFORMANCE „FTW“ | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.