SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Jefferson Amstutz, Dmitry Babokin, Pete Brubaker
Contributions by Jon Kennedy, Jeff Rous, Arina Neshlyaeva
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Advanced SIMD Programming with the
Intel® ISPC Compiler
https://ispc.github.io/
Epic Chaos Demo - Image courtesy of Epic Game® Epic Chaos Demo - Image courtesy of Epic Games ®Intel® OSPRay
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
No license (express or implied, by estoppel or otherwise) to any intellectual
property rights is granted by this document.
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system
manufacturer or retailer or learn more at www.intel.com
Intel disclaims all express and implied warranties, including without
limitation, the implied warranties of merchantability, fitness for a particular
purpose, and non-infringement, as well as any warranty arising from
course of performance, course of dealing, or usage in trade.
Optimization Notice: Intel's compilers may or may not optimize to the same degree
for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality,
or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with
Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User
and Reference Guides for more information regarding the specific instruction sets
covered by this notice.
You may not use or facilitate the use of this document in connection with
any infringement or other legal analysis concerning Intel products
described herein. You agree to grant Intel a non-exclusive, royalty-free
license to any patent claim thereafter drafted which includes subject
matter disclosed herein.
Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or
other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.
Legal Notices And Disclaimers
ISPC : A Brief Recap
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Intel® OSPRay : Disney’s Moana Island Scene: over 15 billion instanced primitives rendered interactively
• Exploiting Parallelism is essential for obtaining peak
performance on modern computing hardware
• Task Parallelism : Multithreading - Utilize all the cores
• SIMD Parallelism : SIMD Programming - Utilize all the vector
units
• Learning intrinsics is time consuming, and not always accessible
to every programmer.
• Make it easier to get all the FLOPs without being a ninja
programmer
• Reduce the development cost by working with a high level
language
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Why ISPC?
ISPC : A Brief Recap
• The Intel SPMD Program Compiler
• SPMD == Single Program, Multiple Data programming model
• It’s a compiler and a language for writing vector (SIMD) code.
• Open-source, LLVM-based language and compiler for many SIMD architectures.
• Generates high performance vector code targeting many vector ISAs.
• Cross platform support (Windows/Linux/MacOS/PS4/Xbox/ARM AARCH64)
• The language is C based
• Simple to use and easy to integrate with existing codebase.
• ISPC is not an “autovectorizing” compiler!
• Vectors are built into the type system, not discovered
• The programmer explicitly specifies vector or scalar variables
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
What is ISPC?
ISPC : A Brief Recap
ISPC : A Brief Recap
• C based, so it’s easy to read and
understand
• Code looks sequential, but executes
in parallel
• Easily mixes scalar and vector
computation
• Explicit vectorization using two new
keywords, uniform and varying
• Vector iteration via foreach keyword
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
https://ispc.godbolt.org/z/sOpQ8Z
What does the language look like?
It is basically shader programming for the CPU!
• The ISPC compiler produces everything required for very simple
integration into application code.
• C/C++ header file
• Contains the API/function call for each kernel you have written
• Contains any data structures defined in your ISPC kernel and
required by the application code.
• Object files to link against
• No bulky runtime or verbose API
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC : A Brief Recap
Easy integration
• Programmers no longer need to know the ISA to write good vector code.
• More accessible to programmers who aren’t familiar with SIMD intrinsics.
• More programmers are able to fully utilize the CPU in different areas of application
development.
• Reduced development cost
• It’s easier to develop and maintain. Simple integration. It looks like scalar code.
• Increased optimization reach
• Supporting a new ISA is as easy as changing a command line option and recompiling.
• Increased performance over scalar code
• SSE : ~3-4x; AVX2 : ~5-6x
• YMMV ☺
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC : A Brief Recap
Why is this good?
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Vector Loops
Epic Chaos Demo - Image courtesy of Epic Games®
Vector Loops
• Foreach is a convenience mechanism:
• It is a simd_for loop and iterates in chunks of
simd width sized steps
• Unmasked main body for when all SIMD
lanes are enabled
• Masked tail body for when some SIMD lanes
are disabled
• Foreach can be N dimensional, where each
dimensional index is a varying
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• For loop
• A for loop with a varying index will use
masking in the loop body
• Safe, but with a slight cost
• A for loop with a uniform index will have no
masking
• The user will need to add a tail body
https://ispc.godbolt.org/z/r1eflk
foreach(…) vs for(…)
Vector Loops
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
foreach example
https://ispc.godbolt.org/z/00eIcH
Unmasked Main Body
Masked Tail Body
Vector Loops
• Serializes over each active SIMD lane
• Many Uses :
• Atomic operations
• Custom reductions
• Calls to uniform functions
• …
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
https://ispc.godbolt.org/z/i18Lux
Unreal Engine 4.23, Chaos Physics ISPC Source
foreach_active
Vector Loops
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Loop over each unique value in a varying only once
• Execution mask enabled for all SIMD lanes with the same value
https://ispc.godbolt.org/z/r49y7i
foreach_unique
Vector Loops
Naïve ports to uniform code paths can miss opportunities
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Axis of parallelization
Try looking for a new axis of parallelization
https://ispc.godbolt.org/z/GF7myA
Scalar
Vector
Vector Loops
• ISPC supports multiple axis of
parallelization within a kernel
• HLSL/GLSL/CL only support 1
• User controlled
• Provides optimization opportunities
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
https://github.com/ispc/ispc/blob/master/examples/sgemm/SGEMM_kernels.ispc
Multiple axes of parallelisation
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Structures and Pointers
Intel® OSPRay : Gramophone rendered in Pixar’s usdview
Structures and Pointers
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
struct vec3f {
float x, y, z;
};
struct Ray {
vec3f origin;
vec3f direction;
float tnear;
float tfar;
};
Uniform Ray
uniform Ray r;
Varying Ray
varying Ray r;
Uniform vs. Varying structures
Structures and Pointers
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
struct vec3f {
float x, y, z;
};
struct PerspRay {
uniform vec3f origin;
vec3f direction;
float tnear;
float tfar;
};
Uniform PerspRay
uniform PerspRay r;
Varying PerspRay
varying PerspRay r;
Uniform vs. Varying structures
• Pointers are complex
• The variability is specified like ‘const’ in C/C++
uniform float * varying vPtr;
• Variability: 2 parts
• The pointer itself
• Single pointer? Different pointer per SIMD lane?
• Default: varying
• The item pointed-to
• Scalar value? Vector value?
• Default: uniform
• Be explicit and specify the variability so it’s correct and clear to the reader
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Structures and Pointers
ISPC pointers
Structures and Pointers
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
->
Pointer Data
f
-> f f f f
-> -> -> ->
f f
f f
-> -> -> ->
f f f f
f f f f
f f f f
f f f f
uniform float * uniform uPtr2u;
varying float * uniform uPtr2v;
uniform float * varying vPtr2u;
varying float * varying vPtr2v;
ISPC pointers
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Memory Access
Epic Chaos Demo - Image courtesy of Epic Games®
uniform vec3f uPos
{
}
varying vec3f vPos
{
}
Memory Access
struct vec3f
{
float x;
float y;
float z;
};
Memory Layout:
x
y
z
x y z x y z …
x
y
z
x
y
z
x
y
z
x
y
z
x x x x y y y y …
Uniform vs. Varying data layout
varying Ray uRay
{
origin {
}
direction {
}
tnear
tfar
}
Memory Access
Complex data layout
uniform Ray uRay
{
origin {
}
direction {
}
tnear
tfar
}
struct Ray {
vec3f origin;
vec3f direction;
float tnear;
float tfar;
};
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
x
y
z
Memory Access
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• ISPC will automatically transpose your array of structures (AoS) data to structures of
arrays (SoA) and back
• Useful for block copying uniform structs into varyings
• It will just work!
• But there may be faster alternatives?
Data transposition
https://ispc.godbolt.org/z/4_p44L
Memory Access
• Vector reads/writes to non-contiguous
memory
• AVX2 onwards supports an optimised
gather instruction
• AVX512 supports an optimised scatter
instruction
• ISPC will use these if available
• ISPC will emit performance warnings when it
finds gather/scatters
#pragma ignore warning(perf)
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Gather performance has improved over
successive generations
• But there can be faster alternatives,
especially if there is cacheline locality
• Aos_to_Soa() helpers
• Good for packed float3/float4 data types
• Shuffle()
• Load a vector register from memory and
swizzle the data
• You will need to experiment on your dataset.
• The fastest form of gather is no gather –
read contiguous memory where possible!
Scatter/Gather
Memory Access
• It's best to use SoA or AoSoA layouts with
ISPC
• Re-arranging data is not always easy
• Transposing the input data can be
faster than using gather/scatter
instructions.
• When to transpose?
• If the algorithm is cheap, it's best to
convert the data into a temporary
buffer, do the work then convert back.
• Otherwise transpose live data on the
way in/out of the kernel.
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
AOS to SOA
Transpose
Array of Structures
(AoS)
Structure of Arrays
(SoA)
Hybrid Array of Structures of Arrays
(AoSoA)
Memory Access
• There are stdlib functions,
aos_to_soa3/4.
• They assume arrays of
vec3/vec4 input data.
• What about strided data?
• You can write your own
transpose functions using
the stdlib.
• Use loads, shuffles, inserts, etc.
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
AOS to SOA
Vector Load Vector Load Vector Load
Vector Store Vector Store Vector Store
Shuffle
Shuffle
Memory Access
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
AOS to SOA example
https://ispc.godbolt.org/z/NwLihI
Unreal Engine 4.23, Chaos Physics ISPC Source
DRAM
Memory Access
• Allows writes to memory to occur bypassing the cache
• Avoids cacheline reads and cache pollution
• Useful when bandwidth limited
• Not always faster than normal stores
• Never read the memory straight after the write
• It won’t be in cache and will be slow…
• Write full cachelines to avoid partial writes
• Used for techniques such as :
• Texture writes
• Geometry transformations
• Compression
• …
• Experiment with your dataset.
• What about streaming loads?
• Unless the memory was specifically allocated with the
write combining flag, they won’t do anything
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Streaming stores
Normal Write
Cache Hierarchy Write Combine Buffer
Streaming Store
Memory Access
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Streaming stores example
https://ispc.godbolt.org/z/bKOJ1m
Memory Access
• Loads and stores can be aligned or unaligned
(default)
• There are specific instructions for each type
• Historically this had a performance impact
• Unaligned loads/stores may straddle cachelines
• Newer Intel architectures have reduced/removed
this impact
• Alignment needs to be the register width
• SSE : 16byte, AVX2 32byte, AVX512 64byte
• Simple to enable in ISPC
• --opt=force-aligned-memory
• Try it – YMMV!
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Aligned memory
Cacheline Cacheline
Cacheline
Unaligned Load
CachelineAligned Load
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Control Flow
Intel® OSPRay : Richtmyer–meshkov volume shown with shadows and ambient occlusion
Control Flow
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Divergent control flow
Control flow divergence can be costly
1 1 1 10 1 0 11 0 1 0
1 1 1 11 1 1 10 0 0 0
Divergent branch causes both expensive
operations to be executed
Uniform branch causes a single
expensive operation to be executed
Consider this :
Now consider this :
Execution Mask
Execution Mask
https://ispc.godbolt.org/z/XM0MEw
Control Flow
Unmasked Functions
• Avoids masked operations
• Useful if you want to use a different execution
mask
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Unmasked Blocks
• An optimisation
• Avoids masked operations
• Useful when you know there are no side
effects
Unmasked
https://ispc.godbolt.org/z/i18Lux
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Interfacing Tricks
Epic Chaos Demo - Image courtesy of Epic Games®
Interfacing Tricks
• Input data is generally an array of
uniforms
• These can be copied directly to varyings
by using a varying index
• Such as programIndex
• They can be cast to a varying pointer and
dereferenced
• Applications can pass in ‘fake’ varyings
which still generates SIMD code
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Mapping input data to ispc varyings
https://ispc.godbolt.org/z/-hbfO1
Interfacing Tricks
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Just like normal C/C++ code, there are times when you need to call external code
• ISPC supports this for any external function using ‘C’ linkage
Calling back to C
https://ispc.godbolt.org/z/P5XcuT
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Choosing the Right Target
Epic Chaos Demo - Image courtesy of Epic Games®
Choosing the Right Target
• ISPC has a limited set of decoupling of SIMD width
and ISA
• “Double Pumped”
• Vector instructions executed twice to
emulate double width registers
• Can be effective at hiding latency
• sse4-i32x8, avx2-i32x16, etc
• “Half Pumped”
• Vector instructions executed with
narrower SIMD width registers
• Use a richer ISA for performance
gains
• avx512skl-i32x8
• Avoids platform specific AVX512
power scaling
• As simple as changing the command line
• --target=...
• Experiment to find the best targets for your
workload
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Asymmetrical SIMD register width and target SIMD ISA
https://ispc.godbolt.org/z/4EhA2A
Choosing the Right Target
ISPC supports compiling to multiple targets
at once
• Currently, only 1 target per ISA
• Auto dispatch will choose the highest
supported compiled target that a platform
supports, at runtime
• Manual dispatch will be coming in a future
release…
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Compile for all of the main targets
• SSE4, AVX2, AVX512
• This will allow the best performing ISA to run
on your system
• Unreal Engine and OSPRay compile for all of
the main targets by default.
Auto dispatch : multi-target compilation
--target=sse4-i32x4,avx2-i32x8,
avx512skx-i32x16
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC StdLib
Intel® OSPRay : OSPRay’s path tracer supports physically-based materials and a common principled material
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC STDLIB
Use ISPC stdlib
ISPC provides a rich stdlib of operations:
• Logical operators
• Bit ops
• Math
• Clamping and Saturated Arithmetic
• Transcendental Operations
• RNG (Not the fastest!)
• Mask/Cross-lane Operations
• Reductions
• And that’s not all!
https://github.com/ispc/ispc/blob/master/stdlib.ispc
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Floating Point Determinism
Epic Chaos Demo - Image courtesy of Epic Games®
To increase floating point precision/determinism :
• Don’t use `--opt=fast-maths`
• Do use `--opt=disable-fma`
• But, there will be a performance penalty
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Floating Point Determinism
A Quick note!
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Debugging and Optimizing ISPC Kernels
Epic Chaos Demo - Image courtesy of Epic Games®
• Compile ISPC kernels with –g
• Visual Studio, gdb, lldb etc
works as expected
• View registers, uniform and
varying data
• Visual Studio Code ISPC
Plugin available
• Syntax highlights, Auto-
complete stdlib, Real-time
validation
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Debugging ISPC Kernels
Debugging
• The best way to check for performance deltas when optimising code is to
benchmark it
• Sometimes the code of interest is too small, so need a microbenchmark
• A small ISPC kernel run many times, ideally on real data
• Caution as the results may not be representative of the final gains
• ISPC git repo will soon contain a microbenchmark `ispc-bench`
• Based on google benchmark
• Simple to use and augment
• ISPC Dev team are looking for contributions to help improve ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Optimising ISPC kernels
Benchmarking
Optimising ISPC kernels
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC is supported by the Compiler Explorer
• Simply copy and paste your kernels into a browser
• Try different command line arguments
• Look for optimization opportunities in the ASM code
• Experiment with all of the example code from this presentation
• Now supports using ispc (trunk)
Godbolt Compiler Explorer
http://ispc.godbolt.org/
Optimising ISPC kernels
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• LLVM-MCA provides static code uOp/cycle counts
• Doesn’t accurately report the cost of memory ops, but still useful
Godbolt Compiler Explorer : llvm-mca
https://ispc.godbolt.org/z/etmC_T
Optimising ISPC kernels
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Profile your ispc kernels looking for hotspots
• Compile the kernels with –g for debugging symbols
• ISPC heavily inlines, so use ‘noinline’ to target hotspot functions
VTune
https://software.intel.com/en-us/vtune
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC Roadmap
Intel® OSPRay : Disney’s Moana Island Scene: over 15 billion instanced primitives rendered interactively
ISPC Roadmap
ISPC v1.12
• ARM support
• Cross compilation support
(iOS/Android/Switch/Xbox/PS4)
• Noinline keyword
• Performance improvements
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC v1.next
• Performance improvements
• Future hardware support
• Manual dispatch
ISPC roadmap
File an issue on github – let us know what you need!
Submit a patch – show us what you need!
Advanced ISPC
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC Resources
Intel® OSPRay : OSPRay’s path tracer supports physically-based materials and a common principled material
ISPC Resources
ISPC Home Page
• https://ispc.github.io/ispc.html
ISPC Origins
• https://pharr.org/matt/blog/2018/04/18/ispc-origins.html
ISPC on Intel® Developer Zone
• https://software.intel.com/en-
us/search/site/language/en?query=ispc
Visual Studio Code ISPC Plugin
• https://marketplace.visualstudio.com/items?itemName=intel-
corporation.ispc
ISPC Compiler Explorer
• https://ispc.godbolt.org/
Intel® Intrinsics Guide
• https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Agner Fog Instruction Tables
• https://www.agner.org/optimize/instruction_tables.pdf
uOps Latency, Throughput and Port Usage Information
• http://uops.info/
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ISPC Github
• https://github.com/ispc/ispc/
Intel® OSPRay
• https://www.ospray.org/
Unreal Engine
• https://www.unrealengine.com/en-US/
ISPC Texture Compressor
• https://github.com/GameTechDev/ISPCTextureCompressor
ISPC DX12 nBodies Sample
• https://github.com/GameTechDev/ISPC-DirectX-Graphics-
Samples
SPIRV to ISPC Project
• https://github.com/GameTechDev/SPIRV-Cross
ISPC in Unreal Engine Blog Post
• https://software.intel.com/en-us/articles/unreal-engines-new-
chaos-physics-system-screams-with-in-depth-intel-cpu-
optimizations
ISPC on the web
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
56
• Subtitle Copy Goes Here

Mais conteúdo relacionado

Mais procurados

Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたMITSUNARI Shigeo
 
Sw技術者に送るfpga入門
Sw技術者に送るfpga入門Sw技術者に送るfpga入門
Sw技術者に送るfpga入門直久 住川
 
MIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHYMIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHYMIPI Alliance
 
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019 Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019 Unity Technologies
 
Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)marsee101
 
組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門Norishige Fukushima
 
Zynq + Vivado HLS入門
Zynq + Vivado HLS入門Zynq + Vivado HLS入門
Zynq + Vivado HLS入門narusugimoto
 
Windows internals
Windows internalsWindows internals
Windows internalsPiyush Jain
 
MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...
MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...
MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...MIPI Alliance
 
Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)
Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)
Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)marsee101
 
ACRiウェビナー:小野様ご講演資料
ACRiウェビナー:小野様ご講演資料ACRiウェビナー:小野様ご講演資料
ACRiウェビナー:小野様ご講演資料直久 住川
 
Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出marsee101
 
マルチコアのプログラミング技法 -- OpenCLとWebCL
マルチコアのプログラミング技法 -- OpenCLとWebCLマルチコアのプログラミング技法 -- OpenCLとWebCL
マルチコアのプログラミング技法 -- OpenCLとWebCLmaruyama097
 

Mais procurados (20)

Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみた
 
From IA-32 to avx-512
From IA-32 to avx-512From IA-32 to avx-512
From IA-32 to avx-512
 
llvm入門
llvm入門llvm入門
llvm入門
 
Sw技術者に送るfpga入門
Sw技術者に送るfpga入門Sw技術者に送るfpga入門
Sw技術者に送るfpga入門
 
MIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHYMIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHY
 
4th Semester M Tech: Computer Science and Engineering (Jun-2016) Question Papers
4th Semester M Tech: Computer Science and Engineering (Jun-2016) Question Papers4th Semester M Tech: Computer Science and Engineering (Jun-2016) Question Papers
4th Semester M Tech: Computer Science and Engineering (Jun-2016) Question Papers
 
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019 Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
Intrinsics: Low-level engine development with Burst - Unite Copenhagen 2019
 
Verilog tutorial
Verilog tutorialVerilog tutorial
Verilog tutorial
 
Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)Vivado hls勉強会3(axi4 lite slave)
Vivado hls勉強会3(axi4 lite slave)
 
QEMU in Cross building
QEMU in Cross buildingQEMU in Cross building
QEMU in Cross building
 
組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門組み込み関数(intrinsic)によるSIMD入門
組み込み関数(intrinsic)によるSIMD入門
 
Zynq + Vivado HLS入門
Zynq + Vivado HLS入門Zynq + Vivado HLS入門
Zynq + Vivado HLS入門
 
Windows internals
Windows internalsWindows internals
Windows internals
 
from Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Worksfrom Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Works
 
MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...
MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...
MIPI DevCon 2016: MIPI C-PHY - Introduction From Basic Theory to Practical Im...
 
Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)
Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)
Vivado hls勉強会2(レジスタの挿入とpipelineディレクティブ)
 
ACRiウェビナー:小野様ご講演資料
ACRiウェビナー:小野様ご講演資料ACRiウェビナー:小野様ご講演資料
ACRiウェビナー:小野様ご講演資料
 
Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出Pynqでカメラ画像をリアルタイムfastx コーナー検出
Pynqでカメラ画像をリアルタイムfastx コーナー検出
 
マルチコアのプログラミング技法 -- OpenCLとWebCL
マルチコアのプログラミング技法 -- OpenCLとWebCLマルチコアのプログラミング技法 -- OpenCLとWebCL
マルチコアのプログラミング技法 -- OpenCLとWebCL
 
Qemu
QemuQemu
Qemu
 

Semelhante a Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Implicit SPMD Program Compiler | SIGGRAPH 2019 Technical Sessions

Eclipse Plugin for ESP-IDF - EclipseCon Europe 2019
Eclipse Plugin for ESP-IDF -  EclipseCon Europe 2019Eclipse Plugin for ESP-IDF -  EclipseCon Europe 2019
Eclipse Plugin for ESP-IDF - EclipseCon Europe 2019Kondal Kolipaka
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceinside-BigData.com
 
A new era of opensource hardware Pakistan's story MERL.pdf
A new era of opensource hardware Pakistan's story MERL.pdfA new era of opensource hardware Pakistan's story MERL.pdf
A new era of opensource hardware Pakistan's story MERL.pdfAli Ahmed, Ph.D.
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleAMD Developer Central
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapManolis Vavalis
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
Fpga implementation of encryption and decryption algorithm based on aes
Fpga implementation of encryption and decryption algorithm based on aesFpga implementation of encryption and decryption algorithm based on aes
Fpga implementation of encryption and decryption algorithm based on aeseSAT Publishing House
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18Xiaoli Liang
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Linxu conj2016 96boards
Linxu conj2016 96boardsLinxu conj2016 96boards
Linxu conj2016 96boardsLF Events
 
OSPF EIGRP & RIP comparision.pdf
OSPF EIGRP & RIP comparision.pdfOSPF EIGRP & RIP comparision.pdf
OSPF EIGRP & RIP comparision.pdfKOLOYOYO
 

Semelhante a Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Implicit SPMD Program Compiler | SIGGRAPH 2019 Technical Sessions (20)

Eclipse Plugin for ESP-IDF - EclipseCon Europe 2019
Eclipse Plugin for ESP-IDF -  EclipseCon Europe 2019Eclipse Plugin for ESP-IDF -  EclipseCon Europe 2019
Eclipse Plugin for ESP-IDF - EclipseCon Europe 2019
 
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performanceAI Bridging Cloud Infrastructure (ABCI) and its communication performance
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
 
A new era of opensource hardware Pakistan's story MERL.pdf
A new era of opensource hardware Pakistan's story MERL.pdfA new era of opensource hardware Pakistan's story MERL.pdf
A new era of opensource hardware Pakistan's story MERL.pdf
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmap
 
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the ...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
Fpga implementation of encryption and decryption algorithm based on aes
Fpga implementation of encryption and decryption algorithm based on aesFpga implementation of encryption and decryption algorithm based on aes
Fpga implementation of encryption and decryption algorithm based on aes
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Linxu conj2016 96boards
Linxu conj2016 96boardsLinxu conj2016 96boards
Linxu conj2016 96boards
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
OSPF EIGRP & RIP comparision.pdf
OSPF EIGRP & RIP comparision.pdfOSPF EIGRP & RIP comparision.pdf
OSPF EIGRP & RIP comparision.pdf
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
Introduction to EDA Tools
Introduction to EDA ToolsIntroduction to EDA Tools
Introduction to EDA Tools
 

Mais de Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Software
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...Intel® Software
 

Mais de Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
 

Último

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Último (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Implicit SPMD Program Compiler | SIGGRAPH 2019 Technical Sessions

  • 1. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  • 2. Jefferson Amstutz, Dmitry Babokin, Pete Brubaker Contributions by Jon Kennedy, Jeff Rous, Arina Neshlyaeva SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Advanced SIMD Programming with the Intel® ISPC Compiler https://ispc.github.io/ Epic Chaos Demo - Image courtesy of Epic Game® Epic Chaos Demo - Image courtesy of Epic Games ®Intel® OSPRay
  • 3. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation. Legal Notices And Disclaimers
  • 4. ISPC : A Brief Recap SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Intel® OSPRay : Disney’s Moana Island Scene: over 15 billion instanced primitives rendered interactively
  • 5. • Exploiting Parallelism is essential for obtaining peak performance on modern computing hardware • Task Parallelism : Multithreading - Utilize all the cores • SIMD Parallelism : SIMD Programming - Utilize all the vector units • Learning intrinsics is time consuming, and not always accessible to every programmer. • Make it easier to get all the FLOPs without being a ninja programmer • Reduce the development cost by working with a high level language SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Why ISPC? ISPC : A Brief Recap
  • 6. • The Intel SPMD Program Compiler • SPMD == Single Program, Multiple Data programming model • It’s a compiler and a language for writing vector (SIMD) code. • Open-source, LLVM-based language and compiler for many SIMD architectures. • Generates high performance vector code targeting many vector ISAs. • Cross platform support (Windows/Linux/MacOS/PS4/Xbox/ARM AARCH64) • The language is C based • Simple to use and easy to integrate with existing codebase. • ISPC is not an “autovectorizing” compiler! • Vectors are built into the type system, not discovered • The programmer explicitly specifies vector or scalar variables SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST What is ISPC? ISPC : A Brief Recap
  • 7. ISPC : A Brief Recap • C based, so it’s easy to read and understand • Code looks sequential, but executes in parallel • Easily mixes scalar and vector computation • Explicit vectorization using two new keywords, uniform and varying • Vector iteration via foreach keyword SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST https://ispc.godbolt.org/z/sOpQ8Z What does the language look like? It is basically shader programming for the CPU!
  • 8. • The ISPC compiler produces everything required for very simple integration into application code. • C/C++ header file • Contains the API/function call for each kernel you have written • Contains any data structures defined in your ISPC kernel and required by the application code. • Object files to link against • No bulky runtime or verbose API SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC : A Brief Recap Easy integration
  • 9. • Programmers no longer need to know the ISA to write good vector code. • More accessible to programmers who aren’t familiar with SIMD intrinsics. • More programmers are able to fully utilize the CPU in different areas of application development. • Reduced development cost • It’s easier to develop and maintain. Simple integration. It looks like scalar code. • Increased optimization reach • Supporting a new ISA is as easy as changing a command line option and recompiling. • Increased performance over scalar code • SSE : ~3-4x; AVX2 : ~5-6x • YMMV ☺ SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC : A Brief Recap Why is this good?
  • 10. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Vector Loops Epic Chaos Demo - Image courtesy of Epic Games®
  • 11. Vector Loops • Foreach is a convenience mechanism: • It is a simd_for loop and iterates in chunks of simd width sized steps • Unmasked main body for when all SIMD lanes are enabled • Masked tail body for when some SIMD lanes are disabled • Foreach can be N dimensional, where each dimensional index is a varying SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • For loop • A for loop with a varying index will use masking in the loop body • Safe, but with a slight cost • A for loop with a uniform index will have no masking • The user will need to add a tail body https://ispc.godbolt.org/z/r1eflk foreach(…) vs for(…)
  • 12. Vector Loops SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST foreach example https://ispc.godbolt.org/z/00eIcH Unmasked Main Body Masked Tail Body
  • 13. Vector Loops • Serializes over each active SIMD lane • Many Uses : • Atomic operations • Custom reductions • Calls to uniform functions • … SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST https://ispc.godbolt.org/z/i18Lux Unreal Engine 4.23, Chaos Physics ISPC Source foreach_active
  • 14. Vector Loops SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Loop over each unique value in a varying only once • Execution mask enabled for all SIMD lanes with the same value https://ispc.godbolt.org/z/r49y7i foreach_unique
  • 15. Vector Loops Naïve ports to uniform code paths can miss opportunities SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Axis of parallelization Try looking for a new axis of parallelization https://ispc.godbolt.org/z/GF7myA Scalar Vector
  • 16. Vector Loops • ISPC supports multiple axis of parallelization within a kernel • HLSL/GLSL/CL only support 1 • User controlled • Provides optimization opportunities SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST https://github.com/ispc/ispc/blob/master/examples/sgemm/SGEMM_kernels.ispc Multiple axes of parallelisation
  • 17. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Structures and Pointers Intel® OSPRay : Gramophone rendered in Pixar’s usdview
  • 18. Structures and Pointers SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST struct vec3f { float x, y, z; }; struct Ray { vec3f origin; vec3f direction; float tnear; float tfar; }; Uniform Ray uniform Ray r; Varying Ray varying Ray r; Uniform vs. Varying structures
  • 19. Structures and Pointers SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST struct vec3f { float x, y, z; }; struct PerspRay { uniform vec3f origin; vec3f direction; float tnear; float tfar; }; Uniform PerspRay uniform PerspRay r; Varying PerspRay varying PerspRay r; Uniform vs. Varying structures
  • 20. • Pointers are complex • The variability is specified like ‘const’ in C/C++ uniform float * varying vPtr; • Variability: 2 parts • The pointer itself • Single pointer? Different pointer per SIMD lane? • Default: varying • The item pointed-to • Scalar value? Vector value? • Default: uniform • Be explicit and specify the variability so it’s correct and clear to the reader SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Structures and Pointers ISPC pointers
  • 21. Structures and Pointers SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST -> Pointer Data f -> f f f f -> -> -> -> f f f f -> -> -> -> f f f f f f f f f f f f f f f f uniform float * uniform uPtr2u; varying float * uniform uPtr2v; uniform float * varying vPtr2u; varying float * varying vPtr2v; ISPC pointers
  • 22. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Memory Access Epic Chaos Demo - Image courtesy of Epic Games®
  • 23. uniform vec3f uPos { } varying vec3f vPos { } Memory Access struct vec3f { float x; float y; float z; }; Memory Layout: x y z x y z x y z … x y z x y z x y z x y z x x x x y y y y … Uniform vs. Varying data layout
  • 24. varying Ray uRay { origin { } direction { } tnear tfar } Memory Access Complex data layout uniform Ray uRay { origin { } direction { } tnear tfar } struct Ray { vec3f origin; vec3f direction; float tnear; float tfar; }; x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z
  • 25. Memory Access SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • ISPC will automatically transpose your array of structures (AoS) data to structures of arrays (SoA) and back • Useful for block copying uniform structs into varyings • It will just work! • But there may be faster alternatives? Data transposition https://ispc.godbolt.org/z/4_p44L
  • 26. Memory Access • Vector reads/writes to non-contiguous memory • AVX2 onwards supports an optimised gather instruction • AVX512 supports an optimised scatter instruction • ISPC will use these if available • ISPC will emit performance warnings when it finds gather/scatters #pragma ignore warning(perf) SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Gather performance has improved over successive generations • But there can be faster alternatives, especially if there is cacheline locality • Aos_to_Soa() helpers • Good for packed float3/float4 data types • Shuffle() • Load a vector register from memory and swizzle the data • You will need to experiment on your dataset. • The fastest form of gather is no gather – read contiguous memory where possible! Scatter/Gather
  • 27. Memory Access • It's best to use SoA or AoSoA layouts with ISPC • Re-arranging data is not always easy • Transposing the input data can be faster than using gather/scatter instructions. • When to transpose? • If the algorithm is cheap, it's best to convert the data into a temporary buffer, do the work then convert back. • Otherwise transpose live data on the way in/out of the kernel. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST AOS to SOA Transpose Array of Structures (AoS) Structure of Arrays (SoA) Hybrid Array of Structures of Arrays (AoSoA)
  • 28. Memory Access • There are stdlib functions, aos_to_soa3/4. • They assume arrays of vec3/vec4 input data. • What about strided data? • You can write your own transpose functions using the stdlib. • Use loads, shuffles, inserts, etc. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST AOS to SOA Vector Load Vector Load Vector Load Vector Store Vector Store Vector Store Shuffle Shuffle
  • 29. Memory Access SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST AOS to SOA example https://ispc.godbolt.org/z/NwLihI Unreal Engine 4.23, Chaos Physics ISPC Source
  • 30. DRAM Memory Access • Allows writes to memory to occur bypassing the cache • Avoids cacheline reads and cache pollution • Useful when bandwidth limited • Not always faster than normal stores • Never read the memory straight after the write • It won’t be in cache and will be slow… • Write full cachelines to avoid partial writes • Used for techniques such as : • Texture writes • Geometry transformations • Compression • … • Experiment with your dataset. • What about streaming loads? • Unless the memory was specifically allocated with the write combining flag, they won’t do anything SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Streaming stores Normal Write Cache Hierarchy Write Combine Buffer Streaming Store
  • 31. Memory Access SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Streaming stores example https://ispc.godbolt.org/z/bKOJ1m
  • 32. Memory Access • Loads and stores can be aligned or unaligned (default) • There are specific instructions for each type • Historically this had a performance impact • Unaligned loads/stores may straddle cachelines • Newer Intel architectures have reduced/removed this impact • Alignment needs to be the register width • SSE : 16byte, AVX2 32byte, AVX512 64byte • Simple to enable in ISPC • --opt=force-aligned-memory • Try it – YMMV! SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Aligned memory Cacheline Cacheline Cacheline Unaligned Load CachelineAligned Load
  • 33. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Control Flow Intel® OSPRay : Richtmyer–meshkov volume shown with shadows and ambient occlusion
  • 34. Control Flow SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Divergent control flow Control flow divergence can be costly 1 1 1 10 1 0 11 0 1 0 1 1 1 11 1 1 10 0 0 0 Divergent branch causes both expensive operations to be executed Uniform branch causes a single expensive operation to be executed Consider this : Now consider this : Execution Mask Execution Mask https://ispc.godbolt.org/z/XM0MEw
  • 35. Control Flow Unmasked Functions • Avoids masked operations • Useful if you want to use a different execution mask SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Unmasked Blocks • An optimisation • Avoids masked operations • Useful when you know there are no side effects Unmasked https://ispc.godbolt.org/z/i18Lux
  • 36. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Interfacing Tricks Epic Chaos Demo - Image courtesy of Epic Games®
  • 37. Interfacing Tricks • Input data is generally an array of uniforms • These can be copied directly to varyings by using a varying index • Such as programIndex • They can be cast to a varying pointer and dereferenced • Applications can pass in ‘fake’ varyings which still generates SIMD code SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Mapping input data to ispc varyings https://ispc.godbolt.org/z/-hbfO1
  • 38. Interfacing Tricks SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Just like normal C/C++ code, there are times when you need to call external code • ISPC supports this for any external function using ‘C’ linkage Calling back to C https://ispc.godbolt.org/z/P5XcuT
  • 39. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Choosing the Right Target Epic Chaos Demo - Image courtesy of Epic Games®
  • 40. Choosing the Right Target • ISPC has a limited set of decoupling of SIMD width and ISA • “Double Pumped” • Vector instructions executed twice to emulate double width registers • Can be effective at hiding latency • sse4-i32x8, avx2-i32x16, etc • “Half Pumped” • Vector instructions executed with narrower SIMD width registers • Use a richer ISA for performance gains • avx512skl-i32x8 • Avoids platform specific AVX512 power scaling • As simple as changing the command line • --target=... • Experiment to find the best targets for your workload SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Asymmetrical SIMD register width and target SIMD ISA https://ispc.godbolt.org/z/4EhA2A
  • 41. Choosing the Right Target ISPC supports compiling to multiple targets at once • Currently, only 1 target per ISA • Auto dispatch will choose the highest supported compiled target that a platform supports, at runtime • Manual dispatch will be coming in a future release… SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Compile for all of the main targets • SSE4, AVX2, AVX512 • This will allow the best performing ISA to run on your system • Unreal Engine and OSPRay compile for all of the main targets by default. Auto dispatch : multi-target compilation --target=sse4-i32x4,avx2-i32x8, avx512skx-i32x16
  • 42. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC StdLib Intel® OSPRay : OSPRay’s path tracer supports physically-based materials and a common principled material
  • 43. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC STDLIB Use ISPC stdlib ISPC provides a rich stdlib of operations: • Logical operators • Bit ops • Math • Clamping and Saturated Arithmetic • Transcendental Operations • RNG (Not the fastest!) • Mask/Cross-lane Operations • Reductions • And that’s not all! https://github.com/ispc/ispc/blob/master/stdlib.ispc
  • 44. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Floating Point Determinism Epic Chaos Demo - Image courtesy of Epic Games®
  • 45. To increase floating point precision/determinism : • Don’t use `--opt=fast-maths` • Do use `--opt=disable-fma` • But, there will be a performance penalty SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Floating Point Determinism A Quick note!
  • 46. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Debugging and Optimizing ISPC Kernels Epic Chaos Demo - Image courtesy of Epic Games®
  • 47. • Compile ISPC kernels with –g • Visual Studio, gdb, lldb etc works as expected • View registers, uniform and varying data • Visual Studio Code ISPC Plugin available • Syntax highlights, Auto- complete stdlib, Real-time validation SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Debugging ISPC Kernels Debugging
  • 48. • The best way to check for performance deltas when optimising code is to benchmark it • Sometimes the code of interest is too small, so need a microbenchmark • A small ISPC kernel run many times, ideally on real data • Caution as the results may not be representative of the final gains • ISPC git repo will soon contain a microbenchmark `ispc-bench` • Based on google benchmark • Simple to use and augment • ISPC Dev team are looking for contributions to help improve ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Optimising ISPC kernels Benchmarking
  • 49. Optimising ISPC kernels SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC is supported by the Compiler Explorer • Simply copy and paste your kernels into a browser • Try different command line arguments • Look for optimization opportunities in the ASM code • Experiment with all of the example code from this presentation • Now supports using ispc (trunk) Godbolt Compiler Explorer http://ispc.godbolt.org/
  • 50. Optimising ISPC kernels SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • LLVM-MCA provides static code uOp/cycle counts • Doesn’t accurately report the cost of memory ops, but still useful Godbolt Compiler Explorer : llvm-mca https://ispc.godbolt.org/z/etmC_T
  • 51. Optimising ISPC kernels SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Profile your ispc kernels looking for hotspots • Compile the kernels with –g for debugging symbols • ISPC heavily inlines, so use ‘noinline’ to target hotspot functions VTune https://software.intel.com/en-us/vtune
  • 52. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC Roadmap Intel® OSPRay : Disney’s Moana Island Scene: over 15 billion instanced primitives rendered interactively
  • 53. ISPC Roadmap ISPC v1.12 • ARM support • Cross compilation support (iOS/Android/Switch/Xbox/PS4) • Noinline keyword • Performance improvements SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC v1.next • Performance improvements • Future hardware support • Manual dispatch ISPC roadmap File an issue on github – let us know what you need! Submit a patch – show us what you need!
  • 54. Advanced ISPC SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC Resources Intel® OSPRay : OSPRay’s path tracer supports physically-based materials and a common principled material
  • 55. ISPC Resources ISPC Home Page • https://ispc.github.io/ispc.html ISPC Origins • https://pharr.org/matt/blog/2018/04/18/ispc-origins.html ISPC on Intel® Developer Zone • https://software.intel.com/en- us/search/site/language/en?query=ispc Visual Studio Code ISPC Plugin • https://marketplace.visualstudio.com/items?itemName=intel- corporation.ispc ISPC Compiler Explorer • https://ispc.godbolt.org/ Intel® Intrinsics Guide • https://software.intel.com/sites/landingpage/IntrinsicsGuide/ Agner Fog Instruction Tables • https://www.agner.org/optimize/instruction_tables.pdf uOps Latency, Throughput and Port Usage Information • http://uops.info/ SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ISPC Github • https://github.com/ispc/ispc/ Intel® OSPRay • https://www.ospray.org/ Unreal Engine • https://www.unrealengine.com/en-US/ ISPC Texture Compressor • https://github.com/GameTechDev/ISPCTextureCompressor ISPC DX12 nBodies Sample • https://github.com/GameTechDev/ISPC-DirectX-Graphics- Samples SPIRV to ISPC Project • https://github.com/GameTechDev/SPIRV-Cross ISPC in Unreal Engine Blog Post • https://software.intel.com/en-us/articles/unreal-engines-new- chaos-physics-system-screams-with-in-depth-intel-cpu- optimizations ISPC on the web
  • 56. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST 56
  • 57. • Subtitle Copy Goes Here