SlideShare uma empresa Scribd logo
1 de 59
Presenter: Stephen Friedman (Pixar Animation Studios)
Authors: Alex Wells (Intel),
Max Liani & Stephen Friedman (Pixar Animation Studios) ,
Larry Gritz (Sony Pictures Imageworks)
Contributors: Steena Monteiro & Louis Feng (Intel)
August 16, 2018
Legal Disclaimers and Optimization Notices
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from
course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest
forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.
Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer
system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or
configuration may affect your actual performance.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2,
SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-
dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates
may make these results inapplicable to your device or system.
Intel, Xeon and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.
2
3
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
4
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
Shading in Physically Based
Rendering
5
Shading Networks
Develop reusable shading nodes
Connect nodes to define complex
materials
Production shading networks can grow
very large to 100s, 1000s of nodes.
6
C++ Shader Limitations
Lack of context at compile time
 Input parameters unknown
 Geometry being shaded unknown
 Mode of shading unknown
 Surrounding shading network unknown
 Branchy testing required
Lack of portability
Requires “Performance Ninjas”
7
Image Credit: Ninja Working AT Desk from Vector.me (by Hector Gomez)
Open Shading Language
Developed by Sony Pictures Imageworks*
C-like DSL for programmable shading
API to connect shaders into networks
Open source
 http://github.com/imageworks/OpenShadingLanguage
Sci-Tech Award* in 2017
8Logo owned by Academy of Motion Picture Arts and Sciences for Infobox
*Other names and brands may be claimed as the property of others.
9
Poster images (c) Sony Pictures*, Paramount*, Warner Brothers*, Disney*, Fox*, Universal*
Example OSL Shader (marble)
10
shader marble (color Cin = .5,
float freq = 1.0,
output color Cout = 0)
{
float sum = 0;
float freqVal = freq;
point Pshad = transform ("object", P);
for (int i = 0; i < 6; i++)
{
sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ;
freqVal = 2 * freqVal;
}
Cout = Cin * sum;
}
Shader
Globals
(input set by renderer)
Library CallsLibrary CallsLibrary Calls
11
oslc
Offline
compiler
Shader
Written in OSL
Intermediate OSO
(Instructions + operands)
OSL Runtime
Library
Renderer
(Pixar’s RenderMan*, Autodesk Arnold*, Blender*)
Scene Management
Ray Tracing/Path Tracing
Light Integration
OSL Runtime
Build
Shading
Network
callbacks
Render Time
Optimization
With
LLVM* JIT
(Just In Time Compilation)
Execute
Shading
Network
(per Point)
Optimized x86
QueryOutputs
*Other names and brands may be claimed as the property of others.
Complexity
12
Image (c) 21st Century Fox
173 billion shader invocations
> 16 hours to execute (1t)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
OSL Render Time Optimization
280M ops -> 2.68M
 (-99.0%)
161M symbols -> 1.9M
 98.8% reduction
150,612 empty shader instances
 63% optimized away
13
Image (c) 21st Century Fox
99% reduction of operations
Can outperform precompiled C++ shaders
(mostly because of Render Time optimization)
14
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
OSL Scalability Limitations
Single sample execution
 limited opportunities to leverage SIMD
 high execution cost
Block vectorization using Intel® Streaming SIMD Extensions (Intel® SSE)
 only 4-wide
 very limited support (noise functions, texturing)
No benefits from modern 8/16-wide CPUs
15
No benefits from modern 8/16-wide CPUs
Image (c) Pixar Animation Studios
Creating a SIMD Scalable OSL
Use Single Program Multiple Data (SPMD) techniques
 No changes to the OSL language specs
– Leverage OSL render time optimization
 Create “Batched” interface to OSL
– Retain “single point” interface
 Create “wide” backend to generate code
– Directly emit vector operations
– Leverage LLVM* vector data types<16 x float>
 Create “wide” library
– Leverage OpenMP Explicit Vectorization
16
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
17
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
18
This Page Intentionally Left Blank
19
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
Renderer
INTERFACE to Renderer
20
Shading System
execute(ShaderGlobals,…)
symbol_address(…)
execute_batch(ShaderGlobalsBatch, …)
symbol_batch_accessor(…)
Submit Single Point
Query Results
Submit Batch
of Points
Query Batch of
Results
ShaderGlobalsBatch
Uniform:
context *’s
Raytype
…
Queue of Varying:
Surface Position
Incident Ray
Surface Normal
…
ShaderGlobals
New “Batched”
Interface
Representing Varying Data
21
template <int WidthT>
struct Wide<Vec3, WidthT> {
float x[WidthT];
float y[WidthT];
float z[WidthT];
};
Fixed size SOA
(Structure of Arrays)
friendly for SIMD hardware
Define “Wide” wrappers for
existing data types:
Vec3, Matrix44, float, etc.
Image © Disney/Pixar
Accessing Varying Data
Inspired by techniques from Intel’s SIMD Data Layout Templates
22
my_callback(ConstWideAccessor<float> wScale,
ConstWideAccessor<Matrix44> wM,
WideAccessor<Vec3> wVS,
MaskedAccessor<Vec3> wVT) {
for(int i=0; i < wr.width; ++i) {
Vec3 V = wVS[i];
float F = wScale[i];
Matrix44 M = wRM[i];
wVS[i] = V*F;
wVT[i] = transform(M,V);
}
}
Array subscript returns a
proxy object to that lane
Accessors transparent
AOS view of SOA
Extract data
from a lane
of the SOA
Skips assignment if
lane masked off
BatchedRendererServices
texture(“MyTex”,…, );
Uniform Texture binning
23
texture(“MyTex”, u, v); No Overhead
if (layer == 1)
file = “r.tex”;
else if (layer == 2)
file = “g.tex”;
else if (layer == 3)
file = “b.tex”;
texture(file, u, v);
3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1layer =
file =
Mask =
texture(“b.tex”,…, );
texture(“r.tex”,…, );
texture(“g.tex”,…, );
JIT’d
Binning
Full flexibility
24
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
25
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• …
• Utilizing SIMD as a Language Author
• Uniform Computation Optimization
• Control Flow Management
• Mapping OSL to Assembly
• OSL Library SIMD Implementations
• …
Uniform vs Varying Variables
In some other high performance shading languages, it is left as an exercise for the
programmer to identify what variables and parameters are uniform or varying with keywords:
• Pixar’s RenderMan* Shading Language (RSL)
• https://renderman.pixar.com/resources/RenderMan_20/shadingLanguage.html
• Intel SPMD Program Compiler
• https://ispc.github.io/
• OpenMP*
• https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
Furthermore, multiple versions of functions may need to be created to handle different
combinations of uniform vs. varying.
26
varying float patctx = 0; /* initialize the context */
varying float f = gridpattern("concentric", patctx);
for (uniform int j = 0; j < height; j++)
#pragma omp declare simd uniform(a) linear(1: b)
somefunc(float a, float * b, float c);
*Other names and brands may be claimed as the property of others.
HIDE UNIFORM vs varying from
user• Goals:
• No changes to shader source/.oso files.
• Allow shader authors to leverage SIMD hardware without added complexity.
• Leverage common operations across batches of shading points.
• Implications:
• Can't add new keywords (uniform, varying, forall)
• Must leverage uniform computations when possible for:
• data layout
• control flow
• code generation
• New interfaces to allow renderers to leverage uniform computations
27
HIDING UNIFORM vs varying: Is it
possible?
• We can leverage domain specific restrictions
• No external library functions
• User functions are part of the shader source/.oso
• Well defined contracts of varying/uniform in OSL library and
RendererServices
28
YES WE CAN!
Identifying Uniform vs Varying
• Variables are uniform until proven varying
• Varying is proven by tracing dependence from known-varying shader globals
29
point Pshad = transform ("object", P);
for (int i = 0; i < 6; i++)
{
sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ;
freqVal = 2 * freqVal;
}
Cout = Cin * sum;
}
P is a varying
Shader Global
Uniform because no dependency on
varying Shader Global
Uniform because no dependency on
varying Shader Global
30
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• …
• Utilizing SIMD as a Language Author
• Uniform Computation Optimization
• Control Flow Management
• Mapping OSL to Assembly
• OSL Library SIMD Implementations
• …
Handling Divergence
• System to track logical masks.
• Which stores need to be masked?
31
Image © Disney/Pixar
Variable Value
Before Branch
<16 x float>
Mask
<16 x i1>
(1 = execute Branch)
(0 = skip Branch)
Variable Value
Modified in Branch
<16 x float>
Select
Result
<16 x float>
Keep a Stack of Masks
32
surface test_conditional_masking(output color ResultRGB = 0) {
if (x > 0.5) {
if (y > 0.5) {
float powB = pow(z, 5.3);
float g;
float inv_g;
if (powB > 0.75) {
inv_g = 1.0/y;
inv_g = inv_g*inv_g;
g = smoothstep(x,z,inv_g);
} else {
float in_red = x;
float inv_r = 1.0/in_red;
g = noise("perlin",inv_r);
}
ResultRGB[1] = g;
}
}
}
Logical Mask Stack <16 x i1>
empty
(x > 0.5)
(y > 0.5) &&
(x > 0.5)
(powB > 0.75) &&
(P[1] > 0.5) &&
(P[0] > 0.5)
(P[1] > 0.5) &&
(P[0] > 0.5)
!(powB > 0.75) &&
(P[1] > 0.5) &&
(P[0] > 0.5)
(P[1] > 0.5) &&
(P[0] > 0.5)
(P[0] > 0.5)
empty
true or false
1-bit per data lane
33
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• …
• Utilizing SIMD as a Language Author
• Uniform Computation Optimization
• Control Flow Management
• Mapping OSL to Assembly
• OSL Library SIMD Implementations
• …
OSL Shader -> Assembly
34
Line Level Debugging
(even inlined functions)
The Starting Line: marble single sample execute
35
95% of the time spent
in the OSL library
“_2” is the JIT
(marble shader)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
36
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• …
• Utilizing SIMD as a Language Author
• Uniform Computation Optimization
• Control Flow Management
• Mapping OSL to Assembly
• OSL Library SIMD Implementations
• …
Noise: Before & After
37
Scalar computation
with
Scalar data types
Block Vectorization
with intrinsics
template<int WidthT> void operator() (MaskedAccessor<float, WidthT> wresult,
ConstWideAccessor<Vec3, WidthT> wp) const {
#pragma forceinline recursive
{
#pragma omp simd simdlen(WidthT)
for(int l=0; l< WidthT; ++l) {
Vec3 p = wp[l];
float perlinResult;
HashScalar h;
perlin_scalar(perlinResult, h, p.x, p.y, p.z);
float scaledResult = 0.5f * (perlinResult + 1.0f);
wresult[l] = scaledResult;
}
}
}
inline void operator() (float &result, const Vec3 &p) const
{
HashScalar h;
perlin(result, h, p.x, p.y, p.z);
result = 0.5f * (result + 1.0f);
}
Explicit
Outer Loop
Vectorization
(Intel® C++ Compiler)
(Clang 5+)
The Finish Line: marble batch
execute
38
Wide version of noise:
4x speedup
JIT of marble.osl:
13.2x speedup
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with
Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice.
39
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
40
0.125
0.25
0.5
1
2
4
8
16
null
sin
cos
tan
asin
acos
atan
sinh
cosh
tanh
atan2
sincos
log
log2
log10
logb
exp
exp2
expm1
pow
erf
erfc
radians
degrees
sqrt
inversesqrt
hypot
abs
fabs
sign
floor
ceil
round
trunc
mod
min
max
clamp
mix
isnan
isfinite
select
dot
cross
length
distance
normalize
reflect
fresnel
rotate
transform
transform_matrix
matrix_object_camera
determinant
transpose
linearstep
smooth_linearstep
noise_perlin
noise_cell
noise_simplex
noise_gabor
pnoise_perlin
pnoise_cell
pnoise_gabor
spline_bezier
spline_bspline
spline_catmull-rom
spline_hermite
spline_linear
spline_constant
Batch Size 1 [0.56x] Batch Size 2 [1x] Batch Size 4 [2.12x] Batch Size 8 [4x] Batch Size 12 [5.7x] Batch Size 16 [7.6x]
Speedup
MICRO-BENCHMARK of OSL library
OSL’s testshade running on 40 threads of
Intel® Xeon® Gold 6148 @2.4Ghz (config 2)
0
2
4
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Speedup
Batch Size
threads.osl
(Batched Speedup vs. Single Point)
Intel® AVX-512 performance vs
BATCH UTLIZATION
41
5.7x
Speedup
OSL’s testshade running on 40 threads of
Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
0
1
2
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Speedup
Batch Size
TheDonutShader.osl
(Batched Speedup vs. Single Point)
Intel® AVX-512 performance vs
BATCH UTLIZATION
42
3.13x
Speedup
Break
even
OSL’s testshade running on 40 threads of
Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Speedup
Points in Batch
concrete
leopard
oak
marble
diamond
plate
43
21x
4.3x
4.1x
3.7x
5.6x
Intel® AVX-512 performance
vs BATCH UTLIZATION
OSL’s testshade running on 40 threads of
Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Speedup
concrete leopard oak marble diamond
plate
thread donut
BATCHED Intel® AVX-512
vs Intel® AVX2
44
1.5x 1.52x 1.47x
1.27x
1.7x
1.38x
1.2x
OSL’s testshade running Intel AVX512 on 40 threads of
Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
OSL’s testshade running Intel AVX2 on 36 threads of
Intel® Xeon® E5-2697v4 @2.3Ghz (config 3)
Statue2
• High quality settings
• Expensive “gabor” noise
45
Single Point Batched
2.82x Shading
Speedup
87% Batch
Utilization
2x Overall Speedup
Image © Disney/Pixar
Pixar’s RenderMan* 22.dev running on all 40 threads of Intel®
Xeon® Gold 6148 @2.4Ghz (config 2)
*Other names and brands may be claimed as the property of others.
Bonnie
46
Single Point Batched
1.33x Shading
Speedup
62% Batch
Utilization
• Real production character with 55 shader networks
• 85663 shader operations on 67680 symbols (post-optimization)
Amdahl’s
Law
Image © Disney/Pixar
Pixar’s RenderMan* 22.dev running on all 40 threads of Intel®
Xeon® Gold 6148 @2.4Ghz (config 2)
*Other names and brands may be claimed as the property of others.
1.33x
1.40x
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
0
10
20
30
40
50
60
70
80
90
100
9 Bounces 5 Bounces 3 Bounces 2 Bounces 1 Bounce
ShadingSpeedup
BatchUtilization%
Batch Utilization Shading Speedup
Batch Utilization/ Ray Bounces
(bonnie)
47
Pixar’s RenderMan* 22.dev running on all 40 threads of Intel®
Xeon® Gold 6148 @2.4Ghz (config 2)
*Other names and brands may be claimed as the property of others.
1 point
2 points
3 points
4 points
5 points
6 points
7 points
8 points
9 points
10 points
11 points
12 points
13 points
14 points
15 points
16 points
0
10
20
30
40
50
60
70
80
1 Bounce 2 Bounces 3 Bounces 5 Bounces 9 Bounces
7.3%
13.9%
18.9%
22.3%
25.4%
76.6%
67.1%
60.9%
56.5%
52.6%
%ofBatchesSubmitted
Bucket sizes / RAY BOUNCES
(Bonnie)
48
Pixar’s RenderMan* 22.dev running on all 40 threads of Intel®
Xeon® Gold 6148 @2.4Ghz (config 2)
*Other names and brands may be claimed as the property of others.
Image © Disney/Pixar
Fillmore
49
Single Point Batched
1.58x Shading
Speedup
• Real production character with 56 shader networks
• 153397 shader operations on 123055 symbols (post-optimization)
1.21x Overall Speedup
Pixar’s RenderMan* 22.dev running on all 40 threads of Intel®
Xeon® Gold 6148 @2.4Ghz (config 2)
*Other names and brands may be claimed as the property of others.
Interactive Rendering
in Maya*
50
• Statue2 “fast” quality
• Perlin Noise
• Shader small portion of
overall render time
• Batched Intel® AVX-512
shading
• 10% faster render time
+ 15% frame rate
Pixar’s RenderMan* 22.dev running on 2 of Intel® Xeon® Gold 6148 @2.4Ghz (config 2)
*Other names and brands may be claimed as the property of others.
Video & images © Disney/Pixar
51
Agenda
Taking Advantage of Modern SIMD in a Domain Specific Language
• Open Shading Language (OSL) Overview
• How Modern SIMD Can Improve OSL
• Utilizing SIMD as a Shading/Renderer/Language Author
• Reaping the Benefits of SIMD
• Moving Forward
Next Steps
• Remaining features
• Closures
• Further optimize generated code
• Explore SIMD Texture System
• Intel contributes SIMD OSL to Open Source
• Pixar’s RenderMan* 22
• Upcoming release with SIMD OSL
• Investigate batch utilization
52
Image © Disney/Pixar
*Other names and brands may be claimed as the property of others.
Image © Disney/Pixar
53
Conclusion and Call to Action
• Does your renderer already use OSL?
• Is your renderer capable of batching
requests per Shader Group?
• Unleash the power of Intel® AVX-512
inside the Open Shading Language
• https://github.com/imageworks/OpenShadingL
anguage/tree/IntelBatchedOSL
• Try it out in an upcoming release of
Configurations
56
Config 1 Config 2 Config 3
Model name Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Core(s) per socket 20 20 18
Socket(s) 2 2 2
Memory192GB, DDR4-2666 Mhz (12 x 16GB) 192GB, DDR4-2666 Mhz (12 x 16GB) 128GB, DDR4-2400 MHz (8 x 16GB)
CPU Power PolicyPerformance powersave Performance
Hyperthreading Enabled Enabled Enabled
Turbo Boost Tech Enabled Enabled Enabled
L1d cache 32K 32K 32K
L1i cache 32K 32K 32K
L2 cache 1024K 1024K 256K
L3 cache 28160K 28160K 46080K
Operating System RHEL 7.4 CentOS Linux release 7.3.1611 (Core)
Red Hat Enterprise Linux Server release 7.2
(Maipo)
Bios Version SE5C620.86B.00.01.0009.101920170742 SE5C620.86B.01.00.0412.020920172159 GRRFSDP1.86B0271.R00.1510301446
• All non-interactive tests run on a single socket of these configurations
• Expected environment in render farms
57
OSL ShaderS
• Concrete - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/concrete.osl
• Modifications:
• Leopard - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/leopard.osl
• Diamond plate - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/diamondplateshader.osl
• Thread - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN-Experimental/Threads.osl
• Donut - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN-Experimental/TheDonutShader.osl
• Oak – https://renderman.pixar.com/forum/download.php
• Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/oak.osl
• Marble - https://renderman.pixar.com/forum/download.php
• Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/marble.osl
< float grain=noise("gabor",p,8,"bandwidth",4,"anisotropic",2,"direction",vector(SandDensity,0,0));
---
> float grain=noise("gabor",p,8);
*Other names and brands may be claimed as the property of others.
Minimizing Masked Instructions
59
surface test_conditional_masking(output color ResultRGB = 0) {
if (P[0] > 0.5) {
if (P[1] > 0.5) {
float powB = pow(P[2], 5.3);
float g;
float inv_g;
if (powB > 0.75) {
inv_g = 1.0/P[1];
inv_g = inv_g*inv_g;
g = smoothstep(P[0],P[2],inv_g);
} else {
float in_red = P[0];
float inv_r = 1.0/in_red;
g = noise("perlin",inv_r);
}
ResultRGB[1] = g;
}
}
}Implicit read of output ResultRGB
Read with conditional
mask that is not subset
of the last write
Read with logical mask
that is not a subset of
the a write
Track
Assignments
Track
Assignments
Require
Masking
Assignments
track
logical mask

Mais conteúdo relacionado

Mais procurados

More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationIntel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsIntel® Software
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Software
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Intel® Software
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionIntel® Software
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Intel® Software
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Software Brasil
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architecturespsteinb
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC UpdateIBM Danmark
 
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 
Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*Intel® Software
 

Mais procurados (20)

More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationclCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor Graphics
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*
 
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-ResolutionUltra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architectures
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC Update
 
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*
 

Semelhante a Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the Open Shading Language | SIGGRAPH 2018 Tech Session

Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...Intel IT Center
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkMichelle Holley
 
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...Joao Galdino Mello de Souza
 
Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12Anderson Bassani
 
OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?Anderson Bassani
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...zOSCommserver
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chipinside-BigData.com
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developersMichelle Holley
 
Linux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
Linux on Z13 and Simulatenus Multithreading - Sebastien LlaurencyLinux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
Linux on Z13 and Simulatenus Multithreading - Sebastien LlaurencyNRB
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
zEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewzEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewFelipe Lanzillotta
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...Databricks
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleJoao Galdino Mello de Souza
 
Android on Intel platforms : current state, near-future, future & developers ...
Android on Intel platforms : current state, near-future, future & developers ...Android on Intel platforms : current state, near-future, future & developers ...
Android on Intel platforms : current state, near-future, future & developers ...BeMyApp
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 

Semelhante a Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the Open Shading Language | SIGGRAPH 2018 Tech Session (20)

Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
Unveiling the Early Universe with Intel Xeon Processors and Intel Xeon Phi at...
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
Modernização do Gerenciamento, Monitoramento e Provisionamento em Mainframes ...
 
Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12
 
OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?OpenStack and z/VM – What is it and how do I get it?
OpenStack and z/VM – What is it and how do I get it?
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
z/OS V2.4 Preview: z/OS Container Extensions - Running Linux on Z docker cont...
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
Linux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
Linux on Z13 and Simulatenus Multithreading - Sebastien LlaurencyLinux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
Linux on Z13 and Simulatenus Multithreading - Sebastien Llaurency
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Maximize o valor do z/OS
Maximize o valor do z/OSMaximize o valor do z/OS
Maximize o valor do z/OS
 
zEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware OverviewzEC12 e zBC12 Hardware Overview
zEC12 e zBC12 Hardware Overview
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph Temple
 
Android on Intel platforms : current state, near-future, future & developers ...
Android on Intel platforms : current state, near-future, future & developers ...Android on Intel platforms : current state, near-future, future & developers ...
Android on Intel platforms : current state, near-future, future & developers ...
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 

Mais de Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...Intel® Software
 
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Intel® Software
 
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...Intel® Software
 
Intel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient TrainingIntel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient TrainingIntel® Software
 

Mais de Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
 
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
 
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
 
Intel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient TrainingIntel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient Training
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Unleashing Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Inside the Open Shading Language | SIGGRAPH 2018 Tech Session

  • 1. Presenter: Stephen Friedman (Pixar Animation Studios) Authors: Alex Wells (Intel), Max Liani & Stephen Friedman (Pixar Animation Studios) , Larry Gritz (Sony Pictures Imageworks) Contributors: Steena Monteiro & Louis Feng (Intel) August 16, 2018
  • 2. Legal Disclaimers and Optimization Notices No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor- dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system. Intel, Xeon and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation. 2
  • 3. 3 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 4. 4 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 5. Shading in Physically Based Rendering 5
  • 6. Shading Networks Develop reusable shading nodes Connect nodes to define complex materials Production shading networks can grow very large to 100s, 1000s of nodes. 6
  • 7. C++ Shader Limitations Lack of context at compile time  Input parameters unknown  Geometry being shaded unknown  Mode of shading unknown  Surrounding shading network unknown  Branchy testing required Lack of portability Requires “Performance Ninjas” 7 Image Credit: Ninja Working AT Desk from Vector.me (by Hector Gomez)
  • 8. Open Shading Language Developed by Sony Pictures Imageworks* C-like DSL for programmable shading API to connect shaders into networks Open source  http://github.com/imageworks/OpenShadingLanguage Sci-Tech Award* in 2017 8Logo owned by Academy of Motion Picture Arts and Sciences for Infobox *Other names and brands may be claimed as the property of others.
  • 9. 9 Poster images (c) Sony Pictures*, Paramount*, Warner Brothers*, Disney*, Fox*, Universal*
  • 10. Example OSL Shader (marble) 10 shader marble (color Cin = .5, float freq = 1.0, output color Cout = 0) { float sum = 0; float freqVal = freq; point Pshad = transform ("object", P); for (int i = 0; i < 6; i++) { sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ; freqVal = 2 * freqVal; } Cout = Cin * sum; } Shader Globals (input set by renderer) Library CallsLibrary CallsLibrary Calls
  • 11. 11 oslc Offline compiler Shader Written in OSL Intermediate OSO (Instructions + operands) OSL Runtime Library Renderer (Pixar’s RenderMan*, Autodesk Arnold*, Blender*) Scene Management Ray Tracing/Path Tracing Light Integration OSL Runtime Build Shading Network callbacks Render Time Optimization With LLVM* JIT (Just In Time Compilation) Execute Shading Network (per Point) Optimized x86 QueryOutputs *Other names and brands may be claimed as the property of others.
  • 12. Complexity 12 Image (c) 21st Century Fox 173 billion shader invocations > 16 hours to execute (1t) Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
  • 13. OSL Render Time Optimization 280M ops -> 2.68M  (-99.0%) 161M symbols -> 1.9M  98.8% reduction 150,612 empty shader instances  63% optimized away 13 Image (c) 21st Century Fox 99% reduction of operations Can outperform precompiled C++ shaders (mostly because of Render Time optimization)
  • 14. 14 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 15. OSL Scalability Limitations Single sample execution  limited opportunities to leverage SIMD  high execution cost Block vectorization using Intel® Streaming SIMD Extensions (Intel® SSE)  only 4-wide  very limited support (noise functions, texturing) No benefits from modern 8/16-wide CPUs 15 No benefits from modern 8/16-wide CPUs Image (c) Pixar Animation Studios
  • 16. Creating a SIMD Scalable OSL Use Single Program Multiple Data (SPMD) techniques  No changes to the OSL language specs – Leverage OSL render time optimization  Create “Batched” interface to OSL – Retain “single point” interface  Create “wide” backend to generate code – Directly emit vector operations – Leverage LLVM* vector data types<16 x float>  Create “wide” library – Leverage OpenMP Explicit Vectorization 16 Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 17. 17 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 19. 19 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 20. Renderer INTERFACE to Renderer 20 Shading System execute(ShaderGlobals,…) symbol_address(…) execute_batch(ShaderGlobalsBatch, …) symbol_batch_accessor(…) Submit Single Point Query Results Submit Batch of Points Query Batch of Results ShaderGlobalsBatch Uniform: context *’s Raytype … Queue of Varying: Surface Position Incident Ray Surface Normal … ShaderGlobals New “Batched” Interface
  • 21. Representing Varying Data 21 template <int WidthT> struct Wide<Vec3, WidthT> { float x[WidthT]; float y[WidthT]; float z[WidthT]; }; Fixed size SOA (Structure of Arrays) friendly for SIMD hardware Define “Wide” wrappers for existing data types: Vec3, Matrix44, float, etc. Image © Disney/Pixar
  • 22. Accessing Varying Data Inspired by techniques from Intel’s SIMD Data Layout Templates 22 my_callback(ConstWideAccessor<float> wScale, ConstWideAccessor<Matrix44> wM, WideAccessor<Vec3> wVS, MaskedAccessor<Vec3> wVT) { for(int i=0; i < wr.width; ++i) { Vec3 V = wVS[i]; float F = wScale[i]; Matrix44 M = wRM[i]; wVS[i] = V*F; wVT[i] = transform(M,V); } } Array subscript returns a proxy object to that lane Accessors transparent AOS view of SOA Extract data from a lane of the SOA Skips assignment if lane masked off
  • 23. BatchedRendererServices texture(“MyTex”,…, ); Uniform Texture binning 23 texture(“MyTex”, u, v); No Overhead if (layer == 1) file = “r.tex”; else if (layer == 2) file = “g.tex”; else if (layer == 3) file = “b.tex”; texture(file, u, v); 3 3 1 2 1 1 2 1 2 2 2 2 3 3 3 1layer = file = Mask = texture(“b.tex”,…, ); texture(“r.tex”,…, ); texture(“g.tex”,…, ); JIT’d Binning Full flexibility
  • 24. 24 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 25. 25 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • … • Utilizing SIMD as a Language Author • Uniform Computation Optimization • Control Flow Management • Mapping OSL to Assembly • OSL Library SIMD Implementations • …
  • 26. Uniform vs Varying Variables In some other high performance shading languages, it is left as an exercise for the programmer to identify what variables and parameters are uniform or varying with keywords: • Pixar’s RenderMan* Shading Language (RSL) • https://renderman.pixar.com/resources/RenderMan_20/shadingLanguage.html • Intel SPMD Program Compiler • https://ispc.github.io/ • OpenMP* • https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf Furthermore, multiple versions of functions may need to be created to handle different combinations of uniform vs. varying. 26 varying float patctx = 0; /* initialize the context */ varying float f = gridpattern("concentric", patctx); for (uniform int j = 0; j < height; j++) #pragma omp declare simd uniform(a) linear(1: b) somefunc(float a, float * b, float c); *Other names and brands may be claimed as the property of others.
  • 27. HIDE UNIFORM vs varying from user• Goals: • No changes to shader source/.oso files. • Allow shader authors to leverage SIMD hardware without added complexity. • Leverage common operations across batches of shading points. • Implications: • Can't add new keywords (uniform, varying, forall) • Must leverage uniform computations when possible for: • data layout • control flow • code generation • New interfaces to allow renderers to leverage uniform computations 27
  • 28. HIDING UNIFORM vs varying: Is it possible? • We can leverage domain specific restrictions • No external library functions • User functions are part of the shader source/.oso • Well defined contracts of varying/uniform in OSL library and RendererServices 28 YES WE CAN!
  • 29. Identifying Uniform vs Varying • Variables are uniform until proven varying • Varying is proven by tracing dependence from known-varying shader globals 29 point Pshad = transform ("object", P); for (int i = 0; i < 6; i++) { sum = sum + 1/freqVal * abs(.5 - noise( 4 * freqVal * Pshad)) ; freqVal = 2 * freqVal; } Cout = Cin * sum; } P is a varying Shader Global Uniform because no dependency on varying Shader Global Uniform because no dependency on varying Shader Global
  • 30. 30 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • … • Utilizing SIMD as a Language Author • Uniform Computation Optimization • Control Flow Management • Mapping OSL to Assembly • OSL Library SIMD Implementations • …
  • 31. Handling Divergence • System to track logical masks. • Which stores need to be masked? 31 Image © Disney/Pixar Variable Value Before Branch <16 x float> Mask <16 x i1> (1 = execute Branch) (0 = skip Branch) Variable Value Modified in Branch <16 x float> Select Result <16 x float>
  • 32. Keep a Stack of Masks 32 surface test_conditional_masking(output color ResultRGB = 0) { if (x > 0.5) { if (y > 0.5) { float powB = pow(z, 5.3); float g; float inv_g; if (powB > 0.75) { inv_g = 1.0/y; inv_g = inv_g*inv_g; g = smoothstep(x,z,inv_g); } else { float in_red = x; float inv_r = 1.0/in_red; g = noise("perlin",inv_r); } ResultRGB[1] = g; } } } Logical Mask Stack <16 x i1> empty (x > 0.5) (y > 0.5) && (x > 0.5) (powB > 0.75) && (P[1] > 0.5) && (P[0] > 0.5) (P[1] > 0.5) && (P[0] > 0.5) !(powB > 0.75) && (P[1] > 0.5) && (P[0] > 0.5) (P[1] > 0.5) && (P[0] > 0.5) (P[0] > 0.5) empty true or false 1-bit per data lane
  • 33. 33 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • … • Utilizing SIMD as a Language Author • Uniform Computation Optimization • Control Flow Management • Mapping OSL to Assembly • OSL Library SIMD Implementations • …
  • 34. OSL Shader -> Assembly 34 Line Level Debugging (even inlined functions)
  • 35. The Starting Line: marble single sample execute 35 95% of the time spent in the OSL library “_2” is the JIT (marble shader) Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
  • 36. 36 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • … • Utilizing SIMD as a Language Author • Uniform Computation Optimization • Control Flow Management • Mapping OSL to Assembly • OSL Library SIMD Implementations • …
  • 37. Noise: Before & After 37 Scalar computation with Scalar data types Block Vectorization with intrinsics template<int WidthT> void operator() (MaskedAccessor<float, WidthT> wresult, ConstWideAccessor<Vec3, WidthT> wp) const { #pragma forceinline recursive { #pragma omp simd simdlen(WidthT) for(int l=0; l< WidthT; ++l) { Vec3 p = wp[l]; float perlinResult; HashScalar h; perlin_scalar(perlinResult, h, p.x, p.y, p.z); float scaledResult = 0.5f * (perlinResult + 1.0f); wresult[l] = scaledResult; } } } inline void operator() (float &result, const Vec3 &p) const { HashScalar h; perlin(result, h, p.x, p.y, p.z); result = 0.5f * (result + 1.0f); } Explicit Outer Loop Vectorization (Intel® C++ Compiler) (Clang 5+)
  • 38. The Finish Line: marble batch execute 38 Wide version of noise: 4x speedup JIT of marble.osl: 13.2x speedup Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
  • 39. 39 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 40. 40 0.125 0.25 0.5 1 2 4 8 16 null sin cos tan asin acos atan sinh cosh tanh atan2 sincos log log2 log10 logb exp exp2 expm1 pow erf erfc radians degrees sqrt inversesqrt hypot abs fabs sign floor ceil round trunc mod min max clamp mix isnan isfinite select dot cross length distance normalize reflect fresnel rotate transform transform_matrix matrix_object_camera determinant transpose linearstep smooth_linearstep noise_perlin noise_cell noise_simplex noise_gabor pnoise_perlin pnoise_cell pnoise_gabor spline_bezier spline_bspline spline_catmull-rom spline_hermite spline_linear spline_constant Batch Size 1 [0.56x] Batch Size 2 [1x] Batch Size 4 [2.12x] Batch Size 8 [4x] Batch Size 12 [5.7x] Batch Size 16 [7.6x] Speedup MICRO-BENCHMARK of OSL library OSL’s testshade running on 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 2)
  • 41. 0 2 4 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Speedup Batch Size threads.osl (Batched Speedup vs. Single Point) Intel® AVX-512 performance vs BATCH UTLIZATION 41 5.7x Speedup OSL’s testshade running on 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
  • 42. 0 1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Speedup Batch Size TheDonutShader.osl (Batched Speedup vs. Single Point) Intel® AVX-512 performance vs BATCH UTLIZATION 42 3.13x Speedup Break even OSL’s testshade running on 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
  • 43. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Speedup Points in Batch concrete leopard oak marble diamond plate 43 21x 4.3x 4.1x 3.7x 5.6x Intel® AVX-512 performance vs BATCH UTLIZATION OSL’s testshade running on 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 1)
  • 44. 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Speedup concrete leopard oak marble diamond plate thread donut BATCHED Intel® AVX-512 vs Intel® AVX2 44 1.5x 1.52x 1.47x 1.27x 1.7x 1.38x 1.2x OSL’s testshade running Intel AVX512 on 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 1) OSL’s testshade running Intel AVX2 on 36 threads of Intel® Xeon® E5-2697v4 @2.3Ghz (config 3)
  • 45. Statue2 • High quality settings • Expensive “gabor” noise 45 Single Point Batched 2.82x Shading Speedup 87% Batch Utilization 2x Overall Speedup Image © Disney/Pixar Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 2) *Other names and brands may be claimed as the property of others.
  • 46. Bonnie 46 Single Point Batched 1.33x Shading Speedup 62% Batch Utilization • Real production character with 55 shader networks • 85663 shader operations on 67680 symbols (post-optimization) Amdahl’s Law Image © Disney/Pixar Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 2) *Other names and brands may be claimed as the property of others.
  • 47. 1.33x 1.40x 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 0 10 20 30 40 50 60 70 80 90 100 9 Bounces 5 Bounces 3 Bounces 2 Bounces 1 Bounce ShadingSpeedup BatchUtilization% Batch Utilization Shading Speedup Batch Utilization/ Ray Bounces (bonnie) 47 Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 2) *Other names and brands may be claimed as the property of others.
  • 48. 1 point 2 points 3 points 4 points 5 points 6 points 7 points 8 points 9 points 10 points 11 points 12 points 13 points 14 points 15 points 16 points 0 10 20 30 40 50 60 70 80 1 Bounce 2 Bounces 3 Bounces 5 Bounces 9 Bounces 7.3% 13.9% 18.9% 22.3% 25.4% 76.6% 67.1% 60.9% 56.5% 52.6% %ofBatchesSubmitted Bucket sizes / RAY BOUNCES (Bonnie) 48 Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 2) *Other names and brands may be claimed as the property of others.
  • 49. Image © Disney/Pixar Fillmore 49 Single Point Batched 1.58x Shading Speedup • Real production character with 56 shader networks • 153397 shader operations on 123055 symbols (post-optimization) 1.21x Overall Speedup Pixar’s RenderMan* 22.dev running on all 40 threads of Intel® Xeon® Gold 6148 @2.4Ghz (config 2) *Other names and brands may be claimed as the property of others.
  • 50. Interactive Rendering in Maya* 50 • Statue2 “fast” quality • Perlin Noise • Shader small portion of overall render time • Batched Intel® AVX-512 shading • 10% faster render time + 15% frame rate Pixar’s RenderMan* 22.dev running on 2 of Intel® Xeon® Gold 6148 @2.4Ghz (config 2) *Other names and brands may be claimed as the property of others. Video & images © Disney/Pixar
  • 51. 51 Agenda Taking Advantage of Modern SIMD in a Domain Specific Language • Open Shading Language (OSL) Overview • How Modern SIMD Can Improve OSL • Utilizing SIMD as a Shading/Renderer/Language Author • Reaping the Benefits of SIMD • Moving Forward
  • 52. Next Steps • Remaining features • Closures • Further optimize generated code • Explore SIMD Texture System • Intel contributes SIMD OSL to Open Source • Pixar’s RenderMan* 22 • Upcoming release with SIMD OSL • Investigate batch utilization 52 Image © Disney/Pixar *Other names and brands may be claimed as the property of others.
  • 53. Image © Disney/Pixar 53 Conclusion and Call to Action • Does your renderer already use OSL? • Is your renderer capable of batching requests per Shader Group? • Unleash the power of Intel® AVX-512 inside the Open Shading Language • https://github.com/imageworks/OpenShadingL anguage/tree/IntelBatchedOSL • Try it out in an upcoming release of
  • 54.
  • 55.
  • 56. Configurations 56 Config 1 Config 2 Config 3 Model name Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz Core(s) per socket 20 20 18 Socket(s) 2 2 2 Memory192GB, DDR4-2666 Mhz (12 x 16GB) 192GB, DDR4-2666 Mhz (12 x 16GB) 128GB, DDR4-2400 MHz (8 x 16GB) CPU Power PolicyPerformance powersave Performance Hyperthreading Enabled Enabled Enabled Turbo Boost Tech Enabled Enabled Enabled L1d cache 32K 32K 32K L1i cache 32K 32K 32K L2 cache 1024K 1024K 256K L3 cache 28160K 28160K 46080K Operating System RHEL 7.4 CentOS Linux release 7.3.1611 (Core) Red Hat Enterprise Linux Server release 7.2 (Maipo) Bios Version SE5C620.86B.00.01.0009.101920170742 SE5C620.86B.01.00.0412.020920172159 GRRFSDP1.86B0271.R00.1510301446 • All non-interactive tests run on a single socket of these configurations • Expected environment in render farms
  • 57. 57 OSL ShaderS • Concrete - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/concrete.osl • Modifications: • Leopard - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/leopard.osl • Diamond plate - https://github.com/varkenvarken/osl-shaders/blob/master/Shaders/diamondplateshader.osl • Thread - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN-Experimental/Threads.osl • Donut - https://github.com/ADN-DevTech/3dsMax-OSL-Shaders/blob/master/OSL/ADN-Experimental/TheDonutShader.osl • Oak – https://renderman.pixar.com/forum/download.php • Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/oak.osl • Marble - https://renderman.pixar.com/forum/download.php • Pixar’s RenderMan* examples ./scenes/pattern/osl/shaders/marble.osl < float grain=noise("gabor",p,8,"bandwidth",4,"anisotropic",2,"direction",vector(SandDensity,0,0)); --- > float grain=noise("gabor",p,8); *Other names and brands may be claimed as the property of others.
  • 58.
  • 59. Minimizing Masked Instructions 59 surface test_conditional_masking(output color ResultRGB = 0) { if (P[0] > 0.5) { if (P[1] > 0.5) { float powB = pow(P[2], 5.3); float g; float inv_g; if (powB > 0.75) { inv_g = 1.0/P[1]; inv_g = inv_g*inv_g; g = smoothstep(P[0],P[2],inv_g); } else { float in_red = P[0]; float inv_r = 1.0/in_red; g = noise("perlin",inv_r); } ResultRGB[1] = g; } } }Implicit read of output ResultRGB Read with conditional mask that is not subset of the last write Read with logical mask that is not a subset of the a write Track Assignments Track Assignments Require Masking Assignments track logical mask

Notas do Editor

  1. Not rendering final color, just figuring out diffuse color, surface reflection, or any other property of a material. A Renderer does ray-tracing and light integration but needs these properties from a material given the position of the ray/object intersection in space and the incoming ray orientation, surface normal and other input values. We refer to this process as Shading.
  2. Historically the definition of each node was done in a static programming language (like C++) and execution flow through the graph to produce the requested outputs from the graph. Source code, nodes instantiation and nodes connections is described using the OSL ShadingSystem API.
  3. Note the different parameters that the “2d Texture Placement” node has to handle, the underlying code has to be able to handle any combination of those settings, and resulting is often not optimized well.
  4. Designed for physically based rendering patterns compute radiance closures (BxDFs) , not view-dependent final colors no ray tracing, sampling, integrations, light loops (these are in the renderer) Efficient execution JIT to machine code, extensive runtime optimization Shading networks with lazy evaluation It enables artists at all levels of technical proficiency to create physically plausible materials for efficient production rendering.
  5. Wide industry adoption
  6. Input & Output parameters Shader Globals (how renderer passes in position, surface normal, ray direction, etc.) for the sample they want evaluated by the shader network. NOTE: single precision floating point, 32 bits
  7. Not F.P.S, but Hours or Days per frame
  8. We will rely on LLVM to lower vector operations to capabilities of underlying architecture, which for AVX512 is pretty simple. Worst case is a loop is generated or multiple instructions are issued to satisfy the logical vector operations
  9. Split globals into different structs based on uniformity ShaderGlobalBatched contains both the Uniform & Varying Shader Globals Provides a queue like interface to set the values of the next varying instance and push it into the queue. Renderer needs to use new interfaces and support wide callbacks
  10. For VaryingShaderGlobals as well as callbacks through the renderer, we want a SIMD friendly data layout. This layout isn’t always convenient to code against, and we would rather just program against the original Vec3 or other existing data types
  11. For VaryingShaderGlobals as well as callbacks through the renderer. Accessors just look like an array of the data type, but under the hood is a Wide SOA version backing it. NOTE: assignment to a MaskedAccessor will transparently apply the mask skipping assignment to the data lane.
  12. All textures parameters in a batch could all be varying Texture Subsystem doesn’t want to deal with every option being different per data lane.
  13. One more complexity for artists to deal with, may not get it right cause sub-optimal performance
  14. So we know exactly which functions depend on ShaderGlobals that may be varying and which don’t. This is true for RendererServices as well. Because we don’t optimize until after a shader network is built, we can actually follow variables through connected parameters all the way back to their origin (another thing you can’t do in a traditional programming language). The upshot is that we can follow the dependency chains to automatically identify all variables who’s values are in some way dependent upon a varying ShaderGlobal. This include implicit temporaries that exist in chains of operations. NOTE: analysis of loops is more complex, because break, continue, return, or exit that happen in a non-uniform conditional branch can cause a loop control to be promoted from uniform to varying.
  15. We can blend together results from branches with a bit mask. LLVM has a “Select” operation to choose the original System to track logical masks based on varying conditional operations Added analysis to identify which stores need to be masked
  16. During generation of LLVM IR, keep a stack of non-uniform conditional results (masks are <16 x i1>) When a mask is pushed onto the stack, it is first combined with the mask already on the top of the stack. Handle the “else” of a conditional by tracking a “negated” flag in the stack vs. negating it. When blending can just reverse order of blend vs. extra instructions to negate the mask This gets much more complicated for loop control, early return, break, continue, and exit operations.
  17. We added support to hookup the profiling from OSL runtime to LLVM with Intel JIT profiling enabled so we could actually see the dynamic code generation from inside Vtune. Special Build of LLVM -DLLVM_USE_INTEL_JITEVENTS=1 Modified OSL to enable debug info in the LLVM JIT module emit LLVM debug info and locations as OSL operations are generated Now we can: profile map OSL to assembly with VTune We can run GDB with full line table and callstack through inlined OSL function calls viewing the OSL shader and assembly
  18. Non batched execution of marble.osl. Just wanted to highlight that fully SIMD version of the LLVM IR alone would not help too much. We need everything possible in SIMD including the built-in library calls
  19. The scalar version of this perlin noise computation actually had been optimized to perform block vectorization within the algorithm using SSE2 intrinics. To perform outer loop vectorization we needed to remove these intrinsics and revert to the original C++ version of the algorithm, to not mess up the original version’s performance we made a new helper function “perlin_scalar”. Now the wide version is very similar, except its data types are our WideAccessors which using an array subscript can import/export the data type out of the underlying SOA data layout. We explicitly declare our outer loop to be SIMD using an OpenMP 4 #pragma and specifiy the Width, this tells the compiler that “I the programmer have declared that each iteration of this loop can operate in parallel and unordered”. Now the compiler can emit SIMD code and know it is legal, because “we said it was”, no better logic than that! Inside the loop we export the data for the current lane, perform the scalar computation, then import the results for the lane. Also note that the actual scalar computation is not aware of our data layout or our outer loop. Once it’s all inlined, the compiler can produce striking good code for multiple target ISA’s (SSE2, AVX, AVX2, AVX512, etc.) Inspect the optimization reports from the compiler to check on success and quality of code generation. If the compiler ran into issues when vectorizing it will tell you there. (example: could vectorize but would be inefficient) To ensure proper inlining, on the Intel® C++ Compiler we can use the “#pragma force inline recursive”
  20. Note the other “wide” functions may also be vectorized and enjoy the reduced overhead of being called up to 16 times fewer than the non-batched interface needs to call them.
  21. Intent is to show high performance potential but exemplify how that scales down with batch utilization. As batch utilization is partially in the renderer’s hands, renderers should work to improve there batches to reach top performance.
  22. The actual shader could be taking more/less paths because we skip code blocks when lanes are completely masked off. As we increase active SIMD lanes, the chance of skipping a branch of code goes down. So skipping branches is more effective with small batch sizes, and we are likely executing more code blocks of a shader as the active number of SIMD lanes increases. This can cause non-linear performance / vs active simd lanes for a shader.
  23. Sometimes a shader could be slower at low batch utilization, but there usually a point at which it becomes more profitable. We might be able to better optimize the used OSL library API’s to take a different code path when batch utilization is low, which could have the effect of improving low batch utilization performance
  24. Concrete uses expensive gabor noise and enjoys a hyper speedup from the improved implementation
  25. Concrete uses expensive gabor noise and enjoys a hyper speedup from the improved implementation 100% batch utilization of 16 points.
  26. When I run to convergence, the frame takes 2:20" with scalar OSL and 1:02" with AVX-512
  27. Discuss batch utilization from the Renderer’s usage vs. batch utilization within shaders due to control flow divergence.
  28. Discuss the ray bounces hitting disparate shading network with possibly smaller and smaller #’s of rays. We can see as the # of bounces is decreased the batch utilization increases and so does the Shading Systems’ speedup.
  29. Discuss batch utilization from the Renderer’s usage vs. batch utilization within shaders due to control flow divergence.
  30. quality of code generation for IA reducing JIT compile time (pain point)I
  31. If your renderer doesn’t use OSL, you might want to consider adding support. Not all renderers are capable of generating batches of material requests, your renderer might need rework to operate on batches. OSL with Scalable SIMD execution can be your shading system providing good ROI for updating the rest of your renderer.
  32. We historically track the place in the logical mask stack each assignment happens for each operation. When a symbol is read, we can compare the current logical mask and determine if it is a subset of the mask of any operations that wrote to the symbol. If it is not a subset, then that assignment operation will need to masked. For assignment operations that have been earlier identified as “requires masking”, just use the mask on the top of the stack to select the correct value “select” is the LLVM IR we use to blend values together based on a mask IMPORTANT NOTE: We don’t need to execute masked version of smoothstep or noise, just need to maks the assignment of their results.