SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
Antoine Cohade & Emil Persson
16/03/2016
More Explosions, More Chaos,
and Definitely More Blowing Stuff Up :
Optimizations and New DirectX Features in
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Legal
Copyright © 2016 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY
WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS
OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL
APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES,
AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL
APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names
in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more information go to
http://www.Intel.com/performance
Iris™ graphics is available on select systems. Consult your system manufacturer.
Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
- Introduction
- Tools
- Optimizations
- DirectX features
- Future Work
- Conclusion
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
4
Intel ® HD Graphics: Market Share
72.8%
totalGPUmarket
18.49%
STEAM
23.9%
Unity
Jon Peddie Research, Q3 2015 Steam HW Survey, Jan 2016 Unity HW Stats, Q4 2015
Millions of gamers with Intel ® HD Graphics equipped PCs
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
5
Introduction: Avalanche Studios
Avalanche Studios
- Founded in 2003
- Offices in Stockholm and New York
Games
- Just Cause 1, 2 and 3
- Mad Max
- theHunter, theHunter: Primal
- Renegade Ops
- Rumble City
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
6
Introduction: Just Cause 3
• Open world action-adventure game
• Developed by Avalanche Studios
• Published by Square Enix
• Released Dec 1, 2015
• Huge open world
• 1000 km2 or 400 square miles
• Advanced graphics technology
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
7
About this project
Small targeted development effort
- Collaboration with Intel®
- Focused on Intel® GPU performance optimizations
- DirectX features pioneered by Intel®
- Additional resources (from R&D, Engine etc.)
- Separate from JC3 mainline development
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
- Introduction
- Tools
- Optimizations
- DirectX features
- Future Work
- Conclusion
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
9
GPA: Just Cause 3 Analysis
HUD / System Analyzer:
Frame Analyzer:
Platform Analyzer:
CPU Limited
GPU Limited
Capture
frame
Capture
trace
?Run with
Intel® GPA
Live Analysis Offline Analysis
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
10
GPA : Just Cause 3 – Platform Analyzer view
Frames
GPU queue
Other metrics
CPU Threads
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
11
GPA : Just Cause 3 – Frame Analyzer view
Custom view chart
Rendertargetoverview
Render target preview
RT & drawcalls(Erg)
selection & timings
Detailed metrics
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
- Introduction
- Tools
- Optimizations
- DirectX features
- Conclusion
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
13
Performance Optimizations: Low-Level ALU
Deferred Lighting Shader
- 6-8ms on Iris™ Pro 5200
- Very long shader, lots of math
- Lots of history
Low-Level ALU optimizations
- Tweaking the math to generate fewer instructions
- Low-Level Thinking in High-Level Shading Languages [Persson13]
Low-Level Shader Optimization for Next-Gen and DX11 [Persson14]
- No changes to output
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
14
Performance Optimizations: Low-Level ALU
Remove a division, use MAD-form:
Separate scalars and vectors:
Precompute:
float k = 2.0f / sqrt(PI * (spec_power + 2.0f)); add + mul + sqrt + rcp + mul
float k = rsqrt((0.25f*PI) * spec_power + (0.5f*PI)); mad + rsqrt
return spec_color * spec_intensity * spec_mask; 6×mul (3+3)
return spec_color * (spec_intensity * spec_mask); 4×mul (1+3)
float3 Color = PointLights[index + 1].rgb;
float HDRScale = PointLights[index + 1].w;
Color *= HDRScale;
3×mul
float3 Color = PointLights[index + 1].rgb; -
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
15
Performance Optimizations: Low-Level ALU
Optimizing inputs
- 4x float4 for spotlights, partly packed
- LightDir stored in 2 floats + sign
- HDRScale and NearCap, 16bits each in a float
- Unpacking the packed
- Falloff scale and bias was 2 floats
- Compute falloff bias from scale (saved one float, added one ALU op)
- LightDir a full float3 (saves ~10 cycles of unpacking)
- HDRScale gone, NearCap gets entire float (saved 6 ALU ops of unpacking)
- Rearranged in access order
- Fewer fetches if branch not taken
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
16
Performance Optimizations: Low-Level ALU
Reuse intermediate results:
dist = dot(l_vec, l_vec) * InvRadSqr;
if (dist < 1.0f)
{
l_vec = normalize(l_vec);
dist *= rsqrt(dist);
...
dist = dot(l_vec, l_vec);
if (dist < RadiusSqr)
{
float rd = rsqrt(dist);
l_vec *= rd; // normalize()
dist *= rd;
...
mul + 2×mad + mul
rsqrt + 3×mul
rsqrt + mul
mul + 2×mad
rsqrt + 3×mul
mul
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
17
Performance Optimizations: Low-Level ALU
Operation modifiers:
float3 tex_proj;
tex_proj = mat[0].xyz * light_vec.x;
tex_proj += mat[1].xyz * light_vec.y;
tex_proj += mat[2].xyz * light_vec.z;
tex_proj *= float3(-1, -1, -1);
float3 tex_proj;
tex_proj = mat[0].xyz * -light_vec.x;
tex_proj += mat[1].xyz * -light_vec.y;
tex_proj += mat[2].xyz * -light_vec.z;
3×mul + 6×mad + 3×mul 3×mul + 6×mad
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
18
Performance Optimizations: Low-Level ALU
Loop counters:
// Point lights
for (uint pl = 0; pl < pl_count; pl++) {
uint index = LightIndices[light_index++];
...
}
// Spot lights
for (uint sl = 0; sl < sl_count; sl++) {
uint index = LightIndices[light_index++];
...
}
// Point lights
uint end = light_index + pl_count;
for (; light_index < end; ++light_index) {
uint index = LightIndices[light_index];
...
}
// Spot lights
end += sl_count;
for (; light_index < end; ++light_index) {
uint index = LightIndices[light_index];
...
}
2×iadd / loop 1×iadd / loop + 2×iadd
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
19
Performance Optimizations: Low-Level ALU
Share computations:
if (shadow > 0) {
float2 rot = ExpensivePseudoRandom();
shadow *= SampleShadow(..., rot);
...
}
float2 rot = ExpensivePseudoRandom();
for (spotlights) {
if (spot > 0) {
if (shadow_caster) {
shadow = SampleShadow(..., rot);
...
}
}
}
float2 rot = ExpensivePseudoRandom();
if (shadow > 0) {
shadow *= SampleShadow(..., rot);
...
}
for (spotlights) {
if (spot > 0) {
if (shadow_caster) {
shadow = SampleShadow(..., rot);
...
}
}
}
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
20
Performance Optimizations: Low-Level ALU
Pull computations out of the loop:
for (int i = 0; i < 16; i++) {
float2 offset;
offset.x = kernel[i].x * rot.x +
kernel[i].y * rot.y;
offset.y = kernel[i].y * rot.x –
kernel[i].x * rot.y;
float2 tap = coord.xy + offset * scale;
...
}
rot *= scale;
for (int i = 0; i < 16; i++) {
float2 tap;
tap.x = coord.x + kernel[i].x * rot.x
+ kernel[i].y * rot.y;
tap.y = coord.y + kernel[i].y * rot.x
- kernel[i].x * rot.y;
...
}
(2×mul + 4×mad)×16 (96 ops) (4×mad)×16 + 2×mul (66 ops)
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
21
Performance Optimizations: Low-Level ALU
Game had configurable attenuation curve
- Not really used. Only one curve existed.
- Using ALU saved 0.2ms on Iris™ Pro 5200.
- Small script to brute-force match a set of functions and parameters
- Picked the best match
- ~1% error
ALU instead of lookup table:
atten = Falloff.SampleLevel(Samp, dist, 0.0f); sample_l
atten = saturate((1.0f - dist) / (dist * dist * 12.21f + 1.0f)); 2×mul + mad + add + rcp
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
22
Performance Optimizations: Low-Level ALU
Deferred Lighting Shader
- 4-5.5ms on Iris™ Pro 5200
- About 2ms saved (depending on scene)
- Potential for saving more
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
23
Performance Optimizations: GPU gaps
Learning : Regularly check CPU/GPU concurrency to avoid surprises
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
24
Performance Optimizations: Instancing
Solution
- Instancing support added to common materials
- Drastic reduction in number of draw calls
- Reduced constant buffer updates
- Removed lots of unused constants
- Removed debug constants, tweak variables etc.
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
25
Performance Optimizations: Instancing
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
26
Performance Optimizations: Manual Instancing
Tree Impostors
- Many instances, tiny mesh. (4 vertices, 6 indices)
- Standard instancing implementation
- DrawIndexedInstanced(6, num_instances, 0, 0, 0);
- Poor wavefront occupancy
Manual Instancing optimization
- Draw as regular indexed mesh
- DrawIndexed(6 * num_instances, 0, 0);
- Immutable index buffer of MAX_INSTANCES * 6
- Manually fetch data from texture buffer
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
27
Performance Optimizations: Manual Instancing
v2p main(a2v In, uint VertexID: SV_VertexID) {
...
}
struct InputData {
float Elevation;
float2 Data;
};
StructuredBuffer<InputData> Insts;
v2p main(uint VertexID: SV_VertexID) {
uint InstanceID = VertexID >> 2;
VertexID = VertexID & 0x3;
// Manually fetch vertex data
a2v In;
In.Elevation = Insts[InstanceID].Elevation;
uint2 prt = asuint(Insts[InstanceID].Data);
In.Data.x = int(prt.x & 0xFFFF) * scale;
In.Data.y = int(prt.x >> 16) * scale;
In.Data.z = int(prt.y & 0xFFFF) * scale;
In.Data.w = int(prt.y >> 16) * scale;
...
}
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
28
Performance Optimizations: Manual Instancing
Manual Instancing
- 2.4ms before, 0.7ms after, on Iris™ Pro 5200
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
29
Performance Optimizations: Vegetation stalls
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
30
Performance Optimizations: Stencil stalls
Learning: If vegetation rendering seems abnormally long, try disabling stencil writes.
If the rendering speeds up significantly, you are impacted.
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
31
Performance Optimizations: Stencil stalls
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
32
Performance Optimizations: Forest Layer
Forest Layer
- Lowest LOD tree representation
- Provides forest silhouette in distance
- Alpha texture filling in detail
Dense grid mesh
- 129x129 per patch
- 5ms in some scenes
- Stencil writes enabled, but disabling didn’t help
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
33
Performance Optimizations: Forest Layer
Optimization
- Mostly Vertex Bound
- Mesh optimizations
- Added 65x65 and 33x33 LODs
- Large reduction in total vertices shaded
- Small visual difference. High settings mostly use highest LODs.
- Packed vertex format (2 floats → 2 shorts)
- 16bit index buffer
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
34
Performance Optimizations: Forest Layer
Optimization
- Shader optimizations
- Added a simpler “no-fade” vertex shader, used by most patches
- Pre-computations
- Prebaked scaling into the world matrix
- Folded constants
- Handful of low-level optimizations
- Simplified math
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
35
Performance Optimizations: Forest Layer
Results
- Good performance gain
- Down from 5.0ms to 2.5ms, Iris™ Pro 5200
- Revisited disabling stencil writes
- Down to 0.5ms (!!)
- Revisited triangle strips
- Down to 0.4ms
- More than an order of magnitude faster in the end!
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
36
Performance Optimizations: Low/normal settings
Optimize lower graphical settings
- Shadow size culling
- Made dependent on shadow buffer size
- Disabled cloud shadows for low shadow settings
- Velocity buffer rendering
- Disabled when motion blur and temporal AA is disabled
- Disabled planar reflection pass when screen-space reflection is enabled
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
37
Performance Optimizations: Buffer Clears
G-Buffers
- Cleared to (0.5f, 0.5f, 1.0f, 1.0f) for historical reasons
- Now clears to (0, 0, 0, 0), or skipped entirely
- Still clearing for SLI / CrossFire
- Screen space reflections
- Only clear when enabled and needed
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
38
Performance Optimizations: Shadows
Shadow Cascades
- 4 sun shadow cascades
- Scattered update pattern
- 2 cascades / frame, cycled over 8 frames
- Saves many milliseconds
- Problematic for camera flipping (shadows pop in over a few frames)
- Center outer cascade on camera
- Keeps shadow behind player (in theory)
- Have to disable frustum culling for outer cascade
- Many milliseconds lost
- Lost shadow range
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
39
Performance Optimizations: Shadows
3,747.0
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
40
Performance Optimizations: Shadows
Solution
- Revert to previous outer cascade
- Restores lost milliseconds
- Restores lost shadow range
- Reset refresh cycle on camera flip
- Outer cascade always gets updated first frame after flip
- Added resolution dependent size culling
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
41
Performance Optimizations: Shadows
718.0
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
42
Performance Optimizations: float3x4
Convert float4[] to float3x4[]:
float4 MatrixPalette[2 /*(SIZE*3)*/];
index *= 3;
float3x4 mat = float3x4(
MatrixPalette[index ],
MatrixPalette[index + 1],
MatrixPalette[index + 2]);
float3 s_pos = mul(mat, pos);
...
float3x4 MatrixPalette[2 /*SIZE*/];
float3x4 mat = MatrixPalette[index];
float3 s_pos = mul(mat, pos);
...
imad r2.xy, v1.xx, l(3, 3), l(1, 2)
dp4 r0.x, cb0[r0.x + 9], r1
dp4 r0.y, cb0[r2.x + 9], r1
dp4 r0.z, cb0[r2.y + 9], r1
...
imul null, r0.x, v1.x, l(3)
dp4 r2.x, cb0[r0.x + 9], r1
dp4 r2.y, cb0[r0.x + 10], r1
dp4 r2.z, cb0[r0.x + 11], r1
...
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
43
Performance Optimizations: Terrain optimization
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
44
Performance Optimizations: Terrain optimization
Solution
- Terrain system continuously developed
- New system was in the build, but disabled by default
- Saved around 1-2 milliseconds depending on scene
- Unstable on some drivers
- Detect old drivers and fall back to previous system
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
45
Performance Optimizations: Terrain optimization
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
46
Performance Optimizations: Misc
Misc optimizations
- CPU vs. GPU performance very different on Intel vs. the consoles
- Moved some work back to CPU
- Shorter shader, more computations for CPU
- Better culling
- When all waterboxes are culled, we could save a render pass
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Performance Optimizations: Final Results
Performance benefits: Rendering time* (ms)
Performance benefits: Real performance* (ms) – impact of power
Car scene City scene Sky scene
Before 51 59 59
After 27 32 28
Delta 24 ms 27 ms 31 ms
Car scene City scene Sky scene
Average frame time (static) 27 32 28
Average frame time (dynamic) 30 35 30
*Measured on a 5th gen core™ i7 with Iris™ pro graphics 6200 @ 1366x768 Medium settings
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
- Introduction
- Tools
- Optimizations
- DirectX features
- Future Work
- Conclusion
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
49
Conservative Rasterization
Light assignment using CR [Örtegren16]
- Shell pass
- Lights as low-res meshes
- CR to touch all affected clusters
- Allows arbitrary convex light shapes
- “Perfect clustering”
- MIN blending resolves depth range
- Fill pass
- Writes results to cluster light lists
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
50
Conservative Rasterization
Light list
- Max 256 lights per type
- JC3: Tightly packed list of light indexes [ 2, 7, 12, 38 … ]
- [Örtegren16]: Linked list (2, next)→ (7, next)→ …
- New approach: Bitfield 001000010000100000 …
- Performance
- Bitfield: Faster under heavy load (0.5ms), slower under light load (-0.2ms)
- LA: 0.1 - 0.3ms cost. Shading: 0 - 3ms saved. (6gen core w/ HD Graphics 520)
- Shell pass independent of depth slice count
- Can scale to higher slice count
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
UAV & Rastered Order Views 101
• The DX API specifies “in order” processing rules
• UAV’s enable arbitrary R/W memory ops from a
pixel shader…
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
52
UAV & Rastered Order Views 101
• The DX API specifies “in order” processing rules
• UAV’s enable arbitrary R/W memory ops from a
pixel shader…
… but no ordering of data input…
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
Timeline
data
race
e.g. programmable
blending
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
53
UAV & Rastered Order Views 101
• The DX API specifies “in order” processing rules
• UAV’s enable arbitrary R/W memory ops from a
pixel shader…
… but no ordering of data input…
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
Timeline
order is not
deterministic
e.g. programmable
blending
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
54
UAV & Rastered Order Views 101
• The DX API specifies “in order” processing rules
• UAV’s enable arbitrary R/W memory ops from a
pixel shader…
… but no ordering of data input…
• ROV is a DX12 feature which guarantees primitive
order for R/M/W operations and :
• Avoid data races
• Ensure deterministic ordering
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
Timeline
Wait
data is
Safe !
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Order-Independent Transparency
• Correct compositing, rendering foliage &
fences with zero aliasing !
• Raster Ordered View enable a new
approach
 Single geometry pass and fixed memory requirements
 Stable and predictable performance
 Scalable: easily trade-off image quality for
performance/memory
Correct
render
order
Different
correct
render
order
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Order-Independent Transparency
• Store Visibility Function as a sorted fixed-
size array of nodes, in a UAV surface
• Sort N Layers, blend furthest fragments
• Use more layers to trade-off image quality
for perf/memory
Sample code : https://software.intel.com/en-us/articles/oit-approximation-with-pixel-synchronization-update-2014
New fragment insertion
Blending of furthest
fragments
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
- Introduction
- Tools
- Optimizations
- DirectX features
- Future Work
- Conclusion
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
58
Future Work: Frame time variance
0
5
10
15
20
25
30
35
40
45
50
1
79
157
235
313
391
469
547
625
703
781
859
937
1015
1093
1171
1249
1327
1405
1483
1561
1639
1717
1795
1873
1951
2029
2107
2185
2263
2341
2419
2497
2575
2653
2731
2809
2887
2965
3043
3121
3199
3277
3355
3433
3511
3589
3667
3745
3823
3901
3979
FrameTime(ms)
Frame number
JC3 Frame time variation over a 2 minutes gameplay
Frame time - 10 frames moving average PW = 33.3ms
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
59
Future Work: Dynamic resolution rendering
• Idea : for the most intense scene, lower the rendering resolution
• Based on an Intel sample:
https://software.intel.com/en-us/articles/dynamic-resolution-rendering-sample
if (frametime > max_allowed_frametime && render_target_size != min_RT_size)
render_target_size--;
if (frametime < min_allowed_frametime && render_target_size != max_RT_size)
render_target_size++;
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
60
Future Work: G-buffer blending
• To apply tire skid marks, bullet holes or explosions!
• Same principle that AOIT
– Render your G-Buffer
– Take a normal map of a decal
– Blend it with the G-Buffer
– Result will be a correctly mapped bullet hole
• Prototyped in JC3
• Requires alpha blendable decals
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
- Introduction
- Tools
- Optimizations
- DirectX features
- Future Work
- Conclusion
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
62
Conclusion
• Even the most demanding titles, such as JC3, can run on Iris graphics
• Feature-wise, integrated graphics are now on par with discrete
• Focused optimizations can bring terrific improvements ...
• … you have tools to help you …
• … and is definitely worth it 
Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
63
References
[Persson13] Low-Level Thinking in High-Level Shading Languages, GDC 2013
presentation. http://humus.name/index.php?page=Articles&ID=6
[Persson14] Low-Level Shader Optimization for Next-Gen and DX11, GDC 2014
presentation. http://humus.name/index.php?page=Articles&ID=9
[Örtegren16] Clustered Shading: Assigning Lights Using Conservative
Rasterization in DirectX 12. GPU Pro 7.
More explosions, more chaos, and definitely more blowing stuff up

Mais conteúdo relacionado

Mais procurados

Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderEidos-Montréal
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizationspjcozzi
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Tiago Sousa
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space MarinePope Kim
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsGael Hofemeier
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in UnityProgressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in UnityUnity Technologies
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 

Mais procurados (20)

Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Light prepass
Light prepassLight prepass
Light prepass
 
Rendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb RaiderRendering Techniques in Rise of the Tomb Raider
Rendering Techniques in Rise of the Tomb Raider
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Z Buffer Optimizations
Z Buffer OptimizationsZ Buffer Optimizations
Z Buffer Optimizations
 
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)Crysis Next-Gen Effects (GDC 2008)
Crysis Next-Gen Effects (GDC 2008)
 
Rendering Tech of Space Marine
Rendering Tech of Space MarineRendering Tech of Space Marine
Rendering Tech of Space Marine
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® Graphics
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
Progressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in UnityProgressive Lightmapper: An Introduction to Lightmapping in Unity
Progressive Lightmapper: An Introduction to Lightmapping in Unity
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 

Destaque

In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 
Bringing the Real World Into the Game World
Bringing the Real World Into the Game WorldBringing the Real World Into the Game World
Bringing the Real World Into the Game WorldIntel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
High-Dynamic Range (HDR) Demystified
High-Dynamic Range (HDR) DemystifiedHigh-Dynamic Range (HDR) Demystified
High-Dynamic Range (HDR) DemystifiedIntel® Software
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelIntel® Software
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 

Destaque (8)

In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
Bringing the Real World Into the Game World
Bringing the Real World Into the Game WorldBringing the Real World Into the Game World
Bringing the Real World Into the Game World
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
High-Dynamic Range (HDR) Demystified
High-Dynamic Range (HDR) DemystifiedHigh-Dynamic Range (HDR) Demystified
High-Dynamic Range (HDR) Demystified
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 

Semelhante a More explosions, more chaos, and definitely more blowing stuff up

Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC Gael Hofemeier
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Software Brasil
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 
How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%Gael Hofemeier
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...tdc-globalcode
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Igor José F. Freitas
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetupBeMyApp
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Inteltdc-globalcode
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Software
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYehMAKERPRO.cc
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Intel® Software
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkMichelle Holley
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JSIan Maffett
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 

Semelhante a More explosions, more chaos, and definitely more blowing stuff up (20)

Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing Kernels
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetup
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JS
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 

Mais de Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 

Mais de Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

More explosions, more chaos, and definitely more blowing stuff up

  • 1. Antoine Cohade & Emil Persson 16/03/2016 More Explosions, More Chaos, and Definitely More Blowing Stuff Up : Optimizations and New DirectX Features in
  • 2. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Legal Copyright © 2016 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance Iris™ graphics is available on select systems. Consult your system manufacturer. Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.
  • 3. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Agenda - Introduction - Tools - Optimizations - DirectX features - Future Work - Conclusion
  • 4. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 4 Intel ® HD Graphics: Market Share 72.8% totalGPUmarket 18.49% STEAM 23.9% Unity Jon Peddie Research, Q3 2015 Steam HW Survey, Jan 2016 Unity HW Stats, Q4 2015 Millions of gamers with Intel ® HD Graphics equipped PCs
  • 5. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 5 Introduction: Avalanche Studios Avalanche Studios - Founded in 2003 - Offices in Stockholm and New York Games - Just Cause 1, 2 and 3 - Mad Max - theHunter, theHunter: Primal - Renegade Ops - Rumble City
  • 6. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 6 Introduction: Just Cause 3 • Open world action-adventure game • Developed by Avalanche Studios • Published by Square Enix • Released Dec 1, 2015 • Huge open world • 1000 km2 or 400 square miles • Advanced graphics technology
  • 7. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 7 About this project Small targeted development effort - Collaboration with Intel® - Focused on Intel® GPU performance optimizations - DirectX features pioneered by Intel® - Additional resources (from R&D, Engine etc.) - Separate from JC3 mainline development
  • 8. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Agenda - Introduction - Tools - Optimizations - DirectX features - Future Work - Conclusion
  • 9. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 9 GPA: Just Cause 3 Analysis HUD / System Analyzer: Frame Analyzer: Platform Analyzer: CPU Limited GPU Limited Capture frame Capture trace ?Run with Intel® GPA Live Analysis Offline Analysis
  • 10. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 10 GPA : Just Cause 3 – Platform Analyzer view Frames GPU queue Other metrics CPU Threads
  • 11. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 11 GPA : Just Cause 3 – Frame Analyzer view Custom view chart Rendertargetoverview Render target preview RT & drawcalls(Erg) selection & timings Detailed metrics
  • 12. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Agenda - Introduction - Tools - Optimizations - DirectX features - Conclusion
  • 13. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 13 Performance Optimizations: Low-Level ALU Deferred Lighting Shader - 6-8ms on Iris™ Pro 5200 - Very long shader, lots of math - Lots of history Low-Level ALU optimizations - Tweaking the math to generate fewer instructions - Low-Level Thinking in High-Level Shading Languages [Persson13] Low-Level Shader Optimization for Next-Gen and DX11 [Persson14] - No changes to output
  • 14. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 14 Performance Optimizations: Low-Level ALU Remove a division, use MAD-form: Separate scalars and vectors: Precompute: float k = 2.0f / sqrt(PI * (spec_power + 2.0f)); add + mul + sqrt + rcp + mul float k = rsqrt((0.25f*PI) * spec_power + (0.5f*PI)); mad + rsqrt return spec_color * spec_intensity * spec_mask; 6×mul (3+3) return spec_color * (spec_intensity * spec_mask); 4×mul (1+3) float3 Color = PointLights[index + 1].rgb; float HDRScale = PointLights[index + 1].w; Color *= HDRScale; 3×mul float3 Color = PointLights[index + 1].rgb; -
  • 15. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 15 Performance Optimizations: Low-Level ALU Optimizing inputs - 4x float4 for spotlights, partly packed - LightDir stored in 2 floats + sign - HDRScale and NearCap, 16bits each in a float - Unpacking the packed - Falloff scale and bias was 2 floats - Compute falloff bias from scale (saved one float, added one ALU op) - LightDir a full float3 (saves ~10 cycles of unpacking) - HDRScale gone, NearCap gets entire float (saved 6 ALU ops of unpacking) - Rearranged in access order - Fewer fetches if branch not taken
  • 16. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 16 Performance Optimizations: Low-Level ALU Reuse intermediate results: dist = dot(l_vec, l_vec) * InvRadSqr; if (dist < 1.0f) { l_vec = normalize(l_vec); dist *= rsqrt(dist); ... dist = dot(l_vec, l_vec); if (dist < RadiusSqr) { float rd = rsqrt(dist); l_vec *= rd; // normalize() dist *= rd; ... mul + 2×mad + mul rsqrt + 3×mul rsqrt + mul mul + 2×mad rsqrt + 3×mul mul
  • 17. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 17 Performance Optimizations: Low-Level ALU Operation modifiers: float3 tex_proj; tex_proj = mat[0].xyz * light_vec.x; tex_proj += mat[1].xyz * light_vec.y; tex_proj += mat[2].xyz * light_vec.z; tex_proj *= float3(-1, -1, -1); float3 tex_proj; tex_proj = mat[0].xyz * -light_vec.x; tex_proj += mat[1].xyz * -light_vec.y; tex_proj += mat[2].xyz * -light_vec.z; 3×mul + 6×mad + 3×mul 3×mul + 6×mad
  • 18. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 18 Performance Optimizations: Low-Level ALU Loop counters: // Point lights for (uint pl = 0; pl < pl_count; pl++) { uint index = LightIndices[light_index++]; ... } // Spot lights for (uint sl = 0; sl < sl_count; sl++) { uint index = LightIndices[light_index++]; ... } // Point lights uint end = light_index + pl_count; for (; light_index < end; ++light_index) { uint index = LightIndices[light_index]; ... } // Spot lights end += sl_count; for (; light_index < end; ++light_index) { uint index = LightIndices[light_index]; ... } 2×iadd / loop 1×iadd / loop + 2×iadd
  • 19. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 19 Performance Optimizations: Low-Level ALU Share computations: if (shadow > 0) { float2 rot = ExpensivePseudoRandom(); shadow *= SampleShadow(..., rot); ... } float2 rot = ExpensivePseudoRandom(); for (spotlights) { if (spot > 0) { if (shadow_caster) { shadow = SampleShadow(..., rot); ... } } } float2 rot = ExpensivePseudoRandom(); if (shadow > 0) { shadow *= SampleShadow(..., rot); ... } for (spotlights) { if (spot > 0) { if (shadow_caster) { shadow = SampleShadow(..., rot); ... } } }
  • 20. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 20 Performance Optimizations: Low-Level ALU Pull computations out of the loop: for (int i = 0; i < 16; i++) { float2 offset; offset.x = kernel[i].x * rot.x + kernel[i].y * rot.y; offset.y = kernel[i].y * rot.x – kernel[i].x * rot.y; float2 tap = coord.xy + offset * scale; ... } rot *= scale; for (int i = 0; i < 16; i++) { float2 tap; tap.x = coord.x + kernel[i].x * rot.x + kernel[i].y * rot.y; tap.y = coord.y + kernel[i].y * rot.x - kernel[i].x * rot.y; ... } (2×mul + 4×mad)×16 (96 ops) (4×mad)×16 + 2×mul (66 ops)
  • 21. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 21 Performance Optimizations: Low-Level ALU Game had configurable attenuation curve - Not really used. Only one curve existed. - Using ALU saved 0.2ms on Iris™ Pro 5200. - Small script to brute-force match a set of functions and parameters - Picked the best match - ~1% error ALU instead of lookup table: atten = Falloff.SampleLevel(Samp, dist, 0.0f); sample_l atten = saturate((1.0f - dist) / (dist * dist * 12.21f + 1.0f)); 2×mul + mad + add + rcp
  • 22. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 22 Performance Optimizations: Low-Level ALU Deferred Lighting Shader - 4-5.5ms on Iris™ Pro 5200 - About 2ms saved (depending on scene) - Potential for saving more
  • 23. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 23 Performance Optimizations: GPU gaps Learning : Regularly check CPU/GPU concurrency to avoid surprises
  • 24. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 24 Performance Optimizations: Instancing Solution - Instancing support added to common materials - Drastic reduction in number of draw calls - Reduced constant buffer updates - Removed lots of unused constants - Removed debug constants, tweak variables etc.
  • 25. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 25 Performance Optimizations: Instancing
  • 26. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 26 Performance Optimizations: Manual Instancing Tree Impostors - Many instances, tiny mesh. (4 vertices, 6 indices) - Standard instancing implementation - DrawIndexedInstanced(6, num_instances, 0, 0, 0); - Poor wavefront occupancy Manual Instancing optimization - Draw as regular indexed mesh - DrawIndexed(6 * num_instances, 0, 0); - Immutable index buffer of MAX_INSTANCES * 6 - Manually fetch data from texture buffer
  • 27. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 27 Performance Optimizations: Manual Instancing v2p main(a2v In, uint VertexID: SV_VertexID) { ... } struct InputData { float Elevation; float2 Data; }; StructuredBuffer<InputData> Insts; v2p main(uint VertexID: SV_VertexID) { uint InstanceID = VertexID >> 2; VertexID = VertexID & 0x3; // Manually fetch vertex data a2v In; In.Elevation = Insts[InstanceID].Elevation; uint2 prt = asuint(Insts[InstanceID].Data); In.Data.x = int(prt.x & 0xFFFF) * scale; In.Data.y = int(prt.x >> 16) * scale; In.Data.z = int(prt.y & 0xFFFF) * scale; In.Data.w = int(prt.y >> 16) * scale; ... }
  • 28. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 28 Performance Optimizations: Manual Instancing Manual Instancing - 2.4ms before, 0.7ms after, on Iris™ Pro 5200
  • 29. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 29 Performance Optimizations: Vegetation stalls
  • 30. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 30 Performance Optimizations: Stencil stalls Learning: If vegetation rendering seems abnormally long, try disabling stencil writes. If the rendering speeds up significantly, you are impacted.
  • 31. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 31 Performance Optimizations: Stencil stalls
  • 32. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 32 Performance Optimizations: Forest Layer Forest Layer - Lowest LOD tree representation - Provides forest silhouette in distance - Alpha texture filling in detail Dense grid mesh - 129x129 per patch - 5ms in some scenes - Stencil writes enabled, but disabling didn’t help
  • 33. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 33 Performance Optimizations: Forest Layer Optimization - Mostly Vertex Bound - Mesh optimizations - Added 65x65 and 33x33 LODs - Large reduction in total vertices shaded - Small visual difference. High settings mostly use highest LODs. - Packed vertex format (2 floats → 2 shorts) - 16bit index buffer
  • 34. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 34 Performance Optimizations: Forest Layer Optimization - Shader optimizations - Added a simpler “no-fade” vertex shader, used by most patches - Pre-computations - Prebaked scaling into the world matrix - Folded constants - Handful of low-level optimizations - Simplified math
  • 35. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 35 Performance Optimizations: Forest Layer Results - Good performance gain - Down from 5.0ms to 2.5ms, Iris™ Pro 5200 - Revisited disabling stencil writes - Down to 0.5ms (!!) - Revisited triangle strips - Down to 0.4ms - More than an order of magnitude faster in the end!
  • 36. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 36 Performance Optimizations: Low/normal settings Optimize lower graphical settings - Shadow size culling - Made dependent on shadow buffer size - Disabled cloud shadows for low shadow settings - Velocity buffer rendering - Disabled when motion blur and temporal AA is disabled - Disabled planar reflection pass when screen-space reflection is enabled
  • 37. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 37 Performance Optimizations: Buffer Clears G-Buffers - Cleared to (0.5f, 0.5f, 1.0f, 1.0f) for historical reasons - Now clears to (0, 0, 0, 0), or skipped entirely - Still clearing for SLI / CrossFire - Screen space reflections - Only clear when enabled and needed
  • 38. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 38 Performance Optimizations: Shadows Shadow Cascades - 4 sun shadow cascades - Scattered update pattern - 2 cascades / frame, cycled over 8 frames - Saves many milliseconds - Problematic for camera flipping (shadows pop in over a few frames) - Center outer cascade on camera - Keeps shadow behind player (in theory) - Have to disable frustum culling for outer cascade - Many milliseconds lost - Lost shadow range
  • 39. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 39 Performance Optimizations: Shadows 3,747.0
  • 40. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 40 Performance Optimizations: Shadows Solution - Revert to previous outer cascade - Restores lost milliseconds - Restores lost shadow range - Reset refresh cycle on camera flip - Outer cascade always gets updated first frame after flip - Added resolution dependent size culling
  • 41. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 41 Performance Optimizations: Shadows 718.0
  • 42. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 42 Performance Optimizations: float3x4 Convert float4[] to float3x4[]: float4 MatrixPalette[2 /*(SIZE*3)*/]; index *= 3; float3x4 mat = float3x4( MatrixPalette[index ], MatrixPalette[index + 1], MatrixPalette[index + 2]); float3 s_pos = mul(mat, pos); ... float3x4 MatrixPalette[2 /*SIZE*/]; float3x4 mat = MatrixPalette[index]; float3 s_pos = mul(mat, pos); ... imad r2.xy, v1.xx, l(3, 3), l(1, 2) dp4 r0.x, cb0[r0.x + 9], r1 dp4 r0.y, cb0[r2.x + 9], r1 dp4 r0.z, cb0[r2.y + 9], r1 ... imul null, r0.x, v1.x, l(3) dp4 r2.x, cb0[r0.x + 9], r1 dp4 r2.y, cb0[r0.x + 10], r1 dp4 r2.z, cb0[r0.x + 11], r1 ...
  • 43. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 43 Performance Optimizations: Terrain optimization
  • 44. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 44 Performance Optimizations: Terrain optimization Solution - Terrain system continuously developed - New system was in the build, but disabled by default - Saved around 1-2 milliseconds depending on scene - Unstable on some drivers - Detect old drivers and fall back to previous system
  • 45. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 45 Performance Optimizations: Terrain optimization
  • 46. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 46 Performance Optimizations: Misc Misc optimizations - CPU vs. GPU performance very different on Intel vs. the consoles - Moved some work back to CPU - Shorter shader, more computations for CPU - Better culling - When all waterboxes are culled, we could save a render pass
  • 47. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Performance Optimizations: Final Results Performance benefits: Rendering time* (ms) Performance benefits: Real performance* (ms) – impact of power Car scene City scene Sky scene Before 51 59 59 After 27 32 28 Delta 24 ms 27 ms 31 ms Car scene City scene Sky scene Average frame time (static) 27 32 28 Average frame time (dynamic) 30 35 30 *Measured on a 5th gen core™ i7 with Iris™ pro graphics 6200 @ 1366x768 Medium settings
  • 48. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Agenda - Introduction - Tools - Optimizations - DirectX features - Future Work - Conclusion
  • 49. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 49 Conservative Rasterization Light assignment using CR [Örtegren16] - Shell pass - Lights as low-res meshes - CR to touch all affected clusters - Allows arbitrary convex light shapes - “Perfect clustering” - MIN blending resolves depth range - Fill pass - Writes results to cluster light lists
  • 50. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 50 Conservative Rasterization Light list - Max 256 lights per type - JC3: Tightly packed list of light indexes [ 2, 7, 12, 38 … ] - [Örtegren16]: Linked list (2, next)→ (7, next)→ … - New approach: Bitfield 001000010000100000 … - Performance - Bitfield: Faster under heavy load (0.5ms), slower under light load (-0.2ms) - LA: 0.1 - 0.3ms cost. Shading: 0 - 3ms saved. (6gen core w/ HD Graphics 520) - Shell pass independent of depth slice count - Can scale to higher slice count
  • 51. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. UAV & Rastered Order Views 101 • The DX API specifies “in order” processing rules • UAV’s enable arbitrary R/W memory ops from a pixel shader…
  • 52. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 52 UAV & Rastered Order Views 101 • The DX API specifies “in order” processing rules • UAV’s enable arbitrary R/W memory ops from a pixel shader… … but no ordering of data input… shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w Timeline data race e.g. programmable blending
  • 53. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 53 UAV & Rastered Order Views 101 • The DX API specifies “in order” processing rules • UAV’s enable arbitrary R/W memory ops from a pixel shader… … but no ordering of data input… shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w Timeline order is not deterministic e.g. programmable blending
  • 54. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 54 UAV & Rastered Order Views 101 • The DX API specifies “in order” processing rules • UAV’s enable arbitrary R/W memory ops from a pixel shader… … but no ordering of data input… • ROV is a DX12 feature which guarantees primitive order for R/M/W operations and : • Avoid data races • Ensure deterministic ordering shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w Timeline Wait data is Safe !
  • 55. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Order-Independent Transparency • Correct compositing, rendering foliage & fences with zero aliasing ! • Raster Ordered View enable a new approach  Single geometry pass and fixed memory requirements  Stable and predictable performance  Scalable: easily trade-off image quality for performance/memory Correct render order Different correct render order
  • 56. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Order-Independent Transparency • Store Visibility Function as a sorted fixed- size array of nodes, in a UAV surface • Sort N Layers, blend furthest fragments • Use more layers to trade-off image quality for perf/memory Sample code : https://software.intel.com/en-us/articles/oit-approximation-with-pixel-synchronization-update-2014 New fragment insertion Blending of furthest fragments
  • 57. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Agenda - Introduction - Tools - Optimizations - DirectX features - Future Work - Conclusion
  • 58. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 58 Future Work: Frame time variance 0 5 10 15 20 25 30 35 40 45 50 1 79 157 235 313 391 469 547 625 703 781 859 937 1015 1093 1171 1249 1327 1405 1483 1561 1639 1717 1795 1873 1951 2029 2107 2185 2263 2341 2419 2497 2575 2653 2731 2809 2887 2965 3043 3121 3199 3277 3355 3433 3511 3589 3667 3745 3823 3901 3979 FrameTime(ms) Frame number JC3 Frame time variation over a 2 minutes gameplay Frame time - 10 frames moving average PW = 33.3ms
  • 59. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 59 Future Work: Dynamic resolution rendering • Idea : for the most intense scene, lower the rendering resolution • Based on an Intel sample: https://software.intel.com/en-us/articles/dynamic-resolution-rendering-sample if (frametime > max_allowed_frametime && render_target_size != min_RT_size) render_target_size--; if (frametime < min_allowed_frametime && render_target_size != max_RT_size) render_target_size++;
  • 60. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 60 Future Work: G-buffer blending • To apply tire skid marks, bullet holes or explosions! • Same principle that AOIT – Render your G-Buffer – Take a normal map of a decal – Blend it with the G-Buffer – Result will be a correctly mapped bullet hole • Prototyped in JC3 • Requires alpha blendable decals
  • 61. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. Agenda - Introduction - Tools - Optimizations - DirectX features - Future Work - Conclusion
  • 62. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 62 Conclusion • Even the most demanding titles, such as JC3, can run on Iris graphics • Feature-wise, integrated graphics are now on par with discrete • Focused optimizations can bring terrific improvements ... • … you have tools to help you … • … and is definitely worth it 
  • 63. Intel Software – Developer Relations Division Copyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be claimed as the property of others. 63 References [Persson13] Low-Level Thinking in High-Level Shading Languages, GDC 2013 presentation. http://humus.name/index.php?page=Articles&ID=6 [Persson14] Low-Level Shader Optimization for Next-Gen and DX11, GDC 2014 presentation. http://humus.name/index.php?page=Articles&ID=9 [Örtegren16] Clustered Shading: Assigning Lights Using Conservative Rasterization in DirectX 12. GPU Pro 7.