SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Leigh Davies
March 05, 2015
OIT to Volumetric Shadow
Mapping, 101 Uses for
Raster Ordered Views using
DirectX 12
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Introduction
 Raster Ordered Views
 Applications + R&D Topics
 Performance Tips & Tricks
 Summary
 Q&A
Agenda
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Copyright © 2015 Intel Corporation. All rights reserved.
 *Other names and brands may be claimed as the property of others.
 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS
OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR
INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
 A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S
PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS,
OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY
CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS
NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
 Intel may make changes to specifications and product descriptions at any time, without notice.
 All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
 Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized
errata are available on request.
 Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not
authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.
 Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
 Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to
http://www.Intel.com/performance
 Iris™ graphics is available on select systems. Consult your system manufacturer.
 Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.
Legal
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Why Raster Ordered Views?
 The DX API specifies “in order” processing rules for the raster
pipeline
 If two triangles sent to the GPU touch the same XY screen
location, the GPU hardware guarantees that triangle “A” will blend
its color result before “B” blends it.
 Hardware in the ROP is responsible for enforcing this ordering
requirement.
 Pipeline back-end* still not programmable, color, z & stencil
operations from a fixed menu..
 DX11 added Unordered Access Views, leverage power of shaders
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
DX Pipeline
ThreadDispatch
…..
EU
EU
EU
EU
…..
EU
EU
EU
EU
…..
EU
EU
EU
EU
Vertex Shader
Hull Shader
Domain Shader
Geometry Shader
Pixel Shader
Input Assembler
Rasterizer
Output Merger
Triangle Data
Render Targets
UAV Targets
EUEU
EU EU
EUEU
EU
EU
EUEU
EU
EU
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 UAV’s enable arbitrary R/W memory ops from a pixel
shader but no ordering of data input…
UAV Limitations
shade fragment from 1st triangle r/m/w
e.g. programmable
blending
shade fragment from 2nd triangle r/m/w
Timeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 UAV’s enable arbitrary R/W memory ops from a pixel
shader but no ordering of data input…
 Fragments mapping to same pixel can cause data races
UAV Limitations
shade fragment from 1st triangle r/m/w
e.g. programmable
blending
shade fragment from 2nd triangle r/m/w
Timeline
data race
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 UAV’s enable arbitrary R/W memory ops from a pixel
shader but no ordering of data input…
 Fragments mapping to same pixel can cause data races
UAV Limitations
Timeline
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
data is
Safe
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 UAV’s enable arbitrary R/W memory ops from a pixel
shader but no ordering of data input…
 Fragments mapping to same pixel can cause data races
 Fragments can be shaded out-of-order, can’t support
order-dependent algorithms
UAV Limitations
Timeline
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
data is
Safe
order is not
deterministic
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 The more parallel the GPU the greater race conditions
Out-of-Order Artefacts
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 We need the hardware to detect dependencies among
fragments writing to the same X,Y screen coordinate
and..
• Avoid data races
• Guarantee primitive order for R/M/W operations
Programmable Back-End
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/wWaitshade fragment from 2nd triangle r/m/w
data is
Safe
Timeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 That’s easy what happens if fragment 2 runs first?
Programmable Back-End
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
Timeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 That’s easy what happens if fragment 2 runs first?
 We just get a longer wait
 This isn't “just” a critical section
Programmable Back-End
shade fragment from 1st triangle r/m/w
r/m/w
Well defined order
Waitshade fragment from 2nd triangle r/m/w
data is
Safe
Timeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Little to no performance impact in most cases
– Fragment 1 completes before Fragment 2 hits barrier
– Fragments don’t touch same X,Y screen Coordinate
Programmable Back-End
shade fragment from 1st triangle r/m/w
shade fragment from 2nd triangle r/m/w
data is
Safe
Timeline
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Haven’t we seen this before?
Johan Andersson, DICE (2013)
“We have a pretty long list of all the stuff we typically go through and then talk to
them with, but one very concrete thing we’d like to see, and actually Intel has
already done this on their hardware, they call it PixelSync, which is their method of
synchronizing the graphics pipeline in a very efficient way on a per-pixel basis. You
can do a lot of cool techniques with it such as order-independent transparency for hair
rendering or for foliage rendering. And they can do programmable blending where
you want to have full control over the blending instead of using the fixed-function
units in the GPU. There’s a lot of cool components that can be enabled by such a
programmability primitive there and I would like to see AMD and NVidia implement
something similar to that as well. It's also very power-efficient and efficient overall on
Intel's hardware, so I guess the challenge for NVidia and AMD would be if they were
able to do that efficiently because they have a lot of the different architectures there.
So that's one thing.”
Source: http://www.tomshardware.com/reviews/johan-andersson-battlefield-4-interview,3688-2.html
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Almost perfect match for Intel Pixel Synchronization
extension introduced on 4th Gen Core processors in
2013.
 Lots of existing samples and even games using it!
Raster Ordered Views == Pixel Sync
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 HLSL only construct, modifies access behavior to UAV’s
 New (HLSL) objects, only available to the pixel shader:
– RasterizerOrderedBuffer
– RasterizerOrderedByteAddressBuffer
– RasterizerOrderedStructuredBuffer
– RasterizerOrderedTexture1D
– RasterizerOrderedTexture1DArray
– RasterizerOrderedTexture2D
– RasterizerOrderedTexture2DArray
– RasterizerOrderedTexture3D
 Used in the same manner as other UAV objects
Rasterizer ordered views API
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
ROV’s vs Pixel Sync
RasterOrderedTexture2D<t> gRGBEBuffer;
void PS_RGBE_Blend (...)
{
// Code that doesn’t reference
UAV’s
// Access UAV
uint rgbe = gRGBEBuffer[xy];
// Manipulate UAV
}
#include "IntelExtensions.hlsl“
RWTexture2D <t> gRGBEBuffer : register( u1 );
void PS_RGBE_Blend (...)
{
IntelExt_Init();
// Code that doesn’t reference
UAV’s
IntelExt_BeginPixelOrderingOnUAV(1);
// Access UAV
uint rgbe = gRGBEBuffer[xy];
// Manipulate UAV
}
Pixel Sync Raster Ordered View
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Writing to a UAV disables both Early-Z and Hi-Z
 Declaring [earlydepthstencil] in front of a pixel shader
guarantees the depth test happens early even if the
shader writes to a UAV
 [earlydepthstencil] means Depth is updated even if
you discard in the pixel shader, unless completely
disabled
UAV Refresher
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Applications
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 New blending operators, non-linear color spaces, exotic
encodings, etc.
• e.g. RGBE, LogLuv, etc.
Programmable Blending Applications
HDR in to custom R10G10B10A2 in GRID AutosportIntel RGBE blending sample
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Blending on a RGBE color buffer
RasterOrderedTexture2D<t> gRGBEBuffer;
void PS_RGBE_Blend (...)
{
float3 rgb = ...
float alpha = ...
uint rgbe = gRGBEBuffer[xy];
float3 dstRGB = RGBE_to_RGB(rgbe);
dstRGB = alpha * rgb + (1 – alpha) * dstRGB;
gRGBEBuffer[xy] = RGB_to_RGBE(dstRGB);
}
Compute fragment
color & alpha
Read RGBE buffer &
convert to RGB
Conversion to RGBE &
buffer write
Ordered UAV structure
Wait on access
Alpha-blending in
RGB space
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 To apply a bullet hole or an axe mark…
 simply
– Render your G-Buffer
– Take a normal map of a decal
– Blend it with the G-Buffer
– Result will be a correctly mapped bullet
hole
Blending for deferred shaders
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Generalization of the Z-Buffer*
 Render N-layers of the image in a single pass
 Under DX11 requires Per pixel linked lists and append buffers
– Unbounded memory requirement
– Memory bandwidth heavy
 Countless applications:
– Depth-peeling
– Constructive solid geometry
– Depth-of-field & motion blur
– Volume rendering
– <insert your idea here >
K-Buffer
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
K-Buffer: Single-Pass Depth Peeling
RasterizerOrderedBuffer gBuffer;
void PSMain(...)
{
Fragment frag = {...};
Fragment fragArray[N] = gBuffer[xy];
for (int i = 0; i < N; i++) {
if (frag.Z < fragArray[i].Z) {
Fragment temp = frag;
frag = fragArray[i];
fragArray[i] = temp;
}
}
gBuffer[xy] = fragArray;
}
Compute fragment
color, z, etc.. Enable pixel
synchronization
Read N fragments
from K-buffer
Write N fragments
to K-buffer
Bubble sort (1 pass)
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Correct compositing, rendering
foliage & fences with zero aliasing
etc..
 DX11-style order-independent
transparency has significant
drawbacks
– Requires unbounded memory (per-pixel
lists)
– Not so great performance due to global
atomics, fragments sorting, etc.
Order-Independent Transparency
Correct
render
order
Different
correct
render
order
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Raster Ordered View enable
a new approach
– Single geometry pass and fixed
memory requirements
– Stable and predictable
performance
– Scalable: easily trade-off image
quality for performance/memory
Order-Independent Transparency
GRID Autosport 2014
OIT on 15watt 4th Gen Core™
Processor
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Step 1: Improve alpha-blending
– Use depth to decide whether to composite incoming
fragment over or under
– Much better than vanilla alpha-blending but in some cases
not quite correct
 Step 3: Use more layers to trade-off image quality for
perf/memory
– Sort N Layers, blend furthest fragments
Step 3:Or use an Adaptive Transparency Function
Multi-Layer Alpha Blending
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Adaptive Visibility Function
 Store Visibility Function as a sorted fixed-size array of nodes, in UAV
surface
 Each red node corresponds to a pair of values for depth and
transmittance:
 To compress visibility we remove the node that generates the smallest
area variation
0.7
1
smallest area variation
),( td
struct TransparencyData
{
float depth[ MAX_LAYERS ];
float4 colour[ MAX_LAYERS ];
};
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Adaptive Visibility Function
 Store Visibility Function as a sorted fixed-size array of nodes, in UAV
surface
 Each red node corresponds to a pair of values for depth and
transmittance:
 To compress visibility we remove the node that generates the smallest
area variation
 Used for early Order Independent Transparency samples
struct TransparencyData
{
float depth[ MAX_LAYERS ];
float4 colour[ MAX_LAYERS ];
};
0.7
1
),( td
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Combining OIT with sorted Transparency
 Only add transparent pixels into the
OIT structure if there is already OIT
data in the buffer
– Store mask of pixels in R32 target to
flag pixels that contain OIT data
– Use the mask to identify pixels where
traditional transparency needs to be
merged with OIT
– All other pixels are rendered with
normal alpha blending, because there
is no overlap
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Hair Rendering
Alpha Blending Adaptive Transparency
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Like Deep Shadow Maps but designed
for real-time rendering
 Encode per-pixel visibility function from
light point-of-view
 Lossy compression of the visibility data
 Raster Ordered Views enables first fixed
memory implementation of AVSM
Adaptive Volumetric Shadow Maps
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
AVSM Quality Comparison
 AVSM with 8 to 12 nodes closely
matches ground truth results
 4 Nodes suitable for many
situations, used in GRID 2.
 Fourier Opacity Maps suffer from
artefacts generated by high-frequency
light blockers like hair and sub-optimal
depth bounds
 Hair model courtesy of Cem Yuksel.
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Accelerating AVSM
 Overdraw of particles means even small
increases in per-pixel costs are noticeable
0
5000
10000
15000
GPU ms Pixel Domain
Pixel Shading
Tessellated Per Vertex
 Shadow map generation can use simplified particles
– Less particles, larger area, higher opacity
 Per Pixel lighting of particles took >=10ms…
– Tessellated vertex lighting actually looked better!
– Tessellation is 2-3x faster than per-pixel
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Volume-Aware Blending
 Billboard sprites
representing
clouds volumes
 Composited
using Alpha
blending
 Popping
Artefacts as
billboard sprites
change order
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Volume-Aware Blending
 Billboard clouds
using volume
aware blending
 No popping
 Smooth
blending of
intersecting
volumes
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Blending volumetric particles
– If particles do not overlap, blending is
trivial
– How can we correctly blend overlapping
particles?
Volume-Aware Blending
BackIsecFront
𝐶 𝐹𝑖𝑛𝑎𝑙 =
𝑇𝐹𝑖𝑛𝑎𝑙 = ∙ 𝑇𝐵𝑎𝑐𝑘∙ 𝑇𝐼𝑠𝑒𝑐𝑇𝐹𝑟𝑜𝑛𝑡
𝐶 𝐹𝑟𝑜𝑛𝑡 +𝐶𝐼𝑠𝑒𝑐 ∙ 𝑇𝐹𝑟𝑜𝑛𝑡 +𝐶 𝐵𝑎𝑐𝑘 ∙ 𝑇𝐹𝑟𝑜𝑛𝑡∙ 𝑇𝐼𝑠𝑒𝑐
1 − 𝑇𝐹𝑖𝑛𝑎𝑙
– Final color and transparency:
– Division by 1 − 𝑇𝐹𝑖𝑛𝑎𝑙 because we do not
want alpha pre-multiplied color
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Blending volumetric particles -
Implementation
– Color, density and min/max extent of the
current particle are stored in the UAV
buffer
– Each new particle is tested against the
currently stored
 If new particle is in front of the current,
the current is blended into the back
buffer and replaced with the new one
 If new particle overlaps with the current,
they are blended and stored
 Particles need to be sorted
Volume-Aware Blending
Alpha Blended
Volume Blended
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
R&D Topics
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Use Raster Ordered Views to improve or replace multi-
sampling anti-aliasing
– Higher image quality vs. lower memory requirements vs. better
performance
 Z³ anti-aliasing* (1999)
– Originally developed as HW based high-quality anti-aliasing
algorithm
 Store N fragment per pixel (z, ∂z/∂x, ∂z/∂y, color, coverage)
 Merge fragments (lossy)
Advanced Anti-Aliasing
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Advanced Anti-Aliasing
 Analytic methods
– Render scene using
conservative rasterization
– Build per-pixel spatial
subdivision structure using
primitive edges
– Compute fragment weights
from fraction of pixel area
covered and resolve
– Think targeted OIT
conservative rasterization renders any pixels
that are even partially covered are rasterized
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Build complex per-voxel data structures
on the GPU at voxelization time
– e.g. direction-dependent representations
(anisotropic voxels, etc.)
 Voxelization via 2D rasterization
projects triangles to XY, YZ or XZ plane
Voxelization
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Use Raster Ordered Views to build 3D data structures at
voxelization time, think Depth Peeling
– Problem: fragment dependencies cannot be tracked over multiple
2D planes in a single render call.
 Easy fix: voxelize onto one 2D plane at time
– 3 draw calls per mesh, one per 2D plane (i.e. reject triangles that
map to other planes)
 Number of generated voxels doesn’t change & more flexible
than using global atomics
 Can be further optimized by Conservative Rasterization
Voxelization
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Don’t clear large buffers. Clear a small buffer and use it
as a clear mask.
Performance Tips & Tricks
bool clear = gClearMask[xy];
if (clear) {
gClearMask[xy] = false;
myLargeStruct = ...
} else {
myLargeStruct = gLargeDataStruct[xy];
...
}
gLargeDataStruct[xy] = myStruct;
Clear this!
Not this!
Read clear mask
If pixel is not in clear
state load large struct
and update it
Mark pixel as “used”
and
initialize large struct
Write large struct data
back to memory
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Balance data size vs Packing
Small(er) data
structures can
improve performance
Use more instructions
to pack/unpack data
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Address 1D structured
buffers as tiled to better
data exploit locality
 e.g. 1x2 or 2x2 (2D
textures), 2x2x2 (voxels),
etc..
 Saved 1ms in Grid 2 by
Codemasters in OIT.
Tile Data
uint AOITAddrGen(uint2 addr2D, uint surfaceWidth)
{
#ifdef AOIT_TILED_ADDRESSING
surfaceWidth = surfaceWidth >> 1U;
uint2 tileAddr2D = addr2D >> 1U;
uint tileAddr1D =
(tileAddr2D[0] + surfaceWidth * tileAddr2D[1]) << 2U;
uint2 pixelAddr2D = addr2D & 0x1U;
uint pixelAddr1D = (pixelAddr2D[1] << 1U) + pixelAddr2D[0];
return tileAddr1D | pixelAddr1D;
#else
return addr2D[0] + surfaceWidth * addr2D[1];
#endif
}
Linear
address
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Prefer inserting the
synchronized UAV read in the
second half of the shader
 Increase likelihood of
concurrently shading
fragments that map to the
same pixel
Synch Late
RasterOrderedTexture2D<t> gRGBEBuffer;
void PS_RGBE_Blend (...)
{
uint rgbe = gRGBEBuffer[xy];
float3 rgb = ... Complex stuff….
float alpha = ... Potentially can run
concurrent with other shaders
uint rgbe = gRGBEBuffer[xy];
}
r/m/wr/m/w
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Ok not 100% possible but clever organisation of the
triangles in can limit sync points
– Randomize instancing data to avoid consecutive overlapping
models.
 Small render targets aren’t always your friend, can
increase hits to the same pixel.
Avoid Synch Altogether!
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 Raster Ordered Views are a new tool that injects new
life in the 3D pipeline
 Very simple to use, high performance
 Build complex 3D data structures in bounded memory
 Solve many long standing problems
– OIT
– Depth Peeling
– Volume Rendering and Blending
Summary
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Questions?
Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
 https://software.intel.com/sites/default/files/salvi_avsm_egs
r2010.pdf
 http://advances.realtimerendering.com/s2013/2013-07-23-
SIGGRAPH-PixelSync.pdf
 https://software.intel.com/en-us/gamedev/code-samples
 http://www.gdcvault.com/play/1020221/Rendering-in-
Codemasters-GRID2-and
 https://software.intel.com/en-us/blogs/2014/03/31/cloud-
rendering-sample
References
C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .

Mais conteúdo relacionado

Mais procurados

Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringElectronic Arts / DICE
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3Electronic Arts / DICE
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John HableNaughty Dog
 
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsSeongdae Kim
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
The Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s FortuneThe Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s FortuneNaughty Dog
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The SurgePhilip Hammer
 

Mais procurados (20)

Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3Shiny PC Graphics in Battlefield 3
Shiny PC Graphics in Battlefield 3
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 
Lighting Shading by John Hable
Lighting Shading by John HableLighting Shading by John Hable
Lighting Shading by John Hable
 
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
The Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s FortuneThe Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s Fortune
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Dissecting the Rendering of The Surge
Dissecting the Rendering of The SurgeDissecting the Rendering of The Surge
Dissecting the Rendering of The Surge
 

Semelhante a OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using DirectX 12

Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC Gael Hofemeier
 
Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...Gael Hofemeier
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
 
How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%Gael Hofemeier
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsGael Hofemeier
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
Build HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKBuild HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKIntel® Software
 
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year HorizonGary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year HorizonAugmentedWorldExpo
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JSIan Maffett
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetupBeMyApp
 
A Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTELA Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTELWalton Institute
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureJim St. Leger
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Software
 

Semelhante a OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using DirectX 12 (20)

Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing Kernels
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC
 
Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...Play faster and longer: How Square Enix maximized Android* performance and ba...
Play faster and longer: How Square Enix maximized Android* performance and ba...
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
 
How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%How Funcom Increased Play Time in Lego Minifigures by 40%
How Funcom Increased Play Time in Lego Minifigures by 40%
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® Graphics
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
Build HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKBuild HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDK
 
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year HorizonGary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
Gary Brown (Movidius, Intel): Deep Learning in AR: the 3 Year Horizon
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JS
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 
DreamWorks Animation
DreamWorks AnimationDreamWorks Animation
DreamWorks Animation
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetup
 
A Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTELA Path to NFV/SDN - Intel. Michael Brennan, INTEL
A Path to NFV/SDN - Intel. Michael Brennan, INTEL
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

OIT to Volumetric Shadow Mapping, 101 Uses for Raster-Ordered Views using DirectX 12

  • 1. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Leigh Davies March 05, 2015 OIT to Volumetric Shadow Mapping, 101 Uses for Raster Ordered Views using DirectX 12
  • 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Introduction  Raster Ordered Views  Applications + R&D Topics  Performance Tips & Tricks  Summary  Q&A Agenda
  • 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Copyright © 2015 Intel Corporation. All rights reserved.  *Other names and brands may be claimed as the property of others.  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.  A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.  Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.  Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance  Iris™ graphics is available on select systems. Consult your system manufacturer.  Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries. Legal
  • 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Why Raster Ordered Views?  The DX API specifies “in order” processing rules for the raster pipeline  If two triangles sent to the GPU touch the same XY screen location, the GPU hardware guarantees that triangle “A” will blend its color result before “B” blends it.  Hardware in the ROP is responsible for enforcing this ordering requirement.  Pipeline back-end* still not programmable, color, z & stencil operations from a fixed menu..  DX11 added Unordered Access Views, leverage power of shaders
  • 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. DX Pipeline ThreadDispatch ….. EU EU EU EU ….. EU EU EU EU ….. EU EU EU EU Vertex Shader Hull Shader Domain Shader Geometry Shader Pixel Shader Input Assembler Rasterizer Output Merger Triangle Data Render Targets UAV Targets EUEU EU EU EUEU EU EU EUEU EU EU
  • 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  UAV’s enable arbitrary R/W memory ops from a pixel shader but no ordering of data input… UAV Limitations shade fragment from 1st triangle r/m/w e.g. programmable blending shade fragment from 2nd triangle r/m/w Timeline
  • 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  UAV’s enable arbitrary R/W memory ops from a pixel shader but no ordering of data input…  Fragments mapping to same pixel can cause data races UAV Limitations shade fragment from 1st triangle r/m/w e.g. programmable blending shade fragment from 2nd triangle r/m/w Timeline data race
  • 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  UAV’s enable arbitrary R/W memory ops from a pixel shader but no ordering of data input…  Fragments mapping to same pixel can cause data races UAV Limitations Timeline shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w data is Safe
  • 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  UAV’s enable arbitrary R/W memory ops from a pixel shader but no ordering of data input…  Fragments mapping to same pixel can cause data races  Fragments can be shaded out-of-order, can’t support order-dependent algorithms UAV Limitations Timeline shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w data is Safe order is not deterministic
  • 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  The more parallel the GPU the greater race conditions Out-of-Order Artefacts
  • 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  We need the hardware to detect dependencies among fragments writing to the same X,Y screen coordinate and.. • Avoid data races • Guarantee primitive order for R/M/W operations Programmable Back-End shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/wWaitshade fragment from 2nd triangle r/m/w data is Safe Timeline
  • 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  That’s easy what happens if fragment 2 runs first? Programmable Back-End shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w Timeline
  • 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  That’s easy what happens if fragment 2 runs first?  We just get a longer wait  This isn't “just” a critical section Programmable Back-End shade fragment from 1st triangle r/m/w r/m/w Well defined order Waitshade fragment from 2nd triangle r/m/w data is Safe Timeline
  • 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Little to no performance impact in most cases – Fragment 1 completes before Fragment 2 hits barrier – Fragments don’t touch same X,Y screen Coordinate Programmable Back-End shade fragment from 1st triangle r/m/w shade fragment from 2nd triangle r/m/w data is Safe Timeline
  • 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Haven’t we seen this before? Johan Andersson, DICE (2013) “We have a pretty long list of all the stuff we typically go through and then talk to them with, but one very concrete thing we’d like to see, and actually Intel has already done this on their hardware, they call it PixelSync, which is their method of synchronizing the graphics pipeline in a very efficient way on a per-pixel basis. You can do a lot of cool techniques with it such as order-independent transparency for hair rendering or for foliage rendering. And they can do programmable blending where you want to have full control over the blending instead of using the fixed-function units in the GPU. There’s a lot of cool components that can be enabled by such a programmability primitive there and I would like to see AMD and NVidia implement something similar to that as well. It's also very power-efficient and efficient overall on Intel's hardware, so I guess the challenge for NVidia and AMD would be if they were able to do that efficiently because they have a lot of the different architectures there. So that's one thing.” Source: http://www.tomshardware.com/reviews/johan-andersson-battlefield-4-interview,3688-2.html
  • 16. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Almost perfect match for Intel Pixel Synchronization extension introduced on 4th Gen Core processors in 2013.  Lots of existing samples and even games using it! Raster Ordered Views == Pixel Sync
  • 17. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  HLSL only construct, modifies access behavior to UAV’s  New (HLSL) objects, only available to the pixel shader: – RasterizerOrderedBuffer – RasterizerOrderedByteAddressBuffer – RasterizerOrderedStructuredBuffer – RasterizerOrderedTexture1D – RasterizerOrderedTexture1DArray – RasterizerOrderedTexture2D – RasterizerOrderedTexture2DArray – RasterizerOrderedTexture3D  Used in the same manner as other UAV objects Rasterizer ordered views API
  • 18. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. ROV’s vs Pixel Sync RasterOrderedTexture2D<t> gRGBEBuffer; void PS_RGBE_Blend (...) { // Code that doesn’t reference UAV’s // Access UAV uint rgbe = gRGBEBuffer[xy]; // Manipulate UAV } #include "IntelExtensions.hlsl“ RWTexture2D <t> gRGBEBuffer : register( u1 ); void PS_RGBE_Blend (...) { IntelExt_Init(); // Code that doesn’t reference UAV’s IntelExt_BeginPixelOrderingOnUAV(1); // Access UAV uint rgbe = gRGBEBuffer[xy]; // Manipulate UAV } Pixel Sync Raster Ordered View
  • 19. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Writing to a UAV disables both Early-Z and Hi-Z  Declaring [earlydepthstencil] in front of a pixel shader guarantees the depth test happens early even if the shader writes to a UAV  [earlydepthstencil] means Depth is updated even if you discard in the pixel shader, unless completely disabled UAV Refresher
  • 20. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Applications
  • 21. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  New blending operators, non-linear color spaces, exotic encodings, etc. • e.g. RGBE, LogLuv, etc. Programmable Blending Applications HDR in to custom R10G10B10A2 in GRID AutosportIntel RGBE blending sample
  • 22. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Blending on a RGBE color buffer RasterOrderedTexture2D<t> gRGBEBuffer; void PS_RGBE_Blend (...) { float3 rgb = ... float alpha = ... uint rgbe = gRGBEBuffer[xy]; float3 dstRGB = RGBE_to_RGB(rgbe); dstRGB = alpha * rgb + (1 – alpha) * dstRGB; gRGBEBuffer[xy] = RGB_to_RGBE(dstRGB); } Compute fragment color & alpha Read RGBE buffer & convert to RGB Conversion to RGBE & buffer write Ordered UAV structure Wait on access Alpha-blending in RGB space
  • 23. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  To apply a bullet hole or an axe mark…  simply – Render your G-Buffer – Take a normal map of a decal – Blend it with the G-Buffer – Result will be a correctly mapped bullet hole Blending for deferred shaders
  • 24. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Generalization of the Z-Buffer*  Render N-layers of the image in a single pass  Under DX11 requires Per pixel linked lists and append buffers – Unbounded memory requirement – Memory bandwidth heavy  Countless applications: – Depth-peeling – Constructive solid geometry – Depth-of-field & motion blur – Volume rendering – <insert your idea here > K-Buffer
  • 25. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. K-Buffer: Single-Pass Depth Peeling RasterizerOrderedBuffer gBuffer; void PSMain(...) { Fragment frag = {...}; Fragment fragArray[N] = gBuffer[xy]; for (int i = 0; i < N; i++) { if (frag.Z < fragArray[i].Z) { Fragment temp = frag; frag = fragArray[i]; fragArray[i] = temp; } } gBuffer[xy] = fragArray; } Compute fragment color, z, etc.. Enable pixel synchronization Read N fragments from K-buffer Write N fragments to K-buffer Bubble sort (1 pass)
  • 26. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Correct compositing, rendering foliage & fences with zero aliasing etc..  DX11-style order-independent transparency has significant drawbacks – Requires unbounded memory (per-pixel lists) – Not so great performance due to global atomics, fragments sorting, etc. Order-Independent Transparency Correct render order Different correct render order
  • 27. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Raster Ordered View enable a new approach – Single geometry pass and fixed memory requirements – Stable and predictable performance – Scalable: easily trade-off image quality for performance/memory Order-Independent Transparency GRID Autosport 2014 OIT on 15watt 4th Gen Core™ Processor
  • 28. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Step 1: Improve alpha-blending – Use depth to decide whether to composite incoming fragment over or under – Much better than vanilla alpha-blending but in some cases not quite correct  Step 3: Use more layers to trade-off image quality for perf/memory – Sort N Layers, blend furthest fragments Step 3:Or use an Adaptive Transparency Function Multi-Layer Alpha Blending
  • 29. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Adaptive Visibility Function  Store Visibility Function as a sorted fixed-size array of nodes, in UAV surface  Each red node corresponds to a pair of values for depth and transmittance:  To compress visibility we remove the node that generates the smallest area variation 0.7 1 smallest area variation ),( td struct TransparencyData { float depth[ MAX_LAYERS ]; float4 colour[ MAX_LAYERS ]; };
  • 30. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Adaptive Visibility Function  Store Visibility Function as a sorted fixed-size array of nodes, in UAV surface  Each red node corresponds to a pair of values for depth and transmittance:  To compress visibility we remove the node that generates the smallest area variation  Used for early Order Independent Transparency samples struct TransparencyData { float depth[ MAX_LAYERS ]; float4 colour[ MAX_LAYERS ]; }; 0.7 1 ),( td
  • 31. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Combining OIT with sorted Transparency  Only add transparent pixels into the OIT structure if there is already OIT data in the buffer – Store mask of pixels in R32 target to flag pixels that contain OIT data – Use the mask to identify pixels where traditional transparency needs to be merged with OIT – All other pixels are rendered with normal alpha blending, because there is no overlap
  • 32. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Hair Rendering Alpha Blending Adaptive Transparency
  • 33. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Like Deep Shadow Maps but designed for real-time rendering  Encode per-pixel visibility function from light point-of-view  Lossy compression of the visibility data  Raster Ordered Views enables first fixed memory implementation of AVSM Adaptive Volumetric Shadow Maps
  • 34. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. AVSM Quality Comparison  AVSM with 8 to 12 nodes closely matches ground truth results  4 Nodes suitable for many situations, used in GRID 2.  Fourier Opacity Maps suffer from artefacts generated by high-frequency light blockers like hair and sub-optimal depth bounds  Hair model courtesy of Cem Yuksel.
  • 35. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Accelerating AVSM  Overdraw of particles means even small increases in per-pixel costs are noticeable 0 5000 10000 15000 GPU ms Pixel Domain Pixel Shading Tessellated Per Vertex  Shadow map generation can use simplified particles – Less particles, larger area, higher opacity  Per Pixel lighting of particles took >=10ms… – Tessellated vertex lighting actually looked better! – Tessellation is 2-3x faster than per-pixel
  • 36. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Volume-Aware Blending  Billboard sprites representing clouds volumes  Composited using Alpha blending  Popping Artefacts as billboard sprites change order
  • 37. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Volume-Aware Blending  Billboard clouds using volume aware blending  No popping  Smooth blending of intersecting volumes
  • 38. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Blending volumetric particles – If particles do not overlap, blending is trivial – How can we correctly blend overlapping particles? Volume-Aware Blending BackIsecFront 𝐶 𝐹𝑖𝑛𝑎𝑙 = 𝑇𝐹𝑖𝑛𝑎𝑙 = ∙ 𝑇𝐵𝑎𝑐𝑘∙ 𝑇𝐼𝑠𝑒𝑐𝑇𝐹𝑟𝑜𝑛𝑡 𝐶 𝐹𝑟𝑜𝑛𝑡 +𝐶𝐼𝑠𝑒𝑐 ∙ 𝑇𝐹𝑟𝑜𝑛𝑡 +𝐶 𝐵𝑎𝑐𝑘 ∙ 𝑇𝐹𝑟𝑜𝑛𝑡∙ 𝑇𝐼𝑠𝑒𝑐 1 − 𝑇𝐹𝑖𝑛𝑎𝑙 – Final color and transparency: – Division by 1 − 𝑇𝐹𝑖𝑛𝑎𝑙 because we do not want alpha pre-multiplied color
  • 39. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Blending volumetric particles - Implementation – Color, density and min/max extent of the current particle are stored in the UAV buffer – Each new particle is tested against the currently stored  If new particle is in front of the current, the current is blended into the back buffer and replaced with the new one  If new particle overlaps with the current, they are blended and stored  Particles need to be sorted Volume-Aware Blending Alpha Blended Volume Blended
  • 40. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. R&D Topics
  • 41. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Use Raster Ordered Views to improve or replace multi- sampling anti-aliasing – Higher image quality vs. lower memory requirements vs. better performance  Z³ anti-aliasing* (1999) – Originally developed as HW based high-quality anti-aliasing algorithm  Store N fragment per pixel (z, ∂z/∂x, ∂z/∂y, color, coverage)  Merge fragments (lossy) Advanced Anti-Aliasing
  • 42. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Advanced Anti-Aliasing  Analytic methods – Render scene using conservative rasterization – Build per-pixel spatial subdivision structure using primitive edges – Compute fragment weights from fraction of pixel area covered and resolve – Think targeted OIT conservative rasterization renders any pixels that are even partially covered are rasterized
  • 43. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Build complex per-voxel data structures on the GPU at voxelization time – e.g. direction-dependent representations (anisotropic voxels, etc.)  Voxelization via 2D rasterization projects triangles to XY, YZ or XZ plane Voxelization
  • 44. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Use Raster Ordered Views to build 3D data structures at voxelization time, think Depth Peeling – Problem: fragment dependencies cannot be tracked over multiple 2D planes in a single render call.  Easy fix: voxelize onto one 2D plane at time – 3 draw calls per mesh, one per 2D plane (i.e. reject triangles that map to other planes)  Number of generated voxels doesn’t change & more flexible than using global atomics  Can be further optimized by Conservative Rasterization Voxelization
  • 45. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Don’t clear large buffers. Clear a small buffer and use it as a clear mask. Performance Tips & Tricks bool clear = gClearMask[xy]; if (clear) { gClearMask[xy] = false; myLargeStruct = ... } else { myLargeStruct = gLargeDataStruct[xy]; ... } gLargeDataStruct[xy] = myStruct; Clear this! Not this! Read clear mask If pixel is not in clear state load large struct and update it Mark pixel as “used” and initialize large struct Write large struct data back to memory
  • 46. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Balance data size vs Packing Small(er) data structures can improve performance Use more instructions to pack/unpack data
  • 47. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Address 1D structured buffers as tiled to better data exploit locality  e.g. 1x2 or 2x2 (2D textures), 2x2x2 (voxels), etc..  Saved 1ms in Grid 2 by Codemasters in OIT. Tile Data uint AOITAddrGen(uint2 addr2D, uint surfaceWidth) { #ifdef AOIT_TILED_ADDRESSING surfaceWidth = surfaceWidth >> 1U; uint2 tileAddr2D = addr2D >> 1U; uint tileAddr1D = (tileAddr2D[0] + surfaceWidth * tileAddr2D[1]) << 2U; uint2 pixelAddr2D = addr2D & 0x1U; uint pixelAddr1D = (pixelAddr2D[1] << 1U) + pixelAddr2D[0]; return tileAddr1D | pixelAddr1D; #else return addr2D[0] + surfaceWidth * addr2D[1]; #endif } Linear address
  • 48. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Prefer inserting the synchronized UAV read in the second half of the shader  Increase likelihood of concurrently shading fragments that map to the same pixel Synch Late RasterOrderedTexture2D<t> gRGBEBuffer; void PS_RGBE_Blend (...) { uint rgbe = gRGBEBuffer[xy]; float3 rgb = ... Complex stuff…. float alpha = ... Potentially can run concurrent with other shaders uint rgbe = gRGBEBuffer[xy]; } r/m/wr/m/w
  • 49. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Ok not 100% possible but clever organisation of the triangles in can limit sync points – Randomize instancing data to avoid consecutive overlapping models.  Small render targets aren’t always your friend, can increase hits to the same pixel. Avoid Synch Altogether!
  • 50. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  Raster Ordered Views are a new tool that injects new life in the 3D pipeline  Very simple to use, high performance  Build complex 3D data structures in bounded memory  Solve many long standing problems – OIT – Depth Peeling – Volume Rendering and Blending Summary
  • 51. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Questions?
  • 52. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.  https://software.intel.com/sites/default/files/salvi_avsm_egs r2010.pdf  http://advances.realtimerendering.com/s2013/2013-07-23- SIGGRAPH-PixelSync.pdf  https://software.intel.com/en-us/gamedev/code-samples  http://www.gdcvault.com/play/1020221/Rendering-in- Codemasters-GRID2-and  https://software.intel.com/en-us/blogs/2014/03/31/cloud- rendering-sample References
  • 53. C o p y r i g h t © 2 0 1 5 , I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d . *O t h e r n a me s a n d b r a n d s ma y b e c l a i me d a s t h e p r o p e r t y o f o t h e r s .