SlideShare uma empresa Scribd logo
1 de 313
Uncharted 2: HDR Lighting John Hable Naughty Dog
Agenda Gamma/Linear-Space Lighting Filmic Tonemapping SSAO Rendering Architecture/How it all fits together.
Skeletons in the Closet ,[object Object]
Ucap Project
I.e. Tiger Woods's Face
Also used to work for EA Los Angeles
LMNO (Spielberg Project)‏
Not PQRS (Boom Blox)‏
Now at Naughty Dog
Uncharted 2,[object Object]
Gamma ,[object Object]
Never heard about it in college
George Borshukov taught it to me back at EA
Matrix Sequels
Issue is getting more traction now
Eugene d’Eon’s chapter in GPU Gems 3,[object Object]
Suppose we show alternating light/dark lines.
If we squint, we get half the linear intensity.,[object Object]
Real image of a computer screen shot from my camera.,[object Object],[object Object],[object Object]
That’s the box on the top.128
Demo ,[object Object]
WTF?128 187
Color Ramp ,[object Object],0 127 255 187
Gamma Lines 255 ,[object Object],187 127 0
What is Gamma ,[object Object]
Our monitors are the red one,[object Object],[object Object],[object Object],[object Object],[object Object]
How bad is it? ,[object Object]
So if you output the left image to the framebuffer, it will actually look like the one right.2.2
Let’s build a Camera… ,[object Object],Actual Light Monitor Output 1.0 2.2 Hard Drive
Let’s build a Camera… ,[object Object],Actual Light Monitor Output .45 2.2 Hard Drive
How to fix this? ,[object Object]
This is stored on your hard drive.
You never see this image, but it’s on your hard drive.2.2
What is Gamma ,[object Object],[object Object]
How bad is it? ,[object Object]
What if displays were linear?,[object Object],[object Object]
On a scale from 0-255
0 would equal 0
1 would equal our current value of 20
2 would equal our current value of 28
3 would equal our current value of 33
Darkest Values:,[object Object],[object Object]
Note: we’re talking about “correct”, not “good”,[object Object]
3D Graphics ,[object Object]
color = pow( tex2D( Sampler, Uv ), 2.2 )‏
Do your lighting calculations
Account for gamma on color output
finalcolor = pow( color, 1/2.2 )‏,[object Object]
3D Graphics ,[object Object],[object Object]
3D Graphics ,[object Object],Spec = CalSpec(); Diff = pow( tex2D( Sampler, UV ), 2.2 ); Color = Diff * max( 0, dot( N, L ) ) + Spec; return pow( Color, 1/2.2);
3D Graphics ,[object Object]
Hardware does sampling for free
For Texture read:
D3DSAMP_SRGBTEXTURE
For RenderTarget write:
D3DRS_SRGBWRITEENABLE,[object Object]
3D Graphics ,[object Object]
Notice the harsh line.,[object Object],[object Object]
FAQs ,[object Object],[object Object],[object Object],[object Object],[object Object]
Gamma-Space
Use sRGB hardware or pow(2.2) on read
128 ~= 0.2
Linear-Space
Don’t use sRGB or pow(2.2) on read
128 = 0.5,[object Object]
Definitely Gamma
Normal Map
Definitely Linear,[object Object]
Uncharted 2 had them as Linear
Artists have trouble tweaking them to look right
Probably should be in Gamma,[object Object]
Technically, it’s a mathematical value like a normal map.
But artists tweak them a lot and bake extra lighting into them.
Uncharted 2 had them as Linear
Probably should be Gamma,[object Object]
sRGB: PC and PS3 gamma
Xenon PWL: Piecewise linear gamma,[object Object]
Way too bright at the low end.
HDR the Bungie Way, Chris Tchou
Post Processing in The Orange Box, Alex Vlachos
Output curve is extra contrasty
Henry LaBounta was going to talk about it.
Try hooking up your Xenon to a waveform monitor and display a test pattern.  Prepare to be mortified.
Both factors counteract each other,[object Object]
Part 2: Filmic Tonemaping
Filmic Tonemapping ,[object Object],[object Object]
Full Auto settings.
Outside blown out.
Cello case looks empty.
My eye somehow adjusts...,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Note: 1 F-Stop is a power of two
So 4 F-stops is 2^4=16x difference in intensity,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Good News: HDR Image
Bad News: Now what?
Re-tweaking exposure won't help,[object Object]
Photomatix
This one is local
The one we’ll use is global,[object Object]
Simulate the Iris
Dynamic Range in different shots
Tunnel vs. Daylight
Simulate the Retina
Different Range within a shot
Inside looking outside,[object Object]
Within a Shot – HDR Tonemapping
Different Shots – Choosing the correct exposure?
Video Games
Within a Shot – HDR Tonemapping
Different Shots – HDR Tonemapping?,[object Object]
Within a Shot – HDR Tonemapping
Different Shots
Automatic Exposure Adjustment
Dynamic Iris,[object Object]
Creative Director, Microsoft Game Studios
Used to be Art Director on LMNO at EA,[object Object],[object Object]
Simplest version is:
F(x) = x/(x+1)
Yes, I’m oversimplifiying for time.,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The crisper, more saturated blacks of improper gamma.
The nice soft highlights of Reinhard
And get the input colors to actually match like pure linear.,[object Object],[object Object]
CG Supervisor at Digital Domain
Used to be a CG Supervisor on LMNO at EA,[object Object]
Film ,[object Object],Shoulder Toe
Film ,[object Object]
Film vs. Digital purists.
Who would’ve thought: Kodak knows how to make good film.,[object Object],[object Object]
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object],Reinhard Linear Filmic
Filmic Tonemapping ,[object Object]
Filmic > Linear > Reinhard
In terms of “Soft Highights” vs. “Clamped Highlights”
Filmic = Reinhard > Linear,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What happens if we go down?
Notice the crisper blacks in the filmic version.,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Filmic Tonemapping ,[object Object],254 248 224 240 252
More Magic ,[object Object]
Three texture lookups
Jim Hejl to the rescue	Principle Member of Technical Staff 	GPU Group, Office of the CTO 	AMD Research ,[object Object],[object Object]
Replaces the entire Lin/Log and Texture LUT
Includes the pow(x,1/2.2)‏x = max(0, LinearColor-0.004); GammaColor = (x*(6.2*x+0.5))/(x*(6.2*x+1.7)+0.06);
Filmic Tonemapping ,[object Object]
Toe arguably too strong
Most monitors are too contrasty, strong toe makes it worse
May want less shoulder
The more range, the more shoulder
If your scene lacks range, you want to tone down the shoulder a bit,[object Object]
Filmic Tonemapping Shoulder Strength = 0.22 Linear Strength = 0.30 Linear Angle = 0.10 Toe Strength = 0.20 Toe Numerator = 0.01 Toe Denominator = 0.30 Linear White Point Value = 11.2 These numbers DO NOT have the pow(x,1/2.2) baked in
Conclusions ,[object Object]
IMO: The most important post effect
Changes your life completely
Once you have it, you can't live without it
If you implement it, you have to redo all your lighting
You can't just switch it on if your lighting is tweaked for no dynamic range.,[object Object]
Part 3: SSAO
SSAO Time ,[object Object]
Time for stage 3.
Let’s talk about why you need AO.,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
But if you are already in shadow, you need something else...
It's the AO that grounds the car in those shots.
So what if we took the AO out?
Trick taught to me by Habib (the guy with the awesome condo earlier)‏,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sky is Hemisphere Light (ish)‏,[object Object]
Sky is Blue (ish)‏,[object Object]
Approximate Directional Shadow with Shadow Mapping,[object Object]
Approximate Hemisphere Shadow with AO,[object Object],[object Object]
How to use AO?
How to use AO?
How to use AO?
How to use AO?
Sunlight Q) Why not have it affect sunlight? A) It does the exact opposite of what you want it to do.
SSAO Example ,[object Object],[object Object],[object Object]
SSAO Darkens Sunlight – BAD
Reduces perceived contrast
Makes lighting look flatter
My Opinion: SSAO should NOT affect Direct Light,[object Object]
No normals
No special filtering (I.e. Bilateral)‏
Processed in 64x64 blocks
Adjacent Depth tiles included as well,[object Object]
Dilate Horizontal
Dilate Vertical
Apply Low Pass Filter,[object Object]
SSAO Passes 1) Base SSAO
SSAO Passes 2) Dilate Horizontal
SSAO Passes 3) Dilate Vertical
SSAO Passes 4) Low Pass Filter 	(I.e. blur the $&%@ out of it)‏
SSAO Passes ,[object Object]
Uses half-res depth buffer
Calculate AO based on nearby depth samples,[object Object],[object Object],[object Object]
Our solution is much hackier.,[object Object],[object Object],[object Object]
Half Volume = Fully Unoccluded,[object Object]
Half Volume = Fully Unoccluded,[object Object]
Half Volume = Fully Unoccluded,[object Object]
Half Volume = Fully Unoccluded,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Artifacts ,[object Object]
Crossing Pattern,[object Object],[object Object],[object Object],[object Object]
Benefits outweigh drawbacks
Have option to turn it off per surface
Most snow fades at distance
Some objects apply sqrt() to brighten it
Some objects just have it off,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Deepens our Blacks
Artifacts are not too bad
Could use better filtering for halos
Runs on SPUs, and doesn’t need normal buffer,[object Object]
Architecture ,[object Object]
Here is the base overview.
Skip some things,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Calculate lighting in Tiles,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Frame 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
This was news to me when I started here.,[object Object]
Say you have 5 loads to do
1 washer, 1 dryer,[object Object]
Load 1 in Dryer
Load 2 in Washer
Load 2 in Dryer
Load 3 in Washer
Load 3 in Dryer
Load 4 in Washer
Load 4 in Dryer
Load 5 in Washer
Load 5 in Dryer,[object Object]
Load 2 in Washer, Load 1 in Dryer
Load 3 in Washer, Load 2 in Dryer
Load 4 in Washer, Load 3 in Dryer
Load 5 in Washer, Load 4 in Dryer
Load 5 in Dryer,[object Object]
Called “Software Pipelining”
It’s how we optimize SPU code by hand
Pal Engstad (our Lead Graphics Programmer) will be putting a doc online explaining the full process.
Should be online: look for the post on the blog
Direct links:
www.naughtydog.com/docs/gdc2010/intro-spu-optimizations-part-1.pdf
www.naughtydog.com/docs/gdc2010/intro-spu-optimizations-part-2.pdf,[object Object]
SPU Assembly 1/2 EVEN ODD loop: 		{nop}								{o6} lqxcurrAo, pvColor, colorOffset 		{nop}								{o4} shufb prevAo0, currAo, currAo, s_AAAA 		{nop}								{o4} shufb ao_n30, prevAo0, currAo, s_BCDa 		{nop}								{o4} shufb ao_n20, prevAo0, currAo, s_CDab 		{nop}								{o4} shufb ao_n10, prevAo0, currAo, s_Dabc 	{e2} selb ao_n4, ao_n4, prevAo0, endLineMask					{lnop} 	{e2} selb ao_n3, ao_n3, ao_n30, endLineMask					{lnop} 	{e2} selb ao_n2, ao_n2, ao_n20, endLineMask					{lnop} 	{e2} selb ao_n1, ao_n1, ao_n10, endLineMask					{lnop} 		{nop}								{o6} lqxnextAo, pvNextColor, colorOffset 	{nop}								{o4} shufb ao_p1, currAo, nextAo, s_BCDa 		{nop}								{o4} shufb ao_p2, currAo, nextAo, s_CDab 		{nop}								{o4} shufb ao_p3, currAo, nextAo, s_Dabc 	{e6} fa ao_n4, ao_n4, nextAo							{lnop} 	{e6} fa ao_n3, ao_n3, ao_p3							{lnop} 	{e6} fa ao_n2, ao_n2, ao_p2							{lnop} 	{e6} fa ao_n1, ao_n1, ao_p1							{lnop}
SPU Assembly 2/2 EVEN ODD 	{e6} fm blurAo, ao_n4, w4							{lnop} 	{e6} fma blurAo, ao_n3, w3, blurAo						{lnop} 	{e6} fma blurAo, ao_n2, w2, blurAo						{lnop} 	{e6} fma blurAo, ao_n1, w1, blurAo						{lnop} 	{e6} fmablurAo, currAo, w0, blurAo						{lnop} 		{nop}								{o6} stqxblurAo, pvColor, colorOffset 		{nop}								{o4} shlqbii ao_n4, currAo, 0	 		{nop}								{o4} shlqbii ao_n3, ao_p1, 0	 		{nop}								{o4} shlqbii ao_n2, ao_p2, 0	 		{nop}								{o4} shlqbii ao_n1, ao_p3, 0	 	{e2} ceqendLineMask, colorOffset, endLineOffset				{lnop} 	{e2} ceqendBlockMask, colorOffset, endOffset					{lnop} 	{e2} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask			{lnop} 	{e2} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{lnop} 	{e2} a endLineOffset, endLineOffset, endLineOffsetIncr				{lnop} 	{e2} a colorOffset, colorOffset, colorOffsetIncr				{lnop} 		{nop}							branch: {o?} brzendBlockMask, loop
SPU Assembly {e6} faao_n3, ao_n3, ao_p3{o6}lqxnextAo, pvNextColor, colorOffset ,[object Object]
SPUs issue one even and odd instruction per cycle
One line = One cycle
e6 = even, 6 cycle latency
o6 = odd, 6 cycle latency,[object Object]
Iteration 1 loop: nop											lnop nop 					{o6:0} lqx currAo, pvColor, colorOffset nop 					{o6:0} lqx nextAo, pvNextColor, colorOffset nop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0 nop 											lnop   nop 											lnop nop 						lnop nop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab nop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 1 loop: nop											lnop nop 					{o6:0} lqx currAo, pvColor, colorOffset nop 					{o6:0} lqx nextAo, pvNextColor, colorOffset nop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0 nop 											lnop   nop 											lnop nop 						lnop nop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab nop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Problems ,[object Object],            {o6} lqxcurrAo, pvColor, colorOffset             {o4} shufb prevAo0, currAo, currAo, s_AAAA ,[object Object]
Next instruction needs currAo
currAo won’t be ready yet
Stall for 5 cycles,[object Object]
Iteration 1 loop: nop											lnop nop 					{o6:0} lqx currAo, pvColor, colorOffset nop 					{o6:0} lqx nextAo, pvNextColor, colorOffset nop 						{o4:0} shlqbiiendLineMask_, endLineMask, 0 nop 											lnop   nop 											lnop nop 						lnop nop 						{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop 						{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab nop								{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 2 loop: nop											lnop {e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset nop{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1	lnop {e6:1} fa ao_n2_, ao_n2, ao_p2	lnop noplnop {e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop  {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 3 loop: nop				 							lnop {e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1	lnop {e6:1} fa ao_n2_, ao_n2, ao_p2	lnop nop 						lnop {e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop  {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Iteration 4 loop: nop 											lnop {e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1	lnop {e6:1} fa ao_n2_, ao_n2, ao_p2	lnop {e6:3} fmablurAo___, currAo___, w0, blurAo__lnop {e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Copy Operations loop: {e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo	{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1	{o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2	{o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__	{o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4	{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo	{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_{o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
Optimized loop: {e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
Optimized EVEN ODD loop: {e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
Optimized EVEN ODD loop: {e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
Optimized EVEN ODD loop: {e2:x} ai ao_n1__, ao_n1_, 0							{o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo							{o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3							{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo						{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1							{o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2							{o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__					{o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4							{o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset				{o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset				{o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask		{o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_				{o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_					{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo						{o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask	{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_			{o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_				{o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_				{o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr			branch: {o?:0} brzendBlockMask, loop
Solution ,[object Object]
Sometimes more or less
SSAO went from 2ms on all 6 SPUs to 1ms on all 6 SPUs
With practice, can do about 1-2 loops per day.
SSAO has 6 loops, and took a little over a week.
Same process for:
PostFX
Fullscreen Lighting
Ambient Cubemaps
Many more,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance ,[object Object],[object Object],[object Object],[object Object]
Architecture Conclusions “So, you are the scheduler that has been nipping at my heels...”
Final Final Thoughts ,[object Object]
Filmic Tonemapping changes your life.
SSAO is cool, and keep it out of your sunlight.
SPUs are awesome.
They don’t optimize themselves.
For a tight for-loop, you can do about 2x better than the compiler.,[object Object]
We’re also looking for
Senior Lighting Artist.
Graphics Programmer,[object Object]
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting

Mais conteúdo relacionado

Mais procurados

Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Philip Hammer
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbiteElectronic Arts / DICE
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019UnityTechnologiesJapan002
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)Philip Hammer
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...Codemotion
 
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color CourseFilmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Coursehpduiker
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingElectronic Arts / DICE
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect AndromedaElectronic Arts / DICE
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteElectronic Arts / DICE
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenBenjamin Glatzel
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 

Mais procurados (20)

Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
Physically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in FrostbitePhysically Based and Unified Volumetric Rendering in Frostbite
Physically Based and Unified Volumetric Rendering in Frostbite
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
Motion blur
Motion blurMotion blur
Motion blur
 
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019
 
Lighting the City of Glass
Lighting the City of GlassLighting the City of Glass
Lighting the City of Glass
 
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
 
Stochastic Screen-Space Reflections
Stochastic Screen-Space ReflectionsStochastic Screen-Space Reflections
Stochastic Screen-Space Reflections
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
 
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color CourseFilmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Moving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based RenderingMoving Frostbite to Physically Based Rendering
Moving Frostbite to Physically Based Rendering
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
High Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in FrostbiteHigh Dynamic Range color grading and display in Frostbite
High Dynamic Range color grading and display in Frostbite
 
Volumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the FallenVolumetric Lighting for Many Lights in Lords of the Fallen
Volumetric Lighting for Many Lights in Lords of the Fallen
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 

Semelhante a Hable John Uncharted2 Hdr Lighting

Hdr Meets Black And White 2
Hdr Meets Black And White 2 Hdr Meets Black And White 2
Hdr Meets Black And White 2 Francesco Carucci
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingSteven Tovey
 
CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding ColorMark Kilgard
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1qpqpqp
 
Analyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade camerasAnalyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade camerasSaiTedla1
 
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...Edge AI and Vision Alliance
 
Shooting High Dynamic Range
Shooting High Dynamic RangeShooting High Dynamic Range
Shooting High Dynamic RangeJessica Young
 
Real-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow VolumesReal-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow VolumesMark Kilgard
 
Deferred shading
Deferred shadingDeferred shading
Deferred shadingFrank Chao
 
Scratch a pixel - Reflection
Scratch a pixel - ReflectionScratch a pixel - Reflection
Scratch a pixel - ReflectionYiwei Gong
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86Droidcon Berlin
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
Unity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity Technologies
 
Rendering basics
Rendering basicsRendering basics
Rendering basicsicedmaster
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-featuresRaimundo Renato
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)Mark Kilgard
 
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...Hemantha Kulathilake
 

Semelhante a Hable John Uncharted2 Hdr Lighting (20)

Hdr Meets Black And White 2
Hdr Meets Black And White 2 Hdr Meets Black And White 2
Hdr Meets Black And White 2
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time Lighting
 
CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding Color
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
 
Analyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade camerasAnalyzing color imaging failure on consumer-grade cameras
Analyzing color imaging failure on consumer-grade cameras
 
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
“CMOS Image Sensors: A Guide to Building the Eyes of a Vision System,” a Pres...
 
Shooting High Dynamic Range
Shooting High Dynamic RangeShooting High Dynamic Range
Shooting High Dynamic Range
 
Real-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow VolumesReal-time Shadowing Techniques: Shadow Volumes
Real-time Shadowing Techniques: Shadow Volumes
 
Hdr magic
Hdr magicHdr magic
Hdr magic
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 
Scratch a pixel - Reflection
Scratch a pixel - ReflectionScratch a pixel - Reflection
Scratch a pixel - Reflection
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
Image processing Presentation
Image processing PresentationImage processing Presentation
Image processing Presentation
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
Unity: Next Level Rendering Quality
Unity: Next Level Rendering QualityUnity: Next Level Rendering Quality
Unity: Next Level Rendering Quality
 
Rendering basics
Rendering basicsRendering basics
Rendering basics
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-features
 
Chapter01 (2)
Chapter01 (2)Chapter01 (2)
Chapter01 (2)
 
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
A Practical and Robust Bump-mapping Technique for Today’s GPUs (slides)
 
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
COM2304: Intensity Transformation and Spatial Filtering – I (Intensity Transf...
 

Mais de ozlael ozlael

Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)ozlael ozlael
 
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)ozlael ozlael
 
Optimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark HarknessOptimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark Harknessozlael ozlael
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...ozlael ozlael
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.ozlael ozlael
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.ozlael ozlael
 
Infinity Blade and beyond
Infinity Blade and beyondInfinity Blade and beyond
Infinity Blade and beyondozlael ozlael
 
스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기ozlael ozlael
 
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)ozlael ozlael
 
Introduce coco2dx with cookingstar
Introduce coco2dx with cookingstarIntroduce coco2dx with cookingstar
Introduce coco2dx with cookingstarozlael ozlael
 
Deferred rendering case study
Deferred rendering case studyDeferred rendering case study
Deferred rendering case studyozlael ozlael
 
Kgc make stereo game on pc
Kgc make stereo game on pcKgc make stereo game on pc
Kgc make stereo game on pcozlael ozlael
 
Modern gpu optimize blog
Modern gpu optimize blogModern gpu optimize blog
Modern gpu optimize blogozlael ozlael
 
Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3ozlael ozlael
 
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)ozlael ozlael
 
Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]ozlael ozlael
 

Mais de ozlael ozlael (20)

Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)Unity & VR (Unity Roadshow 2016)
Unity & VR (Unity Roadshow 2016)
 
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
뭣이 중헌디? 성능 프로파일링도 모름서 - 유니티 성능 프로파일링 가이드 (IGC16)
 
Optimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark HarknessOptimizing mobile applications - Ian Dundore, Mark Harkness
Optimizing mobile applications - Ian Dundore, Mark Harkness
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
Infinity Blade and beyond
Infinity Blade and beyondInfinity Blade and beyond
Infinity Blade and beyond
 
스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기스티브잡스처럼 프레젠테이션하기
스티브잡스처럼 프레젠테이션하기
 
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
유니티의 라이팅이 안 이쁘다구요? (A to Z of Lighting)
 
Introduce coco2dx with cookingstar
Introduce coco2dx with cookingstarIntroduce coco2dx with cookingstar
Introduce coco2dx with cookingstar
 
Deferred rendering case study
Deferred rendering case studyDeferred rendering case study
Deferred rendering case study
 
Kgc make stereo game on pc
Kgc make stereo game on pcKgc make stereo game on pc
Kgc make stereo game on pc
 
mssao presentation
mssao presentationmssao presentation
mssao presentation
 
Modern gpu optimize blog
Modern gpu optimize blogModern gpu optimize blog
Modern gpu optimize blog
 
Modern gpu optimize
Modern gpu optimizeModern gpu optimize
Modern gpu optimize
 
Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3Bickerstaff benson making3d games on the playstation3
Bickerstaff benson making3d games on the playstation3
 
DOF Depth of Field
DOF Depth of FieldDOF Depth of Field
DOF Depth of Field
 
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
Hable uncharted2(siggraph%202010%20 advanced%20realtime%20rendering%20course)
 
Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]Deferred rendering in_leadwerks_engine[1]
Deferred rendering in_leadwerks_engine[1]
 
Deferred shading
Deferred shadingDeferred shading
Deferred shading
 

Hable John Uncharted2 Hdr Lighting

  • 1. Uncharted 2: HDR Lighting John Hable Naughty Dog
  • 2. Agenda Gamma/Linear-Space Lighting Filmic Tonemapping SSAO Rendering Architecture/How it all fits together.
  • 3.
  • 6. Also used to work for EA Los Angeles
  • 8. Not PQRS (Boom Blox)‏
  • 10.
  • 11.
  • 12. Never heard about it in college
  • 13. George Borshukov taught it to me back at EA
  • 15. Issue is getting more traction now
  • 16.
  • 17. Suppose we show alternating light/dark lines.
  • 18.
  • 19.
  • 20. That’s the box on the top.128
  • 21.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. So if you output the left image to the framebuffer, it will actually look like the one right.2.2
  • 29.
  • 30.
  • 31.
  • 32. This is stored on your hard drive.
  • 33. You never see this image, but it’s on your hard drive.2.2
  • 34.
  • 35.
  • 36.
  • 37. On a scale from 0-255
  • 39. 1 would equal our current value of 20
  • 40. 2 would equal our current value of 28
  • 41. 3 would equal our current value of 33
  • 42.
  • 43.
  • 44.
  • 45. color = pow( tex2D( Sampler, Uv ), 2.2 )‏
  • 46. Do your lighting calculations
  • 47. Account for gamma on color output
  • 48.
  • 49.
  • 50.
  • 51.
  • 56.
  • 57.
  • 58.
  • 59.
  • 61. Use sRGB hardware or pow(2.2) on read
  • 64. Don’t use sRGB or pow(2.2) on read
  • 65.
  • 68.
  • 69. Uncharted 2 had them as Linear
  • 70. Artists have trouble tweaking them to look right
  • 71.
  • 72. Technically, it’s a mathematical value like a normal map.
  • 73. But artists tweak them a lot and bake extra lighting into them.
  • 74. Uncharted 2 had them as Linear
  • 75.
  • 76. sRGB: PC and PS3 gamma
  • 77.
  • 78. Way too bright at the low end.
  • 79. HDR the Bungie Way, Chris Tchou
  • 80. Post Processing in The Orange Box, Alex Vlachos
  • 81. Output curve is extra contrasty
  • 82. Henry LaBounta was going to talk about it.
  • 83. Try hooking up your Xenon to a waveform monitor and display a test pattern. Prepare to be mortified.
  • 84.
  • 85. Part 2: Filmic Tonemaping
  • 86.
  • 90.
  • 91. Note: 1 F-Stop is a power of two
  • 92.
  • 95.
  • 97. This one is local
  • 98.
  • 100. Dynamic Range in different shots
  • 104.
  • 105. Within a Shot – HDR Tonemapping
  • 106. Different Shots – Choosing the correct exposure?
  • 108. Within a Shot – HDR Tonemapping
  • 109.
  • 110. Within a Shot – HDR Tonemapping
  • 113.
  • 115.
  • 118.
  • 119. The crisper, more saturated blacks of improper gamma.
  • 120. The nice soft highlights of Reinhard
  • 121.
  • 122. CG Supervisor at Digital Domain
  • 123.
  • 124.
  • 125.
  • 126. Film vs. Digital purists.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137. Filmic > Linear > Reinhard
  • 138. In terms of “Soft Highights” vs. “Clamped Highlights”
  • 139.
  • 140. What happens if we go down?
  • 141.
  • 142.
  • 143.
  • 145.
  • 146. Replaces the entire Lin/Log and Texture LUT
  • 147. Includes the pow(x,1/2.2)‏x = max(0, LinearColor-0.004); GammaColor = (x*(6.2*x+0.5))/(x*(6.2*x+1.7)+0.06);
  • 148.
  • 150. Most monitors are too contrasty, strong toe makes it worse
  • 151. May want less shoulder
  • 152. The more range, the more shoulder
  • 153.
  • 154. Filmic Tonemapping Shoulder Strength = 0.22 Linear Strength = 0.30 Linear Angle = 0.10 Toe Strength = 0.20 Toe Numerator = 0.01 Toe Denominator = 0.30 Linear White Point Value = 11.2 These numbers DO NOT have the pow(x,1/2.2) baked in
  • 155.
  • 156. IMO: The most important post effect
  • 157. Changes your life completely
  • 158. Once you have it, you can't live without it
  • 159. If you implement it, you have to redo all your lighting
  • 160.
  • 162.
  • 164.
  • 165. But if you are already in shadow, you need something else...
  • 166. It's the AO that grounds the car in those shots.
  • 167. So what if we took the AO out?
  • 168.
  • 169.
  • 170.
  • 171.
  • 172.
  • 173. How to use AO?
  • 174. How to use AO?
  • 175. How to use AO?
  • 176. How to use AO?
  • 177. Sunlight Q) Why not have it affect sunlight? A) It does the exact opposite of what you want it to do.
  • 178.
  • 182.
  • 184. No special filtering (I.e. Bilateral)‏
  • 186.
  • 189.
  • 190. SSAO Passes 1) Base SSAO
  • 191. SSAO Passes 2) Dilate Horizontal
  • 192. SSAO Passes 3) Dilate Vertical
  • 193. SSAO Passes 4) Low Pass Filter (I.e. blur the $&%@ out of it)‏
  • 194.
  • 196.
  • 197.
  • 198.
  • 199.
  • 200.
  • 201.
  • 202.
  • 203.
  • 205. Have option to turn it off per surface
  • 206. Most snow fades at distance
  • 207. Some objects apply sqrt() to brighten it
  • 208.
  • 210. Artifacts are not too bad
  • 211. Could use better filtering for halos
  • 212.
  • 213.
  • 214. Here is the base overview.
  • 215.
  • 216.
  • 217.
  • 218.
  • 219. Say you have 5 loads to do
  • 220.
  • 221. Load 1 in Dryer
  • 222. Load 2 in Washer
  • 223. Load 2 in Dryer
  • 224. Load 3 in Washer
  • 225. Load 3 in Dryer
  • 226. Load 4 in Washer
  • 227. Load 4 in Dryer
  • 228. Load 5 in Washer
  • 229.
  • 230. Load 2 in Washer, Load 1 in Dryer
  • 231. Load 3 in Washer, Load 2 in Dryer
  • 232. Load 4 in Washer, Load 3 in Dryer
  • 233. Load 5 in Washer, Load 4 in Dryer
  • 234.
  • 236. It’s how we optimize SPU code by hand
  • 237. Pal Engstad (our Lead Graphics Programmer) will be putting a doc online explaining the full process.
  • 238. Should be online: look for the post on the blog
  • 241.
  • 242. SPU Assembly 1/2 EVEN ODD loop: {nop} {o6} lqxcurrAo, pvColor, colorOffset {nop} {o4} shufb prevAo0, currAo, currAo, s_AAAA {nop} {o4} shufb ao_n30, prevAo0, currAo, s_BCDa {nop} {o4} shufb ao_n20, prevAo0, currAo, s_CDab {nop} {o4} shufb ao_n10, prevAo0, currAo, s_Dabc {e2} selb ao_n4, ao_n4, prevAo0, endLineMask {lnop} {e2} selb ao_n3, ao_n3, ao_n30, endLineMask {lnop} {e2} selb ao_n2, ao_n2, ao_n20, endLineMask {lnop} {e2} selb ao_n1, ao_n1, ao_n10, endLineMask {lnop} {nop} {o6} lqxnextAo, pvNextColor, colorOffset {nop} {o4} shufb ao_p1, currAo, nextAo, s_BCDa {nop} {o4} shufb ao_p2, currAo, nextAo, s_CDab {nop} {o4} shufb ao_p3, currAo, nextAo, s_Dabc {e6} fa ao_n4, ao_n4, nextAo {lnop} {e6} fa ao_n3, ao_n3, ao_p3 {lnop} {e6} fa ao_n2, ao_n2, ao_p2 {lnop} {e6} fa ao_n1, ao_n1, ao_p1 {lnop}
  • 243. SPU Assembly 2/2 EVEN ODD {e6} fm blurAo, ao_n4, w4 {lnop} {e6} fma blurAo, ao_n3, w3, blurAo {lnop} {e6} fma blurAo, ao_n2, w2, blurAo {lnop} {e6} fma blurAo, ao_n1, w1, blurAo {lnop} {e6} fmablurAo, currAo, w0, blurAo {lnop} {nop} {o6} stqxblurAo, pvColor, colorOffset {nop} {o4} shlqbii ao_n4, currAo, 0 {nop} {o4} shlqbii ao_n3, ao_p1, 0 {nop} {o4} shlqbii ao_n2, ao_p2, 0 {nop} {o4} shlqbii ao_n1, ao_p3, 0 {e2} ceqendLineMask, colorOffset, endLineOffset {lnop} {e2} ceqendBlockMask, colorOffset, endOffset {lnop} {e2} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {lnop} {e2} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {lnop} {e2} a endLineOffset, endLineOffset, endLineOffsetIncr {lnop} {e2} a colorOffset, colorOffset, colorOffsetIncr {lnop} {nop} branch: {o?} brzendBlockMask, loop
  • 244.
  • 245. SPUs issue one even and odd instruction per cycle
  • 246. One line = One cycle
  • 247. e6 = even, 6 cycle latency
  • 248.
  • 249. Iteration 1 loop: nop lnop nop {o6:0} lqx currAo, pvColor, colorOffset nop {o6:0} lqx nextAo, pvNextColor, colorOffset nop {o4:0} shlqbiiendLineMask_, endLineMask, 0 nop lnop nop lnop nop lnop nop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab nop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 250. Iteration 1 loop: nop lnop nop {o6:0} lqx currAo, pvColor, colorOffset nop {o6:0} lqx nextAo, pvNextColor, colorOffset nop {o4:0} shlqbiiendLineMask_, endLineMask, 0 nop lnop nop lnop nop lnop nop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab nop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 251.
  • 253. currAo won’t be ready yet
  • 254.
  • 255. Iteration 1 loop: nop lnop nop {o6:0} lqx currAo, pvColor, colorOffset nop {o6:0} lqx nextAo, pvNextColor, colorOffset nop {o4:0} shlqbiiendLineMask_, endLineMask, 0 nop lnop nop lnop nop lnop nop {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab nop {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask lnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 256. Iteration 2 loop: nop lnop {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset nop{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 lnop {e6:1} fa ao_n2_, ao_n2, ao_p2 lnop noplnop {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa nop{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 257. Iteration 3 loop: nop lnop {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 lnop {e6:1} fa ao_n2_, ao_n2, ao_p2 lnop nop lnop {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMasklnop {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 258. Iteration 4 loop: nop lnop {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 lnop {e6:1} fa ao_n2_, ao_n2, ao_p2 lnop {e6:3} fmablurAo___, currAo___, w0, blurAo__lnop {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_lnop {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ lnop {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 259. Copy Operations loop: {e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3{o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo{o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_{o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask{o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_{o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncrbranch: {o?:0} brzendBlockMask, loop
  • 260. Optimized loop: {e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 261. Optimized EVEN ODD loop: {e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 262. Optimized EVEN ODD loop: {e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 263. Optimized EVEN ODD loop: {e2:x} ai ao_n1__, ao_n1_, 0 {o4:x} shlqbii currAo_, currAo, 0 {e6:1} fa ao_n4, ao_n4, nextAo {o6:0} lqx currAo, pvColor, colorOffset {e6:1} fa ao_n3, ao_n3, ao_p3 {o6:0} lqx nextAo, pvNextColor, colorOffset {e6:2} fmablurAo_, ao_n2_, w2, blurAo {o4:0} shlqbiiendLineMask_, endLineMask, 0 {e6:1} fa ao_n1_, ao_n1, ao_p1 {o4:x} shlqbii ao_p1_, ao_p1, 0 {e6:1} fa ao_n2_, ao_n2, ao_p2 {o4:x} shlqbii ao_p2_, ao_p2, 0 {e6:3} fmablurAo___, currAo___, w0, blurAo__ {o4:x} shlqbii ao_p3_, ao_p3, 0 {e6:1} fm blurAo, ao_n4, w4 {o4:0} shufb prevAo0, currAo, currAo, s_AAAA {e2:0} ceqendLineMask, colorOffset, endLineOffset {o4:0} shufb ao_p1, currAo, nextAo, s_BCDa {e2:0} ceqendBlockMask, colorOffset_, endOffset {o4:0} shufb ao_p2, currAo, nextAo, s_CDab {e2:0} selbendLineOffsetIncr, zero, colorLineBytes, endLineMask {o4:0} shufb ao_p3, currAo, nextAo, s_Dabc {e2:0} selb ao_n4, currAo_, prevAo0, endLineMask_ {o4:0} shufb ao_n30, prevAo0, currAo, s_BCDa {e6:2} fma blurAo__, ao_n1__, w1, blurAo_ {o4:0} shufb ao_n20, prevAo0, currAo, s_CDab {e6:1} fma blurAo, ao_n3, w3, blurAo {o4:0} shufb ao_n10, prevAo0, currAo, s_Dabc {e2:0} selbcolorOffsetIncr, defOffsetIncr, endColorOffsetIncr, endLineMask {o6:3} stqxblurAo___, pvColor, colorOffset_ {e2:0} selb ao_n3, ao_p1_, ao_n30, endLineMask_ {o4:0} shufbcolorOffset_, colorOffset_, colorOffset, s_BCa0 {e2:0} selb ao_n2, ao_p2_, ao_n20, endLineMask_ {o4:x} shlqbii currAo___, currAo__, 0 {e2:0} selb ao_n1, ao_p3_, ao_n10, endLineMask_ {o4:x} shlqbii currAo__, currAo_, 0 {e2:0} a colorOffset, colorOffset, colorOffsetIncrlnop {e2:0} a endLineOffset, endLineOffset, endLineOffsetIncr branch: {o?:0} brzendBlockMask, loop
  • 264.
  • 266. SSAO went from 2ms on all 6 SPUs to 1ms on all 6 SPUs
  • 267. With practice, can do about 1-2 loops per day.
  • 268. SSAO has 6 loops, and took a little over a week.
  • 270. PostFX
  • 273.
  • 274.
  • 275. Architecture Conclusions “So, you are the scheduler that has been nipping at my heels...”
  • 276.
  • 278. SSAO is cool, and keep it out of your sunlight.
  • 280. They don’t optimize themselves.
  • 281.
  • 284.