2. Disclaimer The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity
3. Optimization Mindset you can't just make your game faster there is no magic bullet very specific stuff not the same as scripting charachter
6. Optimization Mindset know + think = shoot in the dark you just write code hoping for the best know + measure = shoot in the dark you are missing "understand" part think + measure = shoot in the dark you solve abstract problem, not real
7. Optimization Mindset: know + think hardware is more complex then you think highly parallel deep pipelining when you write asm - high-level already
8. Optimization Mindset: know + measure knowledge is static knowledge comes from the past knowledge is general
9. Optimization Mindset: know + measure qsort vs bubble sort sure, qsort is faster but you are missing the point maybe radix? maybe no need to sort? maybe insertion? parallel sorting network?
10. Optimization Mindset: think + measure solving abstract problem example: GPU optimizing for RIVA TNT and GTX is different
14. Know your hardware: GPU Pipeline meaning - slow step = slow everything you are as slow as your bottleneck Know your pipeline Won't go into full pipeline spec Resources section Just common/biggest problems
15. Know your hardware: GPU Geometry pre/post tnl cache should use indexed geometry or not cache hit rate strips vs tri list memory throughput vertex size fetch cost (memory) pack attributes or not
19. Know your hardware: CPU Mobile = in-order RISC for stupid code far worse than CISC 2 main issues: Memory speed Computation speed
20. Know your hardware: CPU Memory This is single most important factor memory access far slower then computation Latency vs Throughput Caches fast memory your best friend L1/L2/whatever LHS
21. Know your hardware: CPU Computations SIMD better memory usage better arithmetic usage (4 vals instead of 1)
22. Know your target hardware There were general rules But you are running on that particular piece of sh... hardware
23. Know your target hardware: PowerVR TBDR perfect hidden surface removal Alpha-Test/discard shader precision unified shaders Tegra / ATI-AMD / Adreno more common
24. Know your target hardware: ARM VFP = FPU on steroids (not real SIMD) scalar instructions at same speed as vectorized NEON = SIMD more registers awesome load/store instructions not as cool as Altivec but cool enough for mobiles
25. Know your target hardware: ARM Conditional execution of most instructions Fold shifts and rotates into the "data processing" instructions load structure from array by index Thumb + float = disaster switch back and forth between Thumb mode and regular 32-bit mode
26. Know your hardware: Resources RTR lots of whitepapers: powerVR (imgtech) tegra (nvidia) adreno (qualcomm) AMD/ATI - basically the same as X360, but much smaller tiles ARM dev center
27. Think Think about your data Think about your algorithms Think about your constraints Think about your hardware
28. Think Basics CPU vs GPU e.g. draw calls pure CPU cost CPU: memory vs arithmetic memory slower GPU: vprog vs fshader memory vs arithmetic
29. Think Memory fragmentation data organization AOS vs SOA hot/cold split data structures linear vs random array vs list map vs hashtable allocators
30. Think Constraints GPU: will you see the difference? really? on mobile screen? on that one small thingy in the corner? CPU: will you need that? e.g. physics in casual game? Memory: will you need that? will you need more then XXX actors?
31. Measure you didn't optimize anything if you didn't measure difference you can't optimize if you don't know what needs to be optimized if you can't measure what takes time
32. Measure Tools there are lots of tools instruments (ios) perfhud (tegra) adreno profiler (qualcomm) some more probably Poor-man profiler timers
33. Unity use case:random bits Mobile shaders specialized of usual built-ins Skinning full NEON/VFP impl usually 10-15% of c-code time and we are not done optimizing it ;-) Rej's baking material to texture and coming soon BRDF baking to texture
34. Unity use case:random bits Remote Profiler run on target hw, data is transferred over wifi collect in Editor and show pretty graphs ;-) Sort alpha-test *after* opaque check *lots* of extensions LODs - almost done Vertex Cache optimization - after LODs ;-)
35. Closing Words Know hardware Know data Think data Think constraints Measure always You better know earlier You should be always optimizing