In this talk, you'll learn about the tools and techniques that Unity's Consulting and Development team uses to identify and fix performance issues. The team travels the world visiting customers and conducting Project Reviews, in-depth engagements to locate and resolve performance bottlenecks. This session is designed to help you apply their knowledge to your Unity projects, so you'll see examples of real-life performance problems, their solutions, and receive up-to-date best practice advice.
Speaker: Ignacio Liverotti – Unity
Watch the session on YouTube: https://youtu.be/GuODu4-cXXQ
3. About me and what I do here at Unity
3
— Joined Unity as a Software Engineer in 2015
— Became a Developer Relations Engineer in 2018
— I visit our Enterprise customers and help them resolve
technical issues affecting their projects
4. Project Reviews
4
— Multi-day engagement:
— We travel to our customers’ offices
— Review their projects
— Identify problems
— Some of them are resolved onsite
— We investigate and recommend
solutions for the rest
5. Project Reviews
5
— Types of problems:
– Runtime performance
– Build/patch size
– Load times
– Workflow issues
– Build times
6. Today’s plan
6
— Introduction to optimization and profiling in Unity
— CPU optimization
— GPU optimization
— Memory footprint optimization
— (Optimization) rules to live by
8. What is optimization?
8
— Modifying a project so that an aspect of it is more efficient or
uses fewer resources (CPU, GPU, etc)
9. Why do we want to optimize?
9
— To pass the certification requirements imposed by the various distribution
platforms
— To reduce battery consumption
— To deploy our project to a wider range of target devices
— To streamline our production process
11. What else does it involve?
11
— Reviewing assets (texture size, format, poly count, audio files sample freq, etc.)
— Reviewing the Project Settings
— Reviewing the assets’ settings
— Simplifying our solutions
12. The first step in our (optimization) journey: profiling!
12
— Using tools to gather actual data on how resources are being used
— This data will drive our optimization efforts
— We don’t want to optimize based on “guesses”
— We want the tools to let us know where the problems are
13. A note about optimization ‘tips’ and ‘advice’
13
— Don’t apply optimization advice blindly
– Certain pieces of advice are always applicable
– But good advice applied in the wrong situation can make things worse
– A technique that worked in a certain project and platform might not work for you
15. What is the goal of CPU optimization?
15
— To reduce the stress on the CPU
– Because the CPU is the actual performance bottleneck
– Or to free the CPU so that we can do more
16. How do we achieve that?
16
— By using more efficient combinations of algorithms and data structures
— By aiming for nearly zero per-frame allocations
– The GC algorithm can be quite CPU intensive!
17. Tools of the trade: CPU
17
— Unity Profiler
— Unity Profile Analyzer
– Don’t miss the next talk!
— Xcode Instruments
— Intel VTune Amplifier
— Consoles have their own
proprietary tools
20. Example 1: Per-frame memory allocations
20
— Scenario: Management game for mobile platforms that ‘feels’ slow
during gameplay
— We take a capture using the Unity Profiler and uncheck all items, except
for ‘GarbageCollector’:
22. Example 1: Per-frame memory allocations
22
— Our hypothesis: Our code is allocating memory on a per-frame basis
— Let’s select a random frame in the Unity Profiler:
27. Example 1: Per-frame memory allocations
27
— From
https://docs.unity3d.com/Manual/PlatformDependentCompilation.html
using System.Diagnostics;
public static class Logging
{
[Conditional ("ENABLE_LOG")]
static public void Log (object message)
{
UnityEngine.Debug.Log (message);
}
}
1
2
3
4
5
6
7
8
9
10
28. Example 1: Per-frame memory allocations
28
— If we want to see the log messages, we need to add ENABLE_LOG to the
list of defined symbols:
29. Example 1: Per-frame memory allocations
29
— Let’s remove ENABLE_LOG and reprofile:
30. Example 1: Per-frame memory allocations
30
— Zero per-frame allocations
— No GC spikes!
31. Takeaway: use the Unity Profiler to
understand where your managed allocations
are coming from and fix them
32. Example 2: GC spikes in a fast-paced game
32
— Scenario: Mobile racing game where the frame rate needs to
be steady
— Common approach: let the GC do its work
– The problem it causes: When the GC kicks in, program execution
actually stops
– Also, the larger the managed heap, the longer it takes for the
GC algorithm to complete
33. Example 2: GC spikes in a fast-paced game
33
— GC capture:
34. Example 2: GC spikes in a fast-paced game
34
— What we recommend:
– Unload all resources when transitioning from the menu to the ‘racing’ scene
– Allocate a pool of objects
– Optimize the frame time as much as possible
– Enable the incremental garbage collector
35. Example 2: GC spikes in a fast-paced game
35
— The incremental GC was introduced in the early 2019 development cycle
— Instead of causing a single, long interruption, it splits the work across multiple
slices
36. Example 2: GC spikes in a fast-paced game
36
— Incremental GC capture:
37. Example 2: GC spikes in a fast-paced game
37
— Enable it via the Player Settings menu:
38. Takeaway: use the incremental GC and
remember to optimize the frame time as
much as possible so that we can give it
room to do its job
40. What are the goals of GPU optimization?
40
— Reduce the stress on the GPU so that we can render our
scene at the target frame rate
— Free up the GPU for performing other tasks (including
offloading work from the CPU via compute shaders)
41. How do we achieve that?
41
— Minimizing the number of unnecessary rendering operations
— Reducing the amount of data sent to the GPU
— Minimizing the number of state changes (‘set pass’ calls)
— Optimizing our most expensive shaders
42. Tools of the trade: GPU
42
— Unity Frame Debugger
— RenderDoc
— NVidia NSight
— XCode Frame Capture
— Intel GPA
— Consoles have their own
proprietary tools
44. Example 3: Strategy game for mobile
44
— Scenario: A customer working on strategy game for
iOS/Android were experiencing framerate issues
— We profiled it using Xcode Frame Capture and saw a warning
message saying that we were sending too much geometry to
the GPU
47. Example 3: Strategy game for mobile
47
— Both draw calls have the same geometry as input:
48. — But the output of one of them is taking significantly more
screen real state in the final frame than the other one!
Example 3: Strategy game for mobile
48
54. Takeaway: by understanding our
requirements and the tools and techniques
at our disposal, we’ve achieved a nearly 40%
reduction without observable differences in
the final output
55. Example 4: Sprite rendering
55
— Scenario: A customer working on a top down tile-based
strategy game for PC experienced very low frame rates when
deploying to mobile and WebGL
56. Example 4: Sprite rendering
56
— We profiled the game using the Unity Profiler and saw high frame times
— Most of that time was spent on rendering
— The CPU was too busy creating and sending rendering commands to
the GPU
57. Example 4: Sprite rendering
57
— The Frame Debugger revealed one draw call per tile (several hundred draw
calls in the real project!)
62. Example 4: Sprite rendering
1
2
3
4
5
6
7
8
9
10
11
12
using UnityEngine;
public class Tile : MonoBehaviour
{
private Material _spriteRendererMaterial;
void PreprocessMethod()
{
var spriteRenderer = GetComponent<SpriteRenderer>();
_spriteRendererMaterial = spriteRenderer.material;
}
}
62
This statement creates a
copy of the first material
from this SpriteRenderer,
assigns it to the
SpriteRenderer and returns
it.
63. Example 4: Sprite rendering
1
2
3
4
5
6
7
8
9
10
11
12
using UnityEngine;
public class Tile : MonoBehaviour
{
private Material _spriteRendererMaterial;
void PreprocessMethod()
{
var spriteRenderer = GetComponent<SpriteRenderer>();
_spriteRendererMaterial = spriteRenderer.sharedMaterial;
}
}
63
64. Example 4: Sprite rendering
64
— All tiles are now drawn in the same batch
— The game was successfully deployed to mobile and WebGL
— And the performance of the standalone version improved as well!
65. Takeaway: identifying the problem and
understanding the underlying issue allowed us to
trim several hundred draw calls per frame
67. — Fit on devices that don’t have a large amount of memory
— Improve loading times
— Being able to add more content
— Avoid hard-crashes due to out of memory situations
— Improving overall performance by shuffling around less data
during runtime
What are the goals of memory footprint
reduction?
67
68. Tools of the trade:
memory footprint
— Unity Memory Profiler
— Xcode Instruments Allocations
— Xcode Instruments VM Tracker
— Consoles have their own
proprietary tools
68
71. Example 5: Built-in shaders duplication
71
— A customer project has a large number of materials that use
Unity’s Standard shader:
72. Example 5: Built-in shaders duplication
72
— Each Material is stored in its own AssetBundle:
73. Example 5: Built-in shaders duplication
73
— A memory snapshot of the project reveals that there are
multiple instances of the Standard shader in memory:
74. Example 5: Built-in shaders duplication
74
— This happens because the Standard shader is one of Unity’s
built-in shaders
— As such, it cannot be explicitly included in an AssetBundle
— And it will be implicitly included in every AssetBundle that
has a material with a reference to it
75. Example 5: Built-in shaders duplication
75
— What we recommend instead:
– Download a copy of the built-in shaders
– Make a copy of the Standard Shader, rename it (e.g., ‘Unite 2019
Standard’) and add it to its own AssetBundle
– Fix the materials so that they use the new renamed shader
– Rebuild the AssetBundles
76. Example 5: Built-in shaders duplication
76
— After rebuilding and taking a new memory snapshot:
– We now have a single instance of our custom ‘Unite 2019 Standard’
shader
77. Takeaway: by using the right tools and
understanding the internals of the engine, we
were able to eliminate duplicates in memory
78. 78
— Scenario: A customer is porting a desktop game to mobile
platforms and it keeps crashing on low-end devices due to its
high memory footprint
Example 6: Unable to run the game on mobile
79. Example 6: Unable to run the game on mobile
79
— A memory snapshot of the project reveals this:
81. 81
— Let’s look at the settings for these assets:
Example 6: Unable to run the game on mobile
82. 82
— ‘Decompress on load’ option description from the Unity manual:
– Audio files will be decompressed as soon as they are loaded
– This option should be used for smaller compressed sounds to avoid the
performance overhead of decompressing on the fly
– Be aware that decompressing Vorbis-encoded sounds on load will use about
ten times more memory than keeping them compressed, so don’t use this
option for large files
Example 6: Unable to run the game on mobile
83. Do we have other options?
We do! There’s a ‘Streaming’ option
84. 84
— Description from the Unity manual:
– Decode audio on the fly
– Uses a minimal amount of memory to buffer compressed data
– The data is incrementally read from the disk and decoded on the fly
Example 6: Unable to run the game on mobile
85. 85
— Let’s change the load type to ‘Streaming’:
Example 6: Unable to run the game on mobile
86. 86
— And take another memory snapshot:
Example 6: Unable to run the game on mobile
88. Takeaway: understanding our requirements
and using the correct settings allowed us to
reduce the audio memory footprint by ~99%
89. 89
— These problems can be avoided!
— Let’s not catch them via the Memory Profiler
— Instead, let’s use an AssetPostprocessor and create rules:
– Background music assets should be set to streaming
– SFX should be set to decompress on load
– Etc
Example 6: Unable to run the game on mobile
92. (Optimization) rules to live by
92
— Don’t assume where the bottlenecks are, always profile first
— Profile on the target device
— Profile early, profile often
— Don’t apply several fixes simultaneously, tackle one problem at the time
— Apply the ‘optimization triad’:
– Optimize your assets
– Update fewer things
– Draw less stuff