…inside the Integrated Native Development Experience package… or INDE
For those of you that have not used GPA, GPA stands for Graphics Performance Analyzers. GPA is a suite of individual tools that you can run on your favorite development environments: Windows, OS X, and Ubuntu. These tools allow you to analyze your Windows, Ubuntu, and Android applications and isolate and fix any performance issues.
As developers, we’ve all had the experience that once you have your primary functionality complete and start testing it on a number platforms, you might discover performance issues. We all have performance goals, but these goals are even more difficult to hit on the wide variety of hardware out there today. Low fps, lag, and stuttering are all serious issues. And we as developers want to provide the best experience when people play our apps and games – no matter what platform they use. GPA can help you identify your hot spots and reach your performance goals, no matter which devices you are targeting.
So, how do our tools work? Because we run on such a wide variety of platforms, our tools have a host/target architecture. The reason for this is twofold. First, everyone has their favorite development environment. The second is because small handheld devices do not lend themselves to interactive debugging tools. when developing with low-power/low-compute devices, the monitoring tools themselves can get in the way of accurate measurements and affect the performance. So in order to get the best results, we run a small collector on the target system but have our powerful analysis and investigation tools on your favorite development environment. Even if you are a purely desktop developer, this mechanism give you the ability to remotely monitor/collect data from other desktop systems as well.
Here is a more exhaustive list of platforms we run on. As you can see, it is quite a few! We support a wide variety of host operating systems for our powerful analysis and evaluation tools and also support a wide variety of target hardware as well.
As you can see, you can collect all your favorite DirectX versions on Windows, collect OpenGL on Ubuntu, and collect performance data from android targets from Windows, Mac, or Ubuntu hosts.
So what’s inside GPA? Here we see the 4 major analysis applications:
1. First there is system analyzer which allows you to connect to an application and not only monitor realtime metrics, but also override state, trigger captures of individual frames, and make traces
2. Graphics monitor is our lightweight monitoring and launching tool. This component allows you to launch applications, monitor them via a simple HUD, collect frames and traces, set triggered events, and manage the configuration of apps you wish to collect data from.
3. Platform analyzer is our GPU/CPU workload visualization tool. It allows you to see GPU and CPU workloads at the same time, which lets you spot difficult CPU/GPU interaction issues or discover if you are CPU or GPU bound.
4. Finally, there is Frame Analyzer which is our powerful frame analysis tool. You can capture individual frames from your game, inspect all aspects of them, and find performance issues at the individual draw call level.
Ok, don’t let this slide daunt you too much. There is a wide variety of ways to find your performance bottlenecks, but this is one of the most methodical ways to isolate what you are looking for if you really don’t know where to start. At a high level – our first objective is to find out if we CPU or GPU limited. After we discover that, we can start making the right kinds of performance optimizations. After all, you don’t want to spend valuable time optimizing your GPU code if the performance issue is on the CPU or vice versa..
First, we do online or realtime analysis. Lets say you have a performance issue in your game. For desktop applications, you can start your analysis either on the system you are using or remotely. For android, you use your development system to connect to your android application. You then run your game under analysis using either System Analyzer or monitor to capture frames and activity traces.
So, lets talk a little about frame and trace captures. A Frame capture collects all the graphics API calls for an individual frame. A Trace capture collect both CPU and GPU activity and give you an overall picture of what the system is doing. We can capture any number of these at any part of our application or game. If one particular level is causing you issues, you can go to that level and only capture frames and traces for that part of your game.
After we have captured our frames and traces, we then use our analysis tools to figure out where our problem lies. We first analyze our traces with Platform Analyzer. This quickly tells us if we are spending the majority of our time on the cpu or gpu. If you find you’re a CPU bound, you can discover which actions are causing your CPU bottlenecks in Platform Analyzer or dig in even further by using Intel’s Vtune analyzer.
For cases when you are GPU bound, you can use Frame Analyzer to discover what is taking up your GPU time. You can inspect your frame 1 api call at a time and run experiments to figure out which graphics calls are causing performance issues.
This all sounds great, but how easy is GPA to use? It turns out - Very easy! Analyzing your desktop, laptop, or android applications requires NO code changes to your application at all. No recompiles/no filling your code with instrumentation macros, - nothing. For analyzing Android applications, you don’t even have to root your Android device. You also don’t even have to have your Andoird device drirectly connected – you can do it over the network with ADB!
So now that we’ve seen the overall workflow, lets look at each tool individually and how it can help you find performance bottlenecks.
As we mentioned before, you can use the Monitor application to launch your app. Simply point Monitor at your application, launch it, and examine the performance. Monitor has it’s own set of custom configuration features that allow you to handle even complex app starting mechanisms such as launchers like Steam. Just launch your app and collect frame captures or traces, turn on/off overrides and watch the performance in the graphs.
Another powerful feature of monitor is the ability to set trigger captures. Lets say you are getting random stutters in your game. On games that are running 30, 60 or more fps, it is almost impossible to trigger the capture of a problematic frame by hand. Instead, you can tell monitor that if the FPS drops below a certain threshold, to capture the frame automatically. You can set a wide variety of system triggers based on CPU and GPU data. This can be an invaluable tool for capturing hard to reproduce issues.
As we mentioned earlier, System Analyzer is one of the first applications you’ll likely use. It allows you to connect to your application and gather real-time information. You can view highly detailed CPU and GPU activity, power metrics, and a host of other platform data – all in realtime. The realtime graphs help you quickly see not only if you are CPU or GPU bound, but if there are any activity spikes. If you have stuttering or lag issues with certain actions in your application, these graphs can help you find if it was CPU, GPU, or some other issue.
The Realtime power analysis is particularly important for mobile platforms. Our biggest power hogging activities are sometimes not what we expect. Using these tools with developers, we found that one of the biggest power hog offenders for mobile devices were the menu screens. Many games run with no FPS limit. Because the title screen and menu screens are very easy to render, they often ran at hundreds of frames per second. Unfortunately, all this flipping is very power intensive. Because the menu screens only appear at certain times that aren’t usually investigated, a general analysis never caught these spikes. But they were key battery chewing parts of the application since main menu and pause screens were often displayed when a game was left idle.
Here’s an example of some of the realtime metrics you can look at, but there are many more. Our CPU and GPU metrics are very high precision and based on internal hardware counters. Analyzing your application on an Intel CPU or GPU gives you access to these high quality metrics.
Power metrics are great for android development – but are also very important for laptops and ultrabooks. As our applications and games are increasingly run on a wide range of mobile devices, power is an important concern for almost any developer.
Not only can you monitor what your application or game is doing in realtime, but there are also some really powerful tools in System Analyzer. You not only get realtime system-wide information – but on Intel platforms – the tool also allows you to limit and override your CPU frequency. This lets you to test your application on a range of processors without having to buy a whole test bed of different speed machines.
Another powerful feature is the ability to run realtime experiments. While you are running your game or app, you can automatically turn on or off any of these state overrides and quickly see what affect they have on frame rate, power, or cpu/gpu load. This allows you to very quickly see if you are draw call bound, texture sampling bound, or many other common problems.
Last but not least, you can actually pause and single-frame step through your game in realtime. This can be invaluable for catching visual artifacts and isolating individual frame issues.
SA is great, but what if we want to analyze a full-screen app. You can certainly monitor your application remotely, but you also get a subset of SA controls in your own game. The Monitor app allows you to launch your full-screen game and it automatically places an overlay in your game to perform many of the same operations as SA. You can see the hud overlay in blue in the upper left of this app. This is great if you need to test full-screen modes as it gives you access to almost all the same controls and overrides as System Analyzer without having a separate app or remotely monitor from a separate machine.
It should be noted that because of the tiny form factor, we don’t support HUD overlays in Android Targets
Here are some of the same kinds of realtime metrics that our HUD overlay can show you. It’s not quite as many as in System Analyzer, but still provides you a very good idea of your overall performance characteristics of your app without having to run a separate analysis tool.
Platform analyzer is our next tool in the suite. It is used for monitoring both CPU and GPU activity at the same time. This tool helps you figure out how efficiently you’re using all your computing resources. It is often the case early in development we have unbalanced workloads. Sometimes our CPU is completely busy, but our GPU is sitting idle – or vice versa. This tool quickly shows you that information.
Even more importantly, you can easily correlate CPU and GPU activity to see if there are any badly performing interactions. Again, sometimes we issue CPU or GPU commands that can stall our computing resources. This tool helps you figure out if those stalls are causing performance issues.
Platform Analyzer gives you an amazing amount of information, too much to go over now - but here’s a little overview of what you will see in an average trace capture.Here we see our GPU and thread usages over the entire timeline of the capture. You’ll note that the GPU and CPU times are aligned so you can see how they are interacting. These flows show you exactly what was happening on both your CPU and GPU at the time you captured your trace.
In the case of this frame, we see there is a GPU bubble that shows an area of poor GPU utilization. Based on the color of CPU and GPU frames we can easily see that the CPU calculations for the frame are not complete prior to the start of the GPU computations and this is likely starving the GPU pipeline. Using the threading execution view, we notice that only 1 core of the CPU is being used mot of the time. Time to see if we can add some multi-threading.
And now on to the finale – and my favorite tool in the suite. Frame Analyzer. FA runs on both DirectX frames and OpenGL|ES frames.
If you discover you are GPU bound this is where you can do a deep dive into any frame you capture. Not only can this tool tell you where your performance bottlenecks are, but it is also an amazing diagnostic and debugging tool. You can inspect any texture, render target, geometry, shader, buffer or graphics API call in your frame and run experiments to see if changing any of them helps or hurts your performance.
This is what FA looks like in DX. DX and OpenGL have a slightly different workflow, but the elements will be very similar and the functionality almost identical. As you can see, there is a lot in the tool. Lets break this up and do a little deeper dive into the various things you can do with FA.
At the top you’ll see one of the most important features – the bar chart that shows each erg and it’s performance. We use this word ‘erg’ to describe anything the app issues to the graphics API that causes the GPU to do work. As developers, we know that some DirectX or OpenGL calls don’t actually result in GPU work. For example, just setting a blend state often does not trigger actual GPU work, but a draw or present call does. The bar graph shows you all the calls that actually result in GPU work, and how long they took.
If you select an individual erg, or a range of ergs, you see the selected draw calls light up in magenta in your render target view below. This all happens in realtime. If you select new ergs, then the render target immediately highlights the new calls. You can inspect each erg and discover exactly which command is being issued.
Not only that, but if you look at the right, you’ll see the Frame Overview and Details tabs that has a list of metrics collected for the selected draw calls. You’ll get all kinds of information on how the CPU and GPU were performing when issuing those commands. Some of the metrics are GPU duration, the number of milliseconds each individual pixel or vertex shader took, how many pixels or vertices the shader processed, shader idle and stall times, and a whole host of other system metrics. With these two elements, you can track down a great number of your performance issues.
Textures are some of the largest resources we use in graphics applications and games. Sometimes our performance bottlenecks are because we are using very large textures, accidently leave the wrong textures bound, or use texture formats that are poorly optimized for your target hardware. You can select your ergs and inspect all the textures that were bound for those draw calls.
Another very useful feature is the ability to analyze the geometry of your rendered objects. Again, sometimes we have performance issues because our models are not optimized or we are rendering objects we shouldn’t be. You can also check all the mesh parameters such as topology, formats, and visualize them in realtime to see if there are issues.
A tremendously powerful feature of the tool is the ability to inspect, edit, and even replace your shaders. Obviously you can’t change the bindings for shader inputs, but you can edit your shader and immediately see the results. If shader source is available, you can edit in HLSL. If your shader source is not available, you can even modify the shader assembly. You can also edit the shader and replace it with another shader file. You can also inspect the constants and uniforms you are sending to your shader. You get to see the results visually and in the metrics in realtime - all without needing to recompile or re-run your application.
If you discover issues, it would be nice to test them right away. FA, just like SA, allows you to run experiments and test for common problems. (point at them) You can turn on/off individual ergs, replace textures with very simple ones to test sampling bottlenecks, and others. Not only that – but the frame is re-run and re-sampled to give you the difference in performance immediately.
The version of FA we have been showing you is for DirectX. But we also have FA for OpenGL ES. The layout is slightly different, but you’ll see that almost all the same features are available. Just like the new visual studio, you can toggle between light and dark color schemes.
Again, we have the ability to perform experiments on our OpenGL ES Android frames just like on the desktop. This allows you to test out a variety of common problems without having to recompile and re-run your app on your target device. A great time saving technique. There is no need to rebuild your game, transport it to your android device and re-collect data from it. Just push the experiment button and FA for OpenGL|ES will do all the work of re-running with the new settings automatically.
You can also examine all your textures – just like on the desktop. Validating your texture formats and sizes can be especially important on mobile OpenGL|ES devices since they can be a common bottleneck. It is often the case that certain texture formats do not perform well on some mobile hardware platforms. If you notice that all your slow draw calls are using the same texture format, you might see if changing that format and see if it helps your performance issue. Again, you can inspect all the parameters of your texture for performance or logic errors.
Here we are again, one of the most powerful and time saving features is shader editing. The OpenGL version of FA supports this as well. It allows you to view and make realtime edits to your vertex and pixel shaders. After each edit, the frame is re-run and the new metrics and render target output updated. This allows you to make changes in your shaders and see the results updated without having to recompile and re-run your application on your device.
So that is a quick overview of GPA toolkit and its tools. Obviously there is a lot more great features, so we’ll both be available between sessions for questions. You can also go to our website and download the tools today.