Presenter notes: https://www.dropbox.com/s/lq6oxuw1s3bhoun/Advanced%20Linux%20Game%20Programming%20%E2%80%93%20Presenter%20Notes.pdf
Ever since the advent of SteamOS, interest in game development for Linux has seen an increase. This lecture aims to address some more advanced issues encountered by programmers on this platform, beyond the very basic Linux setup, and drawing from over a year and two and a half games of experience in the subject. The areas discussed will be:
• Executable build improvements
• Crash handling and reporting
• Memory debugging
• OpenGL instrumentation and debugging
• Various caveats, tips and tricks.
2. Nordic Games GmbH
• Started in 2011 as a sister company to Nordic Games Publishing (We Sing)
• Base IP acquired from JoWooD and DreamCatcher (SpellForce, The Guild,
Aquanox, Painkiller)
• Initially focusing on smaller, niche games
• Acquired THQ IPs in 2013 (Darksiders, Titan Quest, Red Faction, MX vs. ATV)
• Now shifting towards being a production company with internal devs
• Since fall 2013: internal studio in Munich, Germany (Grimlore Games)
2
3. Leszek Godlewski
Programmer, Nordic Games
• Ports
● Painkiller Hell & Damnation (The Farm 51)
● Deadfall Adventures (The Farm 51)
● Darksiders (Nordic Games)
• Formerly generalist programmer on PKHD & DA at TF51
3
4. Objective of this talk
Your game engine on Linux, before porting:
4
Missing!
5. Objective of this talk (cont.)
Your first “working” Linux port:
5
Oops. Bat-Signal!
6. Objective of this talk (cont.)
Where I want to try helping you get to:
6
11. Intended takeaway
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
11
12. Intended takeaway Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
12
13. Build systems
What I had initially with UE3:
● Copy/paste of the Mac OS X toolchain
● It worked, but...
● Slow
● Huge binaries because of debug symbols
● Problematic linking of circular dependencies
13
14. Build systems (cont.)
● 32-bit binaries required for
feature/hardware parity with Windows
● Original solution: a chroot jail with an
entire 32-bit Ubuntu system just for
building
14
15. Cross-compiling for 32/64-bit
● gcc -m32/-m64 is not enough!
● Only sets target code generation
● Not headers & libraries (CRT, OpenMP, libgcc etc.)
● Fixed by installing gcc-multilib
● Dependency package for non-default architectures (i.e.
i386 on an amd64 system and vice versa)
15
16. Clang (ad nauseam)
● Clang is faster
● gcc: 3m47s
● Clang: 3m05s
● More benchmarks at Phoronix [LARABEL13]
● Clang has different diagnostics than gcc
16
17. Clang (cont.)
● Preprocessor macro compatibility
● Declares __GNUC__ etc.
● Command line compatibility
● Easily switch back & forth between Clang & gcc
17
18. Clang – caveats
● C++ object files may be incompatible with
gcc & fail to link (need full rebuilds)
● Clang is not as mature as gcc
● Occasionally has generated faulty code for me
(YMMV)
18
19. Clang – caveats (cont.)
● Slight inconsistencies in C++ standard
strictness
● Templates
● Anonymous structs/unions
● May need to add this-> in some places
● May need to name some anonymous types
19
20. So: Clang or gcc?
Both:
● Clang – quick iterations during
development
● gcc – final shipping binaries
20
21. Linking – GNU ld
● Default linker on Linux
● Ancient
● Single-threaded
● Requires specification of libraries in the order
of reverse dependency...
● We are not doomed to use it!
21
22. Linking – GNU gold
● Multi-threaded linker for ELF binaries
● ld: 18s
● gold: 5s
● Developed at Google, now officially part of
GNU binutils
22
23. Linking – GNU gold (cont.)
● Drop-in replacement for ld
● May need an additional parameter or toolchain
setup
● clang++ -B/usr/lib/gold-ld ...
● g++ -fuse-ld=gold ...
● Still needs libs in the order of reverse
dependency...
23
24. Linking – reverse dependency
● Major headache/game-breaker with
circular dependencies
● ”Proper” fix: re-specify the same libraries
over and over again
● gcc app.o -lA -lB -lA
24
28. Linking – reverse dep. (cont.)
28
app A B A
Just the missing
symbols
29. Linking – library groups
● Declare library groups instead
● Wrap library list with --start-group, --end-
group
● Shorthand: -(, -)
● g++ foo.obj -Wl,-( -lA -lB -Wl,-)
● Results in exhaustive search for symbols
29
30. Linking – library groups (cont.)
● Actually used for non-library objects (TUs)
● Caveat: the exhaustive search!
● Manual warns of possible performance hit
● Not observed here, but keep that in mind!
30
31. Running the binary in debugger
inequation@spearhead:~/projects/largebinary$ gdb -–
silent largebinary
Reading symbols from /home/inequation/projects/larg
ebinary/largebinary...
[zzz... several minutes later...]
done.
(gdb)
31
32. Caching the gdb-index
● Large codebases generate heavy debug
symbols (hundreds of MBs)
● GDB does symbol indexing at every
single startup �
● Massive waste of time!
32
33. Caching the gdb-index (cont.)
● Solution: fold indexing into the build
process
● Old linkers: as described in [GNU01]
● New linkers (i.e. gold): --gdb-index
● May need to forward from compiler driver:
-Wl,--gdb-index
33
34. Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
34
35. Signal handlers
● Unix signals are async notifications
● Sources can be:
● the process itself
● another process
● user
● kernel
35
36. Signal handlers (cont.)
● A lot like interrupts
● Jump to handler upon first non-atomic op
36
Normal program
flow
Normal flow
resumes
Signal
handler
execution
SIGNAL
37. Signal handlers (cont.)
● System installs a default handler
● Usually terminates and/or dumps core
● Core ≈ minidump in Windows parlance, but entire
mapped address range is dumped (truncated to
RLIMIT_CORE bytes)
● See signal(7) for default actions
37
38. Signal handlers (cont.)
● Can (should!) specify custom handlers
● Get/set handlers via sigaction(2)
● void handler(int, siginfo_t *, void *);
● Needs SA_SIGINFO flag in sigaction() call
● Extensively covered in [BENYOSSEF08]
38
39. Interesting siginfo_t fields
● si_code – reason for sending the signal
● Examples: signal source, FP over/underflow,
memory permissions, unmapped address
● si_addr – memory location (if relevant)
● SIGILL, SIGFPE, SIGSEGV, SIGBUS and
SIGTRAP
39
41. Signal handling caveats
● Prone to race conditions
● Signals may be nested
41
Normal
flow
Signal
handler
SIGNAL
Signal
handler
SIGNAL SIGNAL
42. Signal handling caveats (cont.)
● Prone to race conditions
● Can't share locks with the main program
42
Lock
mutex
Lock mutex
Deadlock ☹
Signal
handler
SIGNAL
Normal
flow
43. Signal handling caveats (cont.)
● Prone to race conditions
● Can't call async-unsafe/non-reentrant
functions
● See signal(7) for a list of safe ones
● Notable functions not on the list:
● printf() and friends (formatted output)
● malloc() and free()
43
44. Signal handling caveats (cont.)
● Not safe to allocate or free heap memory
44
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
STOMPED?
Source: [LEA01]
45. Signal handling caveats (cont.)
● Custom handlers do not dump core
● At handler installation time:
● Raise RLIMIT_CORE to desired core size
● Inside handler, after custom logging:
● Restore default handler using signal(2) or
sigaction(2)
● raise(signum);
45
46. Safe stack walking
● glibc provides backtrace(3) and friends
● Symbols are read from the dynamic
symbol table
● Pass -rdynamic at compile-time to populate
46
47. Safe stack walking (cont.)
● backtrace_symbols() internally calls
malloc()
● Not safe... ☹
● Still, can get away with it most of the time
(YMMV)
47
48. Example “proper” solution
● Fork a watchdog process in main()
● Communicate over a FIFO pipe
● In signal handler:
● Collect & send information down the pipe
● backtrace_symbols_fd() down the pipe
● Demo code: is.gd/GDCE14Linux
48
49. Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
49
50. Is this even related to porting?
● Yes! Portability bugs easily overlooked
● Hardcoded struct sizes/offsets
● OpenGL buffers
● Incorrect binary packing/unpacking
● “How did we/they manage to ship that?!”
50
51. What is Valgrind?
● Framework for dynamic, runtime analysis
● Dynamic recompilation
● machine code IR tool machine code→ → →
● Performance typically at 25-20% of unmodified
code
● Worse if heavily threaded – execution is serialized
51
52. What is Valgrind? (cont.)
● Many tools in it:
● Memory error detectors (Memcheck,
SGcheck)
● Cache profilers (Cachegrind, Callgrind)
● Thread error detectors (Helgrind, DRD)
● Heap profilers (Massif, DHAT)
52
53. Memcheck basics
● Basic usage extremely simple
● …as long as you use the vanilla libc malloc()
● valgrind ./app
● Will probably report a ton of errors on the
first run!
● Again: “How did they manage to ship that?!”
53
54. Memcheck basics (cont.)
● Many false positives, esp. in 3rd
parties
● Xlib, NVIDIA driver
● Can suppress them via suppress files
● Call Valgrind with --gen-suppressions=yes to
generate suppression definitions
● Be careful with that! Can let OpenGL bugs slip!
54
55. Contrived example
#include <stdlib.h>
int main(int argc, char *argv[]) {
int foo, *ptr1 = &foo;
int *ptr2 = malloc(sizeof(int));
if (*ptr1)
ptr2[1] = 0xabad1dea;
else
ptr2[1] = 0x15bad700;
ptr2[0] = ptr2[2];
return *ptr1;
}
55
Demo code:
is.gd/GDCE14Linux
56. Valgrind output for such
==8925== Conditional jump or move depends on
uninitialised value(s)
==8925== Invalid write of size 4
==8925== Invalid read of size 4
==8925== Syscall param exit_group(status)
contains uninitialised byte(s)
==8925== LEAK SUMMARY:
==8925== definitely lost: 4 bytes in 1 blocks
56
57. What about custom allocators?
● Custom memory pool & allocation algo
● Valgrind only “sees” mmap()/munmap() of
multiples of entire memory pages
● All access within those pages – now valid!
● How to track errors?
57
58. Client requests
● Allow annotation of custom allocators
● ~20 C macros defined in valgrind.h
● Common and per-tool requests exist
● Can be cut out with -DNVALGRIND
● Detailed description in [VALGRIND01]
58
59. Example: Instrumenting dlmalloc
● 2.8.4 instrumentation from [CRYSTAL01]
● Demo code: is.gd/GDCE14Linux
● Compile the sample with -DDLMALLOC
● Similar results to libc malloc()
59
60. Other uses of client requests
● Pointer validation
● Is address mapped? Is it defined?
● Mid-session leak checks
● Level transitions
60
61. Other uses of client req. (cont.)
● Poisoning memory regions
● Ensuring signal handlers don't touch the heap
● Ensuring geometry buffers aren't read on CPU
61Source: [LEA01]
62. Debugging inside Valgrind
● A gdbserver for “remote” debugging
● SIGTRAP (breakpoint) on every error
● Unlimited memory watchpoints!
● Data breakpoints in Visual Studio parlance
● Cf. 4 single-word hardware debug registers on
x86
62
64. Agenda
● Build system improvements
● Signal handlers
● Memory debugging with Valgrind
● OpenGL debugging techniques
64
65. Ye Olde Way
● Call glGetError() after each OpenGL call
● Get 1 of 8 (sic!) error codes
● Look up the call in the manual
● See what this particular error means in
this particular context…
65
66. Ye Olde Way (cont.)
● …Then check what was actually the case
● 6 possible reasons for GL_INVALID_VALUE in
glTexImage*() alone! See [OPENGL01]
● Usually: attach a debugger, replay the
scenario…
● This sucks!
66
67. Ye Olde Way (cont.)
● …Then check what was actually the case
● 6 possible reasons for GL_INVALID_VALUE in
glTexImage*() alone! See [OPENGL01]
● Usually: attach a debugger, replay the
scenario…
● This sucks! used to suck ☺
67
68. Debug callback
● Never call glGetError() again!
● Much more detailed information
● Incl. performance tips from the driver
● Good to check what different drivers say
● May not work without a debug OpenGL
context (GLX_CONTEXT_DEBUG_BIT_ARB)
68
71. Debug callback (cont.)
● Verbosity can be controlled (filtering)
● glDebugMessageControl[ARB]()
● [OPENGL02][OPENGL03]
● Turn to 11 (GL_DONT_CARE) for valuable
perf information!
● Memory type for buffers, unused mip levels…
71
72. API call tracing
● Record a trace of the run of the application
● Replay and review the trace
● Look up OpenGL state at a particular call
● Inspect state variables, resources and objects:
textures, shaders, buffers...
● apitrace or VOGL
72
79. Annotation caveats
● Multi-threaded grouping may break
hierarchy
● glDebugMessageInsert() calls the debug
callback, polluting error streams
● Workaround: drop if source ==
GL_DEBUG_SOURCE_APPLICATION
79
80. Example 1: PIX events emulation
#define D3DPERF_BeginEvent(colour, name)
if (GLEW_KHR_debug && threadOwnsDevice())
glPushDebugGroup(GL_DEBUG_SOURCE_APPLICATION,
(GLuint)colour, -1, name)
#define D3DPERF_EndEvent()
if (GLEW_KHR_debug && threadOwnsDevice())
glPopDebugGroup()
80
81. Example 2: Game tech demo
● University assignment
from 2009 ☺
● Annotated OpenGL 1.4
● Demo code:
is.gd/GDCE14Linux
81
82. Takeaway
● gcc-multilib is the prerequisite for 32/64-
bit cross-compilation
● Switching back and forth between Clang
and gcc is easy and useful
● Link times can be greatly improved by
using gold
82
83. Takeaway (cont.)
● Caching the gdb-index improves
debugging experience
● Crash handling is easy to do, tricky to get
right
83
84. Takeaway (cont.)
● Valgrind is an enormous aid in memory
debugging
● Even when employing custom allocators
● OpenGL debugging experience can be
vastly improved using some extensions
84
86. Thank you!
Further Nordic Games information:
K www.nordicgames.at
Development information:
K www.grimloregames.com
86
87. References
● LARABEL13 – Larabel, M. “Clang 3.4 Performance Very Strong Against GCC 4.9” [link]
● GNU01 – “Index Files Speed Up GDB” [link]
● GNU02 – “Options for Debugging Your Program or GCC” [link]
● BENYOSSEF08 – Ben-Yossef, G. “Crash N' Burn: Writing Linux application fault handlers” [link
]
● LEA01 – Lea, D. “A Memory Allocator” [link]
● VALGRIND01 – “The Client Request mechanism” [link]
● CRYSTAL01 – “Crystal Space 3D SDK” [link]
● OPENGL01 – “glTexImage2D” [link]
● OPENGL02 – “ARB_debug_output” [link]
● OPENGL03 – “KHR_debug” [link]
● XDG01 – “XDG Base Directory Specification” [link]
● <page>(<section>), e.g. sigaction(2) – “Linux Programmer's Manual”; to view, type man
<section> <page> into a terminal or a web search engine
87
88. Special thanks
Fabian Giesen
Katarzyna Griksa
Damian Kobiałka
Jetro Lauha
Eric Lengyel
Krzysztof Narkowicz
Reinhard Pollice
Bartłomiej Wroński
Kacper Ząber
88
89. Bonus slides!
● OpenGL resource leak checking
● Intel i965 driver vs stack
● Locating user data according to
FreeDesktop.org guidelines
● Thread priorities in Linux
● Additional/new debug features
89
FREE!!!FREE!!!
90. OpenGL resource leak checking
Courtesy of Eric Lengyel & Fabian Giesen
static void check_for_leaks()
{
GLuint max_id = 10000; // better idea would be to keep track of assigned names.
GLuint id;
// if brute force doesn't work, you're not applying it hard enough
for ( id = 1 ; id <= max_id ; id++ )
{
#define CHECK( type ) if ( glIs##type( id ) ) fprintf( stderr, "GLX: leaked " #type " handle 0x%xn", (unsigned int) id )
CHECK( Texture );
CHECK( Buffer );
CHECK( Framebuffer );
CHECK( Renderbuffer );
CHECK( VertexArray );
CHECK( Shader );
CHECK( Program );
CHECK( ProgramPipeline );
#undef CHECK
}
}
90
91. Intel i965 vs stack
● Been chasing a segfault on a call instruction down
_mesa_Clear() (glClear())
● Region of code copy/pasted from D3D renderer
● Address mapped, so not an invalid jump...
● Only 16 function frames – surely this can't be a stack
overflow?
91
92. Intel i965 vs stack (cont.)
● Oh no, wait:
● Check ESP against /proc/[pid]/maps
● Yup, encroaching on unmapped address space
● Moral: cut your render some stack slack
(160+ kB), or Mesa will blow it up with
locals (e.g. in clear shader generation)
92
93. Locating user data
● There is a spec for that – see [XDG01]
● Savegames, screenshots, options etc.:
● $XDG_CONFIG_HOME or ~/.config/<app>
● Caches of all kinds:
● $XDG_CACHE_HOME or ~/.cache/<app>
● Per-user persistent data (e.g. DLC):
● $XDG_DATA_HOME or ~/.local/share/<app>
93
94. Locating user data (cont.)
● <app> subdirectory currently unregulated
● De-facto standard: simplified or “Unix name”
● Lowercase, “safe” ASCII characters, e.g. blender
● When asked, XDG people suggest rev-DNS
● com.company.appname
94
95. Thread priorities in Linux
● Priority elevation requires root permissions �
● No user will ever grant you root (scary!)
● Reason: DoS protection in servers (probably)
● Priority can be tweaked with nice()
● Think “how nice the process is to others”
● Being nice to everyone will starve your process
● Niceness can be negative (but only with root)
95
96. Thread priorities in Linux (cont.)
● Why not setpriority(2)?
● Also sets scheduling algorithm here be dragons→
● Priority values have different meaning per scheduler
● Still needs root
● What about capabilities(7)?
● This might actually work if your users trust you
● Demo code: is.gd/GDCE14Linux
96
97. Thread priorities in Linux (cont.)
● Don't all threads in a process share
niceness?
● They should, according to POSIX, but they
don't!
● One of the few cases where Linux is non-
compliant
97
98. Additional/new debug features
● Additional debug info: -g3
● Including #defines (macros)
● Better debugger performance [GNU02]:
● -fdebug-types-section: improved layout
● -gpubnames: new format for index
98
Notas do Editor
Hi everyone, my name is Leszek Godlewski and I&apos;m here today to talk about some of the more advanced aspects of Linux game programming.
Or, porting your game engine to Linux, to be more precise.
These notes will only supplement the slides, they are not a transcript.
I&apos;m a programmer with Nordic Games. A bit unusual role with the company since it&apos;s mostly a publisher, but I&apos;m the resident Linux guy and generalist programmer where necessary.
I&apos;ve got three ports under my belt: Painkiller Hell & Damnation and Deadfall Adventures for The Farm 51, and Darksiders for Nordic.
I was also a generalist programmer at TF51 on the original teams for those two games.
Who can guess what these are, apart from the obvious ones? :)
Answers, clockwise from top left: GDB, Valgrind, apitrace (doh), Nsight (doh!), GPU PerfStudio, VOGL (doh...).
The focus of this talk lies in these 4 areas. I&apos;d like to share my findings about optimizing native code builds, signal (crash) handling, jump on to memory debugging stuff using the Valgrind tool, and finish off with OpenGL debugging techniques. There are some bonus slides if there is still time, too.
Actually, let&apos;s adopt this as the agenda. :) The upcoming element will be in bold.
This is based on the two UE3 titles, but also applied to Darksiders to some extent (there was no Mac toolchain, had to make one from scratch). It displayed the same problems.
As much as I would love 32-bit to die, I had to support it to achieve full parity with Windows.
A chroot is a bit like a VM – a separate, sandboxed view of the entire system environment (i.e. file system), but running on the same kernel.
Host system is Debian amd64.
Fortunately, I could get rid of that chroot jail thanks to gcc-multilib.
“Did you mean x?” and colours for the win.
C object files are unaffected.
Clang seems to have slightly different C++ symbol name generation rules (pre-mangling).
Officially in binutils → ships as an optional part of default toolchain on all modern distros.
And not only circular – sometimes it&apos;s tricky to even get regular dependencies in the right order.
Circular actually happened to me with the PhysX 2.8.4 loader. That was even worse because even though it seemed to have linked, the dynamic linker at runtime still failed to resolve everything...
Here&apos;s how this reverse dependency thing works: you first declare the dependent object, then its dependency. Starting the construction of a building from the roof, if you will.
First, an “app” object file is linked. It has unsatisfied imports for symbols from library A.
Library A&apos;s exports satisfy the imports of app, but they import symbols from B.
Library B&apos;s exports match library A&apos;s imports, but B now has unsatisfied imports of A&apos;s symbols.
By re-specifying A, the linker satisfies the A imports in library B.
This is your typical GDB session start at the beginning of porting.
Old linkers need some shell script magic – run GDB in batch mode etc.
Here&apos;s what I had been using before discovering --gdb-index:
@echo Generating gdb-index in $(OUTPUT_PATH)/gdb-index/$(notdir $@).gdb-index
@mkdir -p $(OUTPUT_PATH)/gdb-index
@$(GDB) -batch -ex &quot;save gdb-index $(OUTPUT_PATH)/gdb-index&quot; $@
@echo Merging gdb-index into $@
@$(OBJCOPY) --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$(notdir $@).gdb-index --set-section-flags .gdb_index=readonly $@ $@
A program can possibly resume normal execution after a signal.
Signal handler can do different things by default: terminate, dump core or stop (suspend).
I&apos;ll talk about RLIMIT_CORE a bit later.
It&apos;s actually preferable to perform a clean exit (instead of aborting) in response to SIGTERM and SIGQUIT.
These fields are a good reason for using the extended information handlers (with siginfo_t), as opposed to the traditional signal(2) calls with void sighandler(int) handlers.
si_code is an enumeration, not a bit field. Includes reasons general (sent by user, kernel, async I/O operation etc.) and specific to the signal (e.g. whether SIGILL was caused by invalid opcode, operand or addressing mode etc.).
Not all signals are interesting.
SIGTRAP is a programmatical breakpoint (you can also raise(SIGTRAP) from code to have the debugger stop). It&apos;s up to you to decide how to handle that.
SIGCHLD is only of note if you plan on spawning sub-processes. Similarly for SIGPIPE, if you&apos;re interested in notifications for the other end breaking a FIFO pipe.
In reality, this is slightly more complex than that because of masking and deferring of signals. See sigaction(2) for details. Worth noting that by default, further signalling of the same signal type will be deferred until the return of the handler, unless you specify SA_NODEFER or explicitly unmask that signal.
Ben-Yossef recommends using a spinlock, but warns of deadlocking if contending against a thread with higher priority. In that case it&apos;s better to just try locking and sleep with pselect(), but that&apos;s a overkill in most cases.
What if your code is interrupted right after grabbing the lock?
Unless your mutex is re-entrant and the lock happened on the signalled thread, of course.
Speaking of re-entrant…
Allocator&apos;s data structures may have been damaged in the fault.
When using sigaction(2) to install your handler, you will need a stash of struct sigaction in static memory and populate it when installing your handler.
You might not be able to bump RLIMIT_CORE due to permissions. You may also want to limit it, as core dumps are often huge because they include a dump of all the mapped memory tranges. However, smaller core size → truncation → not all structures/ranges can be dumped → dump may actually be useless (stack not in the dumped memory range). Important to get a stack trace in the handler anyway.
The raise(3) call may not work unless you unmask the signal. This is demonstrated in the code example.
Also worth noting that apps are not really in control of core dump filename (/proc/sys/kernel/core_pattern or sysctl).
The backtrace() function is easy to use, just pass it a static array of void pointers and its size.
Personally, I haven&apos;t had any problems with just calling backtrace_symbols() and calling it a day, but I imagine it may be a bad idea in memory-constrained situations or when nested signalling happens.
Pipe communication protocol up to the developer.
One such case was not setting the right OpenGL pixel packing/unpacking rules when uploading textures, resulting in reading beyond the buffer.
Memcheck doesn&apos;t watch the stack or static variables, only the heap. SGcheck – the reverse.
Valgrind will also watch out for allocations made by the driver.
This very bad code makes several mistakes:
Branch on uninitialized variable
Overwrite memory past malloced area
Read past malloced area
Make a syscall with uninitialized memory as a parameter (return code)
Leak memory allocated to ptr2.
Actual demo code in repo also has an example for use-after-free.
Memcheck easily identifies all the bugs. Actual output includes call stacks, both for when the errors occur and for the allocation/freeing of the relevant memory chunk.
Running with --leak-check=full also pinpoints the source of the leak.
Valgrind has no way of knowing about individual allocations and frees from the pool. Thus it won&apos;t protect control structures etc.
Worth noting that mmap() calls for mapping large address ranges will cause Valgrind to issue a warning (not sure about the threshold value, but a 512MB heap does trigger it).
Client requests have negligible performance impact and are no-ops if Valgrind is not present.
This instrumentation could be improved by using the MEMPOOL family of client requests instead, but it&apos;s still quite verbose and useful.
Pointer validation could be used in garbage collectors, for instance.
I&apos;m sure you&apos;ll find a ton of other uses for making sure some data stays untouched.
I will deliberately not speak of IHV tools because they are complex enough to warrant separate talks. I&apos;ll leave that to the IHV&apos;s dev relations teams. :)
No longer does, because we no longer need to do that.
The extension spec says that it&apos;s only suggested that driver implementers make it available only in debug contexts. It might just work in a regular one with your driver, but better stay on the safe side.
State of Mesa at the time of writing this, on my ThinkPad T500 with Debian amd64 testing.
You don&apos;t get just the message, but also some metadata and a user pointer.
IDs are unregulated, so drivers may use them as they see fit. They will definitely vary from vendor to vendor.
You can also filter the messages programatically with ifs.
Intel is pretty verbose by default; NVIDIA needs setting all channels to GL_DONT_CARE to spew out more performance hints.
It&apos;s a flat list. Good luck finding a faulty call in a multi-pass renderer in it.
Does not a tree view improve things?
In multi-threaded apps, look at the thread ID from which you&apos;re trying to push or pop a debug group.
Callback filtering can be achieved via glDebugMessageControl[ARB]() or programmatically. Just remember to always set source to GL_DEBUG_SOURCE_APPLICATION in your glDebugMessageInsert[ARB]() calls.
There&apos;s a little trick here – since there are no set rules about message IDs, I&apos;m casting the D3DCOLOR (32-bit pixel) to GLuint and using that as an ID.
This was extremely cool to do, because I could instantly take advantage of the existing instrumentation on the whole engine.
The reason this demo was written for such an old version of OpenGL is that it needed to run on an old mobile Intel GPU (i915 I think).
Still works nicely with KHR_debug, though.
Shout out to people who helped me (in various ways) to prepare this talk!
This originally came together as a half-joke on Twitter between those two guys.
Mind you that this would need adjustments for ES (probably going over the entire 32-bit range minus 0), as names are not guaranteed to be consecutive there, whereas desktop GL always returns the first free name.
“If brute force doesn&apos;t work, you&apos;re not applying it hard enough.”
These performance improvement features require fairly recent version of GCC and GDB (probably 4.9 and 7.0, respectively).