2013 was the year in which Linux finally got the attention of game developers; it was also the year in which my first two Linux/SteamOS ports were released. This talk will cover the learnings of one year of porting work from a programmer's point of view: DOs and DON'Ts and issues both expected and unexpected.
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
One Year of Porting - Post-mortem of two Linux/SteamOS launches
1. One Year of Porting
Post-mortem of two Linux/SteamOS launches
Leszek Godlewski
2. Who is this guy?
Leszek Godlewski
Programmer, Nordic Games (early 2014 – now)
– Unannounced project
Freelance Programmer (Sep 2013 – early 2014)
– Linux port of Painkiller Hell & Damnation
– Linux port of Deadfall Adventures
Generalist Programmer, The Farm 51 (Mar 2010 – Aug
2013)
– Painkiller Hell & Damnation, Deadfall Adventures
3. Focus
● Not sales figures
● Not business viability
● Not game-specific bugs
● Not the Steam Controller – oops! �
● Platform-specific problems
● Mistakes made & mitigation attempts
4. Agenda
● The ports
● Laying down the foundations
– Build system
– Compilers
– Linking
– Boilerplate
● Release and feedback
– User issues
– Crash handling
– GLSL shader linking
8. Facts
● Unreal Engine 3
● All major Linux distros
– SteamOS, Debian, Ubuntu, Fedora, Arch, Gentoo
● All official drivers
– NVIDIA, AMD, Intel (i965)
● Some open-source drivers
– Gallium r600, Gallium radeonsi
9. Facts
● Most of UE3 middlewares have Linux versions
– In our case: PhysX, FaceFX, Scaleform Gfx, lzopro, Bink...
● Introduced open-source middlewares
– SDL 2.x, GLEW, Steam Runtime
● UE3's build system – Unreal Build Tool
– Handles everything make does
– Written in C#, fixed up to run in Mono on Linux
● UE3's content packaging (cooking) system
– Linux target based on Mac OSX
10. Facts
● QA department unfamiliar with Linux
– Basic training was required
●
Installing & running software (including from the command line), file
permissions, driver installation, gathering system information...
– Mostly reported false positives in the beginning
● Spare time project over ~13 months
– After leaving The Farm 51 employment – contracted for further
outsourcing directly by TF51
– Occasional support from individual members of TF51 staf
●
Kudos to Piotr Bąk and Wojciech Knopf!
11. Overlap
● Noticed how a lot of work was based on OSX code?
● Happens all the time
– POSIX
– OpenGL/OpenAL
MacOS X
Linux Mobile
12. Agenda
● The ports
● Laying down the foundations
– Build system
– Compilers
– Linking
– Boilerplate
● Release and feedback
– User issues
– Crash handling
– GLSL shader linking
14. Starting point
● Epic's OpenGL 2.1 and OpenAL back-ends
– OpenGL mode somewhat functional in Windows developer builds
● Epic's Mac OSX port
– Limited test builds for Mac OSX had been made before
– Mac OSX binary builds supported via remote compiltion
– Existing Mac OSX target for game content packaging (cooking)
● Both of the above – somewhat... unfinished �
● On Windows, the games shipped 32-bit binaries only
15. Building the build tool – C# & Mono
● Patched the Unreal Build Tool to build & run on Mono in
Linux
– Mono can handle most .NET commandline apps all right
● Added support for Linux toolchains (duh)
● Fixed hardcoding of backslashes in paths
– Path.Join() instead
● Fixed regexes on large strings (C++ sources) blowing up
the stack
– Break up the string into smaller parts
16. Cross-compiling for 32/64-bit
● Yes, I agree, 32-bit should die, but one may not be
allowed to kill it
● gcc -m32/-m64 is not enough!
– Sets target code generation
– But not headers & libraries (CRT, OpenMP, libgcc etc.)
● Fixed (on Debian & friends) by installing gcc-multilib
– Dependency package for non-default architectures (i.e. i386 on an amd64
system and vice versa)
17. Clang
● Clang is faster
– gcc: 3m47s
– Clang: 3m05s
● Clang has diferent diagnostics than gcc
● Clang has C++ preprocessor macro compatibility with gcc
– Declares __GNUC__ etc.
● Clang has commandline compatibility with gcc
– Can easily switch back & forth between gcc and Clang
18. Clang - caveats
● Object files may be incompatible with gcc & fail to link
(need full rebuilds)
● gcc is more mature than Clang
– Clang has generated faulty code for me (YMMV)
● Slight inconsistencies in C++ standard strictness
– Templates
– Anonymous structs/unions
– May need to add this-> in some places
– May need to name some anonymous types
19. So – Clang or gcc?
Both:
● Clang – quick iterations during development
● gcc – final shipping binaries
20. Linking – GNU ld
● Default linker on Linux
● Ancient
● Single-threaded
● Requires specification of libraries in the order of reverse
dependency...
● We are not doomed to use it!
21. Linking – GNU gold
● Multi-threaded linker for ELF binaries
– ld: 18s
– gold: 5s
● Developed at Google, now officially part of GNU binutils
● Drop-in replacement for ld
– May need an additional parameter or toolchain setup
●
clang++ -B/usr/lib/gold-ld ...
●
g++ -fuse-ld=gold ...
● Still needs libs in the order of reverse dependency...
22. Linking – library groups
● Major headache/game-breaker with circular dependencies
– ”Proper” fix: re-specify the same libraries over and over again
● Declare library groups instead
– Wrap library list with --start-group –end-group
●
Shorthand: -(, -)
●
g++ foo.obj -Wl,-( -lA -lB -Wl,-)
● Caveat: results in exhaustive symbol search within the group
– Manual warns of possible performance hit
– Not observed here, but keep that in mind!
23. Caching the gdb-index
● Large codebase generates heavy debug symbols (hundreds of megabytes)
● gdb generates the index for quick symbol lookup...
● ...at every single gdb startup �
– Takes several minutes for said codebases
– Massive waste of time!
● Solution: cache the index, fold it into the build process!
– Full description in the gdb manual
– gdb -batch -ex "save gdb-index $(OUTPUT_PATH)/gdb-index" $
(BINARY)
– objcopy --add-section .gdb_index=$(OUTPUT_PATH)/gdb-index/$
(BINARY).gdb-index --set-section-flags .gdb_index=readonly $
(BINARY) $(BINARY)
24. Raw X11 or SDL?
● Initially tried rolling my own boilerplate
– Basic X11 mouse, window and key press events are easy
– Unicode text input is not
– Useful windowing is not
– Correct GLX is not
– Linux joystick API is not
– Above all, X11 seems to be on its way out
●
Wayland & Mir will have emulation layers, but that's bound to have
overhead
● You really want to use SDL 2 instead, trust me
– Shameless plug: see my talk from WGK 2013 for benefits of using SDL 2 ☺
25. Agenda
● The ports
● Laying down the foundations
– Build system
– Compilers
– Linking
– Boilerplate
● Release and feedback
– User issues
– Crash handling
– GLSL shader linking
27. What we shipped initially with the beta
● 32-bit binaries (64-bit added later on)
● Launch script (~20 lines)
– Architecture detection
●
Initially a stub for 64-bit with fallback to 32-bit
– Steam Runtime injection (if not already present)
● That's about it ☺
● Explicit dependency on the Steam Runtime
– Allows shifting some responsibility to Valve
– And, admittedly, to users who insist on using their own dependencies
28. User issues
● Missing/incompatbile libraries
– Resulting from disabling the Steam Runtime
●
Gentoo users, mostly... Maintainer of steam package had chosen to disable
it by default
– Usually fixed by force-starting Steam with STEAM_RUNTIME=1
●
$ STEAM_RUNTIME=1 steam
● ”Missing” 32-bit NVIDIA OpenGL libraries on 64-bit systems
– Apparently, they might end up unreachable by the dynamic linker
– Fixed by adding /usr/lib32 to LD_LIBRARY_PATH in the launch script
– Also, prompt user to make sure they did install them
●
It's an option - ”install compatibility 32-bit libraries”
29. User issues
● No support for DXT texture compression despite capable
hardware (GL_EXT_texture_compression_s3tc)
– Concerns the open-source drivers
– For legal reasons (S3/VIA patents), some distros don't ship it or install it
automatically
●
E.g. Fedora
– If extension not advertised by driver, suggest the user to install
libtxc_dxtn
●
Often a distro package, so no hassle
30. More user issues...
● Graphical glitches...
● Broken V-sync...
● Broken NVIDIA Optimus with open-source multiplexer...
● Looong & unresponsive loading times...
● A whole lot of crashes...
● Most of the above was my fault – not going to bore you
with all of this!
31. Crash handling
● Unix signals
– Asynchronous IPC notification mechanism in POSIX-compliant systems
●
Sources can be the process itself, other processes or the kernel
– Default handler terminates process & dumps core for most signals
– Can (must?!) specify custom handlers
● Get/set handlers via the sigaction(2) system call
– Handler prototype: void sa_handler(int signal,
siginfo_t *siginfo,
void *context);
● More information
– G. Ben-Yossef, Crash N' Burn: Writing Linux application fault handlers
32. Interesting siginfo_t fields
● si_errno – errno value
– Possibly more detailed error code
● si_code – reason for sending the signal
– Both general and per signal type
– Examples: issued by user, issued by kernel, illegal addressing mode, FP
over/underflow, invalid memory permissions, unmapped address etc.
● si_addr – memory location at which fault happened
– If applicable: SIGILL, SIGFPE, SIGSEGV, SIGBUS and SIGTRAP
33. Signal handler caveats
● Not safe to allocate or free heap memory!
– Fault may have corrupted the allocator's data structures
● Prone to race conditions
– Can't share locks with the main program!
●
If signalled after locking, you'll deadlock
– Can't call async-unsafe functions!
●
See manual for signal(7) for a list of safe ones
● Custom handlers do not dump core (a.k.a. minidump)
– Mitigated by restoring default handler after custom logging and re-signalling self
●
signal(signum, SIG_DFL); raise(signum);
34. Safe stack walking
● glibc provides backtrace() and friends
● Symbols are read from the dynamic symbol table
– Must pass -rdynamic to gcc/Clang to populate
● Calling backtrace_symbols() allocates heap
memory
– Not safe... ☹
– Still, can get away with it most of the time
– Proper solution involves a separate watchdog process & pipes (heap-less
backtrace_symbols_fd() call instead)
35. Long load times? Unresponsiveness?
● Profiling quickly places blame on shader linking
– OpenGL shader model operates on program objects, created by linking shader pipeline
combinations
●
Introduces lots of redundancy (see glGetProgramiv() & glGetShaderiv())
●
Drivers often defer actual compilation until ”link time”
●
Increased memory consumption
– UE3 OpenGL renderer blocks the render thread for linking
●
Render thread blocked → Frozen loading screen!
– Both games have thousands of shaders
●
An awful lot of vertex/fragment shader combinations (programs) �
● Moreover – makes async level streaming blocking!
– Bad stuttering during gameplay
● Situation better on subsequent loads on NVIDIA due to in-driver cache
36. Shader linking
● Short-term fix: background shader linking
– Worker thread with a separate OpenGL context, sharing data with the main one
– Queue all shader link jobs, execute on the worker only
– If on a loading screen, keep spinning it while waiting for the shaders
– Defer ”async streaming done” notifications till shader link queue is empty
● Pros:
– Quick & easy to implement
– Fixes gameplay stuttering
● Cons:
– Only fixes unresponsiveness, not the long load times ☹
37. Shader linking
● Disaster on the official AMD Catalyst driver!
– Total system hang (PC needs hard reset)
– Apparently, exposed a race condition in AMD driver
– AMD has yet to ship the fix...
● Fallback to old, blocking code path if Catalyst detected
38. Shader linking
● Possible improvement (suggested by Epic):
ARB_separate_shader_objects
– Replaces programs (and linking) with much lighter pipeline objects
●
Removes a lot of redundancy
– Makes use of separate vertex/fragment shaders (D3D-like)
– Would play well with UE3's RHI, modelled mostly after D3D
● Not implemented ☹ – requires shader syntax upgrade
and a refactor of UE3's OpenGL renderer
– Explicit locations for attributes and varyings required for SSO
– Need to bump GLSL from 1.20 (OpenGL 2.1) to at least 1.40 (OpenGL 3.1)
39. Shader linking
● Proper fix: deferred shader access
– Modern drivers queue shader compiles and links internally and process
them in a multithreaded manner
●
Official NVIDIA & AMD Catalyst
●
Open-source Mesa drivers in SteamOS (patches pushed upstream
recently by Valve)
– Kick all the jobs (i.e. create shader objects) at level load
– Do not access the objects (query, draw) until they are needed
– Not even the compile/link status! This creates a sync point!
● Not implemented ☹ – requires a considerable refactor of
UE3's OpenGL renderer
41. Takeaway 1/2
● Porting .NET-based tools to Linux is viable
● Many 32/64-bit cross-compile issues are solved with
gcc-multilib
● Switching back and forth between Clang and gcc is easy
and useful
● Link times can be greatly improved by using gold
● Caching the gdb-index improves debugging experience
● Using SDL 2 is way better than rolling your own boilerplate
42. Takeaway 2/2
● Using the Steam Runtime is good for you
● Crash handling in Linux is easy to do, tricky to get right
● OpenGL shader model is significantly diferent from D3D's
● GLSL linking is slow, so defer access if possible
● Multiple concurrent OpenGL contexts can still bite you
● Test on different GPU drivers to avoid unpleasant
surprises!
43. @ l go d l ews k i @ n o rd i c ga m e s . at
t @ T h e I n e Q u ati o n
K w w w. i n e q u ati o n . o rg
Questions?
44. F u rt h e r N o rd i c G a m e s i nfo rm ati o n :
K w ww. n o rd i c ga m e s . at
Deve l o p me nt i nfo rmati o n :
K ww w. gr i m l o re ga m e s . co m
Thank you!