For AAA games now there is a consumer expectation that the developer has a post release strategy. This strategy goes beyond just DLC content. Users expect to receive bug fixes, balancing updates, gamemode variations and constant tuning of the game experience. So how can you architect your game technology to facilitate all of this? Stewart explains the unique patching system developed for Crysis 3 Multiplayer which allowed the team to hot-patch pretty much any asset or data used by the game. He also details the supporting telemetry, server and testing infrastructure required to support this along with some interesting lessons learned.
3. CONTENTS
1.The reasoning
2.Data Patching
3.Telemetry
Asset systems, Patch paks, Multiplayer flow, Handling failure & messaging
Collection, Storage, Syncing, Analysing, Matchmaking telemetry case study
Why, What, How
4.Release-Debug
Other production mechanisms for gathering data
5.Summary
Lessons learned and future developments
6.Questions?
Over to you...
11. T200 (X360)
27th Sept
Open Beta
Jan 29th
Closed Alpha
Nov 2ndT200 (PC)
Oct 4th
T200 (PS3)
11th Oct
T200 (X360)
8th Nov
T200 (PS3)
22nd Nov
T200 (PC)
29th Nov
Because despite alphas, betas and numerous large scale tests things will still slip through the
net. The players are your most thorough QA.
The CRYSIS 3 TEST SCHEDULE T200 = EA Worldwide Tech 200
12. ... For certification failures
... On discovering copyrighted content
... When players are abusing an exploit
As A way to Deploy ASSET
FIXES RAPIDLY
13.
14. BECAUSE CERTIFICATIONCOSTS
TIME & MONEY
December 2012 JANUARY 2013 FEBRUARY 2013 MARCH
03-Dec 10-Dec 17-Dec 24-Dec 31-Dec 07-Jan 14-Jan 21-Jan 28-Jan 04-Feb 11-Feb 18-Feb 25-Feb 04-Mar 11-Mar
Open-beta liveOpen-beta cert
Final cert ReleaseRTM
Day 10 cert Day 10 live
40%Of commits
DuringCERT & RTM WERE
ASSETS& DATA
22. CRYENGINEASSET FILE SYSTEM- OVERVIEW
objects/level_specific/airport/architecture/terminal/main.cgf
Files referenced using paths
A virtual file system
Files can be loose or part of asset packages (.pak) files
Files can be stored in memory, media or HDD
Platform agnostic API
23. CRYENGINEASSET FILE SYSTEM- PAK FILES
Paks are digitally signed and encrypted in mastered builds
Antitamper mechanisms
A collection of files
These are essentially zip archives of a folder hierarchy
Paks searched in order of most recently opened
Stack based searching
24. CRYENGINEASSET FILE SYSTEM- PAK FILES
gEnv->pCryPak->OpenPak(“objects1.pak”);
gEnv->pCryPak->OpenPak(“objects2.pak”);
gEnv->pCryPak->OpenPak(“objects3.pak”);
objects1.pak
objects3.pak
objects2.pak
Search order
gEnv->pCryPak->FOpen(“objects/level_specific/airport/architecture/terminal/main.cgf”,”rbx”);
25. CRYENGINEASSET FILE SYSTEM- PAK FILES
Level loading, MPModeSwitch.pak
Some created for specific loading
Contents generally organised by type
Objects, animation, scripts, music, sounds, etc
.dds0, .chr, .cgf, .cga
Some created for streaming
26. PATCH PAKSAsimplewayto overrideANYEXISTINGASSET?
... Create a patch.pak
... Mount this new pak file
... New assets will be prioritised
Mount it last or mark with a special ‘priority’ flag
Any subsequent file requests will be serviced by these patched files first
containing updated versions of specific assets
... Patching at the asset system level
So individual game subsystems oblivious
... Only suitable for Title Updates and DLC
As we need to hardcode the loading of this pak file in a new executable
27. ON DEMAND PATCHING
... Differing lifetimes
... Separate hot/cold assets
... Risk reduction
DOWNLOADING& ApplyingPATCH PAKS TRANSPARENTLY
number of patch paks?”
Double XP Weekend vs Level setup fixes
Weapon balancing vs player stats fixes
Smaller files mean less chance of failure
“Why do we need to support a variable
28. ON DEMAND PATCHINGCRYSIS3 IMPLEMENTATIONDETAILS
Multiplayer Only
Process hidden within the transition to MP
Cache size of 2Mb (X360 only)
We already show a loading screen and re-initialise most game systems anyway
Self imposed limitations to reduced risk
Patch paks un-mounted on returning to single player
Regularly check for new updates
So that players can be informed if they need to re-enter MP
29. It all starts with a file called Permissions.xml...
ON DEMAND PATCHINGDOWNLOADPAKS INTOMEMORY OVERHTTP
31. MULTIPLAYER FLOW
User selects
Multiplayer
TCR Reqs
Login Online
Services
Download
Permissions.x
ml
Check Cache
Download
Patch1.pak
Download
Patch2.pak
Mount paks
Init Game
systems
Points of failure
32. MULTIPLAYER FLOW
TCR Reqs
TCR Requirements
Hook into existing handling
Require an extra 2Mb in save game
Cannot proceed unless allowed online
User selects
Multiplayer
Login Online
Services
How do we handle these?
Online play checks
Need extra storage to cache paks
33. MULTIPLAYER FLOW
TCR Reqs
Download
Permissions.x
ml
Check Cache
Download
Patch1.pak
Download
Patch2.pak
Failing to download
General networking failures
Bespoke networking configurations
Abort!
No patches
No telemetry
How do we handle
these?
What can go wrong?
38. 1.Isolate PLAYERSThis is basically using the same checks used to isolate people running
old builds (Retail & Development)
Client A
Version
oxA5BC
Client C
Version
oxA5BC
Server 1
Version
oxA5BC
Server 2
Version
ox3370
Client B
Version
ox3370 Client D
Version
ox3370
Version code used as a matchmaking filter &
during context establishment.
P1
P2
P1
P2
39. 1.Isolate PLAYERSXOR in the MD5s of each patch pack to create a unique version code
Client A
Version
oxA5BC
Client C
Version
oxA5BC
Server 1
Version
oxA5BC
Server 2
Version
ox3370
Client B
Version
ox3370 Client D
Version
ox3370
P1 P2
P1 P2
0x96CC
0x0100
0xA4BC
XOR
XOR
Exe
P2
P1
0x3370Matchmaking
=
41. DATAPATCHING FUTURE DEVELOPMENTSASSET DELTAs
Full file must be deployed for small modification
Text based assets
XML & LUA Files can easily have a delta injected after assets loading
Some of our XML files can be up to 500Kb in size
Regularly check for new updates
42. DATAPATCHING FUTURE DEVELOPMENTSASSET DELTAs
Patch XML Nodes
More complicated but huge savings
Extra tools & build steps required but xml patches reduced in size to 1-2% of original
Add, remove or modify at a node level
43. Current permissions.xml end-point fixed
Need a way to redirect the request externally
Added bonus
Using build-version, SKU-ID, Tags etc
Could use this to patch net-tests, fix dev builds etc
This makes testing new patches difficult
DATAPATCHING FUTURE DEVELOPMENTSRe-DIRECT HTTP REQUESTS
44. Some patches are not gameplay critical
Exclude these from any filtering
Basically, do not XOR this packs MD5 into the matchmaking version
For example cosmetic asset changes or players personal stats configurations
0xA4BC
XOR
XOR
Exe
P2
P1
0xA5BCMatchmaking
=
0x96CC
0x0100
DATAPATCHING FUTURE DEVELOPMENTSDIFFERENTIATEGAME CHANGINGPAKS
46. TELEMETRYCOLLECTION- CLIENT OVERVIEW
Data zipped up and streamed asynchronously
Compressed and streamed
Collection and uploading via HTTP
Simple API to push data from files or memory
Fire & Forget. Upload may fail for numerous reasons
No Guarantees
47. TELEMETRYCOLLECTION- SERVER OVERVIEW
No requirements for immediate results
No complex processing on the server
Storage of files received only
Organised by date, platform and type
Any usernames & accounts salted and hashed
Anonymous data
48. TELEMETRYCOLLECTION- SYNCING DATA
Data deleted after seven days
Server data kept for fixed time period
Downloaded to Crytek servers
Rsync-ed daily to internal servers
Ultimately discarded
Analysed locally
49. TELEMETRYCOLLECTION- PROCESSING
Considered the weakest link in the chain
Manually triggered and collated
Turning raw telemetry into useful data
Achieved with a mixture of python & Excel
Optimising has never been a high priority
Processing is slow and intensive
50. “HOWDO YOU HANDLE HUNDREDS OF THOUSANDS OF
CLIENTS UPLOADING SIMILTANEOUSLY?”
So...
51. SAMPLE PLAYERSSample deterministically at the client end
User:
coolbeenz
bool shouldUpload = (Hash( username ) % denominator) < numerator;
0x12345678 0x2E8 NO
Hash % 1000 < 100 ?
52. SAMPLE PLAYERSSample deterministically at the client end
Upload Do not Upload
Select a large denominator and do not change this
Choose a numerator to give you the desired sampling ratio
100
Vary the numerator to meet changing sampling demands
This sets the amount the sampling ratio can be incremented by
E.g 100/1000 = 10%
The individual users being sampled remains consistent
coolbeenz 1000
54. CRYSIS 3 MATCHMAKINGTELEMETRY
Matchmaking one of the top 5 complaints
Find a session fast but find a good session
For consoles & PC
This essentially boils down to ping times
PC also has a quick match option as well as a server browser
Based on MyCrysis Forum feedback
QUANTIFYINGTHE BLACKBOX
Tricky to balance and impossible to predict
Requires constant re-evaluation even with adaptive algorithms
User experience feedback not good enough
You know people are not happy but why exactly?
55. Create a system which is data driven
Server side
Client Side
Used Blaze servers. Rule based, highly configurable, including relaxation criteria
The rules and times used can be configured and therefore data patched
If we are going to collect telemetry we need to be able to action a response
CRYSIS 3 MATCHMAKINGTELEMETRYSOWHERE DO WE START?
56. Q.How many times does a player matchmake?
Q.What kind of ping times do players get during that session?
Q.How long does it take a player to get into a session successfully?
Q.What is the most popular method of joining a session?
Q.What is the average matchmaking time?
CRYSIS 3 MATCHMAKINGTELEMETRYDECIDEWHAT QUESTIONS NEED ANSWERING
57. Need a solution that does not result in GB’s of data
Collect a series of timestamped events in XML
Timestamps based on a zero base time
But still want to be flexible enough to answer a range of questions
Also collect meta data for each event
But still store a server timestamp for collating multiple clients data
<AttemptConnection Method="MatchMake" Timestamp="0.000" />
“GameBrowser”
“Join Session in progress”
“Friend Invite”
“Join Squad”
CRYSIS 3 MATCHMAKINGTELEMETRYIMPLEMENTAN APPROPRIATETELEMETRY SOLUTION
58. Q.How many times does a player matchmake?
Collect time stamped events with meta data
Q.How long does it take a player to get
into a session successfully? Q.What is the most popular
method of joining a session?
Q.What is the average matchmaking time?
CRYSIS 3 MATCHMAKINGTELEMETRYIMPLEMENTAN APPROPRIATETELEMETRY SOLUTION
59. RESULTS
Matchmaking Telemetry
The most surprising result was that there were still 2 major bugs in the
client side code
Eventually this was increased to 82%
One of these was fixed with a data patch. Win!
The results were very insightful
Resulted in several iterative improvements
Initially 65% of players took less than 5 seconds to find a match
Still not perfect but there are many external factors at play
1 in 15 matchmaking requests fail
61. RESULTS
Matchmaking Telemetry
How do players join a session?
Quick Match
Join Squad - Already In Game
Join Squad - Lobby
Private Game
Join Friends Game
Server Browser
Quick Match
Join Squad - Already In Game
Join Squad - Lobby
Join Friends Game
Console
PC
62. Automate the analysis of the telemetry
Utilise A/B testing
User actions telemetry
The results change over time so results can be skewed by a different player pool
We did not collect all user action events. For example when the user backed out
Manual process meant delays in turning around changes
FUTURE DEVELOPMENTS
Matchmaking Telemetry
66. SUMMARY
Start Early
Collecting telemetry is easy
Have the ability to scale collection
Turning that into useful information is difficult
Be able to balance server load and fail safe
Think ahead, the technology involved is complex and cannot be bolted on
Make it easy to test
Dont underestimate the amount of test required in development
Automate as much as you can
Any manual elements of the system become it’s weakest point
Get buy-in from management
It is difficult to justify continued support when the returns are not directly financial