Accelerate Game Development and Enhance Game Experience with Intel® Optane™ Technology
1. ACCELERATE GAME DEVELOPMENT AND ENHANCE GAME
EXPERIENCE WITH INTEL® OPTANE™ TECHNOLOGY
SEAN TRACY, CLOUD IMPERIUM GAMES
ALEJANDRO HOYOS, INTEL CORPORATION
NATHAN FREIDIG, INTEL CORPORATION
2. 2018
• Overview of Game Developing Findings From Using Intel® Optane™
Technology
• What is Intel® Optane™ Technology
• How it translated to Game development
• Intel® Optane™ Technology
• Configurations for Client
• Usages
• Benefits
• Game Developing Findings From Using Intel® Optane™ Technology
• End User Gaming Optimization
• Optimization for Game Development
• Accelerating data build process with Intel® Optane™
Technology
• Questions & Answers
AGENDA
3. 2018
GAME DEVELOPING FINDINGS FROM USING
INTEL® OPTANE™ TECHNOLOGY
How it translated to game
development & user experience
• Using Parallelism/Multithreading, improved
performance of tasks internal to the game
engine directly benefitting players and
developers.
• Multiple concurrent file reads during
streaming make for faster loading and
streaming.
• Data streaming during gameplay more
efficiently, resulting in a smoother
experience.
• Faster export, copy and loading improving
developer quality of life.
INTEL® OPTANE™ SSD 900P
3D XPOINT™
MEMORY
MEDIA
Bit Addressable
Ultra-Low Latency
Performance Consistency
High Endurance
Revolutionary Material
What is Intel® Optane™
Technology?
5. 2018
INTEL® OPTANE™ TECHNOLOGY FOR CLIENT
CONFIGURATIONS
Intel® Optane™
Memory
Hard Drive
or SSD
Intel®
Core
™
DD
R
DRAM
PCIe*
x2
SATA
One storage
volume seen
by the OS
INTEL® OPTANE™
MEMORY
Intel®
Core
™
DD
R
Intel® Optane™ SSD
900P
PCIe* x4
PCIe/SAT
A
Intel® Optane™ SSD
800P
PCIe
*
x2
Intel® 3D NAND SSDs for Client
PCH
CPU
X2PCIe
BOOTOSDrive
800P
INTEL® OPTANE™
STORAGE
DRAM
*Other names and brands may be claimed as the property of others
6. 2018
ENGINEERING
WORKLOADS
• Workloads with large files and data sets
• Workloads that constantly page
• Processor is idle waiting for the drive
• Project is split across many files
• Applications performing simultaneous read/write
• High write workloads that require higher disk endurance
GAMING/MEDIA
CREATION
MEDICAL/
RESEARCH
NEW
POSSIBILITIES
USAGE
S
7. 2018
1. Queue depth break down done by AnandTech for common applications https://www.anandtech.com/show/9090/intel-ssd-750-pcie-ssd-review-nvme-for-the-client/3 by Kristian Vättö on April 2, 2015
BENEFITS
Queue depth: It is the number
of Input/Output requests that
can be kept waiting to be
serviced.
82% Of Common Workstation
Workloads Happen Between
Queue Depth 1-5!
*
*
*
*
1
Queue depth break down on most common workloads
Queue Depth
DiskOperation(IOs)
*
*Other names and brands may be claimed as the property of others
8. 2018
Throughput(MB/s)
4kB RANDOM READ 1
6X
1. Data and simulations collected and ran by AnandTech ”The Intel Optane SSD 900p 480GB Review: Diving Deeper Into 3D Xpoint” by Billy Tallis on December 15, 2017. System configuration: Intel Xeon e3 1240 v5, ASRock Fatal1ty E3V5 Performance gaming/OC, 32GB RAM G.SKILL Ripjaws
DDR4-2400, Graphics AMD Radeon HD5450, Win10 v1703.
BENEFITS
Queue Depth
Intel® Optane™ SSD 900P (480GB)
Top Tier PCIe Gen3 x4 based NAND SSD (512GB)
Queue Depth
Throughput(MB/s)
4kB RANDOM WRITE 1
*Other names and brands may be claimed as the property of others
9. 2018
The vast majority of workloads are random in nature Drive endurance is critical
4KB MIXED RANDOM, QUEUE DEPTH 4 1,2
6.5X
Top Tier PCIe Gen3 based
NAND
Intel® Optane™
SSD 900P
BENEFITS
Throughput(MB/s)
Queue Depth
Intel® Optane™ SSD 900P (480GB)
Top Tier PCIe Gen3 x4 based NAND SSD (512GB)
1. Data and simulations collected and ran by AnandTech ”The Intel Optane SSD 900p 480GB Review: Diving Deeper Into 3D Xpoint” by Billy
Tallis on December 15, 2017. System configuration: Intel Xeon e3 1240 v5, ASRock Fatal1ty E3V5 Performance gaming/OC, 32GB RAM G.SKILL
Ripjaws DDR4-2400, Graphics AMD Radeon HD5450, Win10 v1703.
2. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to
as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system. Tests document
performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase.
3. Based upon the spec sheet of Samsung’s 960 Pro 512GB NAND SSD with an endurance of 400TB Written vs. the spec
sheet of Intel® Optane™ SSD 900P 480GB with and endurance of 8760GB written.
TOTAL WRITE ENDURANCE – TERABYTES WRITTEN (TBW) 2,2
22x x1=
*Other names and brands may be claimed as the property of others
10. 2018
• Faster rendering time2
• Faster loading times1
• Run larger workloads
• Quicker system response1
Sorry, swordplay time has been
reduced!
1. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX Titan X, Intel Optane SSD 140 GB VS. Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS
motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA GeForce GTX Titan X, Samsung 850 Pro* 256GB. Test: launch and load of project files. Software: Houdini* version 16, Adobe After Effects*, Autodesk Maya*. Test: Performed by Digital Dimensions and Intel, opening closing different
applications, loading several applications workloads at the same time, switching applications
2. System Configuration: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 3504 Mhz, 6 Core(s), 12 Logical Processor(s), (RAM) 64.0 GB, NVIDIA® GeForce® 1080, Intel Optane 900P 280GB, Samsung 850 PRO SSD 1TB, WDC 3TB Mechanical 7200 RPM, WDC 6TB Mechanical 5200 RPM
3. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system. Tests document performance of
components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.
BENEFITS
By xkcd - https://xkcd.com/303/
*Other names and brands may be claimed as the property of others
11. 2018
Fluid Dynamics:
7 secs animation
168 frames
1.1 Billion water particles
Rendering a 7-second
scene of a maelstrom went
from 17.4Hrs to 6.3Hrs by
switching to an Intel®
Optane™ SSD4, 5
Faster
particle
rendering1,5
Increase
processor
utilization2,5
• Remove the storage bottleneck, with Intel®
Optane™ SSDs to improve multi-core processor
utilization2,5
• Greatly improves system responsiveness3,5
• Dramatically decreases application and
project load times3,5
4, 5
4, 5
1. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS* motherboard, Corsair Dominator* 8GB DDR4 3200MHz, NVIDIA GeForce* GTX Titan X, Intel® Optane™ SSD 140 GB VS. Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard,
Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX Titan X, Intel 750 SSD 1.2 TB. Software: Houdini* 16 Workload: Proprietary fireball rendering. Test: Performed by Digital Dimensions and Intel, Rendering of Fireball by Houdini
2. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA GeForce GTX Titan X, Intel® Optane™ SSD 140 GB VS. Intel® Core™ i7-5820K 3.3 GHz (6 cores, 15 MB L3 cache), Asus x99-WS motherboard,
Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX 1080, Intel Optane SSD 140 GB Software: Houdini 16 Workload: Proprietary fireball simulation. Test: Performed by Digital Dimensions and Intel, Rendering of Fireball by Houdini
3. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX Titan X, Intel Optane SSD 140 GB VS. Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard,
Corsair Dominator 8GB DDR4 3200MHz, NVIDIA GeForce GTX Titan X, Samsung 850 Pro* 256GB. Test: launch and load of project files. Software: Houdini* version 16, Adobe After Effects*, Autodesk Maya*. Test: Performed by Digital Dimensions and Intel, opening closing different applications, loading several
applications workloads at the same time, switching applications
4. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX Titan X, Intel Optane SSD 140 GB VS. Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard,
Corsair Dominator 8GB DDR4 3200MHz, NVIDIA GeForce GTX Titan X, Samsung 850 Pro* 256GB. Test: launch and load of project files. Software: Houdini 5.
5. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system. Tests document performance of
components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.
*Other names and brands may be claimed as the property of others
13. 2018
• Using Parallelism/Multithreading we increase
performance of tasks internal to the game engine.
• Multithreading is a concept that we always have more
than one “object” to process
• More than one object to cull
• More than one particle emitter to be updated
• Multiple files to stream in
These concepts resulted in two key systems:
• Batch Updater
• Background Workers
END USER OPTIMIZATIONS
PERFORMANCE IS THE KEY FOR PLAYER EXPERIENCE.
14. 2018
• System composed of batch workers.
• Used to update batches of the same object distributed over all CPU
cores.
• Especially useful in combination with our entity component system.
• Allows relatively easy parallelization of the majority of our game
code.
• Mostly used for frame dependent work.
• Computation we need to do to show a frame.
END USER OPTIMIZATIONS
BATCH UPDATER
15. 2018
• This system is designed around single job objects.
• The idea is to run the background worker system whenever
the batch workers run out of work, to fill the remaining gaps.
• Background Workers are tightly coupled with the low level I/O
System, specifically to leverage the benefits of Intel® Optane™
SSDs!
END USER OPTIMIZATIONS
BACKGROUND WORKER SYSTEM
16. • Reads multiple files in parallel blocks utilizing async I/O Or uses a serial access
pattern, File by File on HDD.
• When a file block finishes, a free background worker wakes up.
• Decrypts and decompresses said (even if the whole file is not loaded into
memory yet).
• Due to the number of parallel files, we can work on multiple file blocks in parallel.
• We have up to 4 16KB blocks in flight.
• In the end 4 blocks transfer in parallel, up to N (number of cores) processes
in parallel.
2018
END USER OPTIMIZATIONS
COUPLING BACKGROUND WORKERS TO THE I/O
SYSTEM FOR INTEL® 3D NAND SSD + INTEL® OPTANE™
SSD
17. 2018
• Improved game loading times.
• Multiple concurrent file reads during streaming.
• Data streaming during gameplay far more efficiently.
• With Intel® Optane™ SSDs, Star Citizen* gameplay will
remain smooth.
END USER OPTIMIZATIONS
WHAT DOES THIS ALL MEAN FOR THE GAMER?
1. System Configuration: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 3504 Mhz, 6 Core(s), 12 Logical Processor(s), (RAM) 64.0 GB, NVIDIA® GeForce® 1080, Intel Optane 900P 280GB, Samsung 850 PRO SSD 1TB, WDC 3TB Mechanical 7200 RPM, WDC 6TB Mechanical 5200 RPM
19. 2018
OPTIMIZATIONS FOR GAME DEVELOPMENT
ON TOP OF IMPROVED LOAD TIMES.
• .
• Real world” data copies are faster than ever.
• Version control and “Parallel Sync” options now
available in most source control applications.
• Can also be used with internal tools to speed up
build copies or data synchronization.
• Large source file export speed benefits.
• Uncompressed Collada Export can be upwards of 7
Gigabytes for a Hero Facial Asset.
• Frame by Frame captures.
• Write to disk faster allowing for near real time play
whilst outputting frames to disk for use within demo
videos and reviews
1. System Configuration: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 3504 Mhz, 6 Core(s), 12 Logical Processor(s), (RAM) 64.0 GB, NVIDIA® GeForce® 1080, Intel Optane 900P 280GB, Samsung 850 PRO SSD 1TB, WDC 3TB Mechanical 7200 RPM, WDC 6TB Mechanical 5200 RPM
20. 2018
• 1.34TB of Data to Sync per Branch.
• Done multiple times depending on content needs.
• Done for every major release.
• Any data format changes.
• Blocks development and other builds.
• Current Configuration.
• 24x SSDs in SATA RAID 6.
• For Data Sync only takes 6 hours, 13 minutes.
• Optane Configuration.
• 4x Intel® Optane™ SSD in VROC RAID.
• For Data Sync only takes 3 hours, 30 minutes.
• 2x FASTER!
1. System Configuration: Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz 16 Cores, RAM 64GB, NVIDIA® GeForce® 780, four Intel® Optane™ SSD implemented in RAID configuration using VROC vs. Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz 16 Cores, RAM 64GB, NVIDIA® GeForce® 780,
twenty four Samsung EVO 850 1TB configured in RAID 6. Test was performed by Cloud Imperium Games.
ACCELERATING DATA BUILD PROCESS WITH INTEL® OPTANE™
TECHNOLOGY
DATA TRANSFER.
21. 2018
• End User Gaming Optimization.
• Optimization for Game Development.
• Accelerating data build process.
• Opens new doors for creating novel experiences for our players.
• Allows us to improve long standing systems and tools to take advantage of Intel® Optane™ Technology.
SUMMARY
SEEING BENEFITS EARLY ON WITH MINIMAL EFFORT.
22. 1. Queue depth break down done by AnandTech for common applications https://www.anandtech.com/show/9090/intel-ssd-750-pcie-ssd-review-nvme-for-the-client/3 by Kristian Vättö on April 2, 2015
2. Data and simulations collected and ran by AnandTech ”The Intel Optane SSD 900p 480GB Review: Diving Deeper Into 3D Xpoint” by Billy Tallis on December 15, 2017. System configuration: Intel Xeon e3 1240 v5, ASRock Fatal1ty E3V5
Performance gaming/OC, 32GB RAM G.SKILL Ripjaws DDR4-2400, Graphics AMD Radeon HD5450, Win10 v1703.
3. Based upon the spec sheet of Samsung’s 960 Pro 512GB NAND SSD with an endurance of 400TB Written vs. the spec sheet of Intel® Optane™ SSD 900P 480GB with and endurance of 8760GB written.
4. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS* motherboard, Corsair Dominator* 8GB DDR4 3200MHz, NVIDIA GeForce* GTX Titan X, Intel® Optane™ SSD 140 GB VS. Intel® Core™ i7-5960X 3.3
GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX Titan X, Intel 750 SSD 1.2 TB. Software: Houdini* 16 Workload: Proprietary fireball rendering. Test: Performed by Digital
Dimensions and Intel, Rendering of Fireball by Houdini
5. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA GeForce GTX Titan X, Intel® Optane™ SSD 140 GB VS. Intel® Core™ i7-5820K 3.3
GHz (6 cores, 15 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX 1080, Intel Optane SSD 140 GB Software: Houdini 16 Workload: Proprietary fireball simulation. Test: Performed by Digital
Dimensions and Intel, Rendering of Fireball by Houdini
6. Systems Configuration: Intel® Core™ i7-5960X 3.3 GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA® GeForce® GTX Titan X, Intel Optane SSD 140 GB VS. Intel® Core™ i7-5960X 3.3
GHz (8 cores, 20 MB L3 cache), Asus x99-WS motherboard, Corsair Dominator 8GB DDR4 3200MHz, NVIDIA GeForce GTX Titan X, Samsung 850 Pro* 256GB. Test: launch and load of project files. Software: Houdini* version 16, Adobe After
Effects*, Autodesk Maya*. Test: Performed by Digital Dimensions and Intel, opening closing different applications, loading several applications workloads at the same time, switching applications
7. System Configuration: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 3504 Mhz, 6 Core(s), 12 Logical Processor(s), (RAM) 64.0 GB, NVIDIA® GeForce® 1080, Intel Optane 900P 280GB, Samsung 850 PRO SSD 1TB, WDC 3TB Mechanical 7200
RPM, WDC 6TB Mechanical 5200 RPM
8. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to
your device or system. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate
performance as you consider your purchase.
APPENDIX
Editor's Notes
https://xkcd.com/303/
There we read multiple files (SSD+Optane) or file by file (HDD), in parallel as blocks utilizing async IO.
Whenever a file block is finished, a free background worker wakes up and immediately starts decrypting/decompressing said block (even if the whole file is not loaded into memory yet).
And due to the number of parallel files/the background worker, we can work on multiple file blocks in parallel.
We have up to 4 16 KB blocks in flight. Their "done" mechanism is hooked up with the background workers.
Whenever a block is done, a background worker wakes up to submit the next block, follow by potential block processing (slightly complex as the blocks are not done in submit order) but in the end yes, 4 files in transfer in parallel, up to N (number of cores) in processes in parallel.
NOTE:
Game level load times can in theory benefit greatly from faster read speeds, but in practice decompressing the assets after loading them into RAM quickly becomes the bottleneck. Most of the other situations where the performance advantage of the Optane SSD will really help are better described as a different kind of problem:
Multiple concurrent file reads during streaming
Gives you massive amounts of throughput potential
Data streaming during gameplay more efficiently
Removes most stalls and micro stutters
Improved game loading times
Load times over a standard SSD improved?
On high specification rigs, with maximum graphical settings, very High resolution textures and meshes
Large files fully stream quickly
Vision for What will this bring in the future - Better experiences- bringing better gaming experiences to users
Smoother higher performance experience even when dealing with large file types
With Intel® Optane™ SSDs, Star Citizen* gameplay will remain smooth
Multiple concurrent file reads during streaming
Gives you massive amounts of throughput potential
Data streaming during gameplay more efficiently
Removes most stalls and micro stutters
Improved game loading times
Load times over a standard SSD improved?
On high specification rigs, with maximum graphical settings, very High resolution textures and meshes
Large files fully stream quickly
Vision for What will this bring in the future - Better experiences- bringing better gaming experiences to users
Smoother higher performance experience even when dealing with large file types
With Intel® Optane™ SSDs, Star Citizen* gameplay will remain smooth
How is Optane benefits game development :
Deals with large data sets well
Parallel Sync options now available and should be used if possible.
Large source file export speed benefits
Uncompressed Collada Export can be 8 Gigs for a Star Citizen Facial Asset.
Frame capturing faster in fixed time step
Writing frames directly to disk doesn’t slow down the game
Improved compile times.
Star Citizens build system synchronizes upwards of 1.34TB of Data per branch. Major release, Major asset updates to a given Asset Host.
This synchronization in our current setup, which is 24x SSDs over Sata in RAID6 configuration requires 6h13m of pure Data Xfer time.
Having run very recent tests, as early as last week, using Intel Optane technology this can be massively improved.
In this test instead of using 24x drives we instead setup a 4x Optane drives with VROC RAID.
Doing the same data synchronization took instead 3h30m12s about a 50% reduction in time! This is a big savings.
Data Sync takes so long that it’s a massive problem….Build engineers have to do it on the weekend and makes it possible to kick it on demand and the build engineers can get more sleep
Having to always factor in a 6h (and more because this is only the sync of data) for a new branch to be created, production has to factor it in, QA everyone so the faster this is the better everyone life is.
New Config.
24x SSD in Sata RAID6 - 6h13m
Current Config
4x Optane SSD in VROC RAID - 3h30m12s
Seeing benefits early on with minimal effort
End User Gaming Optimization
Optimization for Game Development
Accelerating data build process with Intel® Optane™ Technology
Opens new doors for creating new experiences for our players
Allows us to improve long standing systems and tools to maximize the benefits of faster storage