SlideShare uma empresa Scribd logo
1 de 30
Jeff Rous – Senior Developer Relations Engineer, Intel
Peter Knepley – Technical Lead, Epic Games
Bob Tellez – Technical Lead, Epic Games
Agenda
Introductions
Super sweet Fortnite* video
Scalability in Unreal Engine* 4
Profiling tools
4.19 Goodness
Wrap up and What’s Next
2
Embedded promo video
3
4
5
Target Hardware - Research
1. Gather data from as many sources as
you can for GPUs and CPUs.
2. Create tables of benchmark scores to
card/chip name.
3. Determine target benchmark scores that
include a supported % of the population.
4. Make histograms of population by
benchmark.
5. Distribute into buckets of roughly equal
size.
Target Hardware - Decisions
• For each bucket, find a popular CPU and
GPU that is near the weaker side of the
bucket’s range.
• Do the research YOURSELF. Existing
data will likely be out of date since
desktop hardware changes frequently.
• Every platform and bucket you support is
another configuration to maintain and
test.
6
7
Shadows/Lighting
• Static lighting was not possible due to the
building and destruction features.
• Dynamic lighting is expensive for low end
machines, but can be awesome on high
end.
• We use simple forward shading for Save
the World* mode, but not Battle Royale*.
• High end machines look much better with
DistanceFieldAO enabled, so we optimized
it so it can be enabled on consoles as well.
8
Render Resolution
• Resolution dramatically affects GPU
performance.
• During development we used discrete
resolutions to make comparing
performance easier.
• This was very effective, but ultimately the
end-user experience is better with a slider.
• Render resolution does not affect UI.
9
Animation (URO)/Significance Manager
• Update Rate Optimization (URO) reduces
the tick frequency of animations.
• In the engine, URO is purely distance
based. In Fortnite*, there is a budget for
characters.
• The significance manager scores players
and enemies to make sure more important
characters animate at higher rates.
• It is also used to score other things
including particle systems and levels.
10
Material Quality
• Adding material quality nodes to materials
greatly improves GPU time on low end
machines.
• Artists must maintain them.
• Adding a material quality node will triple the
number of shaders it generates.
• Try to reduce dependent texture fetching
using this node.
11
HLOD/Distance Culling
• Save the World* uses distance culling a lot
due to the densely populated levels.
• We use a shader to cause object to
animate into view. This allowed us to be
more aggressive on low end machines.
• Set a range of object sizes and cull
distances.
• Battle Royale* required long distance
visibility so we used HLODs to represent
far away geometry.
12
13
“Am I CPU bound, or GPU bound?”
It displays the overall frame time, CPU
time taken by the game thread, the
render thread and the GPU time
Unreal Tournament* target on PC is 8
ms
Fortnite* target on consoles is 16 ms
CPU Profiling
Stat Unit
14
Creates a stats file using UE4’s stat
system with both native code timings
and blueprints
Can be useful to find general
performance issues as well as one time
hitches
Opened with the Stats Viewer in UE4
Editor’s Session Frontend
Easy to find ticking Objects that should
not be ticking
CPU Profiling
Stat StartFile / Stat StopFile
15
Helps find spikes in CPU time that are
harder to find with targeted tools
Callstacks are printed to the game log
The cost of running it is typically minor
so it can be left on during internal
playtests
Used on Fortnite* to find synchronous
loads
CPU Profiling
Stat Dumphitches
16
Used to find issues like unexpected
disk I/O during gameplay
Unreal Tournament* was calling
LoadLibrary every frame and not
finding the file
Issues like that can account for large
amount of the frame time on lower end
systems
CPU Profiling
Windows Performance Recorder and Analyzer
17
Intel® VTune™ Amplifier
Intel® Vtune™ Amplifier enables deep
profiling and problem identification.
Hotspots, locks, syncs, multithreading,
even GPU data!
With 4.19, new support for event
based CPU sampling using itt_notify
framework.
Vtune™ is now free!
18
Triangle count display
Unreal Tournament* has a triangle
count budget around 5 million for low
end
DM-Underland* had a landscape mesh
that was 7 million alone
CPU Profiling
Stat RHI
19
CPU Profiling
LOD Colorization View Mode
Identify meshes with no LODs
Identify LODs with wrong transition
points
On Unreal Tournament*, helped find
rocks in the distance that had no LODs
20
Our built-in tool for displaying the GPU
time breakdown
r.ProfileGPU.ShowUI can be used to
suppress the popup window
Use different values of r.SetRes and
r.ScreenPercentage to determine if you
are vertex or pixel bound
UT switched Simple Forward Shading
for low end
GPU Profiling
ProfileGPU
21
Intel® Graphics Performance Analyzers
Use ToggleDrawEvents and
r.ShowMaterialDrawEvents commands
Frame debugging / live mode
Experiments!
22
Help track materials that may be
overbudget by visualizing their cost
Green is good
White is bad
DM-Underland* has coral foliage that is
white hot
Lowered draw distance and simplified
shader
GPU Profiling
Shader Complexity View Mode
23
Memreport –full
Generates a log file with a breakdown
of memory usage
Listtextures
Generates a log file or a csv
Keep a spreadsheet of textures each
release to watch for usage changes
Memory Profiling
Common Tools
24
Look out for
• Overly large assets
• Content that does not belong
Count lists time in map
Triangle count per asset
Unreal Tournament* modified the panel
to show LOD count for static meshes
Memory Profiling
Primitive Stats Viewer
25
Look out for
• Wrong group
• Wrong LODBias
• Uncompressed textures
• Non-mipping textures
• Bad dimensions
Memory Profiling
Texture Stats Viewer
26
Unreal Engine* 4.19 Goodness
Worker threads scale with CPU. No
more idle cores!
Cloth throughput improved ~30%.
Intel® Vtune™ Amplifier Support –
Gives deep insight into what the
engine is doing at all times. Enables
profiling of task scheduler that was
previously opaque.
4.19 is available now! Upgrade to take
advantage of these improvements!
Call To Action
Scalability is a question of quality. Make your game look as good as possible on
as many machines as you can!
Check out the docs and videos for all of our profiling tools!
Check out the Unreal* demos in the Intel and Epic booths!
27
28
Links
Intel Developer Zone (software.intel.com/gamedev/partners/unreal)
Unreal Profiling Tools (docs.unrealengine.com/en-us/Engine/Performance)
Fortnite (www.epicgames.com/fortnite/en-US/home)
Unreal Tournament (www.epicgames.com/unrealtournament)
Unreal 4.19 Optimizations (software.intel.com/en-us/articles/intel-software-
engineers-assist-with-unreal-engine-419-optimizations)
Unreal Engine 4 Optimization Guide (software.intel.com/en-us/articles/unreal-
engine-4-optimization-tutorial-part-1)
Legal Notices and Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as
any warranty arising from course of performance, course of dealing, or usage in trade.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a
non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are
available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit www.intel.com/benchmarks.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These
optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to
Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your
system hardware, software or configuration may affect your actual performance.
Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.
Forts and Fights Scaling Performance on Unreal Engine*

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Press Button, Drink Coffee : An Overview of UE4 build pipeline and maintenance
Press Button, Drink Coffee : An Overview of UE4 build pipeline and maintenancePress Button, Drink Coffee : An Overview of UE4 build pipeline and maintenance
Press Button, Drink Coffee : An Overview of UE4 build pipeline and maintenance
 
UE4における大規模背景制作事例 描画特殊表現編
UE4における大規模背景制作事例 描画特殊表現編UE4における大規模背景制作事例 描画特殊表現編
UE4における大規模背景制作事例 描画特殊表現編
 
UE4アセットリダクション手法紹介
UE4アセットリダクション手法紹介UE4アセットリダクション手法紹介
UE4アセットリダクション手法紹介
 
UE4 LODs for Optimization -Beginner-
UE4 LODs for Optimization -Beginner-UE4 LODs for Optimization -Beginner-
UE4 LODs for Optimization -Beginner-
 
UE4を用いたTPS制作事例 EDF:IR レベル構成について
UE4を用いたTPS制作事例 EDF:IR レベル構成についてUE4を用いたTPS制作事例 EDF:IR レベル構成について
UE4を用いたTPS制作事例 EDF:IR レベル構成について
 
UE4でTranslucencyやUnlitに影を落としたい!
UE4でTranslucencyやUnlitに影を落としたい!UE4でTranslucencyやUnlitに影を落としたい!
UE4でTranslucencyやUnlitに影を落としたい!
 
マジシャンズデッド ポストモーテム ~マテリアル編~ (株式会社Byking: 鈴木孝司様、成相真治様) #UE4DD
マジシャンズデッド ポストモーテム ~マテリアル編~ (株式会社Byking: 鈴木孝司様、成相真治様) #UE4DDマジシャンズデッド ポストモーテム ~マテリアル編~ (株式会社Byking: 鈴木孝司様、成相真治様) #UE4DD
マジシャンズデッド ポストモーテム ~マテリアル編~ (株式会社Byking: 鈴木孝司様、成相真治様) #UE4DD
 
実行速度の最適化のあれこれ プラス おまけ
実行速度の最適化のあれこれ プラス おまけ  実行速度の最適化のあれこれ プラス おまけ
実行速度の最適化のあれこれ プラス おまけ
 
Unreal Studioのご紹介
Unreal Studioのご紹介Unreal Studioのご紹介
Unreal Studioのご紹介
 
Unreal Engineを使用した商用タイトルで のノンフォトリアルレンダリング(NPR)事例
Unreal Engineを使用した商用タイトルで のノンフォトリアルレンダリング(NPR)事例Unreal Engineを使用した商用タイトルで のノンフォトリアルレンダリング(NPR)事例
Unreal Engineを使用した商用タイトルで のノンフォトリアルレンダリング(NPR)事例
 
大規模タイトルにおけるエフェクトマテリアル運用 (SQEX大阪: 林武尊様) #UE4DD
大規模タイトルにおけるエフェクトマテリアル運用 (SQEX大阪: 林武尊様) #UE4DD大規模タイトルにおけるエフェクトマテリアル運用 (SQEX大阪: 林武尊様) #UE4DD
大規模タイトルにおけるエフェクトマテリアル運用 (SQEX大阪: 林武尊様) #UE4DD
 
UE4プログラマー勉強会 in 大阪 -エンジンの内部挙動について
UE4プログラマー勉強会 in 大阪 -エンジンの内部挙動についてUE4プログラマー勉強会 in 大阪 -エンジンの内部挙動について
UE4プログラマー勉強会 in 大阪 -エンジンの内部挙動について
 
UE4における大規模背景制作事例 最適化ワークフロー編
UE4における大規模背景制作事例 最適化ワークフロー編UE4における大規模背景制作事例 最適化ワークフロー編
UE4における大規模背景制作事例 最適化ワークフロー編
 
UE4の色について v1.1
 UE4の色について v1.1 UE4の色について v1.1
UE4の色について v1.1
 
UE4のためのより良いゲーム設計を理解しよう!
UE4のためのより良いゲーム設計を理解しよう!UE4のためのより良いゲーム設計を理解しよう!
UE4のためのより良いゲーム設計を理解しよう!
 
Fortniteを支える技術
Fortniteを支える技術Fortniteを支える技術
Fortniteを支える技術
 
UE4を用いたTPS制作事例 EDF:IR 地球を衛る兵士の作り方
UE4を用いたTPS制作事例 EDF:IR 地球を衛る兵士の作り方UE4を用いたTPS制作事例 EDF:IR 地球を衛る兵士の作り方
UE4を用いたTPS制作事例 EDF:IR 地球を衛る兵士の作り方
 
UE4のレイトレで出来ること/出来ないこと
UE4のレイトレで出来ること/出来ないことUE4のレイトレで出来ること/出来ないこと
UE4のレイトレで出来ること/出来ないこと
 
UE4のライティング解体新書~効果的なNPRのためにライティングの仕組みを理解しよう~
UE4のライティング解体新書~効果的なNPRのためにライティングの仕組みを理解しよう~UE4のライティング解体新書~効果的なNPRのためにライティングの仕組みを理解しよう~
UE4のライティング解体新書~効果的なNPRのためにライティングの仕組みを理解しよう~
 
60fpsアクションを実現する秘訣を伝授 基礎編
60fpsアクションを実現する秘訣を伝授 基礎編60fpsアクションを実現する秘訣を伝授 基礎編
60fpsアクションを実現する秘訣を伝授 基礎編
 

Semelhante a Forts and Fights Scaling Performance on Unreal Engine*

Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД
Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОДКак выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД
Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД
Nick Turunov
 

Semelhante a Forts and Fights Scaling Performance on Unreal Engine* (20)

Accelerate Your Game Development on Android*
Accelerate Your Game Development on Android*Accelerate Your Game Development on Android*
Accelerate Your Game Development on Android*
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® Graphics
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
 
Intel Core X-seires processors
Intel Core X-seires processorsIntel Core X-seires processors
Intel Core X-seires processors
 
Intel Technologies for High Performance Computing
Intel Technologies for High Performance ComputingIntel Technologies for High Performance Computing
Intel Technologies for High Performance Computing
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
 
E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
4th gen intelcoreprocessor family
4th gen intelcoreprocessor family4th gen intelcoreprocessor family
4th gen intelcoreprocessor family
 
Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД
Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОДКак выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД
Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
The Architecture of Intel Processor Graphics: Gen 11
The Architecture of Intel Processor Graphics: Gen 11The Architecture of Intel Processor Graphics: Gen 11
The Architecture of Intel Processor Graphics: Gen 11
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor Graphics
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architectures
 
Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools
 

Mais de Intel® Software

Mais de Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Forts and Fights Scaling Performance on Unreal Engine*

  • 1. Jeff Rous – Senior Developer Relations Engineer, Intel Peter Knepley – Technical Lead, Epic Games Bob Tellez – Technical Lead, Epic Games
  • 2. Agenda Introductions Super sweet Fortnite* video Scalability in Unreal Engine* 4 Profiling tools 4.19 Goodness Wrap up and What’s Next 2
  • 4. 4
  • 5. 5 Target Hardware - Research 1. Gather data from as many sources as you can for GPUs and CPUs. 2. Create tables of benchmark scores to card/chip name. 3. Determine target benchmark scores that include a supported % of the population. 4. Make histograms of population by benchmark. 5. Distribute into buckets of roughly equal size.
  • 6. Target Hardware - Decisions • For each bucket, find a popular CPU and GPU that is near the weaker side of the bucket’s range. • Do the research YOURSELF. Existing data will likely be out of date since desktop hardware changes frequently. • Every platform and bucket you support is another configuration to maintain and test. 6
  • 7. 7 Shadows/Lighting • Static lighting was not possible due to the building and destruction features. • Dynamic lighting is expensive for low end machines, but can be awesome on high end. • We use simple forward shading for Save the World* mode, but not Battle Royale*. • High end machines look much better with DistanceFieldAO enabled, so we optimized it so it can be enabled on consoles as well.
  • 8. 8 Render Resolution • Resolution dramatically affects GPU performance. • During development we used discrete resolutions to make comparing performance easier. • This was very effective, but ultimately the end-user experience is better with a slider. • Render resolution does not affect UI.
  • 9. 9 Animation (URO)/Significance Manager • Update Rate Optimization (URO) reduces the tick frequency of animations. • In the engine, URO is purely distance based. In Fortnite*, there is a budget for characters. • The significance manager scores players and enemies to make sure more important characters animate at higher rates. • It is also used to score other things including particle systems and levels.
  • 10. 10 Material Quality • Adding material quality nodes to materials greatly improves GPU time on low end machines. • Artists must maintain them. • Adding a material quality node will triple the number of shaders it generates. • Try to reduce dependent texture fetching using this node.
  • 11. 11 HLOD/Distance Culling • Save the World* uses distance culling a lot due to the densely populated levels. • We use a shader to cause object to animate into view. This allowed us to be more aggressive on low end machines. • Set a range of object sizes and cull distances. • Battle Royale* required long distance visibility so we used HLODs to represent far away geometry.
  • 12. 12
  • 13. 13 “Am I CPU bound, or GPU bound?” It displays the overall frame time, CPU time taken by the game thread, the render thread and the GPU time Unreal Tournament* target on PC is 8 ms Fortnite* target on consoles is 16 ms CPU Profiling Stat Unit
  • 14. 14 Creates a stats file using UE4’s stat system with both native code timings and blueprints Can be useful to find general performance issues as well as one time hitches Opened with the Stats Viewer in UE4 Editor’s Session Frontend Easy to find ticking Objects that should not be ticking CPU Profiling Stat StartFile / Stat StopFile
  • 15. 15 Helps find spikes in CPU time that are harder to find with targeted tools Callstacks are printed to the game log The cost of running it is typically minor so it can be left on during internal playtests Used on Fortnite* to find synchronous loads CPU Profiling Stat Dumphitches
  • 16. 16 Used to find issues like unexpected disk I/O during gameplay Unreal Tournament* was calling LoadLibrary every frame and not finding the file Issues like that can account for large amount of the frame time on lower end systems CPU Profiling Windows Performance Recorder and Analyzer
  • 17. 17 Intel® VTune™ Amplifier Intel® Vtune™ Amplifier enables deep profiling and problem identification. Hotspots, locks, syncs, multithreading, even GPU data! With 4.19, new support for event based CPU sampling using itt_notify framework. Vtune™ is now free!
  • 18. 18 Triangle count display Unreal Tournament* has a triangle count budget around 5 million for low end DM-Underland* had a landscape mesh that was 7 million alone CPU Profiling Stat RHI
  • 19. 19 CPU Profiling LOD Colorization View Mode Identify meshes with no LODs Identify LODs with wrong transition points On Unreal Tournament*, helped find rocks in the distance that had no LODs
  • 20. 20 Our built-in tool for displaying the GPU time breakdown r.ProfileGPU.ShowUI can be used to suppress the popup window Use different values of r.SetRes and r.ScreenPercentage to determine if you are vertex or pixel bound UT switched Simple Forward Shading for low end GPU Profiling ProfileGPU
  • 21. 21 Intel® Graphics Performance Analyzers Use ToggleDrawEvents and r.ShowMaterialDrawEvents commands Frame debugging / live mode Experiments!
  • 22. 22 Help track materials that may be overbudget by visualizing their cost Green is good White is bad DM-Underland* has coral foliage that is white hot Lowered draw distance and simplified shader GPU Profiling Shader Complexity View Mode
  • 23. 23 Memreport –full Generates a log file with a breakdown of memory usage Listtextures Generates a log file or a csv Keep a spreadsheet of textures each release to watch for usage changes Memory Profiling Common Tools
  • 24. 24 Look out for • Overly large assets • Content that does not belong Count lists time in map Triangle count per asset Unreal Tournament* modified the panel to show LOD count for static meshes Memory Profiling Primitive Stats Viewer
  • 25. 25 Look out for • Wrong group • Wrong LODBias • Uncompressed textures • Non-mipping textures • Bad dimensions Memory Profiling Texture Stats Viewer
  • 26. 26 Unreal Engine* 4.19 Goodness Worker threads scale with CPU. No more idle cores! Cloth throughput improved ~30%. Intel® Vtune™ Amplifier Support – Gives deep insight into what the engine is doing at all times. Enables profiling of task scheduler that was previously opaque. 4.19 is available now! Upgrade to take advantage of these improvements!
  • 27. Call To Action Scalability is a question of quality. Make your game look as good as possible on as many machines as you can! Check out the docs and videos for all of our profiling tools! Check out the Unreal* demos in the Intel and Epic booths! 27
  • 28. 28 Links Intel Developer Zone (software.intel.com/gamedev/partners/unreal) Unreal Profiling Tools (docs.unrealengine.com/en-us/Engine/Performance) Fortnite (www.epicgames.com/fortnite/en-US/home) Unreal Tournament (www.epicgames.com/unrealtournament) Unreal 4.19 Optimizations (software.intel.com/en-us/articles/intel-software- engineers-assist-with-unreal-engine-419-optimizations) Unreal Engine 4 Optimization Guide (software.intel.com/en-us/articles/unreal- engine-4-optimization-tutorial-part-1)
  • 29. Legal Notices and Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation.

Notas do Editor

  1. Hi everyone, thanks for coming. This is Forts and Fights: Scaling performance on Unreal Engine. Today we’re going to dig in on how Intel and Epic worked together to optimize Fortnite and Unreal Tournament. This talk is a culmination of 5 years of Intel / Epic collaboration with lots of optimization trips in between and the learnings that came from them. We’ll start off with a short video and then get into scalability, profiling tools, and a bit about the work that we’ve done on the 4.19 release. [Introductions] That brings me to the introductions. I’m Jeff Rous, a senior developer relations engineer at Intel. What my team does it work closely with folks like Epic here to optimize their games for the CPU and GPU. I’ve had the pleasure of working on Paragon, Unreal Tournament and Fortnite over the years. We also do quite a bit of engine optimization work which I’ll talk about towards the end. I’ve been with Intel for 14 years now. Hi, my name is Peter Knepley, I'm Technical Lead on Fortnite Battle Royale and previous to that I was Technical Lead on Unreal Tournament. I've worked at Epic Games for 8 years as a gameplay programmer and I've spent a lot of time profiling our games. I've shipped Gears 3, Gears Judgment, Unreal Tournament, Paragon, and Fortnite with Epic. Hey everyone! I'm Bob Tellez, and I've been working on Fortnite as a Technical Lead for about four and a half years. Previously, I've worked as an engine programmer on Unreal Engine for about two years. I've spent a lot of time profiling, tweaking, and optimizing various aspects of Fortnite and I'd love to tell you all about it! In this presentation I'm going to be sharing how we chose the target hardware for Fortnite, and what features we configured in order to both run well and look great on these machines.
  2. [Target Hardware - Research] The first thing you need to know when optimizing your game is what machines will be running it. It may be tempting to just make some gut decisions about what platforms, GPUs and CPUs are popular, but I encourage you to do some research to make the best possible decisions to make your game look as good as it can for as many people as possible. To do this you should first try to find as many sources of data about the hardware that your *potential* gamers are using. The Steam Hardware & Software Survey is a great public source of information but you may also have private information as well. For example, when evaluating Fortnite, we gathered data about those who have opened the Epic Games Launcher to play other Epic games. We also got some data from Tencent about users in different regions of the world. What you are looking for in this data is a list of unique video cards and CPUs that are actually used by a non-trivial number of people. Do the best you can. I have found that there are cards and chips that misreport their names or have inconsistent names, but they should not be very common and I just trimmed the dirty data. You should then create a table of benchmark scores for each of these GPUs and CPUs. You can find these benchmarks on the website of your choice. I used videocardbenchmark.net. Sort this table by benchmark score so you can now see how low you need to go to hit a target percentage of the population. Your target percentage might depend on the size of your project/company/budget. It can be challenging to make a modern game run on very old hardware, so make sure you are up to the task! To make something really EPIC like Fortnite, you'll probably want to support at least 90% of potential users. Trim all hardware that is below the benchmark scores for your target percentage. Now you'll need to visualize the data to divide it up appropriately into a few discrete buckets, so you'll want to make a histogram like the one shown on this slide. Play around with the histogram bin size until you have a good feeling for the population distribution. At this point you have enough information to determine your target spec machines! Try to divide the population MOSTLY equally, but feel free to move the division lines around a little to to put a popular set of cards with similar strength near the bottom of a bucket. The number of buckets to choose very much depends on how much work you want to put into supporting settings configurations. Having more buckets allows more machines to look awesome, but it can be quite costly for many people on your team to support tons of configurations. For Fortnite, we decided to have four: Low, Medium, High, and EPIC.
  3. [Target Hardware - Decisions] At this point in the process, you should now have a good idea of what people are using. It's time to choose some hardware that represents each bucket. You'll likely need to purchase this hardware, and sometimes it's hard to get a hold of older chips and cards, which is why the hardware you choose to represent each bucket will need to have been somewhat popular when it was released. While working on Fortnite, I found that old hardware tends to break so there is a good chance you'll need to buy a replacement. If your choice was popular enough, you have a good shot being able to find an exact replacement. Otherwise, you will need to change your choice which will affect your ability to compare performance between builds. You may have noticed on my previous slide that I did not show any numbers in the histogram. This is because I would like to encourage you do this research yourself! You may find some canned research available online, but desktop hardware changes very frequently so I suggest you go through this process with as fresh data as you can, and re-evaluate from time to time. Depending on the scope of your project you may only need to do this once, or if you are a live game like Fortnite you may need to do this every 6 months or so. Remember that changing your target hardware for any reason is pretty disruptive since you will need to change many settings, so I would recommend sticking with your decisions for a good while before re-evaluating. Speaking of maintainability, keep in mind the overall number of combinations of settings that you will need to support. This presentation, so far, has largely been about desktop hardware. If you plan on supporting consoles or mobile platforms, know that each of them will also have one or more buckets. All buckets will need to be tested so adding platforms greatly increases the amount of work that needs to be done. Luckily for Fortnite, the settings for "High" desktop were somewhat close to PS4 and XB1, so those consoles use these settings with some tweaks. Xbox One X and PS4 Pro use "Epic" desktop settings with tweaks. Keeping the settings similar made it easier to keep track of the quality of consoles even while testing on desktop and vice versa.
  4. [Shadows/Lighting] Once you know your target hardware, it's time to start optimizing and configuring settings. There are very many settings to tweak, but in this presentation I will only be talking about a few of the settings that have the highest impact on framerate. Let's start with shadows and lighting. One of the best things you can do is use static lighting in your game where possible. Unfortunately, in Fortnite we could not do this for pretty much anything due to the player's ability to create or destroy nearly all of the environment combined with the continuous day/night cycle. We chose to disable static lighting entirely and go fully dynamic! Dynamic lighting can be somewhat expensive for weak GPUs, but it looks and works great in our game so we looked into some options. Initially, we turned on simple forward shading for low end machines, which was a rendering mode that allowed us to have very simple lighting but skip many of the expensive parts of our deferred renderer. This is very effective for framerate and to get things working, but the visual quality was a little lacking. In Battle Royale we optimized other parts of our GPU usage and turned this feature back off, but in Save the World it is still on. In the future we intend to have it off in both modes once we find some more GPU time in Save the World. One of the biggest quality improvements to Fortnite was when we started using distance field ambient occlusion. This was initially very expensive and only enabled when using Epic settings, but we optimized it and added a lower resolution mode to allow it to be used in High settings and consoles as well. Depending on your game, you will probably want this setting enabled on the lowest bucket you can to make it look awesome. Try to make actors that move a lot not affect distance fields by setting a flag on them that disables them, and let mostly static parts of the environment be affected.
  5. [Render Resolution] Render resolution dramatically affects GPU performance in Fortnite. We use very many postprocess render targets, and some of them have somewhat pricey pixel shaders. In the early stages of the game's development, we just had a slider to control the percentage scale of your monitor you wanted your render resolution to be. This was very bad when comparing performance between machines because your monitor's supported resolution would greatly affect your framerate and people rarely report their monitor resolution when listing hardware specs. To combat this, we changed over to having discrete buttons in the setting screen to set your resolution to 480p, 700p, 1080p, and 1440p, regardless of your monitor size. We quickly learned that the "Epic" 1440p size was not enough for folks who had very fast GPUs so to trade off user experience for practical measurement, we changed "Epic" to just be full resolution of your monitor. While this made comparing Epic machines hard again, it was an acceptable compromise for a long time. Eventually we brought back a slider where you would choose the size instead of percentage because once our game was made available to far more people, the end-user convenience outweighed so practical benefit. On consoles we recently started using dynamic scaling so the game can look as good as it can given the other things happening in the scene. One last thing to keep in mind about render resolution is that it does not affect the UI for obvious reasons, so while this is still a powerful setting if you happen to have a UI heavy game this setting may do less for you than you think.
  6. [Animation (URO)/Significance Manager] So far I have been talking about scaling based on GPU performance, but let's not neglect the CPU! One of the large CPU costs in Fortnite is updating animations. The Unreal Engine provides a way to reduce the frequency of animation updates called Update Rate Optimization, or URO. By default, URO is purely based on distance which works pretty well in general cases. In Fortnite, however, we changed this behavior to use a budget for characters that is based on a score we call significance which is calculated by a significance manager. The budget only allows a certain number of characters to update at a full rate while others have reduced rates, which makes cases that involve many characters on the screen to still maintain your target framerate even if they are all close to you. This budget scales to be stricter on weaker hardware. In Fortnite, a character's score is based not only on distance to the camera, but also on screen space size. This gracefully handles cases like using a sniper scope to zoom in on a player. You can use the significance manager to score more things than just characters. The significance manager works nicely with particle effects for cases where many are used at once. It is also used in Fortnite to handle level streaming.
  7. [Material Quality] If you have a lot of complex materials, you should consider adding material quality nodes to them. Making simpler versions of materials can greatly improve GPU time on low end machines. You can use the "Shader Complexity" viewmode to try to find your worst offenders and focus on making them, and Pete will be talking about that viewmode later in the presentation. There are a couple downsides to using quality nodes. Those who work on the materials will have to maintain them, and this can be hard if you have very many. Also every time a quality node shows up in a material it triples the number of shaders the material makes. This normally is not a big deal but keep it in mind and don't go crazy with them. When you use quality nodes, you'll generally want to focus on reducing the number of instructions that are generated, but also generally expensive operations like dependent texture fetching which we found several cases of in the Fortnite terrain materials.
  8. [HLOD/Distance Culling] The last big performance settings I'm going to be talking about are Hierarchical Level of Detail (HLOD) and distance culling. In Save the World we had somewhat densely populated levels with many small actors that are all independently destroyed and this put a strain on occlusion culling. We combat this by bringing in distance culling quite a bit on weaker machines, but as expected it made it so the user could see when objects disappeared at a distance. Since Fortnite is a fun and "bouncy" game, instead of trying to hide the culling we added some shader logic to cause actors to animate in and out when culling happened. This is much less jarring and allowed us to be very aggressive. To make sure that the general shape of the level far away from you remained mostly intact, we made the cull distance of all actors in our levels based on the size of the object so that trees and buildings would be the last to cull. In Battle Royale, combat and skydiving is done over large distances and we could not be very aggressive with cull distance at all. Instead we use HLOD to represent large portions of the map, which involves a process that automatically generates a mesh representing the actors in a level. These levels are completely unloaded from memory and streamed in as the camera comes close to them. Once streamed in the HLOD mesh is removed revealing the loaded level.
  9. [Overview of Profiling] -X seconds This is a screenshot of the DM-Underland map that shipped with the latest Unreal Tournament game. The goal was to get this map running at 120 fps on discreet GPU computers and at least 30 fps on laptops with HD-4000. Any given view could be several million polys. I'm here today to talk about Intel and Epic came together to optimize this map. Hopefully you can apply any or all of these techniques on your UE4 game to make sure that it not only uses your desired amount of CPU and GPU, but also fits into memory on your platform of choice.
  10. [CPU Profiling] [stat unit] I'm going to start with CPU profiling. The first step in our profiling journey is stat unit. It displays the overall frame time which is then broken down by CPU time spent in the game and render threads and the GPU time. This helps answer the age old question "Am I CPU bound, or GPU bound?" We need to know the answer to that question to know where to start optimizing. The threads are fairly parallel so we only have to worry about the one with the longest time currently shown in stat unit. After every big change, I come back to stat unit to make sure that the frame time is going down and to know where to optimize next. On Unreal Tournament which was a PC only title, the high end target was 8ms of frame time to hit 120 frames per seconds and for HD-4000 class hardware the target was more lenient at 33 ms of frame time or 30 frames per second. On Fortnite, we're targeting 16 ms of frame time on consoles to hit 60 frames per second. For DM-Underland, the game thread time was over the render thread time, so I’m going to start there.
  11. [Stat StartFile / Stat StopFile] To measure CPU usage, I typically start with our stats file captures. The stat startfile command will tell UE4’s stat system to start grabbing timings for both native code and blueprints. This can be useful to inspect general performance issues as well as one time hitches. The capture files can be really large when measuring over long periods of time so it's not always the right tool to find random hitches though. Once we have our desired capture, the stat stopfile command will write out the stats capture. Then using the Stats Viewer in UE4 Editor’s Session Frontend, we open the capture. You will see that we get measurements of the game thread, render thread and other worker threads. UFUNCTIONs are automatically marked up in the trace, but it's possible to add manual tracing to non UFUNCTIONs. We use it extensively to find any blueprints that are ticking that should not be ticking. In Underland, some environmental items made by designers were set to tick and they showed up in the game thread of the stats capture. We set those blueprint to never tick and then rerunning the profile showed an decrease in our CPU time. This system has been quiet valuable on Battle Royale to optimize our dedicated server performance. One of the easy things to see in this profiler is when components belonging to pawns are updated. We had a lot of cosmetic only components like trail particles that were updating their positions on dedicated servers even though they were never rendered. At 100 players for Battle Royale, we need all that time back. We have code that detaches them at runtime on dedicated servers and now they are no longer showing up on the profile for dedicated servers.
  12. [stat dumphitches] When we're looking for CPU hitches without an exact repro, Stat dumphitches is my go to tool. The cost of running it is typically minor so we will leave it on during internal playtests if we're looking for a hitch. When a frame goes long in a playtest the Callstacks are printed to the game log so a programmer can look afterwards without disturbing the rest of the playtest. I recommend Launching the game with -noverifygc to cut down on garbage collection (or GC) times showing up in your dump hitch logs. GC verification won't be on during shipping so seeing it in the hitch log is not very helpful. Stat dumphitches has been very valuable on Fortnite when trying to find synchronous loads of assets that should've been preloaded or async loaded. I also used it extensively on Gears of War and Unreal Tournament. In our DM-Underland investigation, it showed hitching on the low end laptop in a tick function of a plugin that I had written.
  13. [Windows Performance Recorder and Analyzer] To further investigate the hitching, we used Microsoft Windows Performance Recorder and Windows Performance Analyzer.WPR and WPA have been helpful for finding many types of issues in our games when it comes to interacting with the rest of the computer. In the case of Unreal Tournament, we used it to find the source of some unwanted disk I/O that was really killing frame rate on a low end laptop. We had a plugin for lighting up keyboards that wanted to call LoadLibrary for a third party dll. For many machines, especially a laptop without an external keyboard, the dll doesn’t exist. I wrote some code that would retry every frame to load that dll and that caused a lot of frame rate drops on that laptop. On my high end dev machine, I never even noticed the performance hit. We used Windows Performance Recorder to find out that I was trying and failing to load that specific dll every frame. Changing that code to only try once removed the hitching and it no longer showed up in stat dump hitches or WPR.
  14. [Intel VTune] - 30 seconds VTune is Intel's CPU profiling tool. It's a good next step after Unreal’s internal profiling tools have identified the problem functions. It helps to determine thread bottlenecks, sync points and the way work is given to TaskGraph threads on the CPU. For 4.19, Intel worked closely with Epic engineers to implement support for ITT markers in Unreal Engine 4. This added much needed contextual data to the Vtune graphical visualizations and was extremely beneficial in profiling the engine’s thread scheduler for some of our other 4.19 work which I’ll talk about later. We also used VTune on Fortnite to find that we’re somewhat render thread thread bound. We’re addressing this in the game and I’ll talk more about it later. Oh by the way, VTune is now free if you hadn’t heard!
  15. [Render Thread] [Stat RHI] Once the game thread performance was under control, I checked stat unit again and now the render thread time was over the game thread time. Stat RHI is my first stop when profiling the render thread. It has a lot of good info, but most importantly for me it has the number of triangles drawn. This can help narrow down which portions of the maps are over the triangle budget. On Unreal Tournament, I was very particular about the poly count because on our target HD 4000, we had to keep the polygon budget around 5 million to get a good framerate. When we started profiling DM-Underland to run on HD-4000 machines, we noticed that the polycount with no characters was over 7 million and sometimes up to 10 million. Our first hunch was that the landscape might have too many polys. We found that the tesselation of the landscape piece was set very high and it was using 5 million triangles on its own. We got the level designer to change the section size from 255x255 to 63x63. This dropped the landscape to well below 1 million triangles. The level designer had to repaint a few bits of the landscape to make up for the resolution change.
  16. [LOD Colorization view mode] Even after all that savings on landscape, we still had too many polys. The next tool that I used was the level of detail (or LOD) colorization view. When that viewmode is enabled, instead of being textured and lit, meshes at LOD 0 will be gray, LOD 1 will be green, LOD 2 red, LOD 3 blue. If you look at the screenshot on the left, the background above the play area is quite gray. The rocks there are also used around the map 47 times. Their top LOD is 9272 triangles so we're at about half a million triangles in just that rock mesh. Luckily, the editor has automatic LOD generation built in so I was able to create LODs all the way down to 184 polys without any artist help. If you look at the screenshot on the right, now the rocks are red showing that it has LODs and it’s currently rendering LOD 3. Using the same technique looking around the map, I was able to identify a bunch of other rock meshes that needed LOD creation and I was able to get the poly count within the five million poly budget.
  17. [GPU Profiling] [ProfileGPU] Once we're good on the CPU side, we moved on to the GPU. I like to use ProfileGPU which is our built-in tool for displaying the GPU time breakdown. r.ProfileGPU.ShowUI can be used to suppress the popup window and only print to the log, but typically I use the GUI version. I used different resolutions and screen percentages in the same scene with multiple runs of ProfileGPU to determine if weÕre able to hit frame rate targets with low pixel counts. The frame rate did increase a bit with resolution decrease, so we were pixel bound. But even with low resolution and low screen percentage, we were still having trouble making the desired framerate with the deferred renderer. We ended up using our Simple Forward Shading when a player chooses low settings. The GPU profiles were much more favorable after switching renderers, but it comes at the cost of visual fidelity. It was fine for Unreal Tournament, but for Fortnite Battle Royale, we decided that it was going to provide too much advantage and changed the look of the game too much. The perf gains alone were not worth it so we still use the deferred renderer on FN:BR for low settings.
  18. Intel GPA is a tool that helps developers identify where their apps are slow on Intel graphics. Contains both a live mode and a frame debugger. These help narrow down whether you’re bottlenecked in shadows, geometry, post processing etc. ToggleDrawEvents is a console command in UE4 that turns on annotations to help identify where in the scene you are. r.ShowMaterialDrawEvents will mark up each draw call with the material name so you can tie it back to your blueprints. Both of these are super helpful for identifying expensive parts of the scene like landscape in both Fortnite and Unreal Tournament, which we’ll talk about a bit later. Here at Intel, GPA is our bread and butter tool to profile games on Intel Graphics and identify where targeted optimizations can be made. It also works with other hardware, although you won’t get the same depth of hardware data that you will on Intel. We used it to profile both Unreal Tournament and Fortnite and was instrumental in identifying things like the landscape tessellation issue on the Underland map Pete mentioned before.
  19. [Shader Complexity View Mode] Even with simple forward shading, some areas of the map still had framerate issues due to overly complex materials. To find those complex materials, I used the shader complexity viewmode. This view mode shows good materials in green and bad materials in white. On the DM-Underland map in UT, we used it to identify a couple hot spots. The underwater area in general was expensive because of the over draw on the transparent water, but there were some areas that were showing up white hot. It turns out that there's some very expensive coral foliage at the bottom of the lake. You can barely see on high end machines from the normal play area and almost never see on low end. It's also an area that doesn't really see that much game play so toning it down wouldn't effect the overall scene. We ended up lowering the draw distance on that foliage and simplifying the shader on low detail to get back from frame time and back into budget.
  20. [Memory Profiling Tools] Once we were done with CPU and GPU optimizations, we moved on to memory optimizations. Some platforms like consoles have hard memory caps, they will crash when you use too much memory. Others like low end PC have soft memory caps, they have virtual memory once you hit the physical ram limit. Hitting the soft memory cap can cause compressed memory or virtual memory paging which will kill your performance. I use the Memreport -full console command to get a list of everything in memory. It generates a text file that contains information about all the static meshes, skeletal meshes, sounds and textures that are loaded. Listtextures is included in the memreport, but I typically will have QA run it on it’s own regularly once I’ve already done a pass on other memory. -csv can be used to make it easier to import into a spreadsheet which makes keeping track of memory trends in textures easy. I’ll do passes through the spreadsheet to look for textures in the wrong group or using too much memory due to forced sizes. We keep spreadsheets of texture usage for every release that way any major swing in texture memory can be investigated without a ton of effort.
  21. [Primitive Stats] The editor's primitive statistics panel is another tool that I use when trying to optimize memory. I can sort by size in memory and see if anything is an outlier. It's also a good place to look for assets that don't belong in the current scene, but are getting loaded anyway. On Fortnite Battle Royale, we use it to watch for any assets from Save the World that might be getting loaded accidentally. The primitive stats also lists triangle count which makes it useful for trying to optimize the number of triangles in the scene, in the case of the Landscape in DM-Underland that used to be over 5 million triangles, you can see that it's now only 127 thousand triangles. The count stat can help you decide if you have too many unique hero meshes and would be able to save some memory by duplicating some. On Unreal Tournament, I modified the panel to show LOD count to help deal with the issues we talked about before with rocks that had no LODs.
  22. [Texture Stats] The statistics panel also had a mode that shows textures. It's similar to the view that memreport provides but can be refreshed in real time. In UE4, the texture pool saves us from having to worry about being pushed over memory limits, but it's in our best interest to make sure the texture pool is being used optimally. The tighter we can make our texture pool, the more space we have for other things. I like to use the texture statistics panel to verify that textures are all in the correct group, are properly mipping, power of 2 dimensions and have the right LODBias. On Unreal Tournament and Gears of War, we would routinely have cinematic sized textures showing up during gameplay so we needed to keep a close watch on this list. Another common mixup is normal maps ending up in World or Character instead of WorldNormalMap or CharacterNormalMap.[Summary]Using the techniques I have described, we achieved our goal of 30 frames per second on an HD-4000 in DM-Underland. We’ve also applied them to get to 60 frames per second on Fortnite console builds. Our next optimization target is getting Fortnite Battle Royale dedicated servers up to a steady 20 hz and then hopefully 30 hz. We have a lot of optimizations coming in Unreal Engine 4.20, but many of these optimizations are already in 4.19.
  23. Going back to Bob’s section about having performance buckets, we often have high end CPUs where a lot of the time is idle. For 4.19, we added some great stuff for developers to take advantage of. Prior to 4.19, Unreal Engine 4 does not create enough worker threads to fully utilize a CPU beyond 6 cores. This has been fixed to allow the Task Graph system to detect the number of cores on a CPU and scale the number of worker threads available accordingly. This lets developers take full advantage of high core count CPUs, creating more visual realism through systems such as cloth physics, environment destruction, CPU based particles and advanced 3D audio. The cloth system allows for dynamic simulation of meshes that respond to the player, wind or other environmental factors. Typical cloth workloads include player capes or flags. Cloth is simulated every frame, even if the player is not looking at it because the simulation results determine if it shows up in the player's view. Improved performance by about 30% in 4.19. Vtune is an important tool to determine thread bottlenecks, sync points and the effectiveness of a thread scheduler. Worked closely with Epic engineers to implement support for ITT markers in Unreal Engine 4. This adds much needed contextual data to the Vtune graphical visualizations. The picture on the right was taken with a test level with an absurd amount of cloth rendering. This was run on an i7-6950X 10 core 20 thread extreme edition CPU. Now you can make use of all of that CPU power in your games too! Looking forward, Fortnite* is focusing on consistent framerate on console and dedicated server. RHI thread in DX11 is being enabled for extra headroom. Take advantage of all system resources in your games. CPU is often overlooked but can add some great eye candy if extra cycles are available, especially with 4.19.