SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
How shit works:
the CPU
Tomer Gabel
BuildStuff 2016 Lithuania
Image: Telecarlos (CC BY-SA 3.0)
Full Disclosure
Bullshit ahead!
• I’m not an expert
• Explanations may be:
– Simplified
– Inaccurate
– Wrong :-)
• We’ll barely scratch the
surface
Image: Public Domain
A CONUNDRUM?
Are you ready for…
Image: Louis Reed (CC BY-SA 4.0)
Setting the Stage
// Generate a bunch of bytes
byte[] data = new byte[32768];
new Random().nextBytes(data);
Arrays.sort(data);
// Sum positive elements
long sum = 0;
for (int i = 0; i < data.length; i++)
if (data[i] >= 0)
sum += data[i];
1. Which is faster?
2. By how much?
3. And crucially…
why?!
# Run complete. Total time: 00:00:32
Benchmark Mode Cnt Score Error Units
Baseline.sum avgt 6 115.666 ± 3.137 us/op
Presorted.sum avgt 6 13.741 ± 0.524 us/op
Surprise, Terror and Ruthless Efficiency
# Run complete. Total time: 00:00:32
Benchmark Mode Cnt Error Units
Baseline.sum avgt 6 ± 3.137 us/op
Presorted.sum avgt 6 ± 0.524 us/op
* Ignoring setup cost
CPUS ARE
COMPLEX
BEASTS.
Image: Pauli Rautakorpi (CC BY 3.0)
It Is Known
• Your high-level code…
long sum = 0;
for (i = 0; i < length; i++)
if (data[i] >= 0)
sum += data[i];
• Gets compiled down to…
movsx eax,BYTE PTR [rax+rdx*1+0x10]
cmp eax,0x0
movabs rdx,0x11f3a9f60
movabs rcx,0x128
jl 0x000000010679e077
movabs rcx,0x138
mov r8,QWORD PTR [rdx+rcx*1]
lea r8,[r8+0x1]
mov QWORD PTR [rdx+rcx*1],r8
jl 0x000000010679e092
movsxd rax,eax
add rax,rbx
mov rbx,rax
inc edi
It Is Less Known
• What happens then?
• The instruction goes through phases…
Fetch Decode Execute
Memory
Access
Write-
back
Instruction
Stream
CPU Architecture 101
Image: Appaloosa (CC BY-SA 3.0)
CPU Architecture 101
• What does a CPU do?
– Reads the program
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
– Talks to memory
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
– Talks to memory
– Performs I/O
CPU Architecture 101
• What does a CPU do?
– Reads the program
– Figures it out
– Executes it
– Talks to memory
– Performs I/O
• Immense complexity!
Execution Units
• Arithmetic-Logic Unit (ALU)
– Boolean algebra
– Arithmetic
– Memory accesses
– Flow control
• Floating Point Unit (FPU)
• Memory Management Unit (MMU)
– Memory mapping
– Paging
– Access control
Images: ALU by Dirk Oppelt (CC BY-SA 3.0), FPU by Konstantin Lanzet (CC BY-SA 3.0), MMU from unknown source
DESIGN
CONSIDERATIONS
Image: William M. Plate Jr. (Public Domain)
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
I1
I0
I2
Pipelining
Sequential Execution
Latency = 5 cycles
Throughput= 0.2 ops / cycle
Fetch Decode Execute
Memory
Access
Write-
back
I1
I0
I2
Fetch Decode Execute
Memory
Access
Fetch Decode Execute
Pipelining
Sequential Execution Pipelined Execution
Latency = 5 cycles
Throughput= 0.2 ops / cycle
Latency = 5 cycles
Throughput= 1 ops / cycle
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
Fetch Decode Execute
Memory
Access
Write-
back
I1
I0
I2
Pipelining
• A pipeline can stall
• This happens with:
– Branches
if (i < 0) i++ else i--;
F D E M WMemory Load
F D E MTest
F D EConditional
Jump
? ????
F D E M WIncrement
memory address
F D E M
F D Stall
F D
Load from
memory
Add +1
Store in
memory
Pipelining
• A pipeline can stall
• This happens with:
– Branches
– Dependent Instructions
• A.K.A pipeline bubbling
i++;
x = i + 1;
Stall
PRACTICAL
RAMIFICATIONS
Image: Hangsna (CC BY-SA 3.0)
1. Memory is Slow
• RAM access is ~60ns
• Random access on a
4GHz, 64-bit CPU:
– 250 cycles / memory access
– 130MB / second bandwidth
• Surely we can do better!
Image: Noah Wieder (Public Domain)
Source: 7-cpu.com
Enter: CPU Cache
Level Size Latency
L1 32KB + 32KB 1ns
L2 256KB 3ns
L3 4MB 11ns
Main Memory 62ns
Intel i7-6700 “Skylake” at 4 GHz
Image: Ferry24.Milan (CC BY-SA 3.0)
Source: 7-cpu.com
Enter: CPU Cache
• A unit of work is
called cache line
– 64 bytes on x86
– LRU eviction policy
• Why is sequential
access fast?
– Cache prefetching
In Real Life
• Let’s rotate an image!
for (y = 0; y < height; y++)
for (x = 0; x < width; x++) {
int from = y * width + x;
int to = x * height + y;
target[to] = source[from];
}
Image: EgoAltere (CC0 Public Domain)
In Real Life
• This is not efficient
• Reads are sequential
0 1 2 3 ... 9
0
1
2
3
…
9
In Real Life
• This is not efficient
• Reads are sequential
0 1 2 3 ... 9
0 0 1 2 3 … 9
1
2
3
…
9
In Real Life
• This is not efficient
• Reads are sequential
• Writes aren’t, though
• Different strides
– Worst case wins :-(
0 1 2 3 ... 9
0 0 1 2 3 … 9
1 10
2 20
3 30
… …
9 90
Cache-Friendly Algorithms
• Use blocking or tiling
for (y = 0; y < height; y += blockHeight)
for (x = 0; x < width; x += blockWidth)
for (by = 0; by < blockHeight; by++)
for (bx = 0; bx < blockWidth; bx++) {
int from = (y + by) * width + (x + bx);
int to = (x + bx) * height + (y + by);
target[to] = source[from];
}
Cache-Friendly Algorithms
• The results?
Benchmark Mode Cnt Score Error Units
CachingShowcase.transposeNaive avgt 10 43.851 ± 6.000 ms/op
CachingShowcase.transposeTiled8x8 avgt 10 20.641 ± 1.646 ms/op
CachingShowcase.transposeTiled16x16 avgt 10 18.515 ± 1.833 ms/op
CachingShowcase.transposeTiled48x48 avgt 10 21.941 ± 1.954 ms/op
• The results?
Benchmark Mode Cnt Error Units
CachingShowcase.transpose avgt 10 ± 6.000 ms/op
CachingShowcase.transpose avgt 10 ± 1.646 ms/op
CachingShowcase.transpose avgt 10 ± 1.833 ms/op
CachingShowcase.transpose avgt 10 ± 1.954 ms/op
x2.37 speedup!
2. Those Pesky Branches
• Do I go left or right?
• Need input!
• … but can’t wait for it
• Maybe...
– Take a guess?
– Based on historic trends?
• Sounds speculative
Image: Michael Dolan (CC BY 2.0)
Those Pesky Branches
• Enter: Branch Prediction
• Concurrently:
– Speculate branch
– Evaluate condition
• It’s now a tradeoff
– Commit is fast
– Rollback is slow
Image: Alejandro C. (CC BY-NC 2.0)
// Generate a bunch of bytes
byte[] data = new byte[32768];
new Random().nextBytes(data);
Arrays.sort(data);
// Sum positive elements
long sum = 0;
for (int i = 0; i < data.length; i++)
if (data[i] >= 0)
sum += data[i];
Back to Our Conundrum
• Can you guess?
– 3…
– 2...
– 1...
• Here it is!
// Generate a bunch of bytes
byte[] data = new byte[32768];
new Random().nextBytes(data);
Arrays.sort(data);
// Sum positive elements
long sum = 0;
for (int i = 0; i < data.length; i++)
if (data[i] >= 0)
sum += data[i];
Catharsis
54 10 -4 -2 15 41
-
37
13 0 -9 14 25
-
61
40
Original data array:
Catharsis
-
61
-
37
-9 -4 -2 0 10 13 14 15 25 40 41 54
After sorting:
0
data[i] >= 0
Always false!
data[i] >= 0
Always true!
QUESTIONS?
Thank you for listening
tomer@tomergabel.com
@tomerg
http://engineering.wix.com
Sources and Examples:
https://goo.gl/f7NfGT
This work is licensed under a Creative
Commons Attribution-ShareAlike 4.0
International License.
Further Reading
• Jason Robert Carey Patterson –
Modern Microprocessors, a 90-Minute Guide
• Igor Ostrovsky - Gallery of Processor Cache
Effects
• Piyush Kumar –
Cache Oblivious Algorithms

Mais conteúdo relacionado

Mais procurados

Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
GNU ld的linker script簡介
GNU ld的linker script簡介GNU ld的linker script簡介
GNU ld的linker script簡介Wen Liao
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDBLinaro
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMKris Mok
 
RSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpRSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpsonickun
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたMITSUNARI Shigeo
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewMoriyoshi Koizumi
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiCysinfo Cyber Security Community
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術Kito Cheng
 
20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチンyohhoy
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Mais procurados (20)

GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
 
The Internals of "Hello World" Program
The Internals of "Hello World" ProgramThe Internals of "Hello World" Program
The Internals of "Hello World" Program
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
GNU ld的linker script簡介
GNU ld的linker script簡介GNU ld的linker script簡介
GNU ld的linker script簡介
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
Memory model
Memory modelMemory model
Memory model
 
What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
 
Intrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VMIntrinsic Methods in HotSpot VM
Intrinsic Methods in HotSpot VM
 
RSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpRSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjp
 
Intro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみたIntro to SVE 富岳のA64FXを触ってみた
Intro to SVE 富岳のA64FXを触ってみた
 
X / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural OverviewX / DRM (Direct Rendering Manager) Architectural Overview
X / DRM (Direct Rendering Manager) Architectural Overview
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
llvm入門
llvm入門llvm入門
llvm入門
 
Three Optimization Tips for C++
Three Optimization Tips for C++Three Optimization Tips for C++
Three Optimization Tips for C++
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術
 
20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン20分くらいでわかった気分になれるC++20コルーチン
20分くらいでわかった気分になれるC++20コルーチン
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Destaque

How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: StorageTomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice StackTomer Gabel
 
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016aixigo AG
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at ScaleTomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala AdoptionTomer Gabel
 
Disturbios de aprendizagem
Disturbios de aprendizagemDisturbios de aprendizagem
Disturbios de aprendizagemBeneditaarruda
 
безсмертна пам’ять
безсмертна      пам’ятьбезсмертна      пам’ять
безсмертна пам’ятьkilobajt
 
Cualidades del personal del futuro
Cualidades del personal del futuroCualidades del personal del futuro
Cualidades del personal del futuroparc21
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesTomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP OnTomer Gabel
 
Scala in practice
Scala in practiceScala in practice
Scala in practiceTomer Gabel
 

Destaque (12)

How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
 
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
 
Four hands
Four handsFour hands
Four hands
 
Disturbios de aprendizagem
Disturbios de aprendizagemDisturbios de aprendizagem
Disturbios de aprendizagem
 
безсмертна пам’ять
безсмертна      пам’ятьбезсмертна      пам’ять
безсмертна пам’ять
 
Cualidades del personal del futuro
Cualidades del personal del futuroCualidades del personal del futuro
Cualidades del personal del futuro
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
 

Semelhante a How shit works: the CPU

HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHackito Ergo Sum
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardsonharryvanhaaren
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
Pitfalls of Object Oriented Programming
Pitfalls of Object Oriented ProgrammingPitfalls of Object Oriented Programming
Pitfalls of Object Oriented ProgrammingSlide_N
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory AnalysisMoabi.com
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesPeter Hlavaty
 
LST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchLST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchDimitry Snezhkov
 
A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010Tsukasa Oi
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War IIISlide_N
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Pythoninfodox
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Peter Hlavaty
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status ReportLiu Yuan
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performanceCodemotion
 

Semelhante a How shit works: the CPU (20)

HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Pitfalls of Object Oriented Programming
Pitfalls of Object Oriented ProgrammingPitfalls of Object Oriented Programming
Pitfalls of Object Oriented Programming
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
 
LST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchLST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, Touch
 
A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010A New Tracer for Reverse Engineering - PacSec 2010
A New Tracer for Reverse Engineering - PacSec 2010
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
 
The Quantum Physics of Java
The Quantum Physics of JavaThe Quantum Physics of Java
The Quantum Physics of Java
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
 

Mais de Tomer Gabel

How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: TimeTomer Gabel
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsTomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionTomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingTomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryTomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitTomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the WildTomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationTomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaTomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Tomer Gabel
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With ScalaTomer Gabel
 
Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Tomer Gabel
 
DevCon³: Scala Best Practices
DevCon³: Scala Best PracticesDevCon³: Scala Best Practices
DevCon³: Scala Best PracticesTomer Gabel
 
Maven for Dummies
Maven for DummiesMaven for Dummies
Maven for DummiesTomer Gabel
 
SHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case StudySHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case StudyTomer Gabel
 
The Demoscene: A cursory introduction
The Demoscene: A cursory introductionThe Demoscene: A cursory introduction
The Demoscene: A cursory introductionTomer Gabel
 
Video: What you never thought you might want to know
Video: What you never thought you might want to knowVideo: What you never thought you might want to know
Video: What you never thought you might want to knowTomer Gabel
 

Mais de Tomer Gabel (20)

How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
 
Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101Lab: JVM Production Debugging 101
Lab: JVM Production Debugging 101
 
DevCon³: Scala Best Practices
DevCon³: Scala Best PracticesDevCon³: Scala Best Practices
DevCon³: Scala Best Practices
 
Maven for Dummies
Maven for DummiesMaven for Dummies
Maven for Dummies
 
SHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case StudySHC Israel: GigaSpaces Case Study
SHC Israel: GigaSpaces Case Study
 
The Demoscene: A cursory introduction
The Demoscene: A cursory introductionThe Demoscene: A cursory introduction
The Demoscene: A cursory introduction
 
Video: What you never thought you might want to know
Video: What you never thought you might want to knowVideo: What you never thought you might want to know
Video: What you never thought you might want to know
 

Último

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 

Último (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

How shit works: the CPU

  • 1. How shit works: the CPU Tomer Gabel BuildStuff 2016 Lithuania Image: Telecarlos (CC BY-SA 3.0)
  • 2. Full Disclosure Bullshit ahead! • I’m not an expert • Explanations may be: – Simplified – Inaccurate – Wrong :-) • We’ll barely scratch the surface Image: Public Domain
  • 3. A CONUNDRUM? Are you ready for… Image: Louis Reed (CC BY-SA 4.0)
  • 4. Setting the Stage // Generate a bunch of bytes byte[] data = new byte[32768]; new Random().nextBytes(data); Arrays.sort(data); // Sum positive elements long sum = 0; for (int i = 0; i < data.length; i++) if (data[i] >= 0) sum += data[i]; 1. Which is faster? 2. By how much? 3. And crucially… why?!
  • 5. # Run complete. Total time: 00:00:32 Benchmark Mode Cnt Score Error Units Baseline.sum avgt 6 115.666 ± 3.137 us/op Presorted.sum avgt 6 13.741 ± 0.524 us/op Surprise, Terror and Ruthless Efficiency # Run complete. Total time: 00:00:32 Benchmark Mode Cnt Error Units Baseline.sum avgt 6 ± 3.137 us/op Presorted.sum avgt 6 ± 0.524 us/op * Ignoring setup cost
  • 6. CPUS ARE COMPLEX BEASTS. Image: Pauli Rautakorpi (CC BY 3.0)
  • 7. It Is Known • Your high-level code… long sum = 0; for (i = 0; i < length; i++) if (data[i] >= 0) sum += data[i]; • Gets compiled down to… movsx eax,BYTE PTR [rax+rdx*1+0x10] cmp eax,0x0 movabs rdx,0x11f3a9f60 movabs rcx,0x128 jl 0x000000010679e077 movabs rcx,0x138 mov r8,QWORD PTR [rdx+rcx*1] lea r8,[r8+0x1] mov QWORD PTR [rdx+rcx*1],r8 jl 0x000000010679e092 movsxd rax,eax add rax,rbx mov rbx,rax inc edi
  • 8. It Is Less Known • What happens then? • The instruction goes through phases… Fetch Decode Execute Memory Access Write- back Instruction Stream
  • 9. CPU Architecture 101 Image: Appaloosa (CC BY-SA 3.0)
  • 10. CPU Architecture 101 • What does a CPU do? – Reads the program
  • 11. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out
  • 12. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it
  • 13. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it – Talks to memory
  • 14. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it – Talks to memory – Performs I/O
  • 15. CPU Architecture 101 • What does a CPU do? – Reads the program – Figures it out – Executes it – Talks to memory – Performs I/O • Immense complexity!
  • 16. Execution Units • Arithmetic-Logic Unit (ALU) – Boolean algebra – Arithmetic – Memory accesses – Flow control • Floating Point Unit (FPU) • Memory Management Unit (MMU) – Memory mapping – Paging – Access control Images: ALU by Dirk Oppelt (CC BY-SA 3.0), FPU by Konstantin Lanzet (CC BY-SA 3.0), MMU from unknown source
  • 17. DESIGN CONSIDERATIONS Image: William M. Plate Jr. (Public Domain)
  • 18. Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back I1 I0 I2 Pipelining Sequential Execution Latency = 5 cycles Throughput= 0.2 ops / cycle
  • 19. Fetch Decode Execute Memory Access Write- back I1 I0 I2 Fetch Decode Execute Memory Access Fetch Decode Execute Pipelining Sequential Execution Pipelined Execution Latency = 5 cycles Throughput= 0.2 ops / cycle Latency = 5 cycles Throughput= 1 ops / cycle Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back Fetch Decode Execute Memory Access Write- back I1 I0 I2
  • 20. Pipelining • A pipeline can stall • This happens with: – Branches if (i < 0) i++ else i--; F D E M WMemory Load F D E MTest F D EConditional Jump ? ????
  • 21. F D E M WIncrement memory address F D E M F D Stall F D Load from memory Add +1 Store in memory Pipelining • A pipeline can stall • This happens with: – Branches – Dependent Instructions • A.K.A pipeline bubbling i++; x = i + 1; Stall
  • 23. 1. Memory is Slow • RAM access is ~60ns • Random access on a 4GHz, 64-bit CPU: – 250 cycles / memory access – 130MB / second bandwidth • Surely we can do better! Image: Noah Wieder (Public Domain) Source: 7-cpu.com
  • 24. Enter: CPU Cache Level Size Latency L1 32KB + 32KB 1ns L2 256KB 3ns L3 4MB 11ns Main Memory 62ns Intel i7-6700 “Skylake” at 4 GHz Image: Ferry24.Milan (CC BY-SA 3.0) Source: 7-cpu.com
  • 25. Enter: CPU Cache • A unit of work is called cache line – 64 bytes on x86 – LRU eviction policy • Why is sequential access fast? – Cache prefetching
  • 26. In Real Life • Let’s rotate an image! for (y = 0; y < height; y++) for (x = 0; x < width; x++) { int from = y * width + x; int to = x * height + y; target[to] = source[from]; } Image: EgoAltere (CC0 Public Domain)
  • 27. In Real Life • This is not efficient • Reads are sequential 0 1 2 3 ... 9 0 1 2 3 … 9
  • 28. In Real Life • This is not efficient • Reads are sequential 0 1 2 3 ... 9 0 0 1 2 3 … 9 1 2 3 … 9
  • 29. In Real Life • This is not efficient • Reads are sequential • Writes aren’t, though • Different strides – Worst case wins :-( 0 1 2 3 ... 9 0 0 1 2 3 … 9 1 10 2 20 3 30 … … 9 90
  • 30. Cache-Friendly Algorithms • Use blocking or tiling for (y = 0; y < height; y += blockHeight) for (x = 0; x < width; x += blockWidth) for (by = 0; by < blockHeight; by++) for (bx = 0; bx < blockWidth; bx++) { int from = (y + by) * width + (x + bx); int to = (x + bx) * height + (y + by); target[to] = source[from]; }
  • 31. Cache-Friendly Algorithms • The results? Benchmark Mode Cnt Score Error Units CachingShowcase.transposeNaive avgt 10 43.851 ± 6.000 ms/op CachingShowcase.transposeTiled8x8 avgt 10 20.641 ± 1.646 ms/op CachingShowcase.transposeTiled16x16 avgt 10 18.515 ± 1.833 ms/op CachingShowcase.transposeTiled48x48 avgt 10 21.941 ± 1.954 ms/op • The results? Benchmark Mode Cnt Error Units CachingShowcase.transpose avgt 10 ± 6.000 ms/op CachingShowcase.transpose avgt 10 ± 1.646 ms/op CachingShowcase.transpose avgt 10 ± 1.833 ms/op CachingShowcase.transpose avgt 10 ± 1.954 ms/op x2.37 speedup!
  • 32. 2. Those Pesky Branches • Do I go left or right? • Need input! • … but can’t wait for it • Maybe... – Take a guess? – Based on historic trends? • Sounds speculative Image: Michael Dolan (CC BY 2.0)
  • 33. Those Pesky Branches • Enter: Branch Prediction • Concurrently: – Speculate branch – Evaluate condition • It’s now a tradeoff – Commit is fast – Rollback is slow Image: Alejandro C. (CC BY-NC 2.0)
  • 34. // Generate a bunch of bytes byte[] data = new byte[32768]; new Random().nextBytes(data); Arrays.sort(data); // Sum positive elements long sum = 0; for (int i = 0; i < data.length; i++) if (data[i] >= 0) sum += data[i]; Back to Our Conundrum • Can you guess? – 3… – 2... – 1... • Here it is! // Generate a bunch of bytes byte[] data = new byte[32768]; new Random().nextBytes(data); Arrays.sort(data); // Sum positive elements long sum = 0; for (int i = 0; i < data.length; i++) if (data[i] >= 0) sum += data[i];
  • 35. Catharsis 54 10 -4 -2 15 41 - 37 13 0 -9 14 25 - 61 40 Original data array:
  • 36. Catharsis - 61 - 37 -9 -4 -2 0 10 13 14 15 25 40 41 54 After sorting: 0 data[i] >= 0 Always false! data[i] >= 0 Always true!
  • 37. QUESTIONS? Thank you for listening tomer@tomergabel.com @tomerg http://engineering.wix.com Sources and Examples: https://goo.gl/f7NfGT This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 38. Further Reading • Jason Robert Carey Patterson – Modern Microprocessors, a 90-Minute Guide • Igor Ostrovsky - Gallery of Processor Cache Effects • Piyush Kumar – Cache Oblivious Algorithms