SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
© 2018 NETRONOME SYSTEMS, INC. 1
Verifier optimization work
Jakub Kicinski <kuba@kernel.org>
LFSMM
BPF Microconference
San Juan, 2 May 2019
© 2019 NETRONOME SYSTEMS, INC. 2CONFIDENTIAL
Recent optimizations from Alexei
● rare explored state removal
most explored states never prune any later walks - remove states after:
miss_cnt > 3 + hit_cnt * 3
● read marking backpropagation pruning
read marks are propagated to source states, once state with read mark
already set is reached, propagation can stop
● big verifier lock removal
already covered
© 2019 NETRONOME SYSTEMS, INC. 3CONFIDENTIAL
Cycles spent*
* sum over Cilium test programs
Function cycles % do_check % insn prog % insn walk
Total (do_check) 2613 100.00%
copy_verifier_state 558 21.35%
regsafe 368 14.08%
free_verifier_state 167 6.39%
check_cond_jmp_op 252 9.64% 10.13% 10.15%
check_alu_op 100 3.83% 59.13% 57.02%
check_mem_access 89 3.41% 23.53% 26.28%
check_helper_call 80 3.06% 5.65% 4.62%
mark_reg_read 229 8.76%
mark_reg_unknown 71 2.72%
mark_reg_known 15 0.57%
© 2019 NETRONOME SYSTEMS, INC. 4CONFIDENTIAL
Cycles spent*
* sum over Cilium test programs
Function cycles % do_check % insn prog % insn walk
Total (do_check) 2613 100.00%
copy_verifier_state 558 21.35%
regsafe 368 14.08%
free_verifier_state 167 6.39%
check_cond_jmp_op 252 9.64% 10.13% 10.15%
check_alu_op 100 3.83% 59.13% 57.02%
check_mem_access 89 3.41% 23.53% 26.28%
check_helper_call 80 3.06% 5.65% 4.62%
mark_reg_read 229 8.76%
mark_reg_unknown 71 2.72%
mark_reg_known 15 0.57%
Trivial micro optimization - avoid the use of zalloc+memcpy
19.41%
© 2019 NETRONOME SYSTEMS, INC. 5CONFIDENTIAL
Pruning point analysis
n prunes sum(points)
0 5137
1 615
2 242
3 167
4 51
5 39
6 45
7 19
8 24
9 17
10 11
© 2019 NETRONOME SYSTEMS, INC. 6CONFIDENTIAL
Pruning point elimination
● pruning points are too dense - every 3.8 instruction in Cilium progs
● 80% of conditional branch pruning points with 0 hits
● replacing the pruning heuristic with marking every 10th instruction gives
4-20% do_check speedup for Cilium progs
● 33% more instructions walked
● no good heuristic apparent, yet
● pruning on fall through insn, rather than jmp - 4%
● in-place branch pruning
Branch 9279 27.55%
Shallow 4641 13.78%
Pruning 24397 72.45%
Total 33676
© 2019 NETRONOME SYSTEMS, INC. 7CONFIDENTIAL
Other ideas
● tail elimination:
r0 = const
exit
covered by the shallow branch optimization
● pure function detection/pruning (callsite independent)
real-life benefit unclear due to small number of no-inline samples
● “fudge” builtin:
var = __builtin_constant_relaxed(5, 0xff)
hints the verifier should loosen the info about the constant
© 2019 NETRONOME SYSTEMS, INC. 8CONFIDENTIAL
1M instruction challenges
● jump offset (16 bit)
● instruction patching is quadratic
● pruning state grows as O(stack frames x prog len)
● execution time estimation?

Mais conteúdo relacionado

Semelhante a LFSMM Verifier Optimizations and 1 M Instructions

Semelhante a LFSMM Verifier Optimizations and 1 M Instructions (20)

Quality Improvement and Automation of a Flywheel Engraving Machine
Quality Improvement and Automation of  a Flywheel Engraving MachineQuality Improvement and Automation of  a Flywheel Engraving Machine
Quality Improvement and Automation of a Flywheel Engraving Machine
 
IRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion Technique
IRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion TechniqueIRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion Technique
IRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion Technique
 
IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...
IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...
IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...
 
Practical Guidelines for Solving Difficult Mixed Integer Programs
Practical Guidelines for Solving Difficult  Mixed Integer ProgramsPractical Guidelines for Solving Difficult  Mixed Integer Programs
Practical Guidelines for Solving Difficult Mixed Integer Programs
 
IRJET- Design and Implementation of High Speed FPGA Configuration using SBI
IRJET- Design and Implementation of High Speed FPGA Configuration using SBIIRJET- Design and Implementation of High Speed FPGA Configuration using SBI
IRJET- Design and Implementation of High Speed FPGA Configuration using SBI
 
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization StudioRecent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
 
IRJET- Traffic Sign and Drowsiness Detection using Open-CV
IRJET- Traffic Sign and Drowsiness Detection using Open-CVIRJET- Traffic Sign and Drowsiness Detection using Open-CV
IRJET- Traffic Sign and Drowsiness Detection using Open-CV
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
 
itSMF Presentation March 2009
itSMF Presentation March 2009itSMF Presentation March 2009
itSMF Presentation March 2009
 
Performance Optimization of HPC Applications: From Hardware to Source Code
Performance Optimization of HPC Applications: From Hardware to Source CodePerformance Optimization of HPC Applications: From Hardware to Source Code
Performance Optimization of HPC Applications: From Hardware to Source Code
 
IBM Streams V4.1 and Incremental Checkpointing
IBM Streams V4.1 and Incremental CheckpointingIBM Streams V4.1 and Incremental Checkpointing
IBM Streams V4.1 and Incremental Checkpointing
 
Meter anomaly detection
Meter anomaly detectionMeter anomaly detection
Meter anomaly detection
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
 
Qualifying a high performance memory subsysten for Functional Safety
Qualifying a high performance memory subsysten for Functional SafetyQualifying a high performance memory subsysten for Functional Safety
Qualifying a high performance memory subsysten for Functional Safety
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
MasterClass_Quectel_LPWAN NBIOT LTE M Power
MasterClass_Quectel_LPWAN NBIOT LTE M PowerMasterClass_Quectel_LPWAN NBIOT LTE M Power
MasterClass_Quectel_LPWAN NBIOT LTE M Power
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
Verify High Sigma Whitepaper
Verify High Sigma WhitepaperVerify High Sigma Whitepaper
Verify High Sigma Whitepaper
 
EEP301: Ca06 sample
EEP301: Ca06 sampleEEP301: Ca06 sample
EEP301: Ca06 sample
 
Multi-Direction Pedestrian Wind Comfort Analysis
Multi-Direction Pedestrian Wind Comfort AnalysisMulti-Direction Pedestrian Wind Comfort Analysis
Multi-Direction Pedestrian Wind Comfort Analysis
 

Mais de Netronome

Efficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesEfficient JIT to 32-bit Arches
Efficient JIT to 32-bit Arches
Netronome
 

Mais de Netronome (20)

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DS
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
 
Offloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TCOffloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TC
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
Efficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesEfficient JIT to 32-bit Arches
Efficient JIT to 32-bit Arches
 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch Abstractions
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICs
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

LFSMM Verifier Optimizations and 1 M Instructions

  • 1. © 2018 NETRONOME SYSTEMS, INC. 1 Verifier optimization work Jakub Kicinski <kuba@kernel.org> LFSMM BPF Microconference San Juan, 2 May 2019
  • 2. © 2019 NETRONOME SYSTEMS, INC. 2CONFIDENTIAL Recent optimizations from Alexei ● rare explored state removal most explored states never prune any later walks - remove states after: miss_cnt > 3 + hit_cnt * 3 ● read marking backpropagation pruning read marks are propagated to source states, once state with read mark already set is reached, propagation can stop ● big verifier lock removal already covered
  • 3. © 2019 NETRONOME SYSTEMS, INC. 3CONFIDENTIAL Cycles spent* * sum over Cilium test programs Function cycles % do_check % insn prog % insn walk Total (do_check) 2613 100.00% copy_verifier_state 558 21.35% regsafe 368 14.08% free_verifier_state 167 6.39% check_cond_jmp_op 252 9.64% 10.13% 10.15% check_alu_op 100 3.83% 59.13% 57.02% check_mem_access 89 3.41% 23.53% 26.28% check_helper_call 80 3.06% 5.65% 4.62% mark_reg_read 229 8.76% mark_reg_unknown 71 2.72% mark_reg_known 15 0.57%
  • 4. © 2019 NETRONOME SYSTEMS, INC. 4CONFIDENTIAL Cycles spent* * sum over Cilium test programs Function cycles % do_check % insn prog % insn walk Total (do_check) 2613 100.00% copy_verifier_state 558 21.35% regsafe 368 14.08% free_verifier_state 167 6.39% check_cond_jmp_op 252 9.64% 10.13% 10.15% check_alu_op 100 3.83% 59.13% 57.02% check_mem_access 89 3.41% 23.53% 26.28% check_helper_call 80 3.06% 5.65% 4.62% mark_reg_read 229 8.76% mark_reg_unknown 71 2.72% mark_reg_known 15 0.57% Trivial micro optimization - avoid the use of zalloc+memcpy 19.41%
  • 5. © 2019 NETRONOME SYSTEMS, INC. 5CONFIDENTIAL Pruning point analysis n prunes sum(points) 0 5137 1 615 2 242 3 167 4 51 5 39 6 45 7 19 8 24 9 17 10 11
  • 6. © 2019 NETRONOME SYSTEMS, INC. 6CONFIDENTIAL Pruning point elimination ● pruning points are too dense - every 3.8 instruction in Cilium progs ● 80% of conditional branch pruning points with 0 hits ● replacing the pruning heuristic with marking every 10th instruction gives 4-20% do_check speedup for Cilium progs ● 33% more instructions walked ● no good heuristic apparent, yet ● pruning on fall through insn, rather than jmp - 4% ● in-place branch pruning Branch 9279 27.55% Shallow 4641 13.78% Pruning 24397 72.45% Total 33676
  • 7. © 2019 NETRONOME SYSTEMS, INC. 7CONFIDENTIAL Other ideas ● tail elimination: r0 = const exit covered by the shallow branch optimization ● pure function detection/pruning (callsite independent) real-life benefit unclear due to small number of no-inline samples ● “fudge” builtin: var = __builtin_constant_relaxed(5, 0xff) hints the verifier should loosen the info about the constant
  • 8. © 2019 NETRONOME SYSTEMS, INC. 8CONFIDENTIAL 1M instruction challenges ● jump offset (16 bit) ● instruction patching is quadratic ● pruning state grows as O(stack frames x prog len) ● execution time estimation?