Mais conteúdo relacionado
Semelhante a LFSMM Verifier Optimizations and 1 M Instructions (20)
LFSMM Verifier Optimizations and 1 M Instructions
- 1. © 2018 NETRONOME SYSTEMS, INC. 1
Verifier optimization work
Jakub Kicinski <kuba@kernel.org>
LFSMM
BPF Microconference
San Juan, 2 May 2019
- 2. © 2019 NETRONOME SYSTEMS, INC. 2CONFIDENTIAL
Recent optimizations from Alexei
● rare explored state removal
most explored states never prune any later walks - remove states after:
miss_cnt > 3 + hit_cnt * 3
● read marking backpropagation pruning
read marks are propagated to source states, once state with read mark
already set is reached, propagation can stop
● big verifier lock removal
already covered
- 3. © 2019 NETRONOME SYSTEMS, INC. 3CONFIDENTIAL
Cycles spent*
* sum over Cilium test programs
Function cycles % do_check % insn prog % insn walk
Total (do_check) 2613 100.00%
copy_verifier_state 558 21.35%
regsafe 368 14.08%
free_verifier_state 167 6.39%
check_cond_jmp_op 252 9.64% 10.13% 10.15%
check_alu_op 100 3.83% 59.13% 57.02%
check_mem_access 89 3.41% 23.53% 26.28%
check_helper_call 80 3.06% 5.65% 4.62%
mark_reg_read 229 8.76%
mark_reg_unknown 71 2.72%
mark_reg_known 15 0.57%
- 4. © 2019 NETRONOME SYSTEMS, INC. 4CONFIDENTIAL
Cycles spent*
* sum over Cilium test programs
Function cycles % do_check % insn prog % insn walk
Total (do_check) 2613 100.00%
copy_verifier_state 558 21.35%
regsafe 368 14.08%
free_verifier_state 167 6.39%
check_cond_jmp_op 252 9.64% 10.13% 10.15%
check_alu_op 100 3.83% 59.13% 57.02%
check_mem_access 89 3.41% 23.53% 26.28%
check_helper_call 80 3.06% 5.65% 4.62%
mark_reg_read 229 8.76%
mark_reg_unknown 71 2.72%
mark_reg_known 15 0.57%
Trivial micro optimization - avoid the use of zalloc+memcpy
19.41%
- 5. © 2019 NETRONOME SYSTEMS, INC. 5CONFIDENTIAL
Pruning point analysis
n prunes sum(points)
0 5137
1 615
2 242
3 167
4 51
5 39
6 45
7 19
8 24
9 17
10 11
- 6. © 2019 NETRONOME SYSTEMS, INC. 6CONFIDENTIAL
Pruning point elimination
● pruning points are too dense - every 3.8 instruction in Cilium progs
● 80% of conditional branch pruning points with 0 hits
● replacing the pruning heuristic with marking every 10th instruction gives
4-20% do_check speedup for Cilium progs
● 33% more instructions walked
● no good heuristic apparent, yet
● pruning on fall through insn, rather than jmp - 4%
● in-place branch pruning
Branch 9279 27.55%
Shallow 4641 13.78%
Pruning 24397 72.45%
Total 33676
- 7. © 2019 NETRONOME SYSTEMS, INC. 7CONFIDENTIAL
Other ideas
● tail elimination:
r0 = const
exit
covered by the shallow branch optimization
● pure function detection/pruning (callsite independent)
real-life benefit unclear due to small number of no-inline samples
● “fudge” builtin:
var = __builtin_constant_relaxed(5, 0xff)
hints the verifier should loosen the info about the constant
- 8. © 2019 NETRONOME SYSTEMS, INC. 8CONFIDENTIAL
1M instruction challenges
● jump offset (16 bit)
● instruction patching is quadratic
● pruning state grows as O(stack frames x prog len)
● execution time estimation?