Talk about bcc/eBPF for SCALE15x (2017) by Brendan Gregg. "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."
26. Pre-BPF: Linux Perf Analysis in 60s
1. uptime
2. dmesg -T | tail
3. vmstat 1
4. mpstat -P ALL 1
5. pidstat 1
6. iostat -xz 1
7. free -m
8. sar -n DEV 1
9. sar -n TCP,ETCP 1
10. top
hDp://techblog.neclix.com/2015/11/linux-performance-analysis-in-60s.html
27. bcc InstallaVon
• hDps://github.com/iovisor/bcc/blob/master/INSTALL.md
• eg, Ubuntu Xenial:
– Also available as an Ubuntu snap
– Ubuntu 16.04 is good, 16.10 beDer: more tools work
• Installs many tools
– In /usr/share/bcc/tools, and …/tools/old for older kernels
# echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |
sudo tee /etc/apt/sources.list.d/iovisor.list
# sudo apt-get update
# sudo apt-get install bcc-tools
32. 3. ext4slower
• Trace slow FS I/O, to beDer idenVfy I/O issues and outliers:
# ext4slower 1
Tracing ext4 operations slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:49:17 bash 3616 R 128 0 7.75 cksum
06:49:17 cksum 3616 R 39552 0 1.34 [
06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7
06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4
06:49:17 cksum 3616 R 10320 0 6.82 411toppm
06:49:17 cksum 3616 R 65536 0 4.01 a2p
06:49:17 cksum 3616 R 55400 0 8.77 ab
06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14
[…]
More reliable and complete indicator than measuring disk I/O latency
Also: btrfsslower, xfsslower, zfslower
33. 4. biolatency
• IdenVfy mulVmodal latency and outliers with a histogram:
# biolatency -mT 1
Tracing block device I/O... Hit Ctrl-C to end.
06:20:16
msecs : count distribution
0 -> 1 : 36 |**************************************|
2 -> 3 : 1 |* |
4 -> 7 : 3 |*** |
8 -> 15 : 17 |***************** |
16 -> 31 : 33 |********************************** |
32 -> 63 : 7 |******* |
64 -> 127 : 6 |****** |
[…]
Average latency (iostat/sar) may not be represenVVve with mulVple modes or outliers
The "count" column is
summarized in-kernel
34. 5. biosnoop
• Dump disk I/O events for detailed analysis. tcpdump for disks:
# biosnoop
TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
0.000004001 supervise 1950 xvda1 W 13092560 4096 0.74
0.000178002 supervise 1950 xvda1 W 13092432 4096 0.61
0.001469001 supervise 1956 xvda1 W 13092440 4096 1.24
0.001588002 supervise 1956 xvda1 W 13115128 4096 1.09
1.022346001 supervise 1950 xvda1 W 13115272 4096 0.98
1.022568002 supervise 1950 xvda1 W 13188496 4096 0.93
1.023534000 supervise 1956 xvda1 W 13188520 4096 0.79
1.023585003 supervise 1956 xvda1 W 13189512 4096 0.60
2.003920000 xfsaild/md0 456 xvdc W 62901512 8192 0.23
[…]
Can import this into a spreadsheet and do a scaDer plot of Vme vs latency, e.t.c.
44. trace
# trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
TIME PID COMM FUNC -
05:18:23 4490 dd sys_read read 1048576 bytes
05:18:23 4490 dd sys_read read 1048576 bytes
05:18:23 4490 dd sys_read read 1048576 bytes
^C
by Sasha Goldshtein
# trace -h
[...]
trace –K blk_account_io_start
Trace this kernel function, and print info with a kernel stack trace
trace 'do_sys_open "%s", arg2'
Trace the open syscall and print the filename being opened
trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
Trace the read syscall and print a message for reads >20000 bytes
trace r::do_sys_return
Trace the return from the open syscall
trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
Trace the open() call from libc only if the flags (arg2) argument is 42
trace 't:block:block_rq_complete "sectors=%d", args->nr_sector'
Trace the block_rq_complete kernel tracepoint and print # of tx sectors
[...]
trace -h
lists example
one-liners
trace custom events
64. ply One-Liners
# Trace file opens:
ply -c 'kprobe:do_sys_open { printf("opened: %sn", mem(arg(1), "128s")); }'
# Counting vfs functions by process name:
ply -c 'kprobe:vfs_* { @[comm(), func()].count(); }'
# Counting off-CPU stacks:
ply -c 'kprobe:schedule { @[stack()].count() }'
# Syscall read return size as a histogram:
ply -c 'kretprobe:SyS_read { @ret.quantize(retval()); }'
# Syscall read latency as a histogram:
ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
@start[tid()] = nil; }'
[...]
also see ply/oneliners.md
65. ply
• A new BPF-based dynamic tracer for Linux
– Created by Tobias Waldekranz
– hDps://github.com/iovisor/ply hDps://wkz.github.io/ply/
• High-level language
– Simple one-liners
– Short scripts
• In development
– kprobes and tracepoints only, uprobes/perf_events not yet
– Successful so far as a proof of concept
– Not producVon tested yet (bcc is)
67. Challenges
• MarkeVng
• DocumentaVon
• Training
• Community
Without these, we may have another irace: a built in "secret" of Linux. Not good for adopVon!
hDps://www.iovisor.org project helps, but tracing (observability) is only one part.