Do you ever wonder what the kernel is doing while your code is running? This talk will explore some methodologies and techniques (eBPF, ftrace, etc.) to look under the hood of the Linux kernel and understand what it’s actually doing behind the scenes.
5. How to keep up with changes
●
https://lwn.net/Kernel/
●
https://kernelnewbies.org/LinuxChanges
●
http://vger.kernel.org/vger-lists.html#linux-kernel
●
kernel source: Documentation/
7. strace
●
strace(1): system call tracer in Linux
●
It uses the ptrace() system call that pauses the
target process for each syscall so that the debugger
can read the state
●
And it’s doing this twice: when the syscall
begins and when it ends!
8. strace overhead
### Regular execution ###
$ dd if=/dev/zero of=/dev/null bs=1 count=500k
512000+0 records in
512000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0,501455 s, 1.0 MB/s
### Strace execution (tracing a syscall that is never called) ###
$ strace -e trace=accept dd if=/dev/zero of=/dev/null bs=1 count=500k
512000+0 records in
512000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 44.0216 s, 11,6 kB/s
+++ exited with 0 +++
11. eBPF features
●
Highly efficient VM that lives in the kernel
●
Inject safe sanboxed bytecode into the kernel
●
Attach code to kernel functions / events
●
In-kernel JIT compiler
– Dynamically translate eBPF bytecode into native opcodes
●
eBPF makes kernel programmable without having to
cross kernel/user-space boundaries
●
Access in-kernel data structures directly without the risk of
crashing, hanging or breaking the kernel in any way
12. eBPF history
●
Initially it was BPF: Berkeley Packet Filter
●
It has its roots in BSD in the very early 1990’s
●
Originally designed as a mechanism for fast filtering network
packets
●
3.15: Linux introduced eBPF: extended Berkeley Packet Filter
●
More efficient / more generic than the original BPF
●
3.18: eBPF VM exposed to user-space
●
4.9: eBPF programs can be attached to perf_events
●
4.10: eBPF programs can be attached to cgroups
●
4.15: eBPF LSM hooks
13. eBPF as a VM
●
Example assembly of a simple
eBPF filter
●
Load 16-bit quantity from
offset 12 in the packet to the
accumulator (ethernet type)
●
Compare the value to see if
the packet is an IP packet
●
If the packet is IP, return TRUE
(packet is accepted)
●
otherwise return 0 (packet is
rejected)
●
Only 4 VM instructions to filter IP
packets!
ldh [12]
jeq #ETHERTYPE_IP, l1, l2
l1: ret #TRUE
l2: ret #0
20. Example #3: ping
●
Identify where ICMP packets (ECHO_REQUEST /
ECHO_REPLY) are received and processed by the
kernel
21. Example #4: task wait / wakeup
●
Determine the stack
trace of a sleeping
process and the stack
trace of the process
that wakes up a
sleeping process
22. Conclusion
●
Real-time tracing as a method to study the kernel
●
Understanding what the kernel is doing can help to
improve your application / service in terms of
performance, reliability and security
23. References
●
Brendan Gregg blog
●
http://brendangregg.com/blog/
●
BCC tools
●
https://github.com/iovisor/bcc
●
gobpf (BPF bindings for go):
●
https://github.com/iovisor/gobpf
●
The BSD Packet Filter: A New Architecture for User-level
Packet Capture -
S. McCanne and V. Jacobson
●
http://www.tcpdump.org/papers/bpf-usenix93.pdf