O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

eBPF in the view of a storage developer

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 34 Anúncio

eBPF in the view of a storage developer

Baixar para ler offline


eBPF is one of the key technologies nowadays. There are several existing technologies in network or observability fields but not much in storage space. This presentation tells my research story and tries to define some of the possibilities of the technology.


eBPF is one of the key technologies nowadays. There are several existing technologies in network or observability fields but not much in storage space. This presentation tells my research story and tries to define some of the possibilities of the technology.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a eBPF in the view of a storage developer (20)

Anúncio

Mais recentes (20)

eBPF in the view of a storage developer

  1. 1. eBPF from the view of a storage developer Richa’rd Kova’cs
  2. 2. © StorageOS, Inc. 2 Boring slide • At work: − Kubernetes Integration Engineer − @StorageOS − Operator, Scheduler, Automation • At all: − Many years of DevOps, cloud and containerization. − OSS devotee − Known as @mhmxs PHOTO
  3. 3. StorageOS is cloud native, software-defined storage for running containerized applications in production, running in the cloud, on-prem and in hybrid/multi-cloud environments. 3
  4. 4. © StorageOS, Inc. 4 Agenda Developer experience Portability and debugging Deep dive Introduce kubectl gadget plugin Basics including architecture, performance, and weaknesses
  5. 5. © StorageOS, Inc. 5 Agenda Basics including architecture, performance, and weaknesses
  6. 6. ● What the heck is Extended Berkley Packet Filter (eBPF) − Linux kernel feature since 4.1 - 🙀 − First it was an iptables replacement (BPF) − It uses kernel events to do various things − cat /proc/kallsyms | wc -l ● 185449 (and counting) − eBPF has the capability to interact with userspace − Script compiled to a special eBPF bytecode − New attack vendor ● In short: − Small, mostly C program, compiled to bytecode to hook up at almost anywhere in the kernel. Basics
  7. 7. How does it work? Source: https://www.brendangregg.com/ebpf.html
  8. 8. © StorageOS, Inc. 8 Some projects based on eBPF WeaveScope Tracing TCP connections seccomp-bpf Limiting syscalls Calico Network eBPF dataplane Inspector gadget Kubectl plugin to work with eBPF Cilium Networking, Observability and Security
  9. 9. Storage related options Source: https://www.brendangregg.com/ebpf.html
  10. 10. ● Tracing at the VFS layer level − At this level eBPF plugin is able to catch file related events: ● CRUD of files or directories ● File system caches ● Mount points ● cat /proc/kallsyms | grep "t vfs" | wc -l − 44 ● Examples: − vfsstat.py: Count VFS calls − vfsreadlat.c: VFS read latency distribution Storage related options
  11. 11. ● Tracing at the file system layer level − File system specific events: ● Ext4, NFS, BTRS, … ● CRUD operations ● Low level operations ● Performance related events ● cat /proc/kallsyms | grep "t ext4" | wc -l − 397 ● Examples: − nfsslower.py: Trace slow NFS operations − btrfsdist.py: Summarize BTRFS operation latency distribution Storage related options
  12. 12. ● Tracing at the block device / device driver layer levels − A trace at this level gives insight on which areas of: ● Low level - near to HW – operations ● Physical disk devices ● Virtual block devices ● Block device read – write ● Examples: − bitehist.py: Block I/O size − disksnoop.py: Trace block device I/O latency Storage related options
  13. 13. ● Supported architectures are limited (arm, amd64 included) ● Not supported everywhere − Needs CONFIG_BPF_SYSCALL during kernel build − Container needs privileged mode − In cloud it should be tricky, not widely supported ● Portability is tricky ● Limited size of MAPs ● Hard to debug ● Test matrix should be huge on case of a heterogeneous infrastructure Weaknesses
  14. 14. ● Small pre-built bytecode ● JIT compiled − Depends on CONFIG_BPF_JIT ● Kernel changes observed function instruction order − It is native − No extra layer − No exact or measurable overhead Performance impact
  15. 15. © StorageOS, Inc. 15 Agenda Deep dive
  16. 16. ● Kprobe − Kernel dynamic tracing ■ Kernel file write end ● Uprobe − User level dynamic tracing ■ Return value of bash readline() ● Tracepoint − Kernel static tracing ■ Trace sys_enter syscalls of a program ● Perf events − Timed sampling Performance Monitoring Counter (PMC) Hook points
  17. 17. Interacting with userspace Source: https://www.brendangregg.com/ebpf.html
  18. 18. ● Without interacting a user space program eBPF has just a limited use-cases ● EBPF uses a shared MAPs to gap the overlap the gap ● Read of MAP happens asynchronous ● There are several type of MAPs for different uses-cases Interacting with userspace
  19. 19. ● BPF_MAP_TYPE_UNSPEC = 0, ● BPF_MAP_TYPE_HASH = 1, ● BPF_MAP_TYPE_ARRAY = 2, ● BPF_MAP_TYPE_PROG_ARRAY = 3, ● BPF_MAP_TYPE_PERF_EVENT_ARRAY = 4, ● BPF_MAP_TYPE_PERCPU_HASH = 5, ● BPF_MAP_TYPE_PERCPU_ARRAY = 6, ● BPF_MAP_TYPE_STACK_TRACE = 7, ● BPF_MAP_TYPE_CGROUP_ARRAY = 8, ● BPF_MAP_TYPE_LRU_HASH = 9, ● BPF_MAP_TYPE_LRU_PERCPU_HASH = 10, ● BPF_MAP_TYPE_LPM_TRIE = 11, Interacting with userspace ● BPF_MAP_TYPE_ARRAY_OF_MAPS = 12, ● BPF_MAP_TYPE_HASH_OF_MAPS = 13, ● BPF_MAP_TYPE_DEVMAP = 14, ● BPF_MAP_TYPE_SOCKMAP = 15, ● BPF_MAP_TYPE_CPUMAP = 16, ● BPF_MAP_TYPE_XSKMAP = 17, ● BPF_MAP_TYPE_SOCKHASH = 18, ● BPF_MAP_TYPE_CGROUP_STORAGE = 19, ● BPF_MAP_TYPE_REUSEPORT_SOCKARRAY = 20, ● BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = 21, ● BPF_MAP_TYPE_QUEUE = 22, ● BPF_MAP_TYPE_STACK = 23, ● BPF_MAP_TYPE_SK_STORAGE = 24, ● BPF_MAP_TYPE_DEVMAP_HASH = 25, ● BPF_MAP_TYPE_STRUCT_OPS = 26, ● BPF_MAP_TYPE_RINGBUF = 27, ● BPF_MAP_TYPE_INODE_STORAGE = 28,
  20. 20. © StorageOS, Inc. 20 Agenda Developer experience
  21. 21. ● BCC − BCC is a toolkit for creating efficient kernel tracing and manipulation programs − Contains lots of examples − Kernel instrumentation is written in C − Python and Lua frontends ● Dynamic generated C source in Python source looks really ugly Frontends
  22. 22. ● BPFTrace − High level, fixed scope tracing language − Solves portability − Language is inspired by awk and C, and predecessor tracers such as Dtrace − Many of the BCC examples have rewritten in BPFTrace − Supports one liners ● bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %sn", comm, str(args->filename)); } − Kubectl plugin exists: kubectl-trace − Easy to learn: ● Trace all EXT4 reads in the given mount point https://github.com/mhmxs/bpftrace/pull/1/files Frontends
  23. 23. Frontends
  24. 24. ● Gobpf − Provides Go binding for BCC Framework − Low level utils to load and use eBPF programs − The same as BCC: ● Kernel instrumentation is written in C ● Python - Go Frontends
  25. 25. ● Cilium/ebpf − Pure Go library that provides utilities for loading, compiling, and debugging eBPF programs − Contains lots of examples − Useful helper functions − Kernel instrumentation is written in ASM ● Generated with Go code − Kernel instrumentation is written in C ● Generates Go bindings Frontends
  26. 26. © StorageOS, Inc. 26 Agenda Portability and debugging
  27. 27. ● By default eBPF program has to match with kernel − Function signatures can change − Data structures can change ● What options we have to increase portability − Use BPFTrace if possible because it just works − Deal with kernel version match Portability
  28. 28. ● Helpers to deal with it ● Use Cilium/ebpf because of it’s handy helpers ● Bpftool is able to dump kernel headers ● bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h ● High-level BPF CO-RE mechanics ● The CO-RE is a set of macros to generate memory accessors on the fly ● Read memory ● Field exists ● So on... − Portability
  29. 29. ● Kernel memory is not readable directly − bpf_core_read() function reads the memory ● Kernel structs are randomly ordered ● High-level BPF CO-RE mechanics − BPF_CORE_READ(file, f_path.dentry, d_iname); // path of data − With regular bpf_core_read() each f_path, dentry, d_name needs to read into a separated variable Portability
  30. 30. ● Hard to debug ● Many times there is no error, just does nothing ● BPF calls are also traceable − Needs to recompile the kernel − Needs to disable JIT compiler ● Rbpf is a eBPF virtual machine in Rust Debugging
  31. 31. © StorageOS, Inc. 31 Agenda Introduce kubectl gadget plugin
  32. 32. ● I LOVE eBPF ● Lot’s of opportunities from AI driven storage miner detector to real-time file monitoring ● With a bit of kernel knowledge it is easy to react on almost any kind event ● Several frontends, helpers and other libraries ● Bunch of existing projects – real world experience ● Kubernetes integration depends on distribution/platform ● C is mandatory at the end of the day ● Really hard to debug SUMM()
  33. 33. www.storageos.com © StorageOS, Inc. Thank You www.storageos.com
  34. 34. ● eBPF for SRE with Reilably: https://dev.to/reliably/ebpf-for-sre-with-reliably-18dc ● Tracing Go function arguments in prod: https://blog.px.dev/ebpf-function-tracing/post/ ● Tracing SSL/TLS connections: https://blog.px.dev/ebpf-openssl-tracing Extra reading

×