SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
XDP: a new fast and
programmable network layer
XDP - eXpress Data Path
Jesper Dangaard Brouer
Principal Engineer, Red Hat Inc.
Kernel Recipes
Paris, France, Sep 2018
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Learn that:
● Linux got a new fast and programmable network layer
○ And touch upon the basic building blocks
● How this XDP technology got motivated upstream
● Why eBPF is such a perfect match for XDP
● Demystify: Why this XDP technology is so fast!
● How XDP achieves "hidden" RX bulking
● Why XDP-redirect is a novel approach
What will you learn?
What will you actually get out of listening to this talk?
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
XDP is about providing eBPF programmable superpowers to end-users
● But in this talk you will not learn how to code eBPF
Motivated and dedicated to: Making eBPF programming more easily consumable
● But NOT in this talk…
● … go watch many of my other talks! :-)
○ Like the tutorial: XDP for the Rest of Us
Anti slide: What you will NOT learn
We cannot cover everything today...
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 20184
XDP basically: New layer in the kernel network stack
● Before allocating the SKB
○ Driver level hook at DMA level
● Means: Competing at the same “layer” as DPDK / netmap
● Super fast, due to
○ Take action/decision earlier (e.g. skip some network layers)
○ No-memory allocations
● Not kernel bypass, data-plane is kept inside kernel
○ via BPF: makes early network stack run-time programmable
○ Cooperates with kernel
Single slide intro: What is XDP?
Compressed slide about XDP … cover the details later
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
What! - New layer in kernel networking?!
How did you motivate this upstream!?!
5
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Kernel bypass solutions: Like DPDK, netmap, PF_RING
● Show they are 10x faster than Linux kernel netstack
● This is true: FOR THEIR SPECIFIC USE-CASES
This lead to WRONG conclusions and quotes:
● Wrong: "Kernel is inherently too slow to handle networking"
Got motivated by showing this is WRONG
● Acknowledge competition, XDP was late to the game (2016 initial idea)
○ Netmap 2011 (SIGCOMM), DPDK 2013 (goes open-source?), PF_RING 2010 (earliest ref found)
Motivation through competition
Competition was helpful to force us to find an upstream kernel alternative
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
It is all about the use-case
● Kernel bypass benchmarks show L2/L3 use-cases
● Linux netstack assumes socket delivery use-case
Linux network stack was build with socket-delivery use-case in focus
● Naming of container for packet data structure says is all:
○ SKB → sk_buff → (originates from) socket buffer
○ Thus, netstack takes this cost upfront (at driver level alloc SKB)
Unfair comparison
Comparing apples and bananas
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
XDP basically change the picture: Operate at same level
● Introduce layer before allocating the SKB
● Hook at driver level (running eBPF)
XDP goals
● Close the performance gap to kernel-bypass solutions
○ Not a goal to be faster
● Provide in-kernel alternative, that is more flexible
○ Don’t steal the entire NIC
● Work in concert with existing network stack
XDP design goals
Operate at same L2/L3 level, allows for apple to apple comparison
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Show me some benchmarks!
9
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Comparing dropping packet:
DPDK vs. XDP vs Linux
● XDP follow DPDK line,
offset is due to driver indirect calls,
calculated to 16 nanosec offset
● Linux scales perfectly,
conntrack overhead is significant
● HW/driver does PCIe compression to
avoid PCIe limitations
Benchmark: Drop packet
Driver: mlx5,
HW: 100Gbit/s ConnectX-5 Ex.
CPU: E5-1650v4
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Comparing L2 forward packets
● XDP-same-NIC uses XDP_TX, which
bounce packet inside NIC driver
● XDP-different-NIC use
XDP_REDIRECT, which is more
advanced (further optimization possible)
● Shows XDP can be faster than
DPDK, in certain cases/modes
Benchmark: L2 forwarding
Driver: mlx5,
HW: 100Gbit/s ConnectX-5 Ex.
CPU: E5-1650v4
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Sounds cool!
Tell me more about: Basic design and building blocks
12
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
BPF program return an action or verdict
● XDP_DROP, XDP_PASS, XDP_TX, XDP_ABORTED, XDP_REDIRECT
How to cooperate with network stack
● Pop/push or modify headers: Change RX-handler kernel use
○ e.g. handle protocol unknown to running kernel
● Can propagate 32Bytes meta-data from XDP stage to network stack
○ TC (clsbpf) hook can use meta-data, e.g. set SKB mark
XDP actions and cooperation
What are the basic building blocks I can use?
13
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Data-plane: inside kernel, split into:
● Kernel-core: Fabric in charge of moving packets quickly
● In-kernel BPF program:
○ Policy logic decide action
○ Read/write access to packet
Control-plane: Userspace
● Userspace load BPF program
● Can control program via changing BPF maps
● Everything goes through bpf system call
Design: XDP: data-plane and control-plane
Overall design
14
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
BPF is sandboxed code running inside kernel (XDP only loaded by root)
● A given kernel BPF hook just define:
○ possible actions and limit helpers (that can lookup or change kernel state)
Users get programmable policies (within these limits)
● Userspace "control-plane" API tied to userspace app (not kernel API)
○ likely via modifying a BPF-map
● No longer need a kernel ABI
○ like sysctl/procfs/ioctls etc.
Why kernel developers should love BPF
How BPF avoids creating a new kernel ABI for every new user-invented policy decision
15
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Why XDP_REDIRECT is so interesting?!
16
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
XDP got action code XDP_REDIRECT (that drivers must implement)
● In basic form: Redirecting RAW frames out another net_device/ifindex
● Egress driver: implement ndo_xdp_xmit (and ndo_xdp_flush)
Performance low without using a map for redirect (single CPU core numbers, ixgbe):
● Using helper: bpf_redirect = 7.5 Mpps
● Using helper: bpf_redirect_map = 13.0 Mpps
What is going on?
● Using redirect maps is a HUGE performance boost, why!?
XDP action XDP_REDIRECT
First lets cover the basics, net_device redirect (... later, more flexibility in maps)
Avail since kernel v4.14, but drivers slower to adapt
17
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201818
Basic design: Simplify changes needed in drivers
● Name “redirect” is more generic, than “forwarding”
First trick: Hide RX bulking from driver code via MAP
● Driver still processes packets one at a time - calling xdp_do_redirect
● End of driver NAPI poll routine “flush” (max 64 packets) - call xdp_do_flush_map
● Thus, bulking via e.g. delaying expensive NIC tailptr/doorbell
Second trick: invent new types of redirects easy
● Without changing any driver code!
● Hopefully last XDP action code(?)
Novel: redirect using BPF maps
Why is it so brilliant to use BPF maps for redirecting?
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Gist: Driver XDP RX-processing (NAPI) loop
Demonstrate NAPI-poll call BPF prog per packet and end with MAP-flush
Basic code needed in Driver for supporting XDP:
1 while (desc_in_rx_ring && budget_left--) {
2 action = bpf_prog_run_xdp(xdp_prog, xdp_buff);
3 /* helper bpf_redirect_map have set map (and index)via this_cpu_ptr */
4 switch (action) {
5 case XDP_PASS: break;
6 case XDP_TX: res = driver_local_xmit_xdp_ring(adapter, xdp_buff); break;
7 case XDP_REDIRECT: res = xdp_do_redirect(netdev, xdp_buff, xdp_prog); break;
8 default: bpf_warn_invalid_xdp_action(action); /* fallthrough */
9 case XDP_ABORTED: trace_xdp_exception(netdev, xdp_prog, action); /* fallthrough */
10 case XDP_DROP: res = DRV_XDP_CONSUMED; break;
11 } /* left out acting on res */
12 } /* End of NAPI-poll */
13 xdp_do_flush_map(); /* Bulk chosen by map, can store xdf_frame for flushing */
14 driver_local_XDP_TX_flush();
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201820
The “devmap”: BPF_MAP_TYPE_DEVMAP
● Contains net_devices, userspace adds them via ifindex to map-index
The “cpumap”: BPF_MAP_TYPE_CPUMAP
● Allow redirecting RAW xdp frames to remote CPU
○ SKB is created on remote CPU, and normal network stack invoked
● The map-index is the CPU number (the value is queue size)
AF_XDP - “xskmap”: BPF_MAP_TYPE_XSKMAP
● Allow redirecting RAW xdp frames into userspace
○ via new Address Family socket type: AF_XDP
○ Very close to ‘netmap’ design, keeping drivers in kernel-space
○ Ask Björn Töpel who implemented it… in audience
Redirect map types
What kind of redirects are people inventing?!
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Spectre V2 killed XDP performance
21
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201822
Hey, you killed my XDP performance! (Retpoline tricks for indirect calls)
● Still processing 6 Mpps per CPU core
● But could do approx 13 Mpps before!
Initial through it was net_device->ndo_xdp_xmit call
● Implemented redirect bulking, but only helped a little
Real pitfall: DMA API use indirect function call pointers
● Christoph Hellwig PoC patch show perf return to approx 10 Mpps
Thus, solutions in the pipeline...
Performance issue: Spectre (variant 2)
CONFIG_RETPOLINE and newer GCC compiler
- for stopping Spectre (variant 2) CPU side-channel attacks
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Questions asked by NDIV (HAproxy) in:
“Challenges migrating from NDIV to XDP”
NetDev Conf 0x12 (slides and video)
23
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
NDIV callback rx_done() per packet batch, at end of RX loop
● XDP do have this, but part of BPF map abstraction
● For XDP, correspond to: xdp_do_flush_map
In theory, NDIV could be implemented with BPF as
● Move match/filter part into BPF prog, that return action XDP_REDIRECT
● New redirect BPF map type for NDIV
○ (Ugly) could do mem alloc for NDIV “out-packet(s)”
○ (Ugly) could invoke ndiv->handle_rx(xdp_frame)
● This map type, can receive event RX-done via xdp_do_flush_map
NDIV need callback rx_done()
NDIV have callback rx_done() per packet batch, at end of RX loop
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
There is no TX hook in XDP, why?
● For packets arriving from netstack
○ makes no-sense performance-wise (too late, SKB already alloc)
● BPF hook for TX is available
○ Use TC (clsbpf) egress BPF-prog hook (takes SKB like your handler)
(Disclaimer only idea:) Maybe do XDP bpf_prog invoke if ndo_xdp_xmit() fails
● Use-case: react to XDP-redirect TX overflow
○ E.g. allow new XDP action, like another redirect target (load-balance)
XDP hook at TX?
Could not find the XDP hook at TX
NDIV have ndiv->handle_tx(ndiv, skb)
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
End slide
… Questions?
plus.google.com/+JesperDangaardBrouer
linkedin.com/in/brouer
youtube.com/channel/UCSypIUCgtI42z63soRMONng
facebook.com/brouer
twitter.com/JesperBrouer
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201827
XDP + BPF combined effort of many people
Thanks to all contributors
● Alexei Starovoitov
● Daniel Borkmann
● Brenden Blanco
● Tom Herbert
● John Fastabend
● Martin KaFai Lau
● Jakub Kicinski
● Jason Wang
● Andy Gospodarek
● Thomas Graf
● Michael Chan (bnxt_en)
● Saeed Mahameed (mlx5)
● Tariq Toukan (mlx4)
● Björn Töpel (i40e + AF_XDP)
● Magnus Karlsson (AF_XDP)
● Yuval Mintz (qede)
● Sunil Goutham (thunderx)
● Jason Wang (VM)
● Michael S. Tsirkin (ptr_ring)
● Edward Cree
● Toke Høiland-Jørgensen
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Extra slides if time permits...
28
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
What is this CPUMAP redirect?
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201830
Basic cpumap properties
● Enables redirection of XDP frames to remote CPUs
● Moved SKB allocation outside driver (could help simplify drivers)
Scalability and isolation mechanism
● Allows isolating/decouple driver XDP layer from network stack
○ Don't delay XDP by deep call into network stack
● Enables DDoS protection on end-hosts (that run services)
○ XDP fast-enough to avoid packet drops happen in HW NICs
Another use-case: Fix NIC-HW RSS/RX-hash broken/uneven CPU distribution
● Proto unknown to HW: e.g. VXLAN and double-tagged VLANs
● Suricata use this feature
XDP_REDIRECT + cpumap
What is cpumap redirect?
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201831
Cpumap architecture: Every slot in array-map: dest-CPU
● MPSC (Multi Producer Single Consumer) model: per dest-CPU
○ Multiple RX-queue CPUs can enqueue to single dest-CPU
● Fast per CPU enqueue store (for now) 8 packets
○ Amortized enqueue cost to shared ptr_ring queue via bulk-enq
● Lockless dequeue, via pinning kthread CPU and disallow ptr_ring resize
Important properties from main shared queue ptr_ring (cyclic array based)
● Enqueue+dequeue don't share cache-line for synchronization
○ Synchronization happens based on elements
○ In queue almost full case, avoid cache-line bouncing
○ In queue almost empty case, reduce cache-line bouncing via bulk-enq
Cpumap redirect: CPU scaling
Tricky part getting cross CPU delivery fast-enough
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201832
XDP_REDIRECT
enqueue into
cpumap
Kthread dequeue
Start normal
netstack
App can run on
another CPU via
socket queue
CPU scheduling via cpumap
Queuing and scheduling in cpumap
CPU#2
kthread
CPU#1
NAPI RX
CPU#3
Userspace
sched
Hint: Same CPU sched possible
● But adjust /proc/sys/kernel/sched_wakeup_granularity_ns
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018
Next slides
Recent changes to XDP core
33
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201834
Long standing request: separate BPF programs per RX queue
● This is not likely to happen… because
Solution instead: provide info per RX queue (xdp_rxq_info)
● Info: ingress net_device (Exposed as: ctx->ingress_ifindex)
● Info: ingress RX-queue number (Exposed as: ctx->rx_queue_index)
Thus, NIC level XDP/bpf program can instead filter on rx_queue_index
Recent change: Information per RX-queue
Recent change in: kernel v4.16
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201835
XDP_REDIRECT needs to queue XDP frames e.g. for bulking
● Queuing open-coded for both cpumap and tun-driver
● Generalize/standardize into struct xdp_frame
● Store info in top of XDP frame headroom (reserved)
○ Avoids allocating memory
Recent change: queuing via xdp_frame
Very recent changes: only accepted in net-next (to appear in v4.18)
XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201836
API for how redirected frames are freed or "returned"
● XDP frames are returned to originating RX driver
● Furthermore: this happens per RX-queue level (extended xdp_rxq_info)
This allows driver to implement different memory models per RX-queue
● E.g. needed for AF_XDP zero-copy mode
Recent change: Memory return API
Very recent changes: only accepted in net-next (to appear in v4.18)

Mais conteúdo relacionado

Mais procurados

netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
Kernel TLV
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
hugo lu
 
Dalvik Vm & Jit
Dalvik Vm & JitDalvik Vm & Jit
Dalvik Vm & Jit
Ankit Somani
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 

Mais procurados (20)

Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモDNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
ディープニューラルネットワーク向け拡張可能な高位合成コンパイラの開発
ディープニューラルネットワーク向け拡張可能な高位合成コンパイラの開発ディープニューラルネットワーク向け拡張可能な高位合成コンパイラの開発
ディープニューラルネットワーク向け拡張可能な高位合成コンパイラの開発
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stack
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Dalvik Vm & Jit
Dalvik Vm & JitDalvik Vm & Jit
Dalvik Vm & Jit
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 

Semelhante a Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper Dangaard Brouer

Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Igalia
 

Semelhante a Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper Dangaard Brouer (20)

Transparent eBPF Offload: Playing Nice with the Linux Kernel
Transparent eBPF Offload: Playing Nice with the Linux KernelTransparent eBPF Offload: Playing Nice with the Linux Kernel
Transparent eBPF Offload: Playing Nice with the Linux Kernel
 
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
Snabb, a toolkit for building user-space network functions (ES.NOG 20)Snabb, a toolkit for building user-space network functions (ES.NOG 20)
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
NetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and VerticaNetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and Vertica
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Rlite software-architecture (1)
Rlite software-architecture (1)Rlite software-architecture (1)
Rlite software-architecture (1)
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 

Mais de Anne Nicolas

Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 

Mais de Anne Nicolas (20)

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream first
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops way
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmaker
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDP
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 

Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper Dangaard Brouer

  • 1. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 XDP: a new fast and programmable network layer XDP - eXpress Data Path Jesper Dangaard Brouer Principal Engineer, Red Hat Inc. Kernel Recipes Paris, France, Sep 2018
  • 2. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Learn that: ● Linux got a new fast and programmable network layer ○ And touch upon the basic building blocks ● How this XDP technology got motivated upstream ● Why eBPF is such a perfect match for XDP ● Demystify: Why this XDP technology is so fast! ● How XDP achieves "hidden" RX bulking ● Why XDP-redirect is a novel approach What will you learn? What will you actually get out of listening to this talk?
  • 3. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 XDP is about providing eBPF programmable superpowers to end-users ● But in this talk you will not learn how to code eBPF Motivated and dedicated to: Making eBPF programming more easily consumable ● But NOT in this talk… ● … go watch many of my other talks! :-) ○ Like the tutorial: XDP for the Rest of Us Anti slide: What you will NOT learn We cannot cover everything today...
  • 4. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 20184 XDP basically: New layer in the kernel network stack ● Before allocating the SKB ○ Driver level hook at DMA level ● Means: Competing at the same “layer” as DPDK / netmap ● Super fast, due to ○ Take action/decision earlier (e.g. skip some network layers) ○ No-memory allocations ● Not kernel bypass, data-plane is kept inside kernel ○ via BPF: makes early network stack run-time programmable ○ Cooperates with kernel Single slide intro: What is XDP? Compressed slide about XDP … cover the details later
  • 5. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides What! - New layer in kernel networking?! How did you motivate this upstream!?! 5
  • 6. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Kernel bypass solutions: Like DPDK, netmap, PF_RING ● Show they are 10x faster than Linux kernel netstack ● This is true: FOR THEIR SPECIFIC USE-CASES This lead to WRONG conclusions and quotes: ● Wrong: "Kernel is inherently too slow to handle networking" Got motivated by showing this is WRONG ● Acknowledge competition, XDP was late to the game (2016 initial idea) ○ Netmap 2011 (SIGCOMM), DPDK 2013 (goes open-source?), PF_RING 2010 (earliest ref found) Motivation through competition Competition was helpful to force us to find an upstream kernel alternative
  • 7. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 It is all about the use-case ● Kernel bypass benchmarks show L2/L3 use-cases ● Linux netstack assumes socket delivery use-case Linux network stack was build with socket-delivery use-case in focus ● Naming of container for packet data structure says is all: ○ SKB → sk_buff → (originates from) socket buffer ○ Thus, netstack takes this cost upfront (at driver level alloc SKB) Unfair comparison Comparing apples and bananas
  • 8. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 XDP basically change the picture: Operate at same level ● Introduce layer before allocating the SKB ● Hook at driver level (running eBPF) XDP goals ● Close the performance gap to kernel-bypass solutions ○ Not a goal to be faster ● Provide in-kernel alternative, that is more flexible ○ Don’t steal the entire NIC ● Work in concert with existing network stack XDP design goals Operate at same L2/L3 level, allows for apple to apple comparison
  • 9. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Show me some benchmarks! 9
  • 10. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Comparing dropping packet: DPDK vs. XDP vs Linux ● XDP follow DPDK line, offset is due to driver indirect calls, calculated to 16 nanosec offset ● Linux scales perfectly, conntrack overhead is significant ● HW/driver does PCIe compression to avoid PCIe limitations Benchmark: Drop packet Driver: mlx5, HW: 100Gbit/s ConnectX-5 Ex. CPU: E5-1650v4
  • 11. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Comparing L2 forward packets ● XDP-same-NIC uses XDP_TX, which bounce packet inside NIC driver ● XDP-different-NIC use XDP_REDIRECT, which is more advanced (further optimization possible) ● Shows XDP can be faster than DPDK, in certain cases/modes Benchmark: L2 forwarding Driver: mlx5, HW: 100Gbit/s ConnectX-5 Ex. CPU: E5-1650v4
  • 12. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Sounds cool! Tell me more about: Basic design and building blocks 12
  • 13. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 BPF program return an action or verdict ● XDP_DROP, XDP_PASS, XDP_TX, XDP_ABORTED, XDP_REDIRECT How to cooperate with network stack ● Pop/push or modify headers: Change RX-handler kernel use ○ e.g. handle protocol unknown to running kernel ● Can propagate 32Bytes meta-data from XDP stage to network stack ○ TC (clsbpf) hook can use meta-data, e.g. set SKB mark XDP actions and cooperation What are the basic building blocks I can use? 13
  • 14. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Data-plane: inside kernel, split into: ● Kernel-core: Fabric in charge of moving packets quickly ● In-kernel BPF program: ○ Policy logic decide action ○ Read/write access to packet Control-plane: Userspace ● Userspace load BPF program ● Can control program via changing BPF maps ● Everything goes through bpf system call Design: XDP: data-plane and control-plane Overall design 14
  • 15. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 BPF is sandboxed code running inside kernel (XDP only loaded by root) ● A given kernel BPF hook just define: ○ possible actions and limit helpers (that can lookup or change kernel state) Users get programmable policies (within these limits) ● Userspace "control-plane" API tied to userspace app (not kernel API) ○ likely via modifying a BPF-map ● No longer need a kernel ABI ○ like sysctl/procfs/ioctls etc. Why kernel developers should love BPF How BPF avoids creating a new kernel ABI for every new user-invented policy decision 15
  • 16. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Why XDP_REDIRECT is so interesting?! 16
  • 17. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 XDP got action code XDP_REDIRECT (that drivers must implement) ● In basic form: Redirecting RAW frames out another net_device/ifindex ● Egress driver: implement ndo_xdp_xmit (and ndo_xdp_flush) Performance low without using a map for redirect (single CPU core numbers, ixgbe): ● Using helper: bpf_redirect = 7.5 Mpps ● Using helper: bpf_redirect_map = 13.0 Mpps What is going on? ● Using redirect maps is a HUGE performance boost, why!? XDP action XDP_REDIRECT First lets cover the basics, net_device redirect (... later, more flexibility in maps) Avail since kernel v4.14, but drivers slower to adapt 17
  • 18. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201818 Basic design: Simplify changes needed in drivers ● Name “redirect” is more generic, than “forwarding” First trick: Hide RX bulking from driver code via MAP ● Driver still processes packets one at a time - calling xdp_do_redirect ● End of driver NAPI poll routine “flush” (max 64 packets) - call xdp_do_flush_map ● Thus, bulking via e.g. delaying expensive NIC tailptr/doorbell Second trick: invent new types of redirects easy ● Without changing any driver code! ● Hopefully last XDP action code(?) Novel: redirect using BPF maps Why is it so brilliant to use BPF maps for redirecting?
  • 19. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Gist: Driver XDP RX-processing (NAPI) loop Demonstrate NAPI-poll call BPF prog per packet and end with MAP-flush Basic code needed in Driver for supporting XDP: 1 while (desc_in_rx_ring && budget_left--) { 2 action = bpf_prog_run_xdp(xdp_prog, xdp_buff); 3 /* helper bpf_redirect_map have set map (and index)via this_cpu_ptr */ 4 switch (action) { 5 case XDP_PASS: break; 6 case XDP_TX: res = driver_local_xmit_xdp_ring(adapter, xdp_buff); break; 7 case XDP_REDIRECT: res = xdp_do_redirect(netdev, xdp_buff, xdp_prog); break; 8 default: bpf_warn_invalid_xdp_action(action); /* fallthrough */ 9 case XDP_ABORTED: trace_xdp_exception(netdev, xdp_prog, action); /* fallthrough */ 10 case XDP_DROP: res = DRV_XDP_CONSUMED; break; 11 } /* left out acting on res */ 12 } /* End of NAPI-poll */ 13 xdp_do_flush_map(); /* Bulk chosen by map, can store xdf_frame for flushing */ 14 driver_local_XDP_TX_flush();
  • 20. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201820 The “devmap”: BPF_MAP_TYPE_DEVMAP ● Contains net_devices, userspace adds them via ifindex to map-index The “cpumap”: BPF_MAP_TYPE_CPUMAP ● Allow redirecting RAW xdp frames to remote CPU ○ SKB is created on remote CPU, and normal network stack invoked ● The map-index is the CPU number (the value is queue size) AF_XDP - “xskmap”: BPF_MAP_TYPE_XSKMAP ● Allow redirecting RAW xdp frames into userspace ○ via new Address Family socket type: AF_XDP ○ Very close to ‘netmap’ design, keeping drivers in kernel-space ○ Ask Björn Töpel who implemented it… in audience Redirect map types What kind of redirects are people inventing?!
  • 21. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Spectre V2 killed XDP performance 21
  • 22. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201822 Hey, you killed my XDP performance! (Retpoline tricks for indirect calls) ● Still processing 6 Mpps per CPU core ● But could do approx 13 Mpps before! Initial through it was net_device->ndo_xdp_xmit call ● Implemented redirect bulking, but only helped a little Real pitfall: DMA API use indirect function call pointers ● Christoph Hellwig PoC patch show perf return to approx 10 Mpps Thus, solutions in the pipeline... Performance issue: Spectre (variant 2) CONFIG_RETPOLINE and newer GCC compiler - for stopping Spectre (variant 2) CPU side-channel attacks
  • 23. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Questions asked by NDIV (HAproxy) in: “Challenges migrating from NDIV to XDP” NetDev Conf 0x12 (slides and video) 23
  • 24. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 NDIV callback rx_done() per packet batch, at end of RX loop ● XDP do have this, but part of BPF map abstraction ● For XDP, correspond to: xdp_do_flush_map In theory, NDIV could be implemented with BPF as ● Move match/filter part into BPF prog, that return action XDP_REDIRECT ● New redirect BPF map type for NDIV ○ (Ugly) could do mem alloc for NDIV “out-packet(s)” ○ (Ugly) could invoke ndiv->handle_rx(xdp_frame) ● This map type, can receive event RX-done via xdp_do_flush_map NDIV need callback rx_done() NDIV have callback rx_done() per packet batch, at end of RX loop
  • 25. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 There is no TX hook in XDP, why? ● For packets arriving from netstack ○ makes no-sense performance-wise (too late, SKB already alloc) ● BPF hook for TX is available ○ Use TC (clsbpf) egress BPF-prog hook (takes SKB like your handler) (Disclaimer only idea:) Maybe do XDP bpf_prog invoke if ndo_xdp_xmit() fails ● Use-case: react to XDP-redirect TX overflow ○ E.g. allow new XDP action, like another redirect target (load-balance) XDP hook at TX? Could not find the XDP hook at TX NDIV have ndiv->handle_tx(ndiv, skb)
  • 26. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 End slide … Questions? plus.google.com/+JesperDangaardBrouer linkedin.com/in/brouer youtube.com/channel/UCSypIUCgtI42z63soRMONng facebook.com/brouer twitter.com/JesperBrouer
  • 27. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201827 XDP + BPF combined effort of many people Thanks to all contributors ● Alexei Starovoitov ● Daniel Borkmann ● Brenden Blanco ● Tom Herbert ● John Fastabend ● Martin KaFai Lau ● Jakub Kicinski ● Jason Wang ● Andy Gospodarek ● Thomas Graf ● Michael Chan (bnxt_en) ● Saeed Mahameed (mlx5) ● Tariq Toukan (mlx4) ● Björn Töpel (i40e + AF_XDP) ● Magnus Karlsson (AF_XDP) ● Yuval Mintz (qede) ● Sunil Goutham (thunderx) ● Jason Wang (VM) ● Michael S. Tsirkin (ptr_ring) ● Edward Cree ● Toke Høiland-Jørgensen
  • 28. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Extra slides if time permits... 28
  • 29. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides What is this CPUMAP redirect?
  • 30. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201830 Basic cpumap properties ● Enables redirection of XDP frames to remote CPUs ● Moved SKB allocation outside driver (could help simplify drivers) Scalability and isolation mechanism ● Allows isolating/decouple driver XDP layer from network stack ○ Don't delay XDP by deep call into network stack ● Enables DDoS protection on end-hosts (that run services) ○ XDP fast-enough to avoid packet drops happen in HW NICs Another use-case: Fix NIC-HW RSS/RX-hash broken/uneven CPU distribution ● Proto unknown to HW: e.g. VXLAN and double-tagged VLANs ● Suricata use this feature XDP_REDIRECT + cpumap What is cpumap redirect?
  • 31. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201831 Cpumap architecture: Every slot in array-map: dest-CPU ● MPSC (Multi Producer Single Consumer) model: per dest-CPU ○ Multiple RX-queue CPUs can enqueue to single dest-CPU ● Fast per CPU enqueue store (for now) 8 packets ○ Amortized enqueue cost to shared ptr_ring queue via bulk-enq ● Lockless dequeue, via pinning kthread CPU and disallow ptr_ring resize Important properties from main shared queue ptr_ring (cyclic array based) ● Enqueue+dequeue don't share cache-line for synchronization ○ Synchronization happens based on elements ○ In queue almost full case, avoid cache-line bouncing ○ In queue almost empty case, reduce cache-line bouncing via bulk-enq Cpumap redirect: CPU scaling Tricky part getting cross CPU delivery fast-enough
  • 32. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201832 XDP_REDIRECT enqueue into cpumap Kthread dequeue Start normal netstack App can run on another CPU via socket queue CPU scheduling via cpumap Queuing and scheduling in cpumap CPU#2 kthread CPU#1 NAPI RX CPU#3 Userspace sched Hint: Same CPU sched possible ● But adjust /proc/sys/kernel/sched_wakeup_granularity_ns
  • 33. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 2018 Next slides Recent changes to XDP core 33
  • 34. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201834 Long standing request: separate BPF programs per RX queue ● This is not likely to happen… because Solution instead: provide info per RX queue (xdp_rxq_info) ● Info: ingress net_device (Exposed as: ctx->ingress_ifindex) ● Info: ingress RX-queue number (Exposed as: ctx->rx_queue_index) Thus, NIC level XDP/bpf program can instead filter on rx_queue_index Recent change: Information per RX-queue Recent change in: kernel v4.16
  • 35. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201835 XDP_REDIRECT needs to queue XDP frames e.g. for bulking ● Queuing open-coded for both cpumap and tun-driver ● Generalize/standardize into struct xdp_frame ● Store info in top of XDP frame headroom (reserved) ○ Avoids allocating memory Recent change: queuing via xdp_frame Very recent changes: only accepted in net-next (to appear in v4.18)
  • 36. XDP: new fast and programmable network layer - Kernel Recipes, Paris, Sep 201836 API for how redirected frames are freed or "returned" ● XDP frames are returned to originating RX driver ● Furthermore: this happens per RX-queue level (extended xdp_rxq_info) This allows driver to implement different memory models per RX-queue ● E.g. needed for AF_XDP zero-copy mode Recent change: Memory return API Very recent changes: only accepted in net-next (to appear in v4.18)