SlideShare uma empresa Scribd logo
1 de 42
share, discuss, & ask
1
DPDK What is and is not?
How to port an application?
Where can we run?
ISSUES What are?
Why did it occurred?
TOOLS General
Debug guide
Custom
2
a) Platform:
Difference General Processor, Network Processor
NPU has smaller caches, lower clock speed (around 1GHz)
Specialized ISA (Instruction Set Architecture) for single clock
parsing (eg: IP, MPLS, Vlan etc..)
Specialized schedulers keeping OS and worker threads
separate
b) Processing Architecture:
 HW: dedicate peripheral interrupt processing, reduce TLB
misses,
 SW: schedulers, locks
 Timers: scheduler tick, remove SW watchdog on DP cores
 Inter Processor Communication: LRU drain, rcu_barrier,
paging, per cpu drain.
c) Locking overheads by CPU or Memory:
 IPI between large number of multicore is expensive
 Locking of memory or regions is expensive
 Vmstat_update every sec and updates for virtual memory
d) Pipeline Latency: Hiding for both HW & SW
 Get bulk packets for processing
 Pre-fetch to cache
 Allow bulk lookup for multiple packets in burst
e) Bus Interface: Improve PCIe (NIC) to CPU Cache
 Design Cache to accommodate more frames per core.
Larger L2 or L3.
 Use HW assisted caching to manage incoming packets.
f) Meta Buffer:
 Impossible to hold and access Millions packets per sec CPU
cache or memory. Overhead of latency and difficulty in size.
 Pre parse stage; prepare Meta data to hold essential
headers which can be stored in cache.
 Make use of bytes prefetch in interleaved fashion to hide
latency by pipeline.
Measuring cross socket
bandwidth
CPU Socket
CPU Cores
UPICONTROLLER
1
LLC
PacketRXd
NIC
CPU Socket
CPU Cores
MEMORYCONTROLLER
RAM
1
LLC
PacketRXd
NIC
UPICONTROLLER
4
Part 1: DPDK
What we need to know!
Problem?
6
Problem?
User Buffer
User App
Network Stack
SKB
Driver - generic
_rcv_ISR()
_hard_start_xmit()
RX DMA BUFFER
TX DMA BUFFER
SKB frame
end
len
tail
data
head
Head room
User data
Tail room
SKB shared info
7
Problem?
BPF ACT
XDP_DROP
XDP_TX
XDP_PASS
User App
Network Stack
SKB
Driver - XDP
_rcv_ISR()
_hard_start_xmit()
RX DMA BUFFER
TX DMA BUFFER
MAP for
XDP sock
XDP_REDIRECT
XDP User Buffer
8
9
Application – Packet Life
 Read from NIC
 Check content
 Ensure integrity
 Do lookup / hash
 Identify processing
 Map to queue
 Action per queue/schedule
 Update stats counters
 Send burst to NIC
CPU NIC Programmable NIC
slow
fast
What is not DPDK?
HW support: Huge Page Size, Data Direct I/O, SIMD
Converse for Power or Cycles as required
Allow multi process data sharing without SYSCALL IPC (SHM, sockets,
FIFO)
Either burst or low latency polls
Adapt to small, big or hybrid cases
Prototype and Deploy quickly
HW offload with SW fallback
Runs in User Space (Bypasses Kernel Path)
Library of Functions
What is DPDK?
10
11
Where all we can run DPDK?
Host User Space
Application
DPDK + Ext
NIC
Docker +
Application
NIC
Application
vNIC
DPDK + Ext
VM Guest
NIC
Host User Space
Docker +
Application
vNIC
DPDK + Ext
VM Guest
NIC
NIC User Space
Application
DPDK + Ext
NIC
Host User Space
Application
DPDK + Ext
Docker +
Application
Application
vNIC
DPDK + Ext
VM Guest
Docker +
Application
vNIC
DPDK + Ext
VM Guest
Part 2: Porting Apps to DPDK
Other Applications
13
Network I/O (Multiple 10Gbit/s Interfaces)
Control,
Configuration and
Stats
User Space
Clear Text
Encrypted
Encrypted
RX NIC
Capture Decode Stream Detect Output
Capture Decode Stream Detect Output
RSS HASH
Parse for
metadata
Match for rule
set
Buffer & Zero
Copy
DPDK
PMD
14
MEM-COPY
ZERO-COPY
https://www.youtube.com/watch?v=rsr_eIDCm8M
15
16
1000
499
1000
826
382
251
1000
416
1000
475
1000
825
382
213
1000
472
0 200 400 600 800 1000 1200
DPDK
AF-Workers
DPDK
AF-Workers
Byte64Byte1500
Byte 64 Byte 1500
DPDK AF-Workers DPDK AF-Workers
P2 TX 382 213 1000 472
P2 RX 1000 475 1000 825
P1 TX 382 251 1000 416
P1 RX 1000 499 1000 826
P2 TX P2 RX P1 TX P1 RX
14.9
8.5
10.2
10.8
14.9
7.9
9.8
10.5
14.8
6.9
8.9
9.7
0
2
4
6
8
10
12
14
16
igb_uio xdp_memcpy xdp_zc xdp_zc_ (no
offload)
rx drop tx drop rx-tx (l2fwd)
1600 950 625
8000
2400 2970
30000
12000
7500
0
5000
10000
15000
20000
25000
30000
35000
1024 2048 4096
CONNECTION/SEC
KEY SIZE
Linked List Array Hash Array
Issues
Feedback: Works partial with
worse throughput
Overview
18
70%
20%
10%
Interaction with teams for
debug, live terminal,
reproducing steps
Let’s think and
Identify where issue is
in Application, DPDK,
OVS, Platform, Kernel
Bottleneck Analysis
mismatch in
packet rates
(received <
desired)?
does RX
lcore threads
gets enough
cycles?
packet drops
at receive or
transmit?
packet or
object
processing
rate in the
pipeline?
user
functions
performance
is not as
expected?
execution
cycles for
dynamic
service
functions are
not
frequent?
Is the packet
not in the
unexpected
format?
19
20
Why are there various drops?
Stress & Regress
pkt-gen
trex
Generic tools
lstopo
dmidecode
libunwind
dpdk apps
proc-info
pdump
Isolate
Debug guide
numa
huge page
pinning
Characterize &
Quantize
perf top
Perf stats
vtune
Custom tools
malloc scanner
Memzone
monitor
Thread Stack
Tracer
Part 3: Tips
for quick
debug
22
lstopo --pid 2 --fontsize 15 --gridsize 12 --no-collapse
Somewhat Helpful!
23
Hardware related items
• NIC details, configurations, firmware version via Linux
• PCIe capability and current configurations
• PCIe advertised speed and configurations.
• SFP and SFP+ details fetch
 lshw -c network –businfo
 lshw -c network | egrep 'firmware|pci@‘
 Ethool –m | -k | -P | -S
• CPU flags and feature get
• Lscpu
• Cat /proc/cpuinfo
HW performance counters (user perf and vtune on IA)
Helpful!
Linux Signals
24
SIGHUP 1 Term Hangup detected on controlling terminal or death of
controlling process
SIGINT 2 Term Interrupt from keyboard
SIGQUIT 3 Core Quit from keyboard
SIGILL 4 Core Illegal Instruction
SIGABRT 6 Core Abort signal from abort(3)
SIGKILL 9 Term Kill signal
SIGSEGV 11 Core Invalid memory reference
SIGTERM 15 Term Termination signal
SIGSTOP 17,19,23 Stop Stop process
SIGTSTP 18,20,24 Stop Stop typed at terminal
SIGBUS 10,7,10 Core Bus error (bad memory access)
SIGFPE 8 Core Floating point exception
SIGPIPE 13 Term Broken pipe: write to pipe with no readers
SIGALRM 14 Term Timer signal from alarm(2)
SIGUSR1 30,10,16 Term User-defined signal 1
SIGUSR2 31,12,17 Term User-defined signal 2
SIGCHLD 20,17,18 Ign Child stopped or terminated
SIGCONT 19,18,25 Cont Continue if stopped
SIGTTIN 21,21,26 Stop Terminal input for background process
SIGTTOU 22,22,27 Stop Terminal output for background process
The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
Next the signals not in the POSIX.1-1990 standard but described in SUSv2 and POSIX.1-2001.
Signal Value Action Comment
SIGPOLL Term Pollable event (Sys V). Synonym for SIGIO
SIGPROF 27,27,29 Term Profiling timer expired
SIGSYS 12,31,12 Core Bad argument to routine (SVr4)
SIGTRAP 5 Core Trace/breakpoint trap
SIGURG 16,23,21 Ign Urgent condition on socket (4.2BSD)
SIGVTALRM 26,26,28 Term Virtual alarm clock (4.2BSD)
SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD)
SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD)
Not Sure!
STRACE
25
strace -e trace=open,read <executable>
strace -t -e open <Executable>
strace -r -e open <exdcutable>
strace -c <executbale>
strace -i <executable>
strace -T -e read <executable>
strace -e trace=network|signal|memory <executable>
strace userspace utility for Linux helps to diagnose, debug and instructional by monitoring system calls and signal. The operation of
strace is made possible by the kernel feature known as ptrace.
Specifying a list of paths to be traced (-P /etc/ld.so.cache, for example).
Modifying return and error code of the specified syscalls, and inject signals upon their execution (since strace 4.15, -e inject=
option).
Extracting information about file descriptors (including sockets, -y option).
Not Helpful!
objdump
26
File header: -f
File format: -p
Section header: -h
All headers: -x
Executable sections: -d
Assembler sections: -D
Full contents: -s
Debug: -g
Symbol table: -t
Dynamic Symbol table: -T
Dynamic Relocation: -R
Function content via name: -s -j.rodata, -D --prefix-addresses
readelf --relocs
Somewhat Helpful!
nm <executable>
27
t|T – The symbol is present in the .text code section
b|B – The symbol is in UN-initialized .data section
D|d – The symbol is in Initialized .data section.
nm -A ./*.o
nm -u undefined symbols
nm -n symbol
nm -S symbol wth size
nm -D dynamic symbol
A : Global absolute symbol.
a : Local absolute symbol.
B : Global bss symbol.
b : Local bss symbol.
D : Global data symbol.
d : Local data symbol.
f : Source file name symbol.
L : Global thread-local symbol (TLS).
l : Static thread-local symbol (TLS).
T : Global text symbol.
t : Local text symbol.
U : Undefined symbol.
Somewhat Helpful!
CPU utilization
28
{
char *stat_param[5] = {"utime", "stime", "cutime", "cstime", "starttime"};
char *stat_result[5] = {0};
struct sysinfo info = {0};
fprintf(stdout, "Process to fetch stat: %sn", argv[1]);
if (sysinfo(&info) == 0) {
fprintf(stdout, "sysinfo n");
sprintf(buf, "cat /proc/%s/stat | awk '{print $14 "," $15 "," $16 "," $17 "," $22}'", argv[1]);
fp = popen(buf, "r");
if (fp) {
char *parse = fgets (buf, 999, fp);
char *p = strtok (parse, ",");
res = 0;
while (p) {
stat_result[res++] = p;
p = strtok (NULL, ",");
}
fprintf(stdout, " --- Calculation --- n");
unsigned long int hertz = sysconf(_SC_CLK_TCK);
unsigned long int total_time = atol(stat_result[0]) + atol(stat_result[1]) + atol(stat_result[2]) + atol(stat_result[3]);
unsigned long int sec = info.uptime - (atol(stat_result[4])/ hertz);
unsigned long int cpu_usage = (100 * total_time) / (sec *hertz);
fprintf(stdout, "cpu_usgae (%lu) for process (%s)n", cpu_usage, argv[1]);
}
return cpu_usage;
}
Not Helpful!
GDB
29
call actual library functions or even functions from within the debugged
program using the command call
start GDB with gdbtui or gdb -tui. Switch using 'layout src|asm|regs'
shell allows you to execute commands in the shell
print, examine and display
info file - Entry point
set disassembly-flavor intel
set print pretty
set print addr off
set print array
set print array on
set print array off
display next 5 instructions - x/5i $pc
disassemble <function name>
.gdbinit
file exe
break *0x400710
set disassembly-flavor intel
layout asm
layout regs
run argument1 argument2
we can use set so do the magic for us. Let's first inspect the instruction
bytes:
(gdb) x/10b $pc
(gdb) set write
(gdb) set {unsigned int}$pc = 0x90909090
(gdb) set {unsigned char}($pc+4) = 0x90
(gdb) set write off
(gdb) x/10i $pc
x/6i $pc
=> 0x40911f: nop
0x409120: nop
0x409121: nop
0x409122: nop
0x409123: nop
0x409124: push rbp
set {unsigned int}0x40911f = 0x90909090
{unsigned char}0x409123 = 0x9
set $pc+=5
jump *$pc+5
Somewhat Helpful!
30
DPDK packet processing using Direct Data
I/O
1. Core writes RXd preparing for receiving packet
2. NIC reads RXd to get buffer address
3. NIC writes packet
4. NIC writes RXd
5. Core reads RXd (polling)
6. Core reads packet and performs some action
CPU Socket
CPU Cores
MEMORYCONTROLLER
RAM
1
LLC
Packet
1
5
2
RXd
NIC
4 3
6
Easy!
Resource Director
31
Cache Monitoring Tech (CMT)
 Per-thread L3 Occupancy Monitoring
LPHP
Memory Bandwidth Allocation
 Per-thread Bandwidth Control
 New on Purley
IMCCORE CREDITS
Memory BW Monitoring (MBM)
 Per-thread Memory Bandwidth Monitoring
IMC?
Cache Allocation Tech (CAT)
 Per-thread L3 Occupancy Control
New Code/Data Prioritization (CDP) extension
Cache LPHP
Monitoring Allocation
MemoryCache
Somewhat Easy!
Packet generator
Multi thread PDUMP
32
DPDK-0 DPDK-1 DPDK-1
librte_pdump
Primary Application
DPDK-PDUMP
pkts-1.pcap pkts-1.pcap
Easy!
PROCINFO
33
Easy!
Stack, register, variable trace for all threads
34
When to use: an unexpected signal or crash occurs
What to do: dump all threads stack and register information in an
environment where GDB is not present or not run.
Where it works:
 Binary are stripped.
 Binary and Application have no debug symbols.
 Rare cases & combinations when faults occurs.
 Errors or faults difficult to reproduce.
 There are no access to GDB or remote GDB, ptrace or pstack-dump.
 Inspect stack for each thread.
 Inspect & dump global and debug variables.
 DPDK when secondary causes primary to segfault. Running GDB for primary
causes Secondary to segfault.
Q & A:
 Does this work for all shared library? Yes
 Does this work mixed libraries static and shared? Yes
 Does this work for all stripped libraries? Yes
 Can we register SIGUSER1 to dump intermediate? Yes
How to make it work:
Build:
 LIB: libunwind-dev
 CFLAGS: -DDUMPSTACK_EXTRAREG -DDUMPSTACK_EXTRASTACK -DDUMPSTACK -
L/usr/lib/x86_64-linux-gnu/ -lunwind
 LDFLAGS: -L/usr/lib/x86_64-linux-gnu/ -lunwind
Application Code Modify: add signal handler to call custom signal
handler
Somewhat Easy!
trace
stack
35
----------------- THREAD NAME BEGIN -----------------
/proc/41253/task/41248/comm
/proc/41253/task/41249/comm
/proc/41253/task/41250/comm
/proc/41253/task/41251/comm
/proc/41253/task/41252/comm
/proc/41253/task/41253/comm
l2fwd
eal-intr-thread
lcore-slave-3
lcore-slave-4
lcore-slave-5
pdump-thread
----------------- THREAD NAME DONE -----------------
DPDK Version 0x11080010
Config: msater 2 lcore count 4 process 0
rte_sys_gettid 41253
Memzone Monitor
36
Lookup
Table
Direct
Table
Counters
PRIMARY PROCESS
Lookup
Table
Direct
Table
Counters
SECONDARY PROCESS
MMAP Huge Pages
When to use: Memory layout is shared across multiple process, this can
lead to Unintended changes within the same process unintended changes
from multi process Application logic or function pointers modifying
unintended areas
What to do: dump all threads stack and register information in an
environment where GDB is not present or not run.
Where it works:
 Control and Data Plane are in same or different process
 Tables are close by.
 Table entries are malloced dynamically.
 Isolate the table or counter where the change is occurring
 Can monitor multiple tables.
 Program error
 Key or values are read without const.
 Values are modified using PTR athematic.
Tables with Lookup, Lookup + Result, Lookup + Result + Counters, Counters or Index to
Counters, Reference to Lookup, and Lookup + Result and Lookup + Result + Counter
Q & A:
 Does this work for all shared library? Yes
 Does this work mixed libraries static and shared? Yes
 Does this work for all stripped libraries? Yes
 Can we register SIGUSER1 to dump intermediate? Yes
How it works: Works as secondary application, which periodically
monitor selected tables or memory region. Reports back the
offset where the change has occurred.
Build:
 LIB: libunwind-dev
 CFLAGS: -DDUMPSTACK_EXTRAREG -DDUMPSTACK_EXTRASTACK -DDUMPSTACK -
L/usr/lib/x86_64-linux-gnu/ -lunwind
 LDFLAGS: -L/usr/lib/x86_64-linux-gnu/ -lunwind
Application Code Modify: add signal handler to call custom signal
handler
Somewhat Easy!
Memzone Monitor
37
38
1
2
3
4
5
39
MALLOC-FREE Scanner
40
When to use: Quick and dirty valgrind like report tool
What to do:
 For every successful malloc, calloc, zalloc create a container to hold - name, pointer and size.
 For every free of alloced entry, remove the container.
How to works:
 create ‘struct rte_fbarray´ with ‘rte_memzone_reserve'
 In Primary process we ‘rte_fbarray_init’
 In secondary we ‘rte_fbarray_attach’
 In primary process for each alloc retrieve container ‘rte_fbarray_find_next_free’.
 For each successful alloc we mark with ‘rte_fbarray_set_used’
 For each free we ‘rte_fbarray_set_free’
 In secondary fetch the details back by ‘rte_fbarray_find_next_used,|rte_fbarray_find_next_n_used’
Where it works:
 rte_malloc, rte_calloc and rte_zalloc does not map alloc region name to address.
 This makes it difficult to track the usage on dynamically allocates instance.
Easy!
Seg - 0
Seg - 1
Seg - 2
Seg - n
Memzone-
container
Alloc-1
Alloc-2
Alloc-3
rte_fbarrary_attach
Dynamic DEBUG with eBPF (user-space)
41
Looku
p
Table
Count
ers
API:
I. Application
Specific
II. DPDK
eBPF functions
for Debug API
When to use: for dynamic debug
What to do: load eBPF to existing applications
How it works: same as user space eBPF
Where:
1. Applications in field
2. Recompile not possible
3. Compiler MACROs not possible
42
# llvm-objdump -S t3.o
t3.o: file format ELF64-BPF
Disassembly of section .text:
entry:
0: bf 12 00 00 00 00 00 00 r2 = r1
1: 69 21 10 00 00 00 00 00 r1 = *(u16 *)(r2 + 16)
2: 79 23 00 00 00 00 00 00 r3 = *(u64 *)(r2 + 0)
3: 0f 13 00 00 00 00 00 00 r3 += r1
4: 69 31 0c 00 00 00 00 00 r1 = *(u16 *)(r3 + 12)
5: 15 01 01 00 08 06 00 00 if r1 == 1544 goto +1 <LBB0_2>
6: 55 01 05 00 08 00 00 00 if r1 != 8 goto +5 <LBB0_3>
LBB0_2:
7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
9: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0)
10: b7 03 00 00 40 00 00 00 r3 = 64
11: 85 10 00 00 ff ff ff ff call -1
LBB0_3:
12: b7 00 00 00 01 00 00 00 r0 = 1
13: 95 00 00 00 00 00 00 00 exit

Mais conteúdo relacionado

Mais procurados

DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDPDaniel T. Lee
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network InterfacesKernel TLV
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdkVipin Varghese
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hwvideos
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stackAnne Nicolas
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit) DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit) ymtech
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 

Mais procurados (20)

DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
Poll mode driver integration into dpdk
Poll mode driver integration into dpdkPoll mode driver integration into dpdk
Poll mode driver integration into dpdk
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stack
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit) DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit)
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 

Semelhante a Dpdk applications

NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
DPDK layer for porting IPS-IDS
DPDK layer for porting IPS-IDSDPDK layer for porting IPS-IDS
DPDK layer for porting IPS-IDSVipin Varghese
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
2007 Tidc India Profiling
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profilingdanrinkes
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
NIOS II Processor.ppt
NIOS II Processor.pptNIOS II Processor.ppt
NIOS II Processor.pptAtef46
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performanceRicky Zhu
 
23_Advanced_Processors controller system
23_Advanced_Processors controller system23_Advanced_Processors controller system
23_Advanced_Processors controller systemstellan7
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsHannes Tschofenig
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 

Semelhante a Dpdk applications (20)

NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Lec02
Lec02Lec02
Lec02
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
DPDK layer for porting IPS-IDS
DPDK layer for porting IPS-IDSDPDK layer for porting IPS-IDS
DPDK layer for porting IPS-IDS
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
2007 Tidc India Profiling
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profiling
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
NIOS II Processor.ppt
NIOS II Processor.pptNIOS II Processor.ppt
NIOS II Processor.ppt
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
23_Advanced_Processors controller system
23_Advanced_Processors controller system23_Advanced_Processors controller system
23_Advanced_Processors controller system
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M Processors
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 

Mais de Vipin Varghese

Dpdk – IoT packet analyzer
Dpdk – IoT packet analyzerDpdk – IoT packet analyzer
Dpdk – IoT packet analyzerVipin Varghese
 
Dpdk frame pipeline for ips ids suricata
Dpdk frame pipeline for ips ids suricataDpdk frame pipeline for ips ids suricata
Dpdk frame pipeline for ips ids suricataVipin Varghese
 
Optimizations for ssl tls certificate lookup
Optimizations for ssl tls certificate lookupOptimizations for ssl tls certificate lookup
Optimizations for ssl tls certificate lookupVipin Varghese
 
Optimizations for ssl tls certificate caching on multicore
Optimizations for ssl tls certificate caching on multicoreOptimizations for ssl tls certificate caching on multicore
Optimizations for ssl tls certificate caching on multicoreVipin Varghese
 
Fast i pv4 lookup using local memory
Fast i pv4 lookup using local memoryFast i pv4 lookup using local memory
Fast i pv4 lookup using local memoryVipin Varghese
 

Mais de Vipin Varghese (8)

Dynamic user trace
Dynamic user traceDynamic user trace
Dynamic user trace
 
Debug generic process
Debug generic processDebug generic process
Debug generic process
 
Dpdk – IoT packet analyzer
Dpdk – IoT packet analyzerDpdk – IoT packet analyzer
Dpdk – IoT packet analyzer
 
Mmap failure analysis
Mmap failure analysisMmap failure analysis
Mmap failure analysis
 
Dpdk frame pipeline for ips ids suricata
Dpdk frame pipeline for ips ids suricataDpdk frame pipeline for ips ids suricata
Dpdk frame pipeline for ips ids suricata
 
Optimizations for ssl tls certificate lookup
Optimizations for ssl tls certificate lookupOptimizations for ssl tls certificate lookup
Optimizations for ssl tls certificate lookup
 
Optimizations for ssl tls certificate caching on multicore
Optimizations for ssl tls certificate caching on multicoreOptimizations for ssl tls certificate caching on multicore
Optimizations for ssl tls certificate caching on multicore
 
Fast i pv4 lookup using local memory
Fast i pv4 lookup using local memoryFast i pv4 lookup using local memory
Fast i pv4 lookup using local memory
 

Último

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Dpdk applications

  • 1. share, discuss, & ask 1 DPDK What is and is not? How to port an application? Where can we run? ISSUES What are? Why did it occurred? TOOLS General Debug guide Custom
  • 2. 2 a) Platform: Difference General Processor, Network Processor NPU has smaller caches, lower clock speed (around 1GHz) Specialized ISA (Instruction Set Architecture) for single clock parsing (eg: IP, MPLS, Vlan etc..) Specialized schedulers keeping OS and worker threads separate b) Processing Architecture:  HW: dedicate peripheral interrupt processing, reduce TLB misses,  SW: schedulers, locks  Timers: scheduler tick, remove SW watchdog on DP cores  Inter Processor Communication: LRU drain, rcu_barrier, paging, per cpu drain. c) Locking overheads by CPU or Memory:  IPI between large number of multicore is expensive  Locking of memory or regions is expensive  Vmstat_update every sec and updates for virtual memory d) Pipeline Latency: Hiding for both HW & SW  Get bulk packets for processing  Pre-fetch to cache  Allow bulk lookup for multiple packets in burst e) Bus Interface: Improve PCIe (NIC) to CPU Cache  Design Cache to accommodate more frames per core. Larger L2 or L3.  Use HW assisted caching to manage incoming packets. f) Meta Buffer:  Impossible to hold and access Millions packets per sec CPU cache or memory. Overhead of latency and difficulty in size.  Pre parse stage; prepare Meta data to hold essential headers which can be stored in cache.  Make use of bytes prefetch in interleaved fashion to hide latency by pipeline.
  • 3. Measuring cross socket bandwidth CPU Socket CPU Cores UPICONTROLLER 1 LLC PacketRXd NIC CPU Socket CPU Cores MEMORYCONTROLLER RAM 1 LLC PacketRXd NIC UPICONTROLLER
  • 4. 4
  • 5. Part 1: DPDK What we need to know!
  • 7. Problem? User Buffer User App Network Stack SKB Driver - generic _rcv_ISR() _hard_start_xmit() RX DMA BUFFER TX DMA BUFFER SKB frame end len tail data head Head room User data Tail room SKB shared info 7
  • 8. Problem? BPF ACT XDP_DROP XDP_TX XDP_PASS User App Network Stack SKB Driver - XDP _rcv_ISR() _hard_start_xmit() RX DMA BUFFER TX DMA BUFFER MAP for XDP sock XDP_REDIRECT XDP User Buffer 8
  • 9. 9 Application – Packet Life  Read from NIC  Check content  Ensure integrity  Do lookup / hash  Identify processing  Map to queue  Action per queue/schedule  Update stats counters  Send burst to NIC CPU NIC Programmable NIC slow fast
  • 10. What is not DPDK? HW support: Huge Page Size, Data Direct I/O, SIMD Converse for Power or Cycles as required Allow multi process data sharing without SYSCALL IPC (SHM, sockets, FIFO) Either burst or low latency polls Adapt to small, big or hybrid cases Prototype and Deploy quickly HW offload with SW fallback Runs in User Space (Bypasses Kernel Path) Library of Functions What is DPDK? 10
  • 11. 11 Where all we can run DPDK? Host User Space Application DPDK + Ext NIC Docker + Application NIC Application vNIC DPDK + Ext VM Guest NIC Host User Space Docker + Application vNIC DPDK + Ext VM Guest NIC NIC User Space Application DPDK + Ext NIC Host User Space Application DPDK + Ext Docker + Application Application vNIC DPDK + Ext VM Guest Docker + Application vNIC DPDK + Ext VM Guest
  • 12. Part 2: Porting Apps to DPDK
  • 13. Other Applications 13 Network I/O (Multiple 10Gbit/s Interfaces) Control, Configuration and Stats User Space Clear Text Encrypted Encrypted RX NIC Capture Decode Stream Detect Output Capture Decode Stream Detect Output RSS HASH Parse for metadata Match for rule set Buffer & Zero Copy DPDK
  • 15. 15
  • 16. 16 1000 499 1000 826 382 251 1000 416 1000 475 1000 825 382 213 1000 472 0 200 400 600 800 1000 1200 DPDK AF-Workers DPDK AF-Workers Byte64Byte1500 Byte 64 Byte 1500 DPDK AF-Workers DPDK AF-Workers P2 TX 382 213 1000 472 P2 RX 1000 475 1000 825 P1 TX 382 251 1000 416 P1 RX 1000 499 1000 826 P2 TX P2 RX P1 TX P1 RX 14.9 8.5 10.2 10.8 14.9 7.9 9.8 10.5 14.8 6.9 8.9 9.7 0 2 4 6 8 10 12 14 16 igb_uio xdp_memcpy xdp_zc xdp_zc_ (no offload) rx drop tx drop rx-tx (l2fwd) 1600 950 625 8000 2400 2970 30000 12000 7500 0 5000 10000 15000 20000 25000 30000 35000 1024 2048 4096 CONNECTION/SEC KEY SIZE Linked List Array Hash Array
  • 17. Issues Feedback: Works partial with worse throughput
  • 18. Overview 18 70% 20% 10% Interaction with teams for debug, live terminal, reproducing steps Let’s think and Identify where issue is in Application, DPDK, OVS, Platform, Kernel
  • 19. Bottleneck Analysis mismatch in packet rates (received < desired)? does RX lcore threads gets enough cycles? packet drops at receive or transmit? packet or object processing rate in the pipeline? user functions performance is not as expected? execution cycles for dynamic service functions are not frequent? Is the packet not in the unexpected format? 19
  • 20. 20 Why are there various drops? Stress & Regress pkt-gen trex Generic tools lstopo dmidecode libunwind dpdk apps proc-info pdump Isolate Debug guide numa huge page pinning Characterize & Quantize perf top Perf stats vtune Custom tools malloc scanner Memzone monitor Thread Stack Tracer
  • 21. Part 3: Tips for quick debug
  • 22. 22 lstopo --pid 2 --fontsize 15 --gridsize 12 --no-collapse Somewhat Helpful!
  • 23. 23 Hardware related items • NIC details, configurations, firmware version via Linux • PCIe capability and current configurations • PCIe advertised speed and configurations. • SFP and SFP+ details fetch  lshw -c network –businfo  lshw -c network | egrep 'firmware|pci@‘  Ethool –m | -k | -P | -S • CPU flags and feature get • Lscpu • Cat /proc/cpuinfo HW performance counters (user perf and vtune on IA) Helpful!
  • 24. Linux Signals 24 SIGHUP 1 Term Hangup detected on controlling terminal or death of controlling process SIGINT 2 Term Interrupt from keyboard SIGQUIT 3 Core Quit from keyboard SIGILL 4 Core Illegal Instruction SIGABRT 6 Core Abort signal from abort(3) SIGKILL 9 Term Kill signal SIGSEGV 11 Core Invalid memory reference SIGTERM 15 Term Termination signal SIGSTOP 17,19,23 Stop Stop process SIGTSTP 18,20,24 Stop Stop typed at terminal SIGBUS 10,7,10 Core Bus error (bad memory access) SIGFPE 8 Core Floating point exception SIGPIPE 13 Term Broken pipe: write to pipe with no readers SIGALRM 14 Term Timer signal from alarm(2) SIGUSR1 30,10,16 Term User-defined signal 1 SIGUSR2 31,12,17 Term User-defined signal 2 SIGCHLD 20,17,18 Ign Child stopped or terminated SIGCONT 19,18,25 Cont Continue if stopped SIGTTIN 21,21,26 Stop Terminal input for background process SIGTTOU 22,22,27 Stop Terminal output for background process The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored. Next the signals not in the POSIX.1-1990 standard but described in SUSv2 and POSIX.1-2001. Signal Value Action Comment SIGPOLL Term Pollable event (Sys V). Synonym for SIGIO SIGPROF 27,27,29 Term Profiling timer expired SIGSYS 12,31,12 Core Bad argument to routine (SVr4) SIGTRAP 5 Core Trace/breakpoint trap SIGURG 16,23,21 Ign Urgent condition on socket (4.2BSD) SIGVTALRM 26,26,28 Term Virtual alarm clock (4.2BSD) SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD) SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD) Not Sure!
  • 25. STRACE 25 strace -e trace=open,read <executable> strace -t -e open <Executable> strace -r -e open <exdcutable> strace -c <executbale> strace -i <executable> strace -T -e read <executable> strace -e trace=network|signal|memory <executable> strace userspace utility for Linux helps to diagnose, debug and instructional by monitoring system calls and signal. The operation of strace is made possible by the kernel feature known as ptrace. Specifying a list of paths to be traced (-P /etc/ld.so.cache, for example). Modifying return and error code of the specified syscalls, and inject signals upon their execution (since strace 4.15, -e inject= option). Extracting information about file descriptors (including sockets, -y option). Not Helpful!
  • 26. objdump 26 File header: -f File format: -p Section header: -h All headers: -x Executable sections: -d Assembler sections: -D Full contents: -s Debug: -g Symbol table: -t Dynamic Symbol table: -T Dynamic Relocation: -R Function content via name: -s -j.rodata, -D --prefix-addresses readelf --relocs Somewhat Helpful!
  • 27. nm <executable> 27 t|T – The symbol is present in the .text code section b|B – The symbol is in UN-initialized .data section D|d – The symbol is in Initialized .data section. nm -A ./*.o nm -u undefined symbols nm -n symbol nm -S symbol wth size nm -D dynamic symbol A : Global absolute symbol. a : Local absolute symbol. B : Global bss symbol. b : Local bss symbol. D : Global data symbol. d : Local data symbol. f : Source file name symbol. L : Global thread-local symbol (TLS). l : Static thread-local symbol (TLS). T : Global text symbol. t : Local text symbol. U : Undefined symbol. Somewhat Helpful!
  • 28. CPU utilization 28 { char *stat_param[5] = {"utime", "stime", "cutime", "cstime", "starttime"}; char *stat_result[5] = {0}; struct sysinfo info = {0}; fprintf(stdout, "Process to fetch stat: %sn", argv[1]); if (sysinfo(&info) == 0) { fprintf(stdout, "sysinfo n"); sprintf(buf, "cat /proc/%s/stat | awk '{print $14 "," $15 "," $16 "," $17 "," $22}'", argv[1]); fp = popen(buf, "r"); if (fp) { char *parse = fgets (buf, 999, fp); char *p = strtok (parse, ","); res = 0; while (p) { stat_result[res++] = p; p = strtok (NULL, ","); } fprintf(stdout, " --- Calculation --- n"); unsigned long int hertz = sysconf(_SC_CLK_TCK); unsigned long int total_time = atol(stat_result[0]) + atol(stat_result[1]) + atol(stat_result[2]) + atol(stat_result[3]); unsigned long int sec = info.uptime - (atol(stat_result[4])/ hertz); unsigned long int cpu_usage = (100 * total_time) / (sec *hertz); fprintf(stdout, "cpu_usgae (%lu) for process (%s)n", cpu_usage, argv[1]); } return cpu_usage; } Not Helpful!
  • 29. GDB 29 call actual library functions or even functions from within the debugged program using the command call start GDB with gdbtui or gdb -tui. Switch using 'layout src|asm|regs' shell allows you to execute commands in the shell print, examine and display info file - Entry point set disassembly-flavor intel set print pretty set print addr off set print array set print array on set print array off display next 5 instructions - x/5i $pc disassemble <function name> .gdbinit file exe break *0x400710 set disassembly-flavor intel layout asm layout regs run argument1 argument2 we can use set so do the magic for us. Let's first inspect the instruction bytes: (gdb) x/10b $pc (gdb) set write (gdb) set {unsigned int}$pc = 0x90909090 (gdb) set {unsigned char}($pc+4) = 0x90 (gdb) set write off (gdb) x/10i $pc x/6i $pc => 0x40911f: nop 0x409120: nop 0x409121: nop 0x409122: nop 0x409123: nop 0x409124: push rbp set {unsigned int}0x40911f = 0x90909090 {unsigned char}0x409123 = 0x9 set $pc+=5 jump *$pc+5 Somewhat Helpful!
  • 30. 30 DPDK packet processing using Direct Data I/O 1. Core writes RXd preparing for receiving packet 2. NIC reads RXd to get buffer address 3. NIC writes packet 4. NIC writes RXd 5. Core reads RXd (polling) 6. Core reads packet and performs some action CPU Socket CPU Cores MEMORYCONTROLLER RAM 1 LLC Packet 1 5 2 RXd NIC 4 3 6 Easy!
  • 31. Resource Director 31 Cache Monitoring Tech (CMT)  Per-thread L3 Occupancy Monitoring LPHP Memory Bandwidth Allocation  Per-thread Bandwidth Control  New on Purley IMCCORE CREDITS Memory BW Monitoring (MBM)  Per-thread Memory Bandwidth Monitoring IMC? Cache Allocation Tech (CAT)  Per-thread L3 Occupancy Control New Code/Data Prioritization (CDP) extension Cache LPHP Monitoring Allocation MemoryCache Somewhat Easy!
  • 32. Packet generator Multi thread PDUMP 32 DPDK-0 DPDK-1 DPDK-1 librte_pdump Primary Application DPDK-PDUMP pkts-1.pcap pkts-1.pcap Easy!
  • 34. Stack, register, variable trace for all threads 34 When to use: an unexpected signal or crash occurs What to do: dump all threads stack and register information in an environment where GDB is not present or not run. Where it works:  Binary are stripped.  Binary and Application have no debug symbols.  Rare cases & combinations when faults occurs.  Errors or faults difficult to reproduce.  There are no access to GDB or remote GDB, ptrace or pstack-dump.  Inspect stack for each thread.  Inspect & dump global and debug variables.  DPDK when secondary causes primary to segfault. Running GDB for primary causes Secondary to segfault. Q & A:  Does this work for all shared library? Yes  Does this work mixed libraries static and shared? Yes  Does this work for all stripped libraries? Yes  Can we register SIGUSER1 to dump intermediate? Yes How to make it work: Build:  LIB: libunwind-dev  CFLAGS: -DDUMPSTACK_EXTRAREG -DDUMPSTACK_EXTRASTACK -DDUMPSTACK - L/usr/lib/x86_64-linux-gnu/ -lunwind  LDFLAGS: -L/usr/lib/x86_64-linux-gnu/ -lunwind Application Code Modify: add signal handler to call custom signal handler Somewhat Easy!
  • 35. trace stack 35 ----------------- THREAD NAME BEGIN ----------------- /proc/41253/task/41248/comm /proc/41253/task/41249/comm /proc/41253/task/41250/comm /proc/41253/task/41251/comm /proc/41253/task/41252/comm /proc/41253/task/41253/comm l2fwd eal-intr-thread lcore-slave-3 lcore-slave-4 lcore-slave-5 pdump-thread ----------------- THREAD NAME DONE ----------------- DPDK Version 0x11080010 Config: msater 2 lcore count 4 process 0 rte_sys_gettid 41253
  • 37. When to use: Memory layout is shared across multiple process, this can lead to Unintended changes within the same process unintended changes from multi process Application logic or function pointers modifying unintended areas What to do: dump all threads stack and register information in an environment where GDB is not present or not run. Where it works:  Control and Data Plane are in same or different process  Tables are close by.  Table entries are malloced dynamically.  Isolate the table or counter where the change is occurring  Can monitor multiple tables.  Program error  Key or values are read without const.  Values are modified using PTR athematic. Tables with Lookup, Lookup + Result, Lookup + Result + Counters, Counters or Index to Counters, Reference to Lookup, and Lookup + Result and Lookup + Result + Counter Q & A:  Does this work for all shared library? Yes  Does this work mixed libraries static and shared? Yes  Does this work for all stripped libraries? Yes  Can we register SIGUSER1 to dump intermediate? Yes How it works: Works as secondary application, which periodically monitor selected tables or memory region. Reports back the offset where the change has occurred. Build:  LIB: libunwind-dev  CFLAGS: -DDUMPSTACK_EXTRAREG -DDUMPSTACK_EXTRASTACK -DDUMPSTACK - L/usr/lib/x86_64-linux-gnu/ -lunwind  LDFLAGS: -L/usr/lib/x86_64-linux-gnu/ -lunwind Application Code Modify: add signal handler to call custom signal handler Somewhat Easy! Memzone Monitor 37
  • 39. 39
  • 40. MALLOC-FREE Scanner 40 When to use: Quick and dirty valgrind like report tool What to do:  For every successful malloc, calloc, zalloc create a container to hold - name, pointer and size.  For every free of alloced entry, remove the container. How to works:  create ‘struct rte_fbarray´ with ‘rte_memzone_reserve'  In Primary process we ‘rte_fbarray_init’  In secondary we ‘rte_fbarray_attach’  In primary process for each alloc retrieve container ‘rte_fbarray_find_next_free’.  For each successful alloc we mark with ‘rte_fbarray_set_used’  For each free we ‘rte_fbarray_set_free’  In secondary fetch the details back by ‘rte_fbarray_find_next_used,|rte_fbarray_find_next_n_used’ Where it works:  rte_malloc, rte_calloc and rte_zalloc does not map alloc region name to address.  This makes it difficult to track the usage on dynamically allocates instance. Easy! Seg - 0 Seg - 1 Seg - 2 Seg - n Memzone- container Alloc-1 Alloc-2 Alloc-3 rte_fbarrary_attach
  • 41. Dynamic DEBUG with eBPF (user-space) 41 Looku p Table Count ers API: I. Application Specific II. DPDK eBPF functions for Debug API When to use: for dynamic debug What to do: load eBPF to existing applications How it works: same as user space eBPF Where: 1. Applications in field 2. Recompile not possible 3. Compiler MACROs not possible
  • 42. 42 # llvm-objdump -S t3.o t3.o: file format ELF64-BPF Disassembly of section .text: entry: 0: bf 12 00 00 00 00 00 00 r2 = r1 1: 69 21 10 00 00 00 00 00 r1 = *(u16 *)(r2 + 16) 2: 79 23 00 00 00 00 00 00 r3 = *(u64 *)(r2 + 0) 3: 0f 13 00 00 00 00 00 00 r3 += r1 4: 69 31 0c 00 00 00 00 00 r1 = *(u16 *)(r3 + 12) 5: 15 01 01 00 08 06 00 00 if r1 == 1544 goto +1 <LBB0_2> 6: 55 01 05 00 08 00 00 00 if r1 != 8 goto +5 <LBB0_3> LBB0_2: 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 9: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0) 10: b7 03 00 00 40 00 00 00 r3 = 64 11: 85 10 00 00 ff ff ff ff call -1 LBB0_3: 12: b7 00 00 00 01 00 00 00 r0 = 1 13: 95 00 00 00 00 00 00 00 exit

Notas do Editor

  1. Packet Classifier – state-full or state-less Flow pinning, Load balancing Kernel interface for IP for PMD ports Node aware resource allocation
  2. https://github.com/vipinpv85/DPDK_SURICATA-4_1_1 https://github.com/vipinpv85/DPDK-Suricata_3.0
  3. Mem-copy: Pros: XDP Buffer are released to pool immediately after copy. Cons: Limited vector instructions (large byte copy is multiple smaller copy, HW is limited to 2 load & 1 store on vector.). With SIMD-512 we can only achieve 64B (512b) copy. Zero-Copy: Pros: Buffer is in DPDK buffer format, No copy or external buffer. Cons: All buffers needs to be page aligned, Applications needs to be adapted, Buffer held in application till packet is dropped or tx complete.
  4. single or multiple primary processes. single primary and single secondary. single primary and multiple secondaries.
  5. https://p81atches.dpdk.org/cover/50379 https://p81atches.dpdk.org/cover/50380 https://p81atches.dpdk.org/cover/50381
  6. https://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
  7. set LD_PRELOAD to the path of a shared object, that file will be loaded before any other library (including the C runtime, libc.so). To run with special library (example malloc) ‘LD_PRELOAD=/path/to/my/malloc.so /bin/ls’
  8. Process to fetch stat: 6871 sysinfo uptime: 40936 loads: 1min (127424) 5min (77472) 15min (42688) RAM: free (49962647552) shared (15323136) buffer (440262656) swap: total (0) free (0) procs: 919 uptime: 40935.02 3598954.44 utime: 12301 stime: 269 cutime: 0 cstime: 0 starttime: 4089122 --- Calculation --- Hertz: 100 total time (12570) sec (45) cpu_usgae (279)
  9. https://github.com/vipinpv85/DPDK-THREADTRACE-WITHOUTGDB
  10. https://github.com/vipinpv85/DPDK-MEMZONEMONITOR
  11. https://github.com/vipinpv85/DPDK-MALLOCFREE-SCANNER