SlideShare uma empresa Scribd logo
1 de 73
RSS++: load and state-aware receive side scaling
Tom Barbette, Georgios P. Katsikas, Gerald Q. Maguire Jr., and Dejan Kostić
CORE
1
CORE
1
CORE
1
CORE
1
CORE
2
CORE
3
CORE
2
CORE
3
CORE
2
100G
Networking today
1/2/2020 2
0
100
200
300
400
1980 1990 2000 2010 2020
EthernetStandard
Speed(Gbps)
Year
1980 1990 2000 2010 2020
0
100
200
300
Years
Cores
100GHow to dispatch dozens of
millions of packets per seconds to
many cores?
Data from Karl Rupp / Creative Commons Attribution 4.0 International Public License
2020-01-02 3
Sharding
Key-Value Stores
Minos [NSDI’19]
Herd [SIGCOMM’14]
MICA [NSDI’14]
Chronos [SoCC‘12]
CPHASH [PPoPP‘12]
Packet Processing / NFV
Metron [NSDI’18]
NetBricks [OSDI’16]
SNF [PeerJ’16]
FastClick [ANCS’15]
Megapipe [OSDI’12]
ShardingNetwork Stacks
ClickNF [ATC’18]
StackMap [ATC’16]
mTCP [NSDI’14]
F-Stack [Tencent Cloud 13]
Affinity-Accept [EuroSys’12]
Sharding
Hello SoTA !
How to dispatch dozens
of millions of packets
per seconds to many
cores?
Ubuntu 18.04
A sharded testbed
2020-01-02 4
Core 1
Core 2
Core 18
RSS
100G
iPerf 2
iPerf 2
iPerf 2
iPerf 2 -c
100 TCP flows
Sharding’s problem : high imbalance
2020-01-02 5
 Underutilization and high tail latency
RSS++ : Rebalance groups of flow from time
to time
2020-01-02 6
‱ Much better load
spreading
‱ Much lower latency
 Latency reduced
by 30%
 Tail latency
reduced by 5X
RSS++ : Rebalance groups of flow from time
to time
2020-01-02 7
‱ Much better load
spreading
‱ Much lower latency
‱ Opportunity to
release 6 cores for
other applications
 1/3 resources
freed
Receive Side Scaling (RSS)
2020-01-02 8
Hash
1
2
1
2
1


Indirection table Core
1
Core
2
Receive Side Scaling (RSS)
2020-01-02 9
Hash
2
1
2
1


Indirection table
1
Hashing (≠uniform spreading) on mice and elephants
 High load imbalance
Flow-
awareness
Load
balancing
Core
1
Core
2
An opposite approach
Packet-based load-balacing
2020-01-02 10
Core
1
Core
2
Flow-
awareness
Fine-grained
load
balancing
Flow-awareness
2020-01-02 11
Fine-grained
load balancing
Flow-awareness
2020-01-02 12
Fine-grained
load balancing
Flow-awareness
++’s challenge
2020-01-02 13
Fine-grained
load balancing
RSS++ strikes for the right balance between perfect load spreading
and sharding
RSS
RSS++
2020-01-02 14
Hash
2
1
2
1


Indirection table
12
Core
1
Core
2
Rebalance some RSS buckets from time to time
RSS++
2020-01-02 15
Rebalance some RSS buckets from time to time
RSS++ strikes for the right balance between perfect load spreading
and sharding
By migrating the RSS indirection buckets based upon the output of
an optimization algorithm to even the load
RSS++
2020-01-02 16
Handle stateful use-cases with a new per-bucket flow table
algorithm that migrates the state with the buckets
RSS++ strikes for the right balance between perfect load spreading
and sharding
By migrating the RSS indirection buckets based upon the output of
an optimization algorithm to even the load
2020-01-02 17
RSS++
RSS++ overview
2020-01-02 18
Hash
2
2
1
2
1


Indirection table Core
1
Core
2
RSS++ overview
2020-01-02 19
Hash
2
2
1
2
1


Indirection table
Core 2
Core 1
Counter
Tables
RSS++
Balancing Timer
10Hz ~ 1Hz
Ethtool API
DPDK APIs
2421
2622
1231


502


3112 90%
40%
CPU Load
LINUX
XDP [CoNEXT’18]
BPF program
In-app
function call
DPDK
Kernel CPU load
Useful cycles /
Application cycles
12%
27%
8%
46%
36%
Greedy iterative approach
In 85% of the cases, a single run is enough to be in a
0,5% squared imbalance margin, in 25usec
Stateful use-cases: state migration
2020-01-02 20
‱ RSS++ migrates some RSS buckets
 Packets from migrated flows need to find their state
20
Core
1
Core
2
Flow table #1
Flow table #2
???
Stateful use-cases: state migration
2020-01-02 21
‱ RSS++ migrates some RSS buckets
 Packets from migrated flows need to find their state
‱ Possible approach: a shared flow table
21
‱ RSS++ migrates some RSS buckets
 Packets from migrated flows need to find their state
‱ Possible approach: a shared flow table
‱ RSS++ (DPDK implementation only):
Stateful use-cases: state migration
2020-01-02 22
1
2
1
2
1


Indirection table
Hash-table #1
Hash-table #2
Hash-table #3




Flow Ptr table
Hash
3
(nearly never) QUEUE
Until previous core finished
handling all packets of bucket #2
Core
2
Core
3
RSS++
2020-01-02 23
Evaluation
Evaluation
Load imbalance
2020-01-02 24
Nmost loaded – Nleast loaded
Nleast loaded
15Gbps trace (~80K active flows/s) replayed towards the DUT
Evaluation
Load imbalance of packet-based methods
2020-01-02 25
Packet-based
method
have a very good
balance !
15Gbps trace (~80K active flows/s) replayed towards the DUT
Flow-
awareness
Fine-grained
load
balancing
Evaluation
Load imbalance of RSS
2020-01-02 2615Gbps trace (~80K active flows/s) replayed towards the DUT
Flow-
awareness
Fine-grained
load
balancing
Evaluation
Load imbalance of stateful methods
2020-01-02 27
Without migration,
other approaches
cannot really do
anything good!
15Gbps trace (~80K active flows/s) replayed towards the DUT
Flow-
awareness
Fine-grained
load
balancing
X12
(Avg. ~X5)
Evaluation: Load imbalance of RSS++
2020-01-02 2815Gbps trace (~80K active flows/s) replayed towards the DUT
Flow-
awareness
Fine-grained
load
balancing
Service chain at 100G: FW+NAT
2020-01-02 2915Gbps trace (~80K active flows/s) trace accelerated up to 100 Gbps, 39K rules FW
RSS is not able to fully
utilize new cores
Service chain at 100G: FW+NAT
2020-01-02 3015Gbps trace (~80K active flows/s) trace accelerated up to 100 Gbps, 39K rules FW
RSS++ shows linear
improvement with the
number of cores
RSS is not able to fully
utilize new cores
Service chain at 100G: FW+NAT
2020-01-02 3115Gbps trace (~80K active flows/s) trace accelerated up to 100 Gbps, 39K rules FW
Sharing state between
cores leads to poor
performance
RSS++ shows linear
improvement with the
number of cores
RSS is not able to fully
utilize new cores
Conclusion
2020-01-02 33
State-aware NIC-assisted scheduling to solve a problem that will only get
worst
– No dispatching cores
– Sharded approach (no OS scheduling)
A new state migration technique
– Minimal state “transfer”
– No lock in the datapath
up to 14x lower 95th lower
latency, no drops and with 25%-
37% less cores
Linux (via Kernel API + small patch) and DPDK implementation, fully available,
with all experiment scripts
Thanks !
2020-01-02 34
github.com/rsspp/
In the paper:
– How the solver works
– More evaluations
> Particularly tail latency studies
> Comparison with Metron’s traffic-class dispatching
– More state of the art
– Future work
– Discussions about use in other contexts:
> KVS load-balancing
> Dispatching using multiple cores in a pipeline
> NUMA
– Trace analysis
This work is supported by SSF and ERC
2020-01-02 35
Backup slides
2020-01-02 36
SOTA
Solutions for RSS’s imbalance
2020-01-02 37
‱ Sprayer [Hotnets’18] / RPCValet [SOSP’19]
– Forget about flows, do per-packet dispatching
 Stateful use case dead
 Even stateless sometimes inefficients
‱ Metron [NSDI’18]
– Compute traffic classes, and split/merge classes among cores
 Miss load-awareness. Traffic classes may not be uniform hashing.
‱ Affinity-Accept [EuroSys’12]
– Redirect connections in software to other cores, and re-program some RSS entries when they contain
mostly redirected connections
 Load imbalance as best as good as « SW Stateful Load »  We need migration.
 Software dispatching to some extent
SOTA : Intra-server LB
2020-01-02 38
Dispatchers cores
Shinjuku*, Shenango
Still need RSS++ to dispatch to the many dispatching cores needed for 100G
Inefficient
Shuffling layer
ZygOS*, Affinity-Accept, Linux
Why pay for cache misses when the NIC can do it?
Do not support migration  high imbalance
*BUT we miss the mixing of multiple applications on a single core
Our contributions
2020-01-02 39
‱ We solve the packet dispatching problem by migrating the RSS indirection
buckets between shards based upon the output of an optimization algorithm
– Without the need for dispatching cores
‱ Dynamically scale the number of cores
 Avoids the typical 25% over-provisioning
 Order of magnitude lower tail latency
‱ Compensate for occasional state migration with a new stateful per-bucket flow
table algorithm:
– Prevents packet reordering during migration
– 20% more efficient than a shared flow table
 Stateful near-perfect intra-server load-balancing, even at the speed of 100
Gbps links
Backup slides
2020-01-02 40
RSS++ Algorithm
RSS++ algorithm
2020-01-02 41
CPU 2
CPU 1
3112
2421
2622
1231
Counting Table
502
Counting Table
90%
40%
CPU Load
RSS++ algorithm
2020-01-02 42
CPU 2
CPU 1
90%
40%
3112
2421
2622
1231
Counting Table
502
Counting Table
Buckets
fractional load
Bucket #1 load : 1231 / (1231 + 2622) = 31%
31% * 40% = 12%
12%
27%
8%
46%
36%
Bucket #1 fractional load :
65%
Average CPU load
90%
40%
+ 25%
- 25%
2
2
1
2
1
Indirection table
RSS++
Problem Solver
82%
48%
+ 17%
- 17%
1
RSS++ algorithm
2020-01-02 43
CPU 2
CPU 1
90%
40%
3112
2421
2622
1231
Counting Table
502
Counting Table
Buckets
fractional load
31% * 40% = 12%
12%
27%
8%
46%
36%
65%
Average CPU load
90%
40%
+ 25%
- 25%
2
2
1
2
1
Indirection table
RSS++
Problem Solver
54%
76%
- 11%
+11%
1
In 85% of the cases, a single run is enough to be in a
0,5% squared imbalance margin, in 25usec
Solver
1/2/2020 44
If you like math, go to the paper.
 We use a greedy, non-optimal, approach because:
 We don’t care about the optimal solution
 State of the art showed too slow resolution time of for multi-way number partitioning
Greedy approach
1/2/2020 45
1. Sort buckets by descending fractional load
2. Sort underloaded cores by ascending load
3. Dispatch most loaded buckets to underloaded cores, allowing over-moves by
a threshold
4. Restart up to 10 times using different threshold to find an inflection point
In 85% of the cases, a single run is enough to be in a 0,5% squared imbalance
margin, in 25usec
Stateful use-cases: state migration
2020-01-02 46
‱ RSS++ migrates some RSS buckets
 Packets from migrated flows need to find their state
‱ Possible approach: a shared, as efficient-as-possible hash-table
2
2
1
2
1


Indirection table
CPU
1
CPU
2
Hash-table
BANG
RSS++ : Rebalance some RSS buckets from
time to time
2020-01-02 47
30% lower average latency
4~5x lower standard deviation and tail latency
Backup slides
2020-01-02 48
RSS++ Implementation
LibNICScheduler
2020-01-02 49
Backup slides
2020-01-02 50
Evaluation
2020-01-02 51
2020-01-02 52
2020-01-02 53
2020-01-02 54
Evaluation: Load imbalance of Metron
2020-01-02 55
[Graph of Load imbalance with RSS and RSS++ RR and Sprayer + Stateful
methods]
2020-01-02 56
CPU frequency fixed at 1GHz, doing some fixed artificial per-packet workload
Evaluation: State migration
2020-01-02 57
Forwarding 1500 bytes UDP packets from 1024 concurrent flows of 1000 packets, classified in either a
unique thread-safe Cuckoo hash-table or in a per-bucket hash-table
Evaluation: Firewall only
2020-01-02 58Trace accelerated up to 100 Gbps
‱ RSS cannot always
fully utilize more
cores due to load
imbalance
‱ Even for a stateless
case, packet-based
approach is harmful
to cache
Evaluation: 39K rules firewall at 100G
2020-01-02 59Trace accelerated up to 100 Gbps
‱ Even for a stateless
case, packet-based
approach is harmful
to cache
‱ We need hardware
dispatching
Stateful evaluation at 100G: FW+NAT+DPI
2020-01-02 60
Hyperscan [Wang 2019]
NFV Evaluation
2020-01-02 61
Backup slides
2020-01-02 62
RSS Video
Why does RSS++ work?
2020-01-02 63
Hash
2
2
1
2
1


Indirection table CPU
1
CPU
2
Why does RSS++ work?
2020-01-02 64
1
2
1
2
1


Indirection table
Why does RSS++ work?
2020-01-02 65
1
2
1
2
1


Why does RSS++ work?
2020-01-02 66
1
2
1
2
1


Why does RSS++ work?
2020-01-02 67
5
4
3
2
1
8
7
6
3
2
1
6
5
4
1
8
7
4
3
2
7
6
5
2
1
8


Numberofpackets
Bucket index
2020-01-02 68
Watch RSS live !
The internet is not random

‱ Buckets have up to
x1000 imbalance
‱ There is a stickiness
over time
Solution at t0 is mostly
valid for t1
Backup slides
2020-01-02 69
Discussion
Why Sharding in Linux ?
2020-01-02 70
‱ Unpublished result of a 3seconds sampling
‱ Still much to do to take real advantage of sharding
Multiple applications
2020-01-02 71
‱ To keep all advantage of sharding, one should slightly modify our
implementation to use a set of RSS queues per application, and exchange
cores through a common pool of available cores
‱ Another idea would be to combine slow applications on one core, and reduce
the problem of polling
Multiple NICs
2020-01-02 72
‱ One should devise how much of the actual load is due to which input
Background noise
2020-01-02 73
‱ A small background noise will make the load go higher and therefore buckets
will get evicted
‱ A high background noise would need a modification to the algorithm to take it
out from the capacity of a CPU : note that some CPU is at 60% out of 70% of
load, or else the « bucket fractional load » will be disproportional from the load
of other cores.
Oscillation
2020-01-02 74
‱ We don’t care

Mais conteĂșdo relacionado

Mais procurados

LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpJames Denton
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDKDenys Haryachyy
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
FD.io VPPäș‹ć§‹ă‚
FD.io VPPäș‹ć§‹ă‚FD.io VPPäș‹ć§‹ă‚
FD.io VPPäș‹ć§‹ă‚tetsusat
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchTe-Yen Liu
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
Using GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlUsing GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlKentaro Ebisawa
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux NetworkingPLUMgrid
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applicationsVipin Varghese
 
DPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway ApplicationDPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway ApplicationMichelle Holley
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čタぼありがたさ
ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čă‚żăźă‚ă‚ŠăŒăŸă•ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čタぼありがたさ
ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čタぼありがたさTakatoshi Matsuo
 

Mais procurados (20)

LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
FD.io VPPäș‹ć§‹ă‚
FD.io VPPäș‹ć§‹ă‚FD.io VPPäș‹ć§‹ă‚
FD.io VPPäș‹ć§‹ă‚
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
Using GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlUsing GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnl
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
DPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway ApplicationDPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway Application
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čタぼありがたさ
ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čă‚żăźă‚ă‚ŠăŒăŸă•ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čタぼありがたさ
ç—›ă„ç›źă«ă‚ăŁăŠă‚ă‹ă‚‹ HAă‚Żăƒ©ă‚čタぼありがたさ
 

Semelhante a RSS++

Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
 
PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)
PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)
PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)Naoto MATSUMOTO
 
Intelligent Network Services through Active Flow Manipulation
Intelligent Network Services through Active Flow ManipulationIntelligent Network Services through Active Flow Manipulation
Intelligent Network Services through Active Flow ManipulationTal Lavian Ph.D.
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCGanesan Narayanasamy
 
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...Scalar, Inc.
 
High available energy management system
High available energy management systemHigh available energy management system
High available energy management systemJo Ee Liew
 
6TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 20156TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 2015Pascal Thubert
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clustersStas Kelvich
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
 
moabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid Enginemoabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid EngineFrédérick Lefebvre
 
Scalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction ManagerScalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction ManagerScalar, Inc.
 
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...Tal Lavian Ph.D.
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
Accordion - VLDB 2014
Accordion - VLDB 2014Accordion - VLDB 2014
Accordion - VLDB 2014Marco Serafini
 
Cisco data center support
Cisco data center supportCisco data center support
Cisco data center supportKrunal Shah
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutronvivekkonnect
 

Semelhante a RSS++ (20)

Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirements
 
PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)
PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)
PBR-LB - Direct Server Return Load Balancing using Policy Based Routing (MEMO)
 
Intelligent Network Services through Active Flow Manipulation
Intelligent Network Services through Active Flow ManipulationIntelligent Network Services through Active Flow Manipulation
Intelligent Network Services through Active Flow Manipulation
 
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPCRISC-V  and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
RISC-V and OpenPOWER open-ISA and open-HW - a swiss army knife for HPC
 
SDN approach.pptx
SDN approach.pptxSDN approach.pptx
SDN approach.pptx
 
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
 
High available energy management system
High available energy management systemHigh available energy management system
High available energy management system
 
6TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 20156TiSCH @Telecom Bretagne 2015
6TiSCH @Telecom Bretagne 2015
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
moabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid Enginemoabcon2012 - Transitioning from Grid Engine
moabcon2012 - Transitioning from Grid Engine
 
Scalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction ManagerScalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction Manager
 
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Accordion - VLDB 2014
Accordion - VLDB 2014Accordion - VLDB 2014
Accordion - VLDB 2014
 
Cisco data center support
Cisco data center supportCisco data center support
Cisco data center support
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
 

Último

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vĂĄzquez
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

RSS++

  • 1. RSS++: load and state-aware receive side scaling Tom Barbette, Georgios P. Katsikas, Gerald Q. Maguire Jr., and Dejan Kostić CORE 1 CORE 1 CORE 1 CORE 1 CORE 2 CORE 3 CORE 2 CORE 3 CORE 2 100G
  • 2. Networking today 1/2/2020 2 0 100 200 300 400 1980 1990 2000 2010 2020 EthernetStandard Speed(Gbps) Year 1980 1990 2000 2010 2020 0 100 200 300 Years Cores 100GHow to dispatch dozens of millions of packets per seconds to many cores? Data from Karl Rupp / Creative Commons Attribution 4.0 International Public License
  • 3. 2020-01-02 3 Sharding Key-Value Stores Minos [NSDI’19] Herd [SIGCOMM’14] MICA [NSDI’14] Chronos [SoCC‘12] CPHASH [PPoPP‘12] Packet Processing / NFV Metron [NSDI’18] NetBricks [OSDI’16] SNF [PeerJ’16] FastClick [ANCS’15] Megapipe [OSDI’12] ShardingNetwork Stacks ClickNF [ATC’18] StackMap [ATC’16] mTCP [NSDI’14] F-Stack [Tencent Cloud 13] Affinity-Accept [EuroSys’12] Sharding Hello SoTA ! How to dispatch dozens of millions of packets per seconds to many cores?
  • 4. Ubuntu 18.04 A sharded testbed 2020-01-02 4 Core 1 Core 2 Core 18 RSS 100G iPerf 2 iPerf 2 iPerf 2 iPerf 2 -c 100 TCP flows
  • 5. Sharding’s problem : high imbalance 2020-01-02 5  Underutilization and high tail latency
  • 6. RSS++ : Rebalance groups of flow from time to time 2020-01-02 6 ‱ Much better load spreading ‱ Much lower latency  Latency reduced by 30%  Tail latency reduced by 5X
  • 7. RSS++ : Rebalance groups of flow from time to time 2020-01-02 7 ‱ Much better load spreading ‱ Much lower latency ‱ Opportunity to release 6 cores for other applications  1/3 resources freed
  • 8. Receive Side Scaling (RSS) 2020-01-02 8 Hash 1 2 1 2 1 
 Indirection table Core 1 Core 2
  • 9. Receive Side Scaling (RSS) 2020-01-02 9 Hash 2 1 2 1 
 Indirection table 1 Hashing (≠uniform spreading) on mice and elephants  High load imbalance Flow- awareness Load balancing Core 1 Core 2
  • 10. An opposite approach Packet-based load-balacing 2020-01-02 10 Core 1 Core 2 Flow- awareness Fine-grained load balancing
  • 13. Flow-awareness ++’s challenge 2020-01-02 13 Fine-grained load balancing RSS++ strikes for the right balance between perfect load spreading and sharding RSS
  • 15. RSS++ 2020-01-02 15 Rebalance some RSS buckets from time to time RSS++ strikes for the right balance between perfect load spreading and sharding By migrating the RSS indirection buckets based upon the output of an optimization algorithm to even the load
  • 16. RSS++ 2020-01-02 16 Handle stateful use-cases with a new per-bucket flow table algorithm that migrates the state with the buckets RSS++ strikes for the right balance between perfect load spreading and sharding By migrating the RSS indirection buckets based upon the output of an optimization algorithm to even the load
  • 19. RSS++ overview 2020-01-02 19 Hash 2 2 1 2 1 
 Indirection table Core 2 Core 1 Counter Tables RSS++ Balancing Timer 10Hz ~ 1Hz Ethtool API DPDK APIs 2421 2622 1231 
 502 
 3112 90% 40% CPU Load LINUX XDP [CoNEXT’18] BPF program In-app function call DPDK Kernel CPU load Useful cycles / Application cycles 12% 27% 8% 46% 36% Greedy iterative approach In 85% of the cases, a single run is enough to be in a 0,5% squared imbalance margin, in 25usec
  • 20. Stateful use-cases: state migration 2020-01-02 20 ‱ RSS++ migrates some RSS buckets  Packets from migrated flows need to find their state 20 Core 1 Core 2 Flow table #1 Flow table #2 ???
  • 21. Stateful use-cases: state migration 2020-01-02 21 ‱ RSS++ migrates some RSS buckets  Packets from migrated flows need to find their state ‱ Possible approach: a shared flow table 21
  • 22. ‱ RSS++ migrates some RSS buckets  Packets from migrated flows need to find their state ‱ Possible approach: a shared flow table ‱ RSS++ (DPDK implementation only): Stateful use-cases: state migration 2020-01-02 22 1 2 1 2 1 
 Indirection table Hash-table #1 Hash-table #2 Hash-table #3 
 
 Flow Ptr table Hash 3 (nearly never) QUEUE Until previous core finished handling all packets of bucket #2 Core 2 Core 3 RSS++
  • 24. Evaluation Load imbalance 2020-01-02 24 Nmost loaded – Nleast loaded Nleast loaded 15Gbps trace (~80K active flows/s) replayed towards the DUT
  • 25. Evaluation Load imbalance of packet-based methods 2020-01-02 25 Packet-based method have a very good balance ! 15Gbps trace (~80K active flows/s) replayed towards the DUT Flow- awareness Fine-grained load balancing
  • 26. Evaluation Load imbalance of RSS 2020-01-02 2615Gbps trace (~80K active flows/s) replayed towards the DUT Flow- awareness Fine-grained load balancing
  • 27. Evaluation Load imbalance of stateful methods 2020-01-02 27 Without migration, other approaches cannot really do anything good! 15Gbps trace (~80K active flows/s) replayed towards the DUT Flow- awareness Fine-grained load balancing
  • 28. X12 (Avg. ~X5) Evaluation: Load imbalance of RSS++ 2020-01-02 2815Gbps trace (~80K active flows/s) replayed towards the DUT Flow- awareness Fine-grained load balancing
  • 29. Service chain at 100G: FW+NAT 2020-01-02 2915Gbps trace (~80K active flows/s) trace accelerated up to 100 Gbps, 39K rules FW RSS is not able to fully utilize new cores
  • 30. Service chain at 100G: FW+NAT 2020-01-02 3015Gbps trace (~80K active flows/s) trace accelerated up to 100 Gbps, 39K rules FW RSS++ shows linear improvement with the number of cores RSS is not able to fully utilize new cores
  • 31. Service chain at 100G: FW+NAT 2020-01-02 3115Gbps trace (~80K active flows/s) trace accelerated up to 100 Gbps, 39K rules FW Sharing state between cores leads to poor performance RSS++ shows linear improvement with the number of cores RSS is not able to fully utilize new cores
  • 32. Conclusion 2020-01-02 33 State-aware NIC-assisted scheduling to solve a problem that will only get worst – No dispatching cores – Sharded approach (no OS scheduling) A new state migration technique – Minimal state “transfer” – No lock in the datapath up to 14x lower 95th lower latency, no drops and with 25%- 37% less cores Linux (via Kernel API + small patch) and DPDK implementation, fully available, with all experiment scripts
  • 33. Thanks ! 2020-01-02 34 github.com/rsspp/ In the paper: – How the solver works – More evaluations > Particularly tail latency studies > Comparison with Metron’s traffic-class dispatching – More state of the art – Future work – Discussions about use in other contexts: > KVS load-balancing > Dispatching using multiple cores in a pipeline > NUMA – Trace analysis This work is supported by SSF and ERC
  • 36. Solutions for RSS’s imbalance 2020-01-02 37 ‱ Sprayer [Hotnets’18] / RPCValet [SOSP’19] – Forget about flows, do per-packet dispatching  Stateful use case dead  Even stateless sometimes inefficients ‱ Metron [NSDI’18] – Compute traffic classes, and split/merge classes among cores  Miss load-awareness. Traffic classes may not be uniform hashing. ‱ Affinity-Accept [EuroSys’12] – Redirect connections in software to other cores, and re-program some RSS entries when they contain mostly redirected connections  Load imbalance as best as good as « SW Stateful Load »  We need migration.  Software dispatching to some extent
  • 37. SOTA : Intra-server LB 2020-01-02 38 Dispatchers cores Shinjuku*, Shenango Still need RSS++ to dispatch to the many dispatching cores needed for 100G Inefficient Shuffling layer ZygOS*, Affinity-Accept, Linux Why pay for cache misses when the NIC can do it? Do not support migration  high imbalance *BUT we miss the mixing of multiple applications on a single core
  • 38. Our contributions 2020-01-02 39 ‱ We solve the packet dispatching problem by migrating the RSS indirection buckets between shards based upon the output of an optimization algorithm – Without the need for dispatching cores ‱ Dynamically scale the number of cores  Avoids the typical 25% over-provisioning  Order of magnitude lower tail latency ‱ Compensate for occasional state migration with a new stateful per-bucket flow table algorithm: – Prevents packet reordering during migration – 20% more efficient than a shared flow table  Stateful near-perfect intra-server load-balancing, even at the speed of 100 Gbps links
  • 40. RSS++ algorithm 2020-01-02 41 CPU 2 CPU 1 3112 2421 2622 1231 Counting Table 502 Counting Table 90% 40% CPU Load
  • 41. RSS++ algorithm 2020-01-02 42 CPU 2 CPU 1 90% 40% 3112 2421 2622 1231 Counting Table 502 Counting Table Buckets fractional load Bucket #1 load : 1231 / (1231 + 2622) = 31% 31% * 40% = 12% 12% 27% 8% 46% 36% Bucket #1 fractional load : 65% Average CPU load 90% 40% + 25% - 25% 2 2 1 2 1 Indirection table RSS++ Problem Solver 82% 48% + 17% - 17% 1
  • 42. RSS++ algorithm 2020-01-02 43 CPU 2 CPU 1 90% 40% 3112 2421 2622 1231 Counting Table 502 Counting Table Buckets fractional load 31% * 40% = 12% 12% 27% 8% 46% 36% 65% Average CPU load 90% 40% + 25% - 25% 2 2 1 2 1 Indirection table RSS++ Problem Solver 54% 76% - 11% +11% 1 In 85% of the cases, a single run is enough to be in a 0,5% squared imbalance margin, in 25usec
  • 43. Solver 1/2/2020 44 If you like math, go to the paper.  We use a greedy, non-optimal, approach because:  We don’t care about the optimal solution  State of the art showed too slow resolution time of for multi-way number partitioning
  • 44. Greedy approach 1/2/2020 45 1. Sort buckets by descending fractional load 2. Sort underloaded cores by ascending load 3. Dispatch most loaded buckets to underloaded cores, allowing over-moves by a threshold 4. Restart up to 10 times using different threshold to find an inflection point In 85% of the cases, a single run is enough to be in a 0,5% squared imbalance margin, in 25usec
  • 45. Stateful use-cases: state migration 2020-01-02 46 ‱ RSS++ migrates some RSS buckets  Packets from migrated flows need to find their state ‱ Possible approach: a shared, as efficient-as-possible hash-table 2 2 1 2 1 
 Indirection table CPU 1 CPU 2 Hash-table BANG
  • 46. RSS++ : Rebalance some RSS buckets from time to time 2020-01-02 47 30% lower average latency 4~5x lower standard deviation and tail latency
  • 54. Evaluation: Load imbalance of Metron 2020-01-02 55 [Graph of Load imbalance with RSS and RSS++ RR and Sprayer + Stateful methods]
  • 55. 2020-01-02 56 CPU frequency fixed at 1GHz, doing some fixed artificial per-packet workload
  • 56. Evaluation: State migration 2020-01-02 57 Forwarding 1500 bytes UDP packets from 1024 concurrent flows of 1000 packets, classified in either a unique thread-safe Cuckoo hash-table or in a per-bucket hash-table
  • 57. Evaluation: Firewall only 2020-01-02 58Trace accelerated up to 100 Gbps ‱ RSS cannot always fully utilize more cores due to load imbalance ‱ Even for a stateless case, packet-based approach is harmful to cache
  • 58. Evaluation: 39K rules firewall at 100G 2020-01-02 59Trace accelerated up to 100 Gbps ‱ Even for a stateless case, packet-based approach is harmful to cache ‱ We need hardware dispatching
  • 59. Stateful evaluation at 100G: FW+NAT+DPI 2020-01-02 60 Hyperscan [Wang 2019]
  • 62. Why does RSS++ work? 2020-01-02 63 Hash 2 2 1 2 1 
 Indirection table CPU 1 CPU 2
  • 63. Why does RSS++ work? 2020-01-02 64 1 2 1 2 1 
 Indirection table
  • 64. Why does RSS++ work? 2020-01-02 65 1 2 1 2 1 

  • 65. Why does RSS++ work? 2020-01-02 66 1 2 1 2 1 

  • 66. Why does RSS++ work? 2020-01-02 67 5 4 3 2 1 8 7 6 3 2 1 6 5 4 1 8 7 4 3 2 7 6 5 2 1 8 
 Numberofpackets Bucket index
  • 67. 2020-01-02 68 Watch RSS live ! The internet is not random  ‱ Buckets have up to x1000 imbalance ‱ There is a stickiness over time Solution at t0 is mostly valid for t1
  • 69. Why Sharding in Linux ? 2020-01-02 70 ‱ Unpublished result of a 3seconds sampling ‱ Still much to do to take real advantage of sharding
  • 70. Multiple applications 2020-01-02 71 ‱ To keep all advantage of sharding, one should slightly modify our implementation to use a set of RSS queues per application, and exchange cores through a common pool of available cores ‱ Another idea would be to combine slow applications on one core, and reduce the problem of polling
  • 71. Multiple NICs 2020-01-02 72 ‱ One should devise how much of the actual load is due to which input
  • 72. Background noise 2020-01-02 73 ‱ A small background noise will make the load go higher and therefore buckets will get evicted ‱ A high background noise would need a modification to the algorithm to take it out from the capacity of a CPU : note that some CPU is at 60% out of 70% of load, or else the « bucket fractional load » will be disproportional from the load of other cores.

Notas do Editor

  1. (Do not say the names if chair introduce me) (Else after joint work do not say myself )
  2. Hundred gigabit NICs are becoming a commodity in datacenters. Those NICs have to dispatch dozen of million of packets to many-cores CPUs. CLICK And both those numbers, the ethernet speeds, and the number of cores, are increasing dramatically. So the question that I’ll address in this talk, [how to 
], which is already a problem today, will even be more of a problem tomorrow.
  3. If we look at the recent SOTA in high speed software networking, a lot of recent works in Key value-stores CLICK and packet processing and network function virtualization advocates for the use of sharding, as well as all recent networks stacks, which are sharded. CLICK
  4. So what is this sharding about? To answer that, I’ll show you our sharded testbed. We have a computer with 18 cores, and a hundred gigabit NIC. We configure the NIC so it dispatches packets to 18 queues, one per core. On each core, we run an instance of the application, in our case, iperf 2. The application is pinned to the core, and that’s the idea of sharding. The computer is divided into independent shardsone can almost consider each core as a different server. The advantage of this is that we avoid any shared data structure, any contention between CPU cores. If there was no problem with sharding ,we would not have a paper today. CLICK So to showcase the problem, we run an iperf client that will request 100 tcp flows. CLICK One important point, the NIC dispatches packets to the cores using RSS, basically hashing packets so packets of the same flow go to the same core.
  5. Sprayer[Hugo Sadok 2018] HotNets.
  6. Sprayer[Hugo Sadok 2018] HotNets.
  7. We see again now that the load is higher that RSS is still not able to utilize fully new cores and not able, even with 6 more cores than RSS
  8. We see again now that the load is higher that RSS is still not able to utilize fully new cores and not able, even with 6 more cores than RSS
  9. We see again now that the load is higher that RSS is still not able to utilize fully new cores and not able, even with 6 more cores than RSS
  10. We see again now that the load is higher that RSS is still not able to utilize fully new cores and not able, even with 6 more cores than RSS
  11. 20% more efficient, an order of magnitude lower latency with a high number of core
  12. With this I will thank you for listening and be happy to take any question you may have
  13. 25: no joke
  14. Do this in an animation
  15. One library for NIC-driven scheduling: With multiple scheduling strategies, one of them being RSS++ Two « integrations » : Linux, reading packets using a XDP BPF program, and writing the indirection table using the ethtool API DPDK, counting packets through function calls and programing the NIC with DPDK’s API
  16. 20% more efficient Order of magnitude better latency
  17. « Controlling Parallelism in a Multicore Software Router “ TODO : limit at 100G
  18. « Controlling Parallelism in a Multicore Software Router “ TODO : limit at 100G
  19. TODO : make in multiple graphs TODO : numbers
  20. If we look at the number of packets received by each buckets, and map it as per a default indirection table, we can see the number of packets received by each core is very disproportional. Moreover, we see the load of each buckets is not completely random, some buckets tend to be highly loaded, or stay loaded for some time.  So what we propose in RSS++ is to migrate, a few of those overloaded buckets from time to time, to even the load between all CPUs