22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Service Providers

Christopher Lim, Sr. Engineer
July 2016
Mellanox Efficient Virtual Network for Service Providers

© 2016 Mellanox Technologies 2- Mellanox Confidential -
Leading Supplier of End-to-End Interconnect Solutions
StoreAnalyze
Enabling the Use of Data
SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules
Comprehensive End-to-End InfiniBand and Ethernet Portfolio (VPI)
Metro / WANNPU & Multicore
NPS
TILE

Cloud-Native NFV Architecture Dictates Efficient Virtual Network
Mellanox EVN: Foundation for Efficient Telco Cloud Infrastructure
Efficient Virtual Network
Enabling High-performance, Reliable and
Scalable Infrastructure for Cloud Service Delivery
AUTOMATIONACCELERATIONVIRTUALIZATION
Compute
Higher Workload
Density
Network
Line Rate Packet
Processing
Storage
Higher IOPS, Lower
Latency

SR-IOV – Overcome Compute Virtualization Penalty
VM 1 VM 2 VM N
……
VF Driver VF Driver VF Driver
VM
Virtual NIC
VM
Virtual NIC
Hypervisor
Single Root I/O Virtualization (SR-IOV) capable NIC
Virtual Switch
Physical FunctionVirtual
Function
Virtual
Function
Virtual
Function
NIC Embedded Switch
PF Driver
PCIe Bus
Application Direct
Access to achieve
bare metal I/O
performance
……
VMs leveraging SR-IOV and Mellanox eSwitch for
near-line-rate performance without CPU overhead
Software-switched VMs suffering
from compute virtualization penalty

SR-IOV + DPDK: Better Together with Mellanox PMD
VM 1 VM 2 VM N
……
Hypervisor
Single Root I/O Virtualization (SR-IOV) capable NIC
Virtual
Function
Virtual
Function
Virtual
Function
NIC Embedded Switch
PCIe Bus
Further accelerate
packet processing
performance by
eliminating interrupts
and context switches
……
Mellanox DPDK PMD
DPDK Library
Mellanox DPDK PMD
DPDK Library
Mellanox DPDK PMD
DPDK Library

Mellanox Sets New DPDK Performance Records
42.11
30.58
17.96
9.36
4.78 3.83 3.24
0
10
20
30
40
50
60
64 128 256 512 1024 1280 1518
FramesperSecond(InMillions)
Frame Size (In Bytes)
Superior DPDK Packet Performance at Various Frame Sizes (Lx 40G)
Test setup:
• ConnectX-4Lx 40GbE Single
port
• 4 Cores Dedicated to DPDK
Product
Single-port TCP Throughput
DPDK 64B Packet Throughput
ConnectX-4 100G
93.4 Gb/s
74.4 million p/s
ConnectX-4 Lx
40G
37.6 Gb/s
42.1 million p/s
ConnectX-4 Lx
25G
23.5 Gb/s
34 million p/s
ConnectX-4 40G
37.6 Gb/s
56.4 million p/s

 Solution:
• Overlay Network Accelerators in NIC
• Penalty free overlays at bare-metal speed
• Integrated and validated by major SDN
vendors
 Benefits:
• 37.5Gb/s on 40G link, >2X compared to
without VxLAN offload
• On a 20 cores system, 7 cores are freed to
run addition VMs, saving 35% of total cores
while doubling the throughput!
Turbocharge Overlay Networks with ConnectX-3/4 NICs

Cumulus Overlay Solution
VMware NSX
PLUMgrid ONS
Nuage VSP
Midokura Midonet
Juniper OpenContrail
Akanda Astara
Cumulus LNV
 Switch VXLAN tunnel endpoint (VTEP) is used
• To connect bare metal servers to VXLAN network
• To connect VXLAN and legacy network
 Cumulus Integrated with every major Overlay Solution
 Available with Mellanox switches April 2016

Accelerated Switching And Packet Processing (ASAP2)
 Best of both worlds: Enable hardware accelerated data plane with SDN/virtual switch
control plane
 Multiple possibilities of accelerated data plane including DPDK in CPU, embedded
switch, FPGA, network processor, multi-core processor in server adaptor, TOR
switch, or centralized acceleration pool
 Standard hardware API to allow control plane and data plane to operate and innovate
independently
Roadmap
Virtual
Switch
Control
Plane
Hardware
Accelerated
Data Plane
Standard
Hardware
Abstraction
Interface
ASAP2

ASAP2 Phase 1: ASAP2 Direct
OVS Control Plane, optionally
combined with SDN controller
Direct application I/O access through
SR-IOV
Accelerated forwarding and
classification through Embedded
Switch (eSwitch) on Mellanox NIC
OS
VM
OS
VM
OS
VM
OS
VM
tap tap
SR-IOV
to the
VM
Embedded
Switch

OVS Architecture and Operations
11
OVS-vswitchd
OVS Kernel Module
First
Packet
Subsequent
Packets
User
Kernel
 Forwarding
• Flow-based forwarding
• First packet of a new flow (match miss) is
directed to user space (ovs-vswitchd)
• ovs-vswitchd determines flow handling and
programs kernel (fast path)
• Following packets hit kernel flow entries and are
executed in fast path

Mellanox eSwitch
ASAP2 – Let the Hardware Do the Heavy-lifting
New Flow
• A new flow will result in a ‘miss’ action in
eSwitch and is directed to OVS kernel module
• Miss in kernel will punt the packet to OVS-
vswitchd in user space
Configuration
• OVS-vswitchd will resolve the flow entry, and
based on a policy decision to offload,
propagate that to corresponding eSwitch tables
for offload-enabled flows
Fast
Forwarding
• Subsequent frames of offload-enabled flows
will be processed and forwarded by eSwitch
OVS-vswitchd
OVS Kernel Module
First
Packet
Subsequent
HW Forwarded
Packets
User
Kernel
Fallback
Forwarding Path
SoftwareHardware

OVS and SRIOV, Working Seamlessly Together
Representor ports enable OVS to
“know” and service those VMs
that uses SR-IOV
Representor ports are used for
eSwitch / OVS communication
(miss flow and PV to SR-IOV
communication)
Netdev
Representor
Netdev
Representor
netdev netdev
VMs using OVS Offload VMs using Para-Virtualization
NIC eSwitch
Policy based Flow Sync

Software Defined Networking, at Full Speed
 Highest performance (High throughput, low
and deterministic latency)
• Offload is increasingly important as server I/O
speed goes up
 Low CPU overhead, higher infrastructure
efficiency
 Software defined
 Everything In-Box
• All changes will be up-streamed, no proprietary
OVS or kernel patches eSwitcheSwitcheSwitch
Configuration
Stats Reporting
SDN or Other Network
Orchestration

Benchmark Targets
 Matrices
• Message Rate (PPS)
• Network related CPU Load
 Environments
• 25Gbps network
• Extreme performance
• Open Source
• Free
 Standard Benchmark
• RFC 2544

Benchmark Topology and Traffic Flow
Mellanox
Kernel Kernel
Kernel Kernel
User User
UserUser
OVS Over DPDK OVS Offload
OVS
DPDK
DPDK
Testpmd
OVS
eSwitch
DPDK
Testpmd
Flows Offload
25GE 25GE
VM
Hypervisor
NIC
VS.

Results and Conclusions
 330% higher message rate compared to OVS
over DPDK
• 33M PPS VS. 7.6M PPS
• OVS Offload reach near line rate at 25G (37.2M PPS)
 Zero! CPU utilization on hypervisor compared to
4 cores with OVS over DPDK
• This delta will grow further with packet rate and link
speed
 Same CPU load on VM
33M PPS
7.6M PPS
0 Cores
4 Cores
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
5
10
15
20
25
30
35
OVS Offload OVS over DPDK
NumberofDedicatedCores
MillionPacketPerSecond
Message Rate Dedicated Hypervisor Cores

Accelerated Data Movement End to End: 25 is the New 10
One Switch. A World of Options.
Flexibility, Opportunities, Speed
Open Ethernet, Zero Packet Loss
Most Cost-Effective Ethernet Adapter
2.5X the Network Performance
Same Infrastructure, Same Connector
One Switch. A World of Options. 25G and 50G at Your Fingertips

Spectrum: The Ultimate 25/100GbE Switch
 The only predictable 25/50/100Gb/s Ethernet switch
 Full wire speed, non-blocking switch
• Doesn’t drop packets per RFC2544
 ZPL: ZeroPacketLoss for all packets sizes

25GbE to 25GbE Latency Test
Results
Not All Ethernet Switches Were Born Equal
5.2
8.4
9.6 9.7
0.3
0.9 1.0 1.1
64B 512B 1.5B 9KB
MaxBurstSize(MB)
Packet size
Microburst Absorption Capability
Spectrum Tomahawk
50
60
70
80
90
100
64
82
128
146
164
182
200
256
1518
9216
50
60
70
80
90
100
64
82
128
146
164
182
200
256
1518
9216
Packet Size (Bytes)Packet Size (Bytes)
Broadcom Spectrum
Microburst Absorption Fairness
Avoidable Packet Loss
Broadcom Spectrum
www.Mellanox.com/tolly
www.zeropacketloss.com
Consistently Low Latency

Open APIs
Open Composable Networks
Automation
End-to-End
Interconnect
Network
OS
Choice
SONiC

RDMA Acceleration – Overcome Transport Protocol Inefficiencies
ZERO Copy Remote Data Transfer
Low Latency, High Performance Data Transfers
InfiniBand - 100Gb/s RoCE* – 100Gb/s
Kernel Bypass Protocol Offload
* RDMA over Converged Ethernet
Application ApplicationUSER
KERNEL
HARDWARE
Buffer Buffer

RDMA Increases Memcached Performance
 Memcached: High Performance in-memory distributed memory object caching system
• Simple key-value store
• Speeds application by eliminating database access
• Used by YouTube, Facebook, Zynga, Twitter etc.
 RDMA improved Memcached performance:
• 1/3 query latency
• >3X throughput
D. Shankar, X. Lu, J. Jose, M.W. Rahman, N. Islam, and D.K. Panda, Can RDMA Benefit On‐Line Data Processing Workloads with Memcached and MySQL, ISPASS’15
OLDP workload
0
1
2
3
4
5
6
7
8
64 96 128 160 320 400
Latency(sec)
No. of Clients
Memcached-TCP Memcached-RDMA
0
500
1000
1500
2000
2500
3000
3500
64 96 128 160 320 400
Throughput(Kq/s)
No. of Clients
Memcached-TCP Memcached-RDMA
Reduced
by 66% Increased
by >200%

Case Studies

Server I/O Decides Affirmed Networks Virtual EPC Efficiency
 When server I/O is constrained, the Affirmed
MCC deployment efficiency can be
constrained, resulting in underutilized
resources and larger server footprint
 Mellanox 40G NIC enables MCC to fully utilize
CPU resources, reduce server footprint and
enhance efficiency.
MCM
CCM
DCM
ASM
WSM
IOM
MCM
CCM
DCM
ASM
WSM
IOM
1
2
1
2
N
1
2
N
1
2
N
1
2
N
1
2
N
MCM – Management Control Module
CCM – Centralized Control Module
DCM – Distributed Control Module
IOM – Input Output Module
WSM – Workflow Services Module (data plane)
ASM – Advanced Services Module (data plane)
MCC Cluster
IOM
IOM
IOM
WSM
WSM
WSM
SP Router
North-South traffic to and from MCC Cluster
East-West traffic within MCC Cluster
A Typical Datapath Traffic Pattern
A single “composite” virtualized network function with distributed
microservices that can scale in and out independently
To support
20Gbps of Cluster
I/O
With 10G NIC With 40G NIC
Number of Servers
Needed
4 1
An Example to Show Server Efficiency Improvement

SR-IOV & Data Plane Acceleration Essential for Affirmed MCC
HypervisorHypervisor
PHY
Native Open vSwitch (OVS) DPDK Accelerated vSwitch (AVS)
~20-30% Line Rate ~80% Line Rate
SR-IOV
Near Line Rate
VM VM VM
Hypervisor
Server NIC
OVS
PHY
Server NIC
OVS
DPDK
Lib
DPDK
Lib
PHY
Server NIC
OVS

Conclusion - The Mellanox EVN Differentiation
Higher
Workload
Density
Faster Data
Movement
Cloud-
native
Scalability
and
Reliability
Operation
and Cost
Efficiency

22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Service Providers

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a 22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Service Providers

Semelhante a 22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Service Providers (20)

Mais de Indonesia Network Operators Group

Mais de Indonesia Network Operators Group (20)

Último

Último (20)

22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Service Providers

Notas do Editor