This document discusses Mellanox's Efficient Virtual Network (EVN) solution for service providers. It begins with an overview of Mellanox's end-to-end interconnect solutions and portfolio. It then discusses how the cloud-native NFV architecture requires an efficient virtual network. The EVN is introduced as the foundation for efficient telco cloud infrastructure. The document provides details on how SR-IOV and DPDK can be used together with Mellanox NICs to achieve near line-rate performance without CPU overhead. It also discusses how overlay networks can be accelerated using overlay network accelerators in NICs. Benchmark results show the EVN approach achieving higher performance and lower CPU utilization compared to alternative solutions.
Another aspect of overlay network is switch VXLAN tunnel endpoint or (VTEP).
Our approach is VTEP should be implemented on the NIC as much as possible as it allow better scaling and performance.
But in some case such as connecting bare metal servers or connect VXLAN to VLAN networks it is not possible.
For such cases VTEP needs to be implemented on the switch rather than on the NIC.
We will start supporting switch VTEP together with our Cumulus release in 2Q16. MLNX-OS will follow later this year.
This is how packets are forwarded by OVS in a paravirtualized environment.
Both the vswitchd slow path and OVS kernel module fast path are running in CPU.
OVS performs flow-based forwarding, the first packet of a new flow will hit the OVS kernel module, and result in a match miss and punted to the user space vswitch deamon. The vswitchd will resolve the flow entry, possibly with the help from an SDN controller, and program that flow entry to the fast path, the OVS kernel module. Subsequently, the following packets in the same flow will hit the flow entry programmed into the OVS kernel module and the packet forwarding is then executed in the kernel fast path.
With OVS Offload, we add the fast and efficient eSwitch in the NIC into the picture. This is how networking on routers and switches have evolved over the last 20-25 years. As I said, no router or switch from any reputable networking vendor today is doing packet forwarding with the CPU any more. Instead, packet processing and forwarding are offloaded to a hardware fast path, normally implemented in ASICs or network processors.
A new flow will result in a âmissâ action in eSwitch and is directed to OVS kernel module
Miss in kernel will punt the packet to OVS-vswitchd in user space
OVS-vswitchd will resolve the flow entry, and based on a policy decision to offload, propagate that to corresponding eSwitch tables for offload-enabled flows
Subsequent frames of offload-enabled flows will be processed and forwarded by eSwitch
SR-IOV and OVS were like oil and water,
Architecture design takes into consideration both VMs directly attached to Virtual Functions (VF) and Paravirt (PV) VMs
VF representors are a netdev modeling of eSwitch ports
The VF representor will support the following operations
Flow configuration
Flow statistics read
Send/receive packet (from the host CPU to VF)
Three immediate benefits here, as you can expect, much higher performance, significantly lower CPU overhead, and software defined.
The high performance refers to high throughput, not only bit throughput but packet throughput, which is shown as how fast the system can process packets, something that is really important for virtualized network functions, especially those real time multimedia applications. High performance is also reflected in low and deterministic latency.
Offload is only getting more important as server I/O speed goes up. At 10G I/O, you are looking at a theoretical max of 15 million packets per second, and if you throw in a few CPU cores, you might be able to achieve decent packet rate with software forwarding. Weâve heard numbers like 9 million packets per second with 4 CPU core. But at 25G, the theoretical max packet rate is 37.5 million pps, at 40G, 60 million pps, and at 100G, that will be 150 million pps and software will not be able to catch up even if you want to dedicate all your CPU cores to process packets instead of actually running applications to do service processing.
OVS offload can lower your CPU overhead and free up CPU from doing packet processing to running applications, resulting in higher infrastructure efficiency.
The hardware offloads are transparent to the user and Open vSwitch interfaces to the user remain untouched, so users donât need changes in their environment at all. The interaction between SDN controller and OVS remains the same and you get the best of both worlds, SDN at full speed.
Last but not least, we are contributing all changes to OVS and Openstack upstream, so you donât need to run yet another proprietary version of OVS to take advantage of these offload capabilities.
Now that you understand how the hyperscalers build their cloud network infrastructure, you might start thinking, OK,, that is great, but I donât have the manpower and large number of software developers to follow this model. The good news is, Mellanox and our ecosystem partners make things easy for you. We have a solution called Open Composable Networks that can provide you a set of high-performance, highly programmable networking components including switches, server adapters, optical modules and cables, network processors, which support open APIs such as SAI and switchdev for Linux, and on top of these standard interfaces, you have a slew of network operating system and software application choices. As a matter of fact, in this yearâs OCP Summit last month, we did a live demo of 5 different network operating systems running over our flagship Spectrum switches. We also provide the middleware that make it easy to compose your ideal cloud network infrastructure, and simple to monitor, manage and scale.
Illustrated with OLDP (On-Line Data Processing) workload using modified mysqlslap load testing tool.
Memcached in its implementation uses sockets. We show the advantages one could get by using RDMA in this environment.
KQ/s stands for Kilo Queries per Second
The above numbers are generated on Infiniband QDR.
IOM module takes in packets from service provider gateway router (north-south traffic ), and distribute them to one of the datapath modules such as WSM (east-west traffic).
WSM will process packets and send them back to IOM which in turn sends the traffic back to SP router.
For IOM to take in X Gbps traffic from the SP router, it requires the server I/O to be able to handle 2X Gbps traffic.