O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

DPDK Summit - 08 Sept 2014 - VMware and Intel - Using DPDK In A Virtual World

9.694 visualizações

Publicada em

Bhavesh Davda, VMware, and Rashmin Patel, Intel present Using DPDK in a Virtual World. They look at the challenges and approach for virtualizing the data plane. They also talk about new VMware ESXi key features and performance results.

Publicada em: Software
  • Entre para ver os comentários

DPDK Summit - 08 Sept 2014 - VMware and Intel - Using DPDK In A Virtual World

  1. 1. Bhavesh Davda (Sr. Staff Engineer, CTO Office, VMware) Rashmin Patel (DPDK Virtualization Engineer, Intel) DPDK Summit 2014 DPDK in a Virtual World
  2. 2. 2 TRANSFORMING NETWORKING & STORAGE •Data Plane Virtualization Trends • DPDK Virtualization Support • VMware ESXi : Key Features and Performance Agenda
  3. 3. 3 TRANSFORMING NETWORKING & STORAGE •Data Plane Virtualization Trends • DPDK Virtualization Support • VMware ESXi : Key Features and Performance Agenda
  4. 4. 4 TRANSFORMING NETWORKING & STORAGE Data Plane Virtualization Trends Intel® Architecture Hypervisor Compute Virtual Machine Data Plane Appliance VM Intel DPDK VT-d Network Interface Network Interface Intel® Architecture Hypervisor Compute or Offload Virtual Machine Data Plane Appliance VM Intel DPDK VT-d Virtual Switch Network Interface Network Interface SR-IOV Virtualization for Directed I/O Packets are routed to Virtual Machine using DirectPath I/O. Limited flexibility but native performance Intel® Architecture Hypervisor Compute or Data Plane Virtual Machine Data Plane Appliance VM Intel DPDK Optimized Virtual Switch Network Interface Hybrid Switching Solution, combining vSwitch support with direct assignment of SR-IOV Virtual Function Optimized Virtual Switching solution, combining flexibility with performance. Support for live migration and data plane performance. Standalone appliance integration, Firewall, WAN acceleration, Traffic Shaping Service Chaining, Unified Threat Management, Intrusion Detection / Prevention Increasing flexibility through high performance soft switching supporting both communications and compute workloads
  5. 5. 5 TRANSFORMING NETWORKING & STORAGE •Data Plane Virtualization Trends • DPDK Virtualization Support • VMware ESXi : Key Feature and Performance Agenda
  6. 6. 6 TRANSFORMING NETWORKING & STORAGE VMware ESXi DPDK Virtualization Support Virtual Machine Niantic 82599 Virtual Function PMD Fortville Virtual Function PMD E1000 Emulated Device (Intel® 82540 Ethernet Controller) PMD Virtio Para-virtual Device (Qumranet Device) PMD IVSHMEM Shared Memory Interface Virtqueue-GrantTable Interface in Xen DomU E1000 Emulated Device (Intel® 82545EM Ethernet Controller) PMD E1000E Emulated Device (Intel® 82574 Ethernet Controller) PMD Vmxnet3 Para-virtual Device PMD Hypervisor/Host Niantic 82599 Physical Function Driver Fortville Physical Function Driver Userspace-Vhost Backend support IVSHMEM Backend support Intel Architecture Hypervisor Fedora VM Ubuntu VM CentOS VM Software Switch DPDK DPDK DPDK VT-d
  7. 7. 7 TRANSFORMING NETWORKING & STORAGE VMware ESXi Virtual Interfaces vSS/vDS/NSX – A software switch Emulated devices •E1000 (Intel® 82545EM) •E1000E (Intel® 82574) •VLance (AMD PCnet32) Para-virtual devices •VMXNET interface •VMXNET2 interface •VMXNET3 interface Direct assignment – vSwitch bypass Intel® VT-d / IOMMU required DirectPath I/O Full PCI function SR-IOV PCI VF assignment Incompatible with virtualization features: vMotion, HA, FT, snapshots, network virtualization overlays (VXLAN/STT/Geneve) VMware ESXi IA VM 0 DPDK EM/IGB/IXGBE-PMD TX RX 1G/10G/40G/SRIOV-VF VM 1 DPDK EM-PMD VM 2 DPDK VMXNET3-PMD vmware/ixgbe vmware/igb vmware/xyz TX RX e1000 TX RX e1000e TX RX VMXNET3 vSwitch / vDS / NSX Virtual Interface Backend VT-d 10000
  8. 8. 8 TRANSFORMING NETWORKING & STORAGE VMware ESXi RX Side TX Side vSwitch Emulation Backend Guest OS E1000 Driver RX Side TX Side Packet Buffer Memory VMware ESXi Emulated Device (E1000) Device Emulation: VMEXIT-VMENTRY Emulated Device Enqueue TX desc Dequeue RX desc Dequeue desc Free mbuf Blank mbuf Desc update only Transmit Queue Tx Descriptor Ring Receive Queue Rx Descriptor Ring Allocate mbuf Enqueue desc RX packet Desc update TX packet Every emulated PCI MMIO access from a VM results in VM exit 5000
  9. 9. 9 TRANSFORMING NETWORKING & STORAGE VMware ESXi RX Side TX Side vSwitch VMXNET3 Backend Guest OS VMXNET3 Driver RX Side TX Side Packet Buffer Memory VMware ESXi Paravirtual Device (VMXNET3) VMXNET3 Device Shared Memory Allocate mbuf Enqueue desc Enqueue TX desc Dequeue RX desc Dequeue desc Free mbuf Blank mbuf RX packet Desc update TX packet Desc update only Transmit Queue Command Ring Completion Ring Receive Queue Command Ring 0 Command Ring 1 Completion Ring Reduced VM exits by moving Queue Control Registers in Shared Memory and passing hints from ESXi to Guest OS 5000
  10. 10. 10 TRANSFORMING NETWORKING & STORAGE ESXi 5.5 - VMXNET3 vs E1000 Optimized Rx/Tx queues handling in VMXNET3 controlled through shared memory region – reduced VM exits compared to E1000’s inefficient MMIO emulation Multiqueue infrastructure of VMXNET3 with RSS capability enhance the performance with Multicores in a VM Intel® Architecture ESXi Hypervisor Virtual Machine vSwitch e1000 vmxnet3 Intel DPDK EM-PMD Virtual Machine Intel DPDK VMXNET3-PMD Network Interface Traffic Generator 0 20000000 40000000 60000000 80000000 100000000 120000000 140000000 EPT VMEXIT Total VMEXIT DPDK VMXNET3 DPDK E1000 Intel® Xeon® Processor E5-4610 v2 (16M Cache, 2.30 GHz) VM exit/ VM entry frequency •Average cost for a VM exit/VM entry sequence includes ~600 cycles for VMCALL instruction. Average cost for EPT violation ~1000 cycles 97% Reduction of VM exits associated with DPDK based Packet forwarding benchmark
  11. 11. 11 TRANSFORMING NETWORKING & STORAGE Hypervisor Backend Impact VM exit reduction doesn’t translate to big difference in packet throughput; Hypervisor Backend and Native Networking stack needs optimizations Traffic Flow: Traffic Gen. -> vSwitch -> VMXNET3 (or E1000) -> VM (DPDK) -> VMXNET3 (or E1000) -> vSwitch -> Traffic Gen.
  12. 12. 12 TRANSFORMING NETWORKING & STORAGE Device Model Research Traffic Flow: Traffic Gen. -> 10G (VT-d) -> VM1 (DPDK) -> VMXNET3 (or E1000) -> vSwitch -> VMXNET3 (or E1000) -> VM2 (DPDK) -> 10G (VT-d) -> Traffic Gen. Important to understand for designing changes in device model for future ESXi releases
  13. 13. 13 TRANSFORMING NETWORKING & STORAGE •Data Plane Virtualization Trends • DPDK Virtualization Support • VMware ESXi : Key Features and Performance Agenda
  14. 14. 14 TRANSFORMING NETWORKING & STORAGE Partitioning Run multiple operating systems on one physical machine Divide system resources between virtual machines Key Properties of Virtual Machines
  15. 15. 15 TRANSFORMING NETWORKING & STORAGE Key Properties of Virtual Machines : Continued Partitioning Run multiple operating systems on one physical machine Divide system resources between virtual machines Isolation Fault and security isolation at the hardware level Advanced resource controls preserve performance VMware
  16. 16. 16 TRANSFORMING NETWORKING & STORAGE VMware Key Properties of Virtual Machines : Continued Partitioning Run multiple operating systems on one physical machine Divide system resources between virtual machines Isolation Fault and security isolation at the hardware level Advanced resource controls preserve performance Encapsulation Entire state of the virtual machine can be saved to files Move and copy virtual machines as easily as moving and copying files
  17. 17. 17 TRANSFORMING NETWORKING & STORAGE VMware Key Properties of Virtual Machines : Continued Encapsulation Entire state of the virtual machine can be saved to files Move and copy virtual machines as easily as moving and copying files Partitioning Run multiple operating systems on one physical machine Divide system resources between virtual machines Isolation Fault and security isolation at the hardware level Advanced resource controls preserve performance Hardware Independence Provision or migrate any virtual machine to any similar or different physical server
  18. 18. 18 TRANSFORMING NETWORKING & STORAGE ESXi Networking Architecture Overview 20000
  19. 19. 19 TRANSFORMING NETWORKING & STORAGE Interconnect ESXi Networking Datapath Overview Network Switch VMs Virtual Switch NIC Server Management Agents Background Tasks ESXi Hypervisor Message copy from application to GOS (kernel) GOS (network stack) + vNIC driver queues packet for vNIC VM exit to VMM/Hypervisor vNIC implementation emulates DMA from VM, sends to vSwitch vSwitch queues packet for pNIC pNIC DMAs packet and transmits on the wire 5000
  20. 20. 20 TRANSFORMING NETWORKING & STORAGE Transmit Processing for a VM One transmit thread per VM, executing all parts of the stack •Transmit thread can also execute receive path for destination VM Activation of transmit thread: Two mechanisms •Immediate, forcible activation by VM (low delay, expensive) •Opportunistic activation by other threads or when VM halts (potentially higher delay, cheap) PNIC Driver Network Virtualization Layer VM vNIC VM vNIC VM vNIC DevEmu DevEmu DevEmu Ring Entries To Packets Packets to Ring Entries Switching, Encapsulation, Teaming, … Destination VM Packets To Ring Entries VMKcall Opport. 0
  21. 21. 21 TRANSFORMING NETWORKING & STORAGE Receive Processing For a VM  One thread per device  NetQueue enabled devices: one thread per NetQueue Each NetQueue processes traffic for one or more MAC addresses (vNICs) •NetQueue  vNIC mapping determined by unicast throughput and FCFS.  vNICs can share queues •due to low throughput, too many vNICs or Queue type mismatch (LRO Queue vs. non-LRO VNIC) PNIC Driver Network Virtualization Layer VM vNIC VM vNIC VM vNIC DevEmu DevEmu DevEmu Dedicated Queue Shared Queue 0
  22. 22. 22 TRANSFORMING NETWORKING & STORAGE Improve receive throughput to a single VM Single thread for receives can become bottleneck at high packet rates (> 1 Million PPS or > 15Gbps) Use VMXNET3 virtual device, Enable RSS inside Guest Enable RSS in Physical NICs (only available on some PNICs) Add ethernetX.pnicFeatures = “4” to vmx file Side effects: Increased CPU Cycles/Byte PNIC NetVirt VM vNIC DevEmu PNIC NetVirt DevEmu VM vNIC vCPU0 vCPUn 0
  23. 23. 23 TRANSFORMING NETWORKING & STORAGE Improve transmit throughput with multiple vNICs Some applications use multiple vNICs for very high throughput Common transmit thread for all vNICs can become bottleneck Set ethernetX.ctxPerDev = 1 in vmx file Side effects: Increased CPU Cost/Byte PNIC NetVirt VM vNIC vNIC DevEmu DevEmu PNIC NetVirt VM vNIC vNIC DevEmu DevEmu 0
  24. 24. 24 TRANSFORMING NETWORKING & STORAGE The Latency Sensitivity Feature in vSphere 5.5 Minimize virtualization overhead, near bare-metal performance New virtual machine property: “Latency sensitivity” •High => lowest latency •Medium => low latency Exclusively assign physical CPUs to virtual CPUs of “Latency Sensitivity = High” VMs •Physical CPUs not used for scheduling other VMs or ESXi tasks Idle in Virtual Machine monitor (VMM) when Guest OS is idle •Lowers latency to wake up the idle Guest OS, compared to idling in ESXi vmkernel Disable vNIC interrupt coalescing For DirectPath I/O, optimize interrupt delivery path for lowest latency Make ESXi vmkernel more preemptible •Reduces jitter due to long-running kernel code Latency-sensitivity Feature Latency-sensitivity Hypervisor Hypervisor
  25. 25. 25 TRANSFORMING NETWORKING & STORAGE 8 16 32 64 128 256 512 Median 90-Pct 99-Pct 99.9-Pct 99.99-Pct Response Time (us) VM-Default VM-SRIOV VM-Latency-SRIOV Native ESXi 5.5 Network Latencies and Jitter Single 2 vCPU VM: RHEL 6.2 x64 ‘ping –i 0.001’ RTT Intel® Xeon E5-2640 @ 2.50 GHz, Intel 82599EB PNIC
  26. 26. 26 TRANSFORMING NETWORKING & STORAGE ESXi 5.5 ultra-low latency w/ InfiniBand DirectPath I/O 0.5 1 2 4 8 16 32 64 128 256 512 1024 2048 Latency (us) Message Size (Bytes) Polling-RDMA-Write-Latency Native ESXi 5.5 HP* Proliant* DL380 Gen 8, Intel Xeon E5-2667v2: 2 x 8 @ 3.0Ghz 2 x 64 GB Memory Mellanox* ConnectX-3 FDR InfiniBand*/RoCE adapter Mellanox OFED (OpenFabrics Enterprise Distribution) ver 2.2
  27. 27. 27 TRANSFORMING NETWORKING & STORAGE ESXi 5.5 packet rates with Intel® DPDK 0 5 10 15 20 25 64 128 256 512 768 1024 1280 1512 Million Packets per Second (Mpps) Packet Size (Bytes) Linux Native DPDK Native DPDK Virtualized SR-IOV DPDK Virtualized Vmxnet3 •Intel® VT-d IOTLB Limitation •Close gap in future vSphere release with Intel® VT-d SuperPage support (Intel® E5-2600 v2 and newer) Focus area for future vSphere release Typical for Telco Core Networks Typical for Enterprise Data center Networks Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d) Intel® Data Plane Development Kit (Intel® DPDK) L3 Forwarding on VMware ESXi 5.5 Intel® Server Board S2600CP IVT L1 @ 2.3GHz Intel® 10GbE Ethernet Controller
  28. 28. 28 TRANSFORMING NETWORKING & STORAGE VLAN L2 L3 Virtual Network L2 NSX – Network Virtualization Platform Physical Network vSphere Host vSphere Host KVM Xen Server NSX vSwitch NSX vSwitch Open vSwitch Open vSwitch Hardware Software Controller Cluster VLAN VTEP API HW Partner Northbound NSX API Cloud Management Platform NSX Edge DPDK 30000
  29. 29. 29 TRANSFORMING NETWORKING & STORAGE •Virtual Appliances performing Network Functions needs high performance on commodity x86 based server platforms •Virtualized interfaces supported in DPDK offer multiple options and flexibility for data plane application developers •ESXi hypervisor supports multiple interfaces, including para-virtual and emulated interfaces while offering best in class virtualization features, as well as direct assigned interfaces via DirectPath I/O and SR-IOV Summary
  30. 30. 30 TRANSFORMING NETWORKING & STORAGE Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf Intel® Data Plane Development Kit (Intel® DPDK) with VMware vSphere http://www.vmware.com/files/pdf/techpaper/intel-dpdk-vsphere-solution-brief.pdf Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5 http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf The CPU Scheduler in VMware vSphere 5.1 https://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vSphere 5 https://labs.vmware.com/academic/publications/ib-researchnote-apr2012 vSphere ESXi Performance Related References
  31. 31. 32 TRANSFORMING NETWORKING & STORAGE Backup
  32. 32. 33 TRANSFORMING NETWORKING & STORAGE VMware vSphere Sample Features Protect Against Failures Eliminate Planned Downtime Distributed Resource Scheduler
  33. 33. 34 TRANSFORMING NETWORKING & STORAGE Getting started: vSphere Hypervisor Minimum Hardware Requirements: Creating Virtual Machines: vSphere Deployment Architecture:
  34. 34. 35 TRANSFORMING NETWORKING & STORAGE Improve Multicast throughput to multiple VMs Multicast receive traffic: single thread has to duplicate packets for ALL VMs Set “ethernetX.emuRxMode = 1” •Receive thread queues packets with the device emulation layer •Per-VM thread picks up packets and carries out receive processing Side effects: Increased receive throughput to single VM, Increased CPU Cycles/Byte PNIC NetVirt VM vNIC DevEmu VM vNIC DevEmu PNIC NetVirt VM vNIC DevEmu VM vNIC DevEmu T0 T0 T1 T2 T0

×