SlideShare uma empresa Scribd logo
1 de 19
LA-UR 10-05188 Implementation & Comparison of RDMA Over Ethernet Students:	   Lee Gaiser, Brian Kraus, and James Wernicke Mentors:	   Andree Jacobson, Susan Coulter, JharrodLaFon, and Ben McClelland
Summary Background Objective Testing Environment Methodology Results Conclusion Further Work Challenges Lessons Learned Acknowledgments References & Links Questions
Background : Remote Direct Memory Access (RDMA) RDMA provides high-throughput, low-latency networking: Reduce consumption of CPU cycles Reduce communication latency Images courtesy of http://www.hpcwire.com/features/17888274.html
Background : InfiniBand Infiniband is a switched fabric communication link designed for HPC: High throughput Low latency Quality of service Failover Scalability Reliable transport How do we interface this high performance link with existing Ethernet infrastructure?
Background : RDMA over Converged Ethernet (RoCE) Provide Infiniband-like performance and efficiency to ubiquitous Ethernet infrastructure. Utilize the same transport and network layers from IB stack and swap the link layer for Ethernet. Implement IB verbs over Ethernet. Not quite IB strength, but it’s getting close. As of OFED 1.5.1, code written for OFED RDMA auto-magically works with RoCE.
Objective We would like to answer the following questions: What kind of performance can we get out of RoCE on our cluster? Can we implement RoCE in software (Soft RoCE) and how does it compare with hardware RoCE?
Testing Environment Hardware: HP ProLiant DL160 G6 servers Mellanox MNPH29B-XTC 10GbE adapters 50/125 OFNR cabling Operating System: CentOS 5.3 2.6.32.16 kernel Software/Drivers: Open Fabrics Enterprise Distribution (OFED) 1.5.2-rc2 (RoCE) & 1.5.1-rxe (Soft RoCE) OSU Micro Benchmarks (OMB) 3.1.1 OpenMPI 1.4.2
Methodology Set up a pair of nodes for each technology: IB, RoCE, Soft RoCE, and no RDMA Install, configure & run minimal services on test nodes to maximize machine performance. Directly connect nodes to maximize network performance. Acquire latency benchmarks OSU MPI Latency Test Acquire bandwidth benchmarks OSU MPI Uni-Directional Bandwidth Test OSU MPI Bi-Directional Bandwidth Test Script it all to perform many repetitions
Results : Latency
Results : Uni-directional Bandwidth
Results : Bi-directional Bandwidth
Results : Analysis ,[object Object]
Up to 5.7x speedup in latency
Up to 3.7x increase in bandwidth
IB QDR vs. RoCE:
IB less than 1µs faster than RoCE at 128-byte message.
IB peak bandwidth is 2-2.5x greater than RoCE.,[object Object]
Further Work & Questions How does RoCE perform over collectives? Can we further optimize RoCE configuration to yield better performance? Can we stabilize the Soft RoCE configuration? How much does Soft RoCE affect the compute nodes ability to perform? How does RoCE compare with iWARP?
Challenges Finding an OS that works with OFED & RDMA: Fedora 13 was too new. Ubuntu 10 wasn’t supported. CentOS 5.5 was missing some drivers. Had to compile a new kernel with IB/RoCE support. Built OpenMPI 1.4.2 from source, but wasn’t configured for RDMA; used OpenMPI 1.4.1 supplied with OFED instead. The machines communicating via Soft RoCE frequently lock up during OSU bandwidth tests.

Mais conteúdo relacionado

Mais procurados

eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
 
What's Coming in CloudStack 4.19
What's Coming in CloudStack 4.19What's Coming in CloudStack 4.19
What's Coming in CloudStack 4.19ShapeBlue
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveMichal Rostecki
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumMichal Rostecki
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
Building DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNBuilding DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNCisco Canada
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
kubernetes - minikube - getting started
kubernetes - minikube - getting startedkubernetes - minikube - getting started
kubernetes - minikube - getting startedMunish Mehta
 
OpenStack Quantum Intro (OS Meetup 3-26-12)
OpenStack Quantum Intro (OS Meetup 3-26-12)OpenStack Quantum Intro (OS Meetup 3-26-12)
OpenStack Quantum Intro (OS Meetup 3-26-12)Dan Wendlandt
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Kubernetes Deployment Strategies
Kubernetes Deployment StrategiesKubernetes Deployment Strategies
Kubernetes Deployment StrategiesAbdennour TM
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18CodeOps Technologies LLP
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Community
 

Mais procurados (20)

eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
What's Coming in CloudStack 4.19
What's Coming in CloudStack 4.19What's Coming in CloudStack 4.19
What's Coming in CloudStack 4.19
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
DPDK & Cloud Native
DPDK & Cloud NativeDPDK & Cloud Native
DPDK & Cloud Native
 
Building DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNBuilding DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPN
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
kubernetes - minikube - getting started
kubernetes - minikube - getting startedkubernetes - minikube - getting started
kubernetes - minikube - getting started
 
OpenStack Quantum Intro (OS Meetup 3-26-12)
OpenStack Quantum Intro (OS Meetup 3-26-12)OpenStack Quantum Intro (OS Meetup 3-26-12)
OpenStack Quantum Intro (OS Meetup 3-26-12)
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Kubernetes Deployment Strategies
Kubernetes Deployment StrategiesKubernetes Deployment Strategies
Kubernetes Deployment Strategies
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18Kubernetes Networking - Sreenivas Makam - Google - CC18
Kubernetes Networking - Sreenivas Makam - Google - CC18
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 

Semelhante a Implementation & Comparison Of Rdma Over Ethernet

International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802Mumbai Academisc
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Ceph Community
 
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...Principled Technologies
 
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetIOSR Journals
 
Network Analysis & Designing
Network Analysis & DesigningNetwork Analysis & Designing
Network Analysis & DesigningPawan Sharma
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox Technologies
 
Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2Vijay Tolani
 
Madge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideMadge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideRonald Bartels
 
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™Multapplied Networks
 
Ccna interview questions
Ccna interview questions Ccna interview questions
Ccna interview questions Hub4Tech.com
 
Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...ieeeprojectschennai
 
Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Jaipal Dhobale
 
Networking & Servers
Networking & ServersNetworking & Servers
Networking & ServersBecky Holden
 
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationTurbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationRadisys Corporation
 
Top 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdfTop 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdfHub4Tech.com
 

Semelhante a Implementation & Comparison Of Rdma Over Ethernet (20)

International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802Bandwidth estimation for ieee 802
Bandwidth estimation for ieee 802
 
Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance Deploying flash storage for Ceph without compromising performance
Deploying flash storage for Ceph without compromising performance
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
Opt for modern 100Gb Broadcom 57508 NICs in your Dell PowerEdge R750 servers ...
 
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit EthernetPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet
 
Network Analysis & Designing
Network Analysis & DesigningNetwork Analysis & Designing
Network Analysis & Designing
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirements
 
Linkage aggregation
Linkage aggregationLinkage aggregation
Linkage aggregation
 
Mellanox High Performance Networks for Ceph
Mellanox High Performance Networks for CephMellanox High Performance Networks for Ceph
Mellanox High Performance Networks for Ceph
 
Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2Why 10 Gigabit Ethernet Draft v2
Why 10 Gigabit Ethernet Draft v2
 
Madge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideMadge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application Guide
 
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
Multapplied Networks - Bonding and Load Balancing together in Bonded Internet™
 
Ccna interview questions
Ccna interview questions Ccna interview questions
Ccna interview questions
 
Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...Cooperation without synchronization practical cooperative relaying for wirele...
Cooperation without synchronization practical cooperative relaying for wirele...
 
Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Networking & Servers
Networking & ServersNetworking & Servers
Networking & Servers
 
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationTurbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
 
Top 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdfTop 20 ccna interview questions and answers pdf
Top 20 ccna interview questions and answers pdf
 

Implementation & Comparison Of Rdma Over Ethernet

  • 1. LA-UR 10-05188 Implementation & Comparison of RDMA Over Ethernet Students: Lee Gaiser, Brian Kraus, and James Wernicke Mentors: Andree Jacobson, Susan Coulter, JharrodLaFon, and Ben McClelland
  • 2. Summary Background Objective Testing Environment Methodology Results Conclusion Further Work Challenges Lessons Learned Acknowledgments References & Links Questions
  • 3. Background : Remote Direct Memory Access (RDMA) RDMA provides high-throughput, low-latency networking: Reduce consumption of CPU cycles Reduce communication latency Images courtesy of http://www.hpcwire.com/features/17888274.html
  • 4. Background : InfiniBand Infiniband is a switched fabric communication link designed for HPC: High throughput Low latency Quality of service Failover Scalability Reliable transport How do we interface this high performance link with existing Ethernet infrastructure?
  • 5. Background : RDMA over Converged Ethernet (RoCE) Provide Infiniband-like performance and efficiency to ubiquitous Ethernet infrastructure. Utilize the same transport and network layers from IB stack and swap the link layer for Ethernet. Implement IB verbs over Ethernet. Not quite IB strength, but it’s getting close. As of OFED 1.5.1, code written for OFED RDMA auto-magically works with RoCE.
  • 6. Objective We would like to answer the following questions: What kind of performance can we get out of RoCE on our cluster? Can we implement RoCE in software (Soft RoCE) and how does it compare with hardware RoCE?
  • 7. Testing Environment Hardware: HP ProLiant DL160 G6 servers Mellanox MNPH29B-XTC 10GbE adapters 50/125 OFNR cabling Operating System: CentOS 5.3 2.6.32.16 kernel Software/Drivers: Open Fabrics Enterprise Distribution (OFED) 1.5.2-rc2 (RoCE) & 1.5.1-rxe (Soft RoCE) OSU Micro Benchmarks (OMB) 3.1.1 OpenMPI 1.4.2
  • 8. Methodology Set up a pair of nodes for each technology: IB, RoCE, Soft RoCE, and no RDMA Install, configure & run minimal services on test nodes to maximize machine performance. Directly connect nodes to maximize network performance. Acquire latency benchmarks OSU MPI Latency Test Acquire bandwidth benchmarks OSU MPI Uni-Directional Bandwidth Test OSU MPI Bi-Directional Bandwidth Test Script it all to perform many repetitions
  • 12.
  • 13. Up to 5.7x speedup in latency
  • 14. Up to 3.7x increase in bandwidth
  • 15. IB QDR vs. RoCE:
  • 16. IB less than 1µs faster than RoCE at 128-byte message.
  • 17.
  • 18. Further Work & Questions How does RoCE perform over collectives? Can we further optimize RoCE configuration to yield better performance? Can we stabilize the Soft RoCE configuration? How much does Soft RoCE affect the compute nodes ability to perform? How does RoCE compare with iWARP?
  • 19. Challenges Finding an OS that works with OFED & RDMA: Fedora 13 was too new. Ubuntu 10 wasn’t supported. CentOS 5.5 was missing some drivers. Had to compile a new kernel with IB/RoCE support. Built OpenMPI 1.4.2 from source, but wasn’t configured for RDMA; used OpenMPI 1.4.1 supplied with OFED instead. The machines communicating via Soft RoCE frequently lock up during OSU bandwidth tests.
  • 20. Lessons Learned Installing and configuring HPC clusters Building, installing, and fixing Linux kernel, modules, and drivers Working with IB, 10GbE, and RDMA technologies Using tools such as OMB-3.1.1 and netperf for benchmarking performance
  • 21. Acknowledgments Andree Jacobson Susan Coulter JharrodLaFon Ben McClelland Sam Gutierrez Bob Pearson
  • 23. References & Links Submaroni, H. et al. RDMA over Ethernet – A Preliminary Study. OSU. http://nowlab.cse.ohio-state.edu/publications/conf-presentations/2009/subramoni-hpidc09.pdf Feldman, M. RoCE: An Ethernet-InfiniBand Love Story. HPCWire.com. April 22, 2010.http://www.hpcwire.com/blogs/RoCE-An-Ethernet-InfiniBand-Love-Story-91866499.html Woodruff, R. Access to InfiniBand from Linux.Intel. October 29, 2009. http://software.intel.com/en-us/articles/access-to-infiniband-from-linux/ OFED 1.5.2-rc2http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/OFED-1.5.2-rc2.tgz OFED 1.5.1-rxehttp://www.systemfabricworks.com/pub/OFED-1.5.1-rxe.tgz OMB 3.1.1http://mvapich.cse.ohio-state.edu/benchmarks/OMB-3.1.1.tgz

Notas do Editor

  1. Introduce yourself, the institute, your teammates, and what you’ve been working on for the past two months.
  2. Rephrase these bullet points with a little more elaboration.
  3. Emphasize how RDMA eliminates unnecessary communication.
  4. Explain that we are using IB QDR and what that means.
  5. Emphasize that the biggest advantage of RoCE is latency, not necessarily bandwidth. Talk about 40Gb & 100Gb Ethernet on the horizon.
  6. OSU benchmarks were more appropriate than netperf
  7. The highlight here is that latency between IB & RoCE differs by only 1.7us at 128 byte messages. It continues to be very close up through 4K messages. Also notice that latency in RoCE and no RDMA converge at higher messages.
  8. Note that RoCE & IB are very close up to 1K message size. IB QDR peaks out at 3MB/s, RoCE at 1.2MB/s
  9. Note that the bandwidth trends are similar to the uni-directional bandwidths. Explain that Soft RoCE could not complete this test. IB QDR peaks at 5.5 MB/s and RoCE peaks at 2.3 MB/s.