How well does InfiniBand virtualized with SR-IOV really perform? SDSC carried out some initial application benchmarking studies and compared to the best-available commercial alternative to determine whether or not SR-IOV was a viable technology for closing the performance gap of virtualized HPC. The results were promising, and this technology will be used in Comet, SDSC's two-petaflop supercomputer being deployed in 2015.
Boost PC performance: How more available memory can improve productivity
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
1. SR-IOV: The Key Enabling Technology for
Fully Virtualized HPC Clusters!
Glenn K. Lockwood!
Christopher Irving!
Philip M. Papadopoulos!
Mahidhar Tatineni!
Rick Wagner!
SAN DIEGO SUPERCOMPUTER CENTER
2. Single Root I/O Virtualization in HPC!
• Problem: complex workflows demand increasing
flexibility from HPC platforms"
• Virtualization = flexibility"
• Virtualization = IO performance loss (e.g.,
excessive DMA interrupts)"
• Solution: SR-IOV and Mellanox ConnectX-3
InfiniBand HCAs "
• One physical function (PF) à multiple virtual
functions (VF), each with own DMA streams,
memory space, interrupts"
• Allows DMA to bypass hypervisor to VMs!
SAN DIEGO SUPERCOMPUTER CENTER
3. High-Performance Virtualization
on Comet !
• Mellanox FDR InfiniBand HCAs with SR-IOV"
• Rocks and OpenStack Nova to manage VMs"
• Flexibility to support complex science
gateways and web-based workflow engines"
• custom compute appliances and virtual clusters
developed with FutureGrid and their existing
expertise"
• backed by virtualized Lustre running over virtualized
InfiniBand"
SAN DIEGO SUPERCOMPUTER CENTER
4. Hardware/Software Configuration of Test Cluster !
Native, SR-IOV!
Amazon EC2!
Platform"
• Rocks 6.1 (EL6)"
• Virtualization via kvm
• Amazon Linux 2013.03 (EL6)"
• cc2.8xlarge Instances"
CPUs"
• 2x Xeon E5-2660 (2.2GHz)"
• 16 cores per node"
• 2x Xeon E5-2670 (2.6GHz)"
• 16 cores per node"
RAM"
• 64 GB DDR3 DRAM"
• 60.5 DDR3 DRAM"
Interconnect" • QDR4X InfiniBand"
• 10 GbE"
• Mellanox ConnectX-3 (MT27500)" • common placement group"
• Intel VT-d, SR-IOV enabled in
firmware, kernel, drivers"
• mlx4_core
1.1"
• Mellanox OFED 2.0"
• HCA firmware 2.11.1192"
SAN DIEGO SUPERCOMPUTER CENTER
5. 50x less latency than Amazon EC2!
• SR-IOV!
• < 30% overhead for M <
128 bytes"
• < 10% overhead for
eager send/recv"
• Overhead à 0% for
bandwidth-limited regime"
• Amazon EC2!
• > 5000% worse latency"
• Time dependent (noisy)"
5
SAN DIEGO SUPERCOMPUTER CENTER
OSU Microbenchmarks (3.9, osu_latency)"
6. 10x more bandwidth than Amazon EC2!
• SR-IOV!
• < 2% bandwidth loss
over entire range"
• > 95% peak bandwidth"
• Amazon EC2!
• < 35% peak bandwidth"
• 900% to 2500% worse
bandwidth than
virtualized InfiniBand"
6
SAN DIEGO SUPERCOMPUTER CENTER
OSU Microbenchmarks (3.9, osu_bw)"
7. Weather Modeling – 15% Overhead!
• 96-core (6-node)
calculation"
• Nearest-neighbor
communication"
• Scalable algorithms"
• SR-IOV incurs modest
(15%) performance hit"
• ...but still still 20%
faster*** than Amazon"
SAN DIEGO SUPERCOMPUTER CENTER
WRF 3.4.1 – 3hr forecast"
*** 20% faster despite SR-IOV cluster having 20% slower CPUs"
8. Quantum ESPRESSO: 5x Faster than EC2!
• 48-core, 3 node calc"
• CG matrix inversion
(irregular comm.)"
• 3D FFT matrix
transposes (All-to-all
communication)"
• 28% slower w/ SR-IOV"
• SR-IOV still > 500%
faster*** than EC2"
SAN DIEGO SUPERCOMPUTER CENTER
Quantum Espresso 5.0.2 – DEISA AUSURF112 benchmark"
*** 20% faster despite SR-IOV cluster having 20% slower CPUs"
9. Conclusions!
• SR-IOV: huge step forward in high-performance virtualization"
• Shows substantial improvement in latency over Amazon EC2, and
it provides nearly 0 bandwidth overhead!
• Benchmark application performance confirms this: significant
improvement over EC2"
• SR-IOV: lowers performance barrier to virtualizing the
interconnect and makes fully virtualized HPC clusters
viable!
• Comet will deliver virtualized HPC to new/non-traditional
communities that need flexibility without major loss of performance!
SAN DIEGO SUPERCOMPUTER CENTER