O slideshow foi denunciado.
Thought Leadership White Paper
IBM Systems and Technology Group November 2012
Could the “C” in HPC
stand for Cloud?
By Chr...
2 Could the “C” in HPC stand for Cloud?
Introduction
Most IaaS (infrastructure as a service) vendors such as
Rackspace, Am...
IBM Systems and Technology Group 3
most hypervisors allows opportunistic workloads to use high
value assets by even after ...
4 Could the “C” in HPC stand for Cloud?
Platform Cluster Manager – Advanced Edition optimizes
infrastructure resources dyn...
IBM Systems and Technology Group 5
•	 diff_hyp: Refers to virtual machine to virtual machine
communication occurring betwe...
6 Could the “C” in HPC stand for Cloud?
IOzone
IOzone is a file system benchmarking tool, which generates
and measures a v...
IBM Systems and Technology Group 7
Figure 5 shows the difference in compilation performance for a
physical machine running...
8 Could the “C” in HPC stand for Cloud?
Figure 7: Parallel ABAQUS standard (s6.inp)
As expected, the additional layers of ...
IBM Systems and Technology Group 9
Figure 9: Distributed parallel FLUENT 12.1 (sedan_4m - 100 iterations)
Though CFD codes...
10 Could the “C” in HPC stand for Cloud?
The MPP-DYNA application responds well when run in a low
latency environment. Thi...
Notes
For more information
To learn more about IBM Platform Computing, please
contact your IBM marketing representative or IBM
B...
Próximos SlideShares
Carregando em…5
×

Could the “C” in HPC stand for Cloud?

313 visualizações

Publicada em

Could the “C” in HPC stand for Cloud?This paper examines aspects of computing important in HPC (compute and network bandwidth, compute and network latency, memory size and bandwidth, I/O, and so on) and how they are affected by various virtualization technologies. For more information on IBM Systems, visit http://ibm.co/RKEeMO.

Visit the official Scribd Channel of IBM India Smarter Computing at http://bit.ly/VwO86R to get access to more documents.

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Could the “C” in HPC stand for Cloud?

  1. 1. Thought Leadership White Paper IBM Systems and Technology Group November 2012 Could the “C” in HPC stand for Cloud? By Christopher N. Porter, IBM Corporation porterc@us.ibm.com
  2. 2. 2 Could the “C” in HPC stand for Cloud? Introduction Most IaaS (infrastructure as a service) vendors such as Rackspace, Amazon and Savvis use various virtualization technologies to manage the underlying hardware they build their offerings on. Unfortunately the virtualization technologies used vary from vendor to vendor and are sometimes kept secret. Therefore, the question about virtual machines versus physical machines for high performance computing (HPC) applications is germane to any discussion of HPC in the cloud. This paper examines aspects of computing important in HPC (compute and network bandwidth, compute and network latency, memory size and bandwidth, I/O, and so on) and how they are affected by various virtualization technologies. The benchmark results presented will illuminate areas where cloud computing, as a virtualized infrastructure, is sufficient for some workloads and inappropriate for others. In addition, it will provide a quantitative assessment of the performance differences between a sample of applications running on various hypervisors so that data-based decisions can be made for datacenter and technology adoption planning. A business case for HPC clouds HPC architects have been slow to adopt virtualization technologies for two reasons: 1. The common assumption that virtualization impacts application performance so severely that any gains in flexibility are far outweighed by the loss of application throughput. 2. Utilization on traditional HPC infrastructure is very high (between 80 - 95 percent).Therefore, the typical driving business cases for virtualization (for example, utilization of hardware, server consolidation or license utilization) simply did not hold significant enough merit to justify the added complexity and expense of running workload in virtualized resources. In many cases, however, HPC architects would be willing to lose some small percentage of application performance to achieve the flexibility and resilience that virtual machine based computing would allow. There are several reasons architects may make this compromise, including: • Security: Some HPC environments require data and host isolation between groups of users or even between the users themselves. In these situations VMs and VLANs can be used in consort to isolate users from each other and isolate data to the users who should have access to it. • Application stack control: In a mixed application environment where multiple applications share the same physical hardware, it can be difficult to satisfy the configuration requirements of each application, including OS versions, updates and libraries. Using virtualization makes that task easier since the whole stack can be deployed as part of the application. • High value asset maximization: In a heterogeneous HPC system the newest machines are often in highest demand. To manage this demand, some organizations use a reservation system to minimize conflicts between users. When using VMs for computing, however, the migration facility available within
  3. 3. IBM Systems and Technology Group 3 most hypervisors allows opportunistic workloads to use high value assets by even after a reservation window opens for a different user. If the reserving user submits workload against a reservation, then the opportunistic workload can be migrated to other assets to continue processing without losing any CPU cycles. • Utilization improvement: If the losses in application performance are very small (single digit percentages), then adoption of virtualization technology may enable incremental steps forward in overall utilization in some cases. In these cases, virtualization may offer an increase in overall HPC throughput for the HPC environment. • Large execution time jobs: Several HPC applications offer no checkpoint restart capability. VM technology can capture and checkpoint the entire state of the virtual machine, however, allowing for checkpoint of these applications. If jobs run long enough to be at the same MTBF for the solution as a whole, then the checkpoint facility available within virtual machines may be very attractive. Additionally, if server maintenance is a common or predictable occurrence, then checkpoint migration or suspension of a long running job within a VM could prevent loss of compute time. • Increases in job reliability: Virtual machines, if used on a 1:1 basis with batch jobs (meaning each job runs within a VM container), provide a barrier between their own environment, the host environment and any other virtual machine environments running on the hypervisor. As such, “rogue” jobs which try and access more memory or cpu cores than expected can be isolated from well behaved jobs allocated resources as expected. Such a situation without virtual machine containment, where jobs share a physical host often cause problems in the form of slowdowns, swapping or even OS crashes. Management tools Achieving HPC in a cloud environment requires a few well chosen tools including a hypervisor platform, workload manager and an infrastructure management toolkit. The management toolkit provides the policy definition, enforcement, provisioning management, resource reservation and reporting. The hypervisor platform provides the foundation for the virtual portion of cloud resources and the workload manager provides the task management. The cloud computing management tools of IBM® Platform Computing™—IBM® Platform™ Cluster Manager – Advanced Edition and IBM® Platform™ Dynamic Cluster— turn static clusters, grids and datacenters into dynamic shared computing environments. The products can be used to create private internal clouds or hybrid private clouds, which use external public clouds for peak demand. This is commonly referred to as “cloud bursting” or “peak shaving.” Platform Cluster Manager – Advanced Edition creates a cloud computing infrastructure to efficiently manage application workloads applied to multiple virtual and physical platforms. It does this by uniting diverse hypervisor and physical environments into a single dynamically shared infrastructure. Although this document describes the properties of virtual machines, Platform Cluster Manager – Advanced Edition is not in any way limited to managing virtual machines. It unlocks the full computing potential lying dormant in existing heterogeneous virtual and physical resources according to workload-intelligent and resource-aware policies.
  4. 4. 4 Could the “C” in HPC stand for Cloud? Platform Cluster Manager – Advanced Edition optimizes infrastructure resources dynamically based on perceived demand and critical resource availability using an API or a web interface. This allows users to enjoy the following business benefits: • By eliminating silos resource utilization can be improved • Batch job wait times are reduced because of additional resource availability or flexibility • Users perceive a larger resource pool • Administrator workload is reduced through multiple layers of automation • Power consumption and server proliferation is reduced Subsystem benchmarks Hardware environment and settings KVM and OVM testing Physical hardware: (2) HP ProLiant BL465cG5 with Dual Socket Quad Core AMD 2382 + AMD-V and 16 GB RAM OS Installed: RHEL 5.5 x86_64 Hypervisor(s): KVM in RHEL 5.5, OVM 2.2, RHEL 5.5 Xen (para-virtualized) Number of VMs per physical node: Unless otherwise noted, benchmarks were run on a 4 GB memory VM. Interconnects: The interconnect between VMs or hypervisors was never used to run the benchmarks. The hypervisor hosts were connected to a 1000baseT network. Citrix Xen testing Physical hardware: (2) HP ProLiant BL2x220c in a c3000 chassis with dual socket quad core 2.83 GHz Intel® CPUs and 8 GB RAM OS Installed: CentOS Linux 5.3 x86_64 Storage: Local Disk Hypervisor: Citrix Xen 5.5 VM Configuration: (Qty 1) 8 GB VM with 8 cores, (Qty 2) 4 GB VMs with 4 cores, (Qty 4) 2 GB VMs with 2 cores, (Qty 8) 1 GB VMs with 1 core NetPIPE NetPIPE is an acronym that stands for Network Protocol Independent Performance Evaluator.1 It is a useful tool for measuring two important characteristics of networks: latency and bandwidth. HPC application performance is becoming increasingly dependent on the interconnect between compute servers. Because of this trend, not only does parallel application performance need to be examined, but also the performance level of the network alone from both the latency and the bandwidth standpoints. The terms used for each data series in this section are defined as follows: • no_bkpln: Refers to communications happening over a 1000baseT Ethernet network • same_bkpln: Refers to communications traversing a backplane within a blade enclosure
  5. 5. IBM Systems and Technology Group 5 • diff_hyp: Refers to virtual machine to virtual machine communication occurring between two separate physical hypervisors • pm2pm: Physical machine to physical machine • vm2pm: Virtual machine to physical machine • vm2vm: Virtual machine to virtual machine Figures 1 and 2 illustrate that the closer the two entities communicating are, the higher the bandwidth and lower the latency between them. Additionally they show that when there is a hypervisor layer between the entities, the communication is slowed only slightly, and latencies stay in the expected range for 1000baseT communication (60 - 80 µsec). When two different VMs on separate hypervisors communicate—even when the backplane is within the blade chassis—the latency is more than double. The story gets even worse (by about 50 percent) when the two VMs do not share a backplane and communicate over TCP/IP. This benchmark illustrates that not all HPC workloads are suitable for a virtualized environment. When applications run in parallel and are latency sensitive (as many MPI based applications are), using virtualized resources may be something that should be avoided. If there is no choice but to use virtualized resources, then the scheduler must have the ability to choose resources that are adjacent to each other on the network or the performance is likely to be unacceptable. This conclusion also applies to transactional applications where latency can be the largest part of the ‘submit to receive cycle time.’ Figure 1: Network bandwidth between machines Figure 2: Network latency between machines
  6. 6. 6 Could the “C” in HPC stand for Cloud? IOzone IOzone is a file system benchmarking tool, which generates and measures a variety of file operations.2 In this benchmark, IOzone was only run for write, rewrite, read and reread to mimic the most popular functions an I/O subsystem performs. This steady state I/O test clearly demonstrates that KVM hypervisors are severely lacking when it comes to I/O to disk in both reads and writes. Even in the OVM case, in a best case scenario the performance of the I/O is nearing 40 percent degradation. Write performance for Citrix Xen is also limited. However, read performance exceeds that of the physical machine by over 7 percent. This can only be attributed to a read-ahead function in Xen, which worked better than the native Linux read-ahead algorithm. Figure 3: IOzone 32 GB file (Local disk) Figure 4: IOzone 32 GB file (Local disk) Regardless, this benchmark, more than others, provides a warning to early HPC cloud adopters of the performance risks of virtual technologies. HPC users running I/O bound applications (Nastran, Gaussian, certain types of ABAQUS jobs, and so on) should steer clear of virtualization until these issues are resolved. Application benchmarks Software compilation Compiler used: gcc-4.1.2 Compilation target: Linux kernel 2.6.34 (with ‘deconfig’ option). All transient files were put in a run specific subdirectory using the ‘O’ option in make. Thus the source is kept in read-only state and writes are into the run specific sub-directory.
  7. 7. IBM Systems and Technology Group 7 Figure 5 shows the difference in compilation performance for a physical machine running a compile on an NFS volume compared to Citrix Xen doing the same thing on the same NFS volume. Citrix Xen is roughly 11 percent slower than the physical machine performing the task. Also included is the difference between compiling to a local disk target versus compiling to the NFS target on the physical machine. The results illustrate how NFS performance can significantly affect a job’s elapsed time. This is of crucial importance because most virtualized private cloud implementations utilize NFS as the file system instead of using local drives to facilitate migration. SIMULIA® Abaqus SIMULIA® Abaqus3 is the standard of the manufacturing industry for implicit and explicit non-linear finite element Figure 5: Compilation of kernel 2.6.34 Figure 6: Parallel ABAQUS explicit (e2.inp) solutions. SIMULIA publishes a benchmark suite that hardware vendors use to distinguish their products.4 “e2” and “s6” were used for these benchmarks. The ABAQUS explicit distributed parallel runs were performed using HP MPI (2.03.01) and scratch files were written to local scratch disk. This comparison, unlike the others presented in this paper, was done in two different ways: 1. The data series called “Citrix” is for a single 8 GB RAM VM with 8 cores where the MPI ranks communicated within a single VM. 2. The data series called “Citrix – Different VMs” represents multiple separate VMs defined on the hypervisor host intercommunicating.
  8. 8. 8 Could the “C” in HPC stand for Cloud? Figure 7: Parallel ABAQUS standard (s6.inp) As expected, the additional layers of virtualized networking slowed the communication speeds (also shown in the NetPIPE results) and reduced scalability when the job had higher rank counts. In addition, for communications within a VM, the performance for a virtual machine compared to the physical machine was almost identical. ABAQUS has a different algorithm for solving implicit Finite Element Analysis (FEA) problem called “ABAQUS Standard.” This method does not run distributed parallel, but can be run SMP parallel which was done for the “s6” benchmark. Figure 8: Serial FLUENT 12.1 Typically ABAQUS Standard does considerably more I/O to scratch disk than its explicit counterpart. However, this is dependent upon the amount of memory available in the execution environment. It is clear again that when an application is only CPU or memory constrained, a virtual machine has almost no detectable performance impact. ANSYS® FLUENT ANSYS® FLUENT5 belongs to a large class of HPC applications referred to as computational fluid dynamics (CFD) codes. The “aircraft_2m” FLUENT model was selected based on size and run for 25 iterations. The “sedan_4m” model was chosen as a suitable sized model for running in parallel. Hundred iterations were performed using this model.
  9. 9. IBM Systems and Technology Group 9 Figure 9: Distributed parallel FLUENT 12.1 (sedan_4m - 100 iterations) Though CFD codes such as FLUENT are rarely run serially because of memory requirements or solution time requirements, the comparison in Figure 8 shows that the solution time for a physical machine and a virtual machine are different by only 1.9 percent where the virtual machine is the slower of the two. The “aircraft_2m” model was simply too small to scale well in parallel, and provided strangely varying results, so the sedan_4m model was used.6 The result for the parallel case (Figure 9) illustrate that at two CPUs the virtual machine outperforms the physical machine. This is most likely caused by the native Linux scheduler moving processes around on the physical host. If the application had been bound to particular cores, then this effect would disappear. In the four and eight CPU runs the difference between physical and virtual machines is negligible. This supports the theory that the Linux CPU scheduler is impacting the two CPU job. LS-DYNA® LS-DYNA®7 is a transient dynamic finite element analysis program capable of solving complex real world time domain problems on serial, SMP parallel, and distributed parallel computational engines. The “refined_neon_30ms” model was chosen for benchmarks reviewed in this section. HP MPI 2.03.01, now owned by IBM Platform Computing was the message passing library used. Figure 10: LS-DYNA - MPP971 - Refined Neon
  10. 10. 10 Could the “C” in HPC stand for Cloud? The MPP-DYNA application responds well when run in a low latency environment. This benchmark supports the notion that distributed parallel LS-DYNA jobs are still very sensitive to network latency, even when using a backplane of a VM. A serial run shows a virtual machine is 1 percent slower. Introduce message passing, however, and at eight CPUs the virtual machine is nearly 40 percent slower than the physical machine. The expectation is that if the same job was run on multiple VMs as was done for ABAQUS Explicit parallel jobs, the effect would be even greater, where physical machines significantly outperform virtual machines. Conclusion As with most legends, there is some truth to the notion that VMs are inappropriate for HPC applications. The benchmark results demonstrate that latency sensitive and I/O bound applications would perform at levels unacceptable to HPC users. However, the results also show that CPU and memory bound applications and parallel applications that are not latency sensitive perform well in a virtual environment. HPC architects who dismiss virtualization technology entirely may therefore be missing an enormous opportunity to inject flexibility and even a performance edge into their HPC designs. The power of Platform Cluster Manger - Advanced Edition and IBM® Platform™ LSF® is their ability to work in consort to manage both of these types of workload simultaneously in a single environment. These tools allow their users to maximize resource utilization and flexibility through provisioning and control at the physical and virtual levels. Only IBM Platform Computing technology allows for environment optimization at the job-by-job level, and only Platform Cluster Manager – Advanced Edition continues to optimize that environment after jobs have been scheduled and new jobs have been submitted. Such an environment could realize orders of magnitude increases in efficiency and throughput while reducing the overhead of IT maintenance. Significant results • The KVM hypervisor significantly outperforms the OVM hypervisor on AMD servers, especially when several VMs run simultaneously. • Citrix Xen I/O read and rereads are very fast on Intel servers. • OVM outperforms KVM by a significant margin for I/O intensive applications running on AMD servers. • I/O intensive and latency sensitive parallel applications are not a good fit for virtual environments today. • Memory and CPU bound applications are at performance parity between physical and virtual machines.
  11. 11. Notes
  12. 12. For more information To learn more about IBM Platform Computing, please contact your IBM marketing representative or IBM Business Partner, or visit the following website: ibm.com/platformcomputing © Copyright IBM Corporation 2012 IBM Corporation Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America November 2012 IBM, the IBM logo, ibm.com, Platform Computing, Platform Cluster Manager, Platform Dynamic Cluster and Platform LSF are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON- INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated. 1 http://www.scl.ameslab.gov/netpipe/ 2 http://www.iozone.org/ 3 ABAQUS is a trademark of Simulia and Dassault Systemes (http://www. simulia.com) 4 See http://www.simulia.com/support/v67/v67_performance.html for description of the benchmark models and availability 5 Fluent is a trademark of ANSYS, Inc (http://www.fluent.com) 6 The largest model provided by ANSYS, “truck_14m”, was not an option for this benchmark asthe model was too large to fit into memory. 7 LS-DYNA is a trademark of LSTC (http://www.lstc.com/) Please Recycle DCW03038-USEN-00

×