Mais conteúdo relacionado

Apresentações para você(20)



Similar a Self-Service Supercomputing(20)

Mais de Amazon Web Services(20)


Self-Service Supercomputing

  1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. London Summit July 2016 HPC Clusters as code in the [almost]* Infinite cloud Brendan Bouffler AWS Global Scientific Computing @boofla 2016-07-07 Wil Mayers Alces Flight Ltd (UK) @alcesflight
  2. Scientific Computing Science is one of the greatest areas of computation and can benefit from a democratization in cost and global accessibility that the cloud brings. It’s also where we think Amazon can make a huge, really disruptive, impact on the world by participating - which is, at the most basic level, what we are about as a company.
  3. Disrupting science, wherever it’s happening.
  4. Existing 1. Oregon 2. California 3. Virginia 4. Dublin 5. Frankfurt 6. Singapore 7. Sydney 8. Seoul 9. Tokyo 10. Sao Paulo 11. Beijng 12. US GovCloud 1. Ohio 2. India 3. UK 4. Canada 5. China+1 AWS Region Availability Zone regions are sovereign your data never leaves
  5. Public Data Sets workloads to the data data to the workloads
  6. Meeeeelions of uncorrelated workloads cores time Collective action When everyone comes together in the cloud to share the resource, and only pays for what they use, the efficiency is huge.
  7. Spot Market cores time Spot Market Our ultimate space filler. Spot Instances allow you to name your own price for spare AWS EC2 computing capacity. Great for workloads that aren’t time sensitive, and especially popular in research (hint: it’s really cheap).
  8. Spot Market Behavior Spot Bid Advisor The Spot Bid Advisor analyzes Spot price history to help you determine a bid price that suits your needs. You should weigh your application’s tolerance for interruption and your cost saving goals when selecting a Spot instance and bid price. The lower your frequency of being outbid, the longer your Spot instances are likely to run without interruption. Bid Price & Savings Your bid price affects your ranking when it comes to acquiring resources in the SPOT market, and is the maximum price you will pay. But frequently you’ll pay a lot less.
  9. Agility is…Paying Only for IT You Use Peak: 58K cores Valley: 12K cores
  10. Breakthrough discoveries in the Cloud The CHILES project astronomers have detected radio emissions from hydrogen in a galaxy more than 5 billion light years away, shattering the previous record by almost twice. This has important implications for our understanding of how galaxies have evolved over time. The team at ICRAR in Western Australia estimates that the amount of compute capacity required to shift and crunch this data would have made this work infeasible. By using AWS, they were able to quickly and cheaply build their new pipelines, and then scale them as massive amounts of data arrived from their instruments.
  11. Science is about experimentation
  12. AWS Building blocks TECHNICAL & BUSINESS SUPPORT Account Management Support Professional Services Solutions Architects Training & Certification Security & Pricing Reports Partner Ecosystem AWS MARKETPLACE Backup Big Data & HPC Business Apps Databases Development Industry Solutions Security MANAGEMENT TOOLS Queuing Notifications Search Orchestration Email ENTERPRISE APPS Virtual Desktops Storage Gateway Sharing & Collaboration Email & Calendaring Directories HYBRID CLOUD MANAGEMENT Backups Deployment Direct Connect Identity Federation Integrated Management SECURITY & MANAGEMENT Virtual Private Networks Identity & Access Encryption Keys Configuration Monitoring Dedicated INFRASTRUCTURE SERVICES Regions Availability Zones Compute Storage Objects, Blocks, Files Databases SQL, NoSQL, Caching CDNNetworking PLATFORM SERVICES App Mobile & Web Front-end Functions Identity Data Store Real-time Development Containers Source Code Build Tools Deploymen t DevOps Mobile Sync Identity Push Notifications Mobile Analytics Mobile Backend Analytics Data Warehousing Hadoop Streaming Data Pipelines Machine Learning
  13. EC2There’s a couple dozen EC2 compute instance types alone, each of which is optimized for different things. One size does not fit all.
  14. C4Intel Xeon E5-2666 v3, custom built for AWS. Intel Haswell, 16 FLOPS/tick 2.9 GHz, turbo to 3.5 GHz Feature Specification Processor Number E5-2666 v3 Intel® Smart Cache 25 MiB Instruction Set 64-bit Instruction Set Extensions AVX 2.0 Lithography 22 nm Processor Base Frequency 2.9 GHz Max All Core Turbo Frequency 3.2 GHz Max Turbo Frequency 3.5 GHz (available on c4.2xLarge) Intel® Turbo Boost Technology 2.0 Intel® vPro Technology Yes Intel® Hyper-Threading Technology Yes Intel® Virtualization Technology (VT-x) Yes Intel® Virtualization Technology for Directed I/O (VT-d) Yes Intel® VT-x with Extended Page Tables (EPT) Yes Intel® 64 Yes
  15. cfnCluster - provision an HPC cluster in minutes #cfncluster cfncluster is a sample code framework that deploys and maintains clusters on AWS. It is reasonably agnostic to what the cluster is for and can easily be extended to support different frameworks. The CLI is stateless, everything is done using CloudFormation or resources within AWS. 10minutes – (Boof’s HOWTO slides)
  16. § 750+ popular scientific applications AWS Marketplace iimmediately Introducing Alces Flight - self-scaling HPC clusters instantly ready to compute, billed by the hour and using the AWS Spot market by default to achieve supercomputing for ~1c per core per hour. Self-service HPC … 2016
  17. Requirements for Launching your HPC cluster • An Amazon Web Services (AWS) account • An SSH key-pair in your AWS region • An SSH client • Optionally – a VNC client • A workload to process
  18. Wil Mayers, Alces
  19. Searching AWS Marketplace
  20. Selecting Alces Flight from Marketplace
  21. Launching a new cluster
  22. CloudFormation cluster launch
  23. Access IP address
  24. Logging in to your Flight Cluster
  25. Cluster Architecture VPC • Virtual Private Cluster (VPC) • One login node • EBS volume for data/apps • Compute node scaling group • 2 to 1,152 cores • Deployed in placement group • Static or auto-scaling • On-demand or Spot instances
  26. Linux cluster facilities • CentOS Linux cluster • Full root access to all nodes • Genders utility • PDSH utility • YUM install any software
  27. Graphical Desktop sessions • Create a session • Share connection details • Join to the session via VNC • Other collaborators can join
  28. Using Graphical Applications
  29. Installing Scientific Applications • Simple command-line tool to install applications
  30. Installing by Scientific Discipline • Choose a depot of applications to install
  31. Alces Gridware Application library • Over 850 application, library and MPI versions • Pre-optimized and stored in S3 • Option to compile and optimize on-demand • Includes modules environment management • Gridware project keeps pace with latest versions • Support for commercial and licensed applications •
  32. Using Storage Services • Cluster includes large storage volume for data and apps • Tools to manage data held in object storage • Store your data in AWS S3 quickly and easily S3
  33. Cluster job scheduler • Choice of HPC cluster job schedulers • Automate job processing on your HPC cluster • Queue jobs for processing when nodes are available • Auto-scaling compute nodes within user-defined limits • Automatically rerun any jobs stopped when spot price exceeded
  34. Workload to process #1 Landsat cloud coverage survey
  35. Landsat Satellite mapping data • Continuous record of Earth’s surface • Data from the 1970s to present day • Public data set available to everyone • Stored on object storage, including AWS S3
  36. Workload • Survey of cloud cover around Northern Tropic • Task-array job running 360 degrees around the Earth • Measures average cloud cover in each image • Generates a deck of sample images • Uploads deck to S3 object storage • Uses 360 x compute cores ? S3
  37. Workflow 1. Launch your cluster 2. Enable object storage 3. Install application 4. Fetch job-script 5. Submit job
  38. Approximate costs • 360 jobs each taking ~5 mins • Total CPU time = 30 core hours • Cost of 36 core hours in AWS spot market* = $0.44 • Cost of one T2 login node for 1 hour* = $0.12 • Cost of 100GB EBS volume for apps* = <$0.01 • Alces Flight software cost = $0.00 • Total cost per daily run = $0.60 / 45p • Cost for one year of research = $219 / £168 * based on C4.8xlarge spot rate in EU-West region; T2.large on-demand instance; EBS st1 volume; excludes S3 storage costs and sales tax where applicable
  39. Workload to process #2 Computational Fluid Design with OpenFoam
  40. OpenFoam CFD • Computational Fluid Design workload • Simulates liquid and air-flow for engineering projects • Open-source software available to all • Commercial support available from CFD Direct Ltd. • Run as a parallel job across multiple compute nodes
  41. Workload • Generate a mesh representing the problem • Decomposition of the problem into sections • Processing of the sections • Visualization of the solution
  42. Workflow 1. Launch your cluster 2. Enable object storage 3. Install application 4. Fetch job-script 5. Submit job 6. Start desktop 7. Visualize
  43. Visualization with ParaView
  44. Approximate costs (full solve) • 1 job using 128 cores taking 4 hours • Total CPU time = 1024 core hours • Cost of 1024 core hours in AWS spot market* = $7.04 • Cost of one T2 login node for 4 hours* = $0.45 • Cost of 100GB EBS volume for apps* = $0.02 • Alces Flight software cost = $0.00 • Total cost per simulation = $7.51 / £5.75 * based on C4.8xlarge spot rate in EU-West region; T2.large on-demand instance; EBS st1 volume; excludes sales tax where applicable
  45. Filesystems in the marketplace, too BeeGFS is a scalable parallel cluster filesystem developed with a strong focus on performance and designed easy installation and management developed by the Fraunhofer Institute. Intel Lustre® Cloud Edition is a scalable, parallel file system purpose-built for HPC and with a long history in the field supporting a range of workloads. There’s more to come - the AWS Marketplace is growing all the time and new offerings are added frequently. Watch this space. There are cluster filesystem options, too– for when you need extreme I/O scaling.
  46. How to start? 1. AWS Account 3. A problem to solve
  47. Please remember to rate this session under My Agenda on