O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clara AWS Summit

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 49 Anúncio

Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clara AWS Summit

Baixar para ler offline

Machine learning (ML) facilitates quick exploration into a multitude of scenarios to generate the best solution to complex issues in image, video, speech recognition, autonomous vehicle systems, and weather prediction. For data scientists, researchers, and developers who want to speed up development of their ML applications, Amazon EC2 P3 instances are the most powerful, cost-effective, and versatile GPU compute instances available in the cloud, while Amazon EC2 G4 instances are cost-effective for deploying ML models to production. In this session, we discuss P3 and G4 instances and how to use them for various use cases to meet your ML needs.

Machine learning (ML) facilitates quick exploration into a multitude of scenarios to generate the best solution to complex issues in image, video, speech recognition, autonomous vehicle systems, and weather prediction. For data scientists, researchers, and developers who want to speed up development of their ML applications, Amazon EC2 P3 instances are the most powerful, cost-effective, and versatile GPU compute instances available in the cloud, while Amazon EC2 G4 instances are cost-effective for deploying ML models to production. In this session, we discuss P3 and G4 instances and how to use them for various use cases to meet your ML needs.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clara AWS Summit (20)

Anúncio

Mais de Amazon Web Services (20)

Accelerate ML workloads using EC2 accelerated computing - CMP202 - Santa Clara AWS Summit

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Accelerate ML workloads using EC2 accelerated computing Chetan Kapoor Principal Product Manager – Amazon EC2 C M P 2 0 2
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 instance types General purpose Compute optimized Storage optimized Memory optimized Accelerated computing M5 T3 C5 C4 H1 I3 X1e R5 F1 P3 G3 D2
  3. 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Choice of processors and architectures* Right compute for each application and workload *Not all processors and architectures available globally Over 100 EC2 instances Featuring Intel Xeon Processors AWS Graviton Processor based on 64-bit Arm architecture AMD EPYC processor Additional Amazon EC2 instances featuring Nvidia GPUs FPGAs
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hardware accelerationfor computationallydemand applications • Image recognition, natural language processing, speech recognition Machine learning • Computational fluid dynamics, genomics, weather simulation, EDA High performance computing • Graphics workstations, video transcoding, game streaming Graphic intensive
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. C5: Compute-optimized instances Custom 3.0 GHz Intel Xeon Scalable Processors (Skylake) Up to 72 vCPUs and 144 GiB of memory (2:1 Memory:vCPU ratio) 25 Gbps network bandwidth Support for Intel AVX-512 – Great for ML inference C5d with local NVMe-based SSD storage Up to 50%* AWS instance saving over C4 25% price/ performance improvement over C4 C4 C5 “We saw significant performance improvement on Amazon EC2 C5, with up to a 140% performance improvement in open standard CPU benchmarks over C4.” “We are eager to migrate onto the AVX-512 enabled c5.18xlarge instance size… We expect to decrease the processing time of some of our key workloads by more than 30%.”
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. C5n: fastest networking in the cloud 33% Increased memory footprint over C5 instances 25 Gbps peak bandwidth on smaller instance sizes Featuring Intel Xeon Scalable processors Faster analytics and big data workloads Lower costs for network-bound workloads All of the elasticity, security, and scalability of AWS C5n 100 Gbps network bandwidth on largest instance sizes
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. z1d: high frequency for specialized workloads High Frequency instances with custom Intel Xeon Scalable processors running at sustained 4 GHz all core turbo 8:1 GiB to vCPU ratio Up to 25-Gbps network bandwidth and up to 1.8 TB of local NVMe storage Electronic Design Automation Relational databases Gaming z1d.large z1d.12xlarge 384 GiB 48 vCPU …6 sizes
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • 10s–100s of processing cores • Pre-defined instruction set & datapath widths • Optimized for general- purpose computing CPU CPUs vs. GPUs vs. FPGA vs. ASICs for compute • 1,000s of processing cores • Pre-defined instruction set and datapath widths • Highly effective at parallel execution GPU • Millions of programmable digital logic cells • No predefined instruction set or datapath widths • Hardware-timed execution FPGA DRAM Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU Control ALU ALU Cache DRAM ALU ALU • Optimized & custom design for particular use/function • Predefined software experience exposed through API ASICs
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 accelerated computing instances P3: GPU compute instance • Up to 8 NVIDIA V100 GPUs in a single instance, with NVLink for peer-to-peer GPU communication • Supporting a wide variety of use cases including deep learning, HPC simulations, financial computing, and batch rendering G3: GPU graphics instance • Up to 4 NVIDIA M60 GPUs, with GRID Virtual Workstation features and licenses • Designed for workloads such as 3D rendering, 3D visualizations, graphics-intensive remote workstations, video encoding, and virtual reality applications F1: FPGA instance • Up to 8 Xilinx Virtex UltraScale+ VU9P FPGAs in a single instance. Programmable via VHDL, Verilog, or OpenCL. Growing marketplace of pre-built application accelerations. • Designed for hardware-accelerated applications including financial computing, genomics, accelerated search, and image processing AWS Inferentia – ML Inference Chip • High-performance machine learning inference chip, custom designed by AWS • Designed for lower cost-per-inference across the full range of ML applications P3 G3 F1
  10. 10. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 P3 instances for compute acceleration
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 P3 instances (October 2017) • Up to eight NVIDIA Tesla V100 GPUs • 1 PetaFLOPs of computational performance – Up to 14x better than P2 • 300 GB/s GPU-to-GPU communication (NVLink) – 9x better than P2 • 16-GB GPU memory with 900 GB/sec peak GPU memory bandwidth O n e o f t h e f a s t e s t , m o s t p o w e r f u l G P U i n s t a n c e s i n t h e c l o u d
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use cases for P3 instances Machine learning/AI High performance computing Natural language processing Image and video recognition Autonomous vehicle systems Recommendation systems Computational fluid dynamics Financial and data analytics Weather simulation Computational chemistry
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data visualization & analysis Business problem – ML problem framing Data collection Data integration Data preparation & Cleaning Feature engineering Model training & Parameter tuning Model evaluation Are business goals met? Model deployment Monitoring & Debugging – Predictions YesNo DataAugmentation Feature augmentation The machine learning process Re-training
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training machine learning models AlexNet, 2012 • A large, deep convolutional neural network with 5 convolutional layer, 60 million parameters, and 650,000 neurons • Created by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton • Won the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) • Used two NVIDIA GTX 580 GPUs • Took nearly a week to train! Source - https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS P3 vs. P2 instance GPUperformancecomparison • P2 instances use K80 Accelerator (Kepler architecture) • P3 instances use V100 Accelerator (Volta architecture) 0 2 4 6 8 10 12 14 16 K80 P100 V100 FP32 Perf (TFLOPS) 1.7x 0 1 2 3 4 5 6 7 8 K80 P100 V100 FP64 Perf (TFLOPS) 2.6x 0 20 40 60 80 100 120 140 K80 P100 V100 Mixed/FP16 Perf (TFLOPS) 14x over K80s max perf. FP32
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. P3 instances details Instance size GPUs GPU peer to peer vCPUs Memory (GB) Network bandwidth Amazon EBS bandwidth On-Demand price/hr.* 1-yr RI effective hourly* 3-yr RI effective hourly* P3.2xlarge 1 No 8 61 Up to 10 Gbps 1.7 Gbps $3.06 $1.99 (35% disc.) $1.23 (60% disc.) P3.8xlarge 4 NVLink 32 244 10 Gbps 7 Gbps $12.24 $7.96 (35% disc.) $4.93 (60% disc.) P3.16xlarge 8 NVLink 64 488 25 Gbps 14 Gbps $24.48 $15.91 (35% disc.) $9.87 (60% disc.) Regional availability P3 instances are generally available in AWS US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), Asia Pacific (Seoul), Asia Pacific (Tokyo), AWS GovCloud (US) and China (Beijing) Regions Framework support P3 instances and their V100 GPUs supported across all major frameworks (such as TensorFlow, MXNet, PyTorch, Caffe2 and CNTK)
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. P3 instances details Instance size GPUs GPU peer to peer vCPUs Memory (GB) Network bandwidth EBS bandwidth On-Demand price/hr* 1-yr RI effective hourly* 3-yr RI effective hourly* P3.2xlarge 1 No 8 61 Up to 10 Gbps 1.7 Gbps $3.06 $1.99 (35% disc.) $1.23 (60% disc.) P3.8xlarge 4 NVLink 32 244 10 Gbps 7 Gbps $12.24 $7.96 (35% disc.) $4.93 (60% disc.) P3.16xlarge 8 NVLink 64 488 25 Gbps 14 Gbps $24.48 $15.91 (35% disc.) $9.87 (60% disc.) • P3 instances provide GPU-to-GPU data transfer over NVLink • P2 instanced provided GPU-to-GPU data transfer over PCI Express
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. New larger P3 size – P3dn.24xlarge OptimizedfordistributedMLtraining • One of the most powerful GPU instances available in the cloud • 100 Gbps of networking throughput • 96 vCPU using AWS customer Skylake CPUs and 768 GB of system memory • Based on NVIDIA’s latest GPU Tesla V100 with 32 GB of memory Instance size GPUs GPU memory GPU peer to peer vCPUs CPU type Memory (GB) Network bandwidth Amazon EBS bandwidth Local instance storage P3.2xlarge 1 x V100 16 GB/GPU No 8 Broadwell 61 Up to 10 Gbps 1.7 Gbps NA P3.8xlarge 4 x V100 16 GB/GPU NVLink 32 Broadwell 244 10 Gbps 7 Gbps NA P3.16xlarge 8 x V100 16 GB/GPU NVLink 64 Broadwell 488 25 Gbps 14 Gbps NA P3dn.24xlarge 8 x V100 32 GB/GPU NVLink 96 Skylake 768 100 Gbps 14 Gbps 2 TB NVMe Latest NVIDA V100 GPU with 32 GB memory for large models and higher batch sizes 96 Skylake vCPUs with support for AVX-512 instructions for pre- processing of training data 100 Gbps of networking throughput for large-scale distributed training & fast data access
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scaling performance using distributed training - 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000 1 2 4 8 16 32 64 Images/Second Number of GPUs Training using P3 instances (ResNet-50, ImageNet Images/Second) • Using single P3 instances, with Volta GPUs, customers can cut down training times of their machine learning models from days to a few hours. • Using distributed training via multiple P3 instances, high performance networking and storage solutions, customers can further cut down their time-to-train from hours in to minutes. • Example – We have been able to train ResNet-50 to Top1 validation accuracy of 76% in 14 mins cluster of P3.16xlarge instances. https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-deep-learning-training-using-gpus-in-the-aws-cloud/
  20. 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The broadest global availability Available AWS regions for P3 instances include: • US East (N. Virginia) • US East (Ohio) • US West (Oregon) • Canada (Central) • Europe (Ireland) • Europe (Frankfurt) • Europe (London) • Asia Pacific (Tokyo) • Asia Pacific (Seoul) • Asia Pacific (Sydney) • Asia Pacific (Singapore) • China (Beijing) • China (Ningxia) • AWS GovCloud (US) Available AWS regions for P3dn.24xlarge instances include: • US East (N. Virginia) • US West (Oregon)
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Secure, durable, highly scalable object storage Fast access, low cost For long-term durable storage of data, in a readily accessible get/put access format Primary durable and scalable storage for data Amazon S3 Glacier Secure, durable, long term, highly cost-effective object storage For long-term storage and archival of data that is infrequently accessed Use for long-term, lower- cost archival of data EC2+EBS Create a Single-AZ shared file system using Amazon EC2 and Amazon EBS, with third- party or open source software (e.g., ZFS, Intel Lustre, etc.) For near-line storage of files optimized for high I/O performance Use for high-IOPs, temporary working storage AWS storage options Amazon EFS Highly available, Multi- AZ, fully managed network-attached elastic file system For near-line, highly available storage of files in a traditional NFS format (NFSv4) Use for read-often, temporary working storage
  22. 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon FSx for Lustre • High-performance file system optimized for fast processing of workloads such as machine learning, HPC, video processing, financial modeling, and electronic design automation • Launch and run a file system that provides submillisecond access to your data • Enables you to read and write data at speeds of up to hundreds of gigabytes per second of throughput and millions of IOPS Learn more at aws.amazon.com/fsx/lustre
  23. 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Deep Learning AMI • Get started quickly with easy-to-launch tutorials • Hassle-free setup and configuration • Pay only for what you use—no additional charge for the AMI • Accelerate your model training and deployment • Support for popular deep learning frameworks
  24. 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1 2 3 Amazon SageMaker: Build, train, and deploy ML models at scale
  25. 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker: Build, train, and deploy ML models at scale
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RL Coach Amazon SageMaker: Build, train, and deploy ML models at scale
  27. 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker: Build, train, and deploy ML models at scale
  28. 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A W S I o T G R E E N G R A S S A m a z o n E C 2 C 5 Amazon SageMaker: Build, train, and deploy ML models at scale
  29. 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker: Build, train, and deploy ML models at scale
  30. 30. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 G3 instances for graphics acceleration
  31. 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS G3 GPU instances • Up to four NVIDIA M60 GPUs • Includes GRID Virtual Workstation features and licenses, supports up to four monitors with 4096x2160 (4K) resolution • Includes NVIDIA grid virtual application capabilities for application virtualization software like Citrix XenApp Essentials and VMWare Horizon, supporting up to 25 concurrent users per GPU • Hardware encoding to support up to 10 H.265 (HEVC) 1080p30 streams, and up to 18 H.264 1080p30 streams per GPU • Designed for workloads such as 3D rendering, 3D visualizations, graphics-intensive remote workstations, video encoding, and virtual reality applications Instance Size GPUs vCPUs Memory (GiB) Linux price per hour (IAD) Windows price per hour (IAD) g3s.xlarge 1 4 30.5 $0.75 $0.93 g3.4xlarge 1 16 122 $1.14 $1.88 g3.8xlarge 2 32 244 $2.28 $3.75 g3.16xlarge 4 64 488 $4.56 $7.50
  32. 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Four modes of using G3 instances CPU 16 vCPUs GPU 1 x M60 Memory 122 GB G3.4xlarge Up to 10G Network Graphics rendering, simulations, video encoding EC2 instance with NVIDIA drivers & libraries EC2 instance with NVIDIA GRID NVIDIA GRID virtual workstation NVIDIA GRID virtual application Professional workstation (single user) Virtual apps (25 concurrent users) Gaming services EC2 instance w/ NVIDIA GRID for gaming
  33. 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. M&E – Content creation Auto – Car configurators E&P – Analytics • Seismic analysis, energy E&P, cloud GPU rendering & visualization, such as high end car configurators, AR/VR • Desktop and application virtualization • Productivity and consumer apps • Design and engineering • Media and entertainment post-production • Media and entertainment: video playout/broadcast, encoding/transcoding • Cloud gaming G3 use cases
  34. 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS G4 GPU instances • Designed for machine learning inferencing, video transcoding, remote graphics workstation, and other demanding graphics applications. • Up to 8 NVIDIA T4 Tensor Core GPUs • 2560 CUDA Cores, 320 Turing Codes including support for Ray-Tracing technology • Available in multiple sizes • AWS-custom Intel CPUs (4–96 vCPUs) • Available soon
  35. 35. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 F1 instances for custom hardware acceleration
  36. 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. An FPGA is effective at processing data of many types in parallel, for example creating a complex pipeline of parallel, multistage operations on a video stream, or performing massive numbers of dependent or independent calculations for a complex financial model… • An FPGA does not have an instruction- set! • Data can be any bit-width (9-bit integer? No problem!) • Complex control logic (such as a state machine) is easy to implement in an FPGA Each FPGA in F1 has more than 2M of these cells Parallel processing in FPGAs
  37. 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. …. …. module filter1 (clock, rst, strm_in, strm_out) for (i=0; i<NUMUNITS; i=i+1) always@(posedge clock) integer i,j; //index for loops tmp_kernel[j] = k[i*OFFSETX]; FPGA handles compute- intensive, deeply pipelined, hardware- accelerated operations CPU handles the rest Application How FPGA acceleration works …. …. …. …. …. …. ….
  38. 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. F1 FPGA instance types on AWS ▪Up to 8 Xilinx UltraScale+ 16 nm VU9P FPGA devices in a single instance ▪The f1.16xlarge size provides: ▪ 8 FPGAs, each with over 2 million customer-accessible FPGA programmable logic cells and over 5000 programmable DSP blocks ▪ Each of the 8 FPGAs has 4 DDR-4 interfaces, with each interface accessing a 16 GiB, 72-bit wide, ECC-protected memory Instance size FPGAs FPGA memory (GB) vCPUs Instance memory (GB) NVMe instance storage (GB) Network bandwidth f1.2xlarge 1 64 8 122 1 x 470 Up to 10 Gbps f1.4xlarge 2 128 16 244 1 x 940 Up to 10 Gbps f1.16xlarge 8 512 64 976 4 x 940 25 Gbps
  39. 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Three methods to use F1 instance Hardware engineers/developers1 •Developers who are comfortable programming FPGA •Use F1 Hardware Development Kit (HDK) to develop and deploying custom FPGA accelerations using Verilog and VHDL Software engineers/developers2 • Developers who are not proficient in FPGA design • Use OpenCL to create custom accelerations Software engineers/developers3 • Developers who are not proficient in FPGA design • Use pre-build and ready to use accelerations available in AWS Marketplace
  40. 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FPGA acceleration development PCIe DDR controllers DDR-4 attached memory EC2 F1 Launch instance and load AFI Amazon Machine Image (AMI) CPU Application Amazon FPGA Image (AFI) An F1 instance can have any number of AFIs An AFI can be loaded into the FPGA in seconds
  41. 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Developing custom accelerations The FPGA Developer AMI Use Xilinx Vivado and a hardware description language (Verilog or VHDL for RTL) with the HDK to describe and simulate your FPGA logic Xilinx Vivado for custom logic development Virtual JTAG for interactive debugging
  42. 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. OpenCL generally available for F1 ▪ Familiar development experience to accelerate C/C++ applications ▪ 50+ F1 code examples available that span multiple domains: security, image processing, and accelerated algorithms ▪ Already supported on the FPGA Developer AMI, no need to upgrade/install
  43. 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Marketplace Discover, procure, deploy, and manage software in the cloud
  44. 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Delivering FPGA partner solutions Amazon EC2 FPGA deployment via AWS Marketplace CPU Application Customers Amazon Machine Image (AMI) Amazon FPGA Image (AFI) AFI is secured, encrypted, dynamically loaded into the FPGA – can’t be copied or downloaded
  45. 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Inferentia High-performancemachinelearninginferencechip,customdesignedbyAWS • Making predictions using a trained machine learning model–a process called inference–can drive as much as 90% of the compute costs of the application. • AWS Inferentia is a machine learning inference chip designed to deliver high performance at low cost.
  46. 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • High Frequency instances with custom Intel Xeon Scalable processors • Running at sustained 4 GHz all core turbo • Fastest networking in the cloud • Compute optimized instances Intel Xeon Scalable Processors (Skylake) • Up to 100 GBps networking Summary • Pick the right compute platform for accelerating your application • You have a choice of using compute optimize CPU platforms, GPU, or FPGA accelerated platforms • We aspire to provide you with the broadest and deepest set of products and services to support your workload. • Compute optimized instances • Custom 3.0 GHz Intel Xeon Scalable Processors (Skylake) • Support for Intel AVX-512 – Great for ML inference
  47. 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 accelerated computing instances P3: GPU Compute instance • Up to 8 NVIDIA V100 GPUs in a single instance, with NVLink for peer-to-peer GPU communication • Supporting a wide variety of use cases including deep learning, HPC simulations, financial computing, and batch rendering G3: GPU Graphics instance • Up to 4 NVIDIA M60 GPUs, with GRID Virtual Workstation features and licenses • Designed for workloads such as 3D rendering, 3D visualizations, graphics-intensive remote workstations, video encoding, and virtual reality applications F1: FPGA instance • Up to 8 Xilinx Virtex UltraScale+ VU9P FPGAs in a single instance. Programmable via VHDL, Verilog, or OpenCL. Growing marketplace of pre-built application accelerations. • Designed for hardware-accelerated applications including financial computing, genomics, accelerated search, and image processing AWS Inferentia – ML inference chip • High-performance machine learning inference chip, custom designed by AWS • Designed for lower cost-per-inference across the full range of ML applications P3 G3 F1
  48. 48. Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chetan Kapoor Principal Product Manager Amazon EC2
  49. 49. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×