O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018

800 visualizações

Publicada em

Amazon EMR is a powerful service, enabling you to process and analyze big data at any scale. In this chalk talk, we share proven strategies to maximize your utilization while minimizing your costs for long-running clusters. We discuss how to get the most leverage from features like Auto Scaling and Spot pricing. We also discuss how changing your design architecture decoupling of compute and storage impacts TCO. Not least, appropriately sizing instances, clusters, and jobs will help you save.

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies (ANT385) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lower Costs on Amazon EMR: Auto Scaling, Spot Pricing, & Expert Strategies Bruno Faria Senior EMR Solutions Architect AWS Solutions Architecture A N T 3 8 5
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR pricing With Amazon EMR, you only pay a per-second rate for every second you use. The price is based on the instance type and number of EC2 instances that you deploy, and the region in which you launch your cluster.
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Reserved, Spot, and On-Demand Instances Spot Instances Amazon EC2 Spot Instances offer spare compute capacity available at discounts compared to On- Demand Instances. Reserved Instances Amazon EC2 Reserved Instances provide you the option to make a payment for instances that you want to reserve at a significant discount compared to On- Demand pricing. On-Demand Instances Amazon EC2 On- Demand Instances are instances that you launch and pay by the second.
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Understanding the node types in Amazon EMR Master node: The node that manages the cluster. The master node tracks the status of tasks and monitors the health of the cluster. Core node: The node that runs tasks and stores data in the Hadoop Distributed File System (HDFS) on your cluster. Task node: The node that only runs tasks and does not store data in HDFS. Task nodes are optional.
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lower costs with Spot and Reserved Instances Spot for task nodes Up to 80% off EC2 On-Demand pricing On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Meet SLA at predictable cost Exceed SLA at lower cost
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance and hardware Considerations • Transient or long running • Instance types • Cluster size • Application settings • File formats and S3 tuning Master node c5.2xlarge Slave group - Core c5.2xlarge Slave group – Task m5.2xlarge (EC2 Spot)
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Advanced Spot Provisioning with Instance Fleets Master node Core instance fleet Task instance fleet • Provision from a list of instance types with Spot and On-Demand • Launch in the most optimal Availability Zone based on capacity/price • Spot block support
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transient or long running workloads Transient Long running
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lower costs with Auto Scaling
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use Amazon S3 as your persistent data store • Decouple storage and compute • Scale up or down for your compute and storage needs independently • Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances • Designed for 99.999999999% durability • No need to pay for data replication
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Tips • Partition your data to reduce amount of data scanned • Optimize file sizes to reduce amount S3 requests • Compress data set to minimize bandwidth from S3 to EC2 • Use a columnar file format like Parquet when selecting only a subset of columns
  12. 12. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bruno Faria
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×