O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Datadog - Using Metrics to Measure and Understand Your AWS Environment Performance

2.537 visualizações

Publicada em

Gaining a better understanding of performance metrics is the best way to get a quick read of infrastructure health. In this session, Matt Williams, DevOps Evangelist for Datadog will show how to create a baseline, then collect, aggregate, and use metrics from AWS and other systems to improve application performance. Various customers’ experiences with this process will be highlighted throughout the session. Join this session to understand how you can simplify your use of metrics and use that knowledge to increase performance. Sponsored by Datadog.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Datadog - Using Metrics to Measure and Understand Your AWS Environment Performance

  1. 1. USING METRICS TO MEASURE AND UNDERSTAND YOUR AWS ENVIRONMENT PERFORMANCE AWS GLOBAL SUMMIT 2015 9 APRIL 2015 Matt Williams Evangelist, Datadog
  2. 2. BEFORE I GET STARTED... getting started is easy
  3. 3. BEFORE I GET STARTED... getting started is easy getting good is not easy
  4. 4. BEFORE I GET STARTED... getting started is easy getting good is not easy and it will not be quick
  5. 5. WHY MEASURE? See where you came from
  6. 6. WHY MEASURE? See where you came from See how you are doing
  7. 7. WHY MEASURE? See where you came from See how you are doing See how you can improve
  8. 8. WHY IS THIS COMPLICATED? Elasticity makes this complicated Elasticity is the new normal
  9. 9. SCENARIOS THAT REQUIRE ELASTICITY Amateur Sports Shopping Events Concerts Marketing Campaigns What is your scenario?
  10. 10. HOW TO SOLVE THE COMPLEXITY Tags are one solution and what Datadog uses
  11. 11. HOW TAGS MAKES THIS EASIER
  12. 12. TAGS ALLOW FOR AD-HOC AGGREGATION Monitor all Docker containers running image web … in region us-west-2 across all availability zones … and make sure resident set size < 1GB on c3.xl
  13. 13. TAGS ALLOW FOR AD-HOC AGGREGATION Monitor all Docker containers running image web … in region us-west-2 across all availability zones … and make sure [ resident set size < 1GB ] on c3.xl
  14. 14. TAGS ALLOW FOR AD-HOC AGGREGATION Monitor all Docker containers running image web … in region us-west-2 across all availability zones … and make sure [ RSS > 1.5x AVG ] on c3.xl
  15. 15. WHAT ABOUT CONTEXT?
  16. 16. WHAT METRICS TO LOOK AT? There isn’t a single source of expertise This is the hard part
  17. 17. THERE IS A LOT OF GUIDANCE
  18. 18. YOU NEED TO FIND / HIRE /TRAIN AN EXPERT Your context changes the meaning of metrics
  19. 19. 3 CATEGORIES OF METRICS Utilization - percent/time Saturation - wait queue length Errors - error count
  20. 20. WHAT IS GOOD VS BAD? First learn the patterns
  21. 21. SPIKY
  22. 22. STEADY
  23. 23. COUNTER
  24. 24. BURSTY
  25. 25. BINARY
  26. 26. CLASSIC SAWTOOTH
  27. 27. CYCLIC
  28. 28. STAIRY
  29. 29. ARE THE ANOMALIES THE FOCUS…
  30. 30. ARE THE ANOMALIES THE FOCUS, OR SHOULD THEY BE IGNORED
  31. 31. FIGURE OUT YOUR CYCLES
  32. 32. A MONTH OF CLOUDTRAIL
  33. 33. GET THE SCALES RIGHT
  34. 34. COMBINE TIME SCALES TO FIND PATTERNS
  35. 35. DIFFERENT INTERVALS
  36. 36. COMBINE SCALES TO FIND PATTERNS
  37. 37. SPECIFIC METRICS TO CONSIDER WITH AWS It's impossible to say what the best metrics are in all cases. It depends on YOUR workload
  38. 38. AWS EC2 system.cpu.stolen system.cpu.idle system.load.norm.5 * system.mem.pct_usable * system.disk.pct_usable * aws.ec2.network_in/out aws.ec2.disk_read/write_ops
  39. 39. AWS EC2 (https://www.datadoghq.com/2013/08/understanding-aws-stolen-cpu-and-how-it-affects-your-apps/)
  40. 40. AWS EBS PROVISIONED IOPS aws.ebs.volume_queue_length (should never go over 1/100) (https://www.datadoghq.com/2013/07/aws-ebs-provisioned-iops-getting-optimal-performance/)
  41. 41. AWS ELASTIC LOAD BALANCER healthy_host_count, latency HTTPCode_ELB_5XX surge_queue_length spill_over_count backent_connection_errors (https://www.datadoghq.com/2013/11/key-aws-elb-monitoring-metrics/)
  42. 42. SPECIFIC SQS METRICS number_of_messages_sent (NOM_sent), NOM_received NOM_deleted (throughput) sent_message_size (cost) approximate_NOM_visible (backlog) approximate_NOM_not_visible (worked on) (https://www.datadoghq.com/2014/08/monitor-amazon-sqs-message-traffic-datadog/)
  43. 43. HOW TO GET STARTED
  44. 44. QUESTIONS
  45. 45. THANK YOU Matt Williams Evangelist, Datadog matt.williams@datadoghq.com (mailto:matt.williams@datadoghq.com) http://datadoghq.com (http://datadoghq.com) @technovangelist (http://twitter.com/technovangelist)
  46. 46. (/)

×