O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Top Performance Problems in Distributed Architectures

310 visualizações

Publicada em

Slides from https://devexperience.ro/speakers/andreas-grabner

Publicada em: Software
  • Seja o primeiro a comentar

Top Performance Problems in Distributed Architectures

  1. 1. Top Performance Challenges in Distributed Architectures
  2. 2. 2 907Calls 41sec 97threads 104Calls 21sec 92threads AWS CloudWatch API Single Fetch Bulk Fetch $$$$ $$ Example #1: Building Monitoring for AWS $0.01 / 1000 Calls
  3. 3. 3 Click to edit Master title style Lesson Learned: When moving to a more distributed architecture … https://www.dynatrace.com/news/blog/monitoring-aws-fargate-with-dynatrace-testing-it-in-the-field/
  4. 4. 4 Click to edit Master title style …you also grow your dependencies … https://www.dynatrace.com/news/blog/enterprise-cloud-ecs-microservices-and-dynatrace-at-neiman-marcus/
  5. 5. 5 Click to edit Master title style … and the potential impact of a failure grows! https://www.dynatrace.com/news/blog/enterprise-cloud-ecs-microservices-and-dynatrace-at-neiman-marcus/ 1 Bad Update 4 Impacted Services Because of all dependencies
  6. 6. 6 Common Anti-Patterns we have seen 1. N+1 call 2. N+1 query 3. Payload flood 4. Granularity 5. Tight Coupling 6. Inefficient Service Flow 7. Timeouts, Retries, Backoff 8. Dependencies
  7. 7. 7 N + 1 Call Pattern
  8. 8. 8 N+1 Call Pattern Monolithic Code public double getQuote(String type) { double quote=0; for (Product product: products) { quote += product.getValue(); } return quote; } “Works” well within a single process
  9. 9. 9 N+1 Call Pattern Product Service Quote Service 1 call to Quote Service = 44 calls to Product Service 1 14 17 13
  10. 10. 10https://aws.amazon.com/solutions/case-studies/landbay/
  11. 11. 11 X Y Z Z 1 74 24 22 22 24 24 24 24 1 1 1 1 Subtotal: 243
  12. 12. 12 N+1 Query Pattern 1 call to Quote Service = 87 calls to DB Quote Service 1 87 Product Service Product DB
  13. 13. 13 Cascading N+1 Query Pattern 26k Database Calls 809 3956 4347 4773 3789 3915 4999
  14. 14. 14 14 Payload Flood
  15. 15. 15 Payload Flood: Doc Creation Sequential across services
  16. 16. 16 Payload Flood in Numbers: Full DOC sent between Services 18MB 20MB 21MB
  17. 17. 17 Refactor: Only send relevant data to specialized services 69MB 31.6MB vs
  18. 18. 18 Granularity
  19. 19. 19 Granularity: Encryption carved out into separate service Doc Processor Doc Transformer Doc Signer Doc Encryption Doc Shipment Documents 316 1 1 2 6 6 6 118
  20. 20. 20 Tight Coupling
  21. 21. 21 Tightly coupled! Shall we really distribute/extract? When “Breaking the Monolith” be aware … 1:1 https://www.dynatrace.com/news/blog/breaking-up-the-monolith-while-migrating-to-the-cloud-in-6-steps/
  22. 22. 22 Inefficient Service Flow drawing parallels to Web Performance Optimization
  23. 23. 23 SFPO (Service Flow&Performance Optimization) has to teach us how to optimize (micro)service dependencies through Service Flows
  24. 24. 24 Especially useful to identify: inefficient 3rd party services, recursive call chains, N+1 Query Patterns, loading too much data, no data caching, … -> sounds very familiar to WPO
  25. 25. 25 Classical cascading effect of recursive service calls!
  26. 26. 26 Timeouts, Retries & Backoff Credits go to Adrian Hornsby (@adhorn)
  27. 27. 27 Bad Timeout & Retry Settings From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=31 App DB Conn Pool INSERT Timeout client side = 10s Timeout backend = default (e.g: 60s) INSERT INSERT Retry Retry Retry User 1 ERROR: Failed to get connection from pool
  28. 28. 28 Backoff between Retries From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=33 App DB Conn Poool INSERT Timeout client side = 10s Timeout backend = 10s – time elapsed Wait 2s before Retry User 1 Wait 4s before Retry Wait 8s before Retry Wait 16s before Retry Backoff
  29. 29. 29 Simple Exponential Backoff is not enough: Add Jitter No jitter With jitter From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=34
  30. 30. 30 Dependencies
  31. 31. 31 Look beyond the “Tip of the Iceberg”: Understanding Dependencies is critical!
  32. 32. 32 Who is depending on me? What is the risk of change?
  33. 33. 33 Recap - Common Anti-Patterns + Metrics to look at 1. N+1 call: # same Service Invocations per Request 2. N+1 query: # same SQL Invocations per Request 3. Payload flood: Transfer Size! 4. Granularity: # of Service Invocations across End-2-End Transaction 5. Tight Coupling: Ratio between Service Invocations 6. Inefficient Service Flow: # of Involved Services, # of Calls to each Service 7. Timeouts, Retries, Backoff: Pool Utilization, … 8. Dependencies: # of Incoming & Outcoming Dependencies
  34. 34. 34 Automated Deployment Validation https://github.com/keptn/pitometer https://github.com/keptn, www.keptn.sh
  35. 35. 35 Build #17 Build #21 Identify / Optimize Architectural Patterns Recursive Calls, N+1 Call Pattern, Chatty Interfaces, No Caching Layer … Automate Architectural Checks into CI/CD/CO!
  36. 36. 36 Automate Performance Checks into CI/CD/CO! How is Performance & Resource Consumption per Service Endpoint? Build #17 Build #21
  37. 37. 37 From Google: “Everything as Code” e.g: Enforce Architectural Rules
  38. 38. 38 From Dynatrace: „Performance Signature as Code“ evaluated through Jenkins “Performance Signature” for Build Nov 16 “Performance Signature” for Build Nov 17 “Performance Signature” for every Build “Multiple Metrics” compared to prev Timeframe Simple Regression Detection per Metric https://www.neotys.com/performance-advisory-council/thomas_steinmaurer
  39. 39. 39 Pitometer (part of @keptnProject): Metrics-based grading of a Deployment! Metric Source & Query Grading Details & Metric Score Pitometer Specfile Total Scoring Objectives 2GB Allocated Bytes (from Prometheus) > 2GB: 0 Points < 2GB: 20 Points 5% 2% < 2%: 0 Points < 5%: 10 Points > 5%: 20 Points Conversion Rate (Dynatrace) GraderSource If value: 3GB Score: 0 If value: 3.9% Score: 10 Total Score: 10
  40. 40. 40 Pitometer: Run Standalone - https://github.com/keptn/pitometer Init Source Source Grader Run Result
  41. 41. 41 Pitometer in keptn Autonomous Cloud Control Plane prodstage 1: push 2: deploy (shadow) 3: test (performance) 4: evaluate (scalability KPIs) 6: deploy (blue/green) 7: evaluate (business KPIs) 8: operate (NoOps) 5: promote Dev
  42. 42. 42 Resources • Keptn & Pitometer • www.keptn.sh • github.com/keptn • github.com/keptn/pitometer • Performance, Resiliency & Availablity Content • Adrian Hornsby (AWS): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns- 3742b5ba-e013-4f50-8512-00a65775f478 • Acacio Cruz (Google): https://www.spreaker.com/user/pureperformance/066-load-shedding-sre-at- google-with-aca • Thomas Steinmaurer (Dynatrace): https://www.neotys.com/performance-advisory- council/thomas_steinmaurer
  43. 43. Top Performance Challenges in Distributed Architectures

×