O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Straggler Free Data Processing in Cloud Dataflow

184 visualizações

Publicada em

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2p9iu6d.

Eugene Kirpichov describes the theory and practice behind Cloud Dataflow's approach to straggler elimination, as well as the associated non-obvious challenges, benefits, and implications of the technique. Filmed at qconlondon.com.

Eugene Kirpichov is a Senior Software Engineer on the Cloud Dataflow team at Google, working primarily on the autoscaling and work rebalancing secret sauce as well as the Apache Beam programming model. He is also very interested in functional programming languages, data visualization (especially visualization of behavior of parallel/distributed systems), and machine learning.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Straggler Free Data Processing in Cloud Dataflow

  1. 1. Eugene Kirpichov Senior Software Engineer No Shard Left Behind Straggler-free data processing in Cloud Dataflow
  2. 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations /cloud-dataflow-processing • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  4. 4. Google Cloud Platform 2 Workers Time
  5. 5. Google Cloud Platform 3
  6. 6. Plan Autoscaling Why dynamic rebalancing really matters If you remember two things Philosophy of everything above 01 02 03 Intro Setting the stage Stragglers Where they come from and how people fight them Dynamic rebalancing 1 How it works 2 Why is it hard 04 05
  7. 7. 01 Intro Setting the stage
  8. 8. Google Cloud Platform 6 Google’s data processing timeline 20122002 2004 2006 2008 2010 MapReduce GFS Big Table Dremel Pregel FlumeJava Colossus Spanner 2014 MillWheel Dataflow 2016 Apache Beam
  9. 9. Google Cloud Platform 7 Pipeline p = Pipeline.create(options); p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*")) .apply(FlatMapElements.via( word → Arrays.asList(word.split("[^a-zA-Z']+")))) .apply(Filter.byPredicate(word → !word.isEmpty())) .apply(Count.perElement()) .apply(MapElements.via( count → count.getKey() + ": " + count.getValue()) .apply(TextIO.Write.to("gs://.../...")); p.run(); WordCount
  10. 10. Google Cloud Platform 8 A K, V B K, [V] DoFn: A → [B]ParDo GBKGroupByKey MapReduce = ParDo + GroupByKey + ParDo
  11. 11. Google Cloud Platform 9 DoFn DoFn DoFn Running a ParDo shard 1 shard 2 shard N DoFn
  12. 12. Google Cloud Platform 10 Gantt charts shard N Workers Time
  13. 13. Google Cloud Platform 11 Large WordCount: Read files, GroupByKey, Write files. 400workers 20 minutes
  14. 14. 02 Stragglers Where they come from, and how people fight them
  15. 15. Google Cloud Platform 13 Stragglers Workers Time
  16. 16. Google Cloud Platform 14 Amdahl’s law: it gets worse at scale Higher scale ⇒ More bottlenecked by serial parts. #workers serial fraction
  17. 17. Google Cloud Platform 15 Process dictionary in parallel by first letter: ⅙ words start with ‘t’ ⇒ < 6x speedup Where do stragglers come from? Spuriously slow external RPCs Bugs Join Foos / Bars, in parallel by Foos. Some Foos have ≫ Bars than others. Bad machines Bad network Resource contention Uneven resources NoiseUneven partitioning Uneven complexity
  18. 18. Google Cloud Platform 16 Oversplit Hand-tune Use data statistics What would you do? Uneven resources Backups Restarts NoiseUneven partitioning Uneven complexity Predictive ⇒ Unreliable Weak
  19. 19. Google Cloud Platform 17 These kinda work. But not really. Manual tuning = Sisyphean task Time-consuming, uninformed, obsoleted by data drift ⇒ Almost always tuned wrong Statistics often missing / wrong Doesn’t exist for intermediate data Size != complexity Backups/restarts only address slow workers
  20. 20. Confidential & ProprietaryGoogle Cloud Platform 18 Upfront heuristics don’t work: will predict wrong. Higher scale → more likely.
  21. 21. Confidential & ProprietaryGoogle Cloud Platform 19 High scale triggers worst-case behavior. Corollary: If you’re bottlenecked by worst-case behavior, you won’t scale.
  22. 22. 03.1 Dynamic rebalancing How it works
  23. 23. Google Cloud Platform 21 Detect and fight stragglersWorkers Time
  24. 24. Google Cloud Platform 22 What is a straggler, really?Workers Time Slower than perfectly-parallel: tend > sum(tend ) / N
  25. 25. Google Cloud Platform 23 Split stragglers, return residuals into pool of work Workers Time Now Avg completion time 100 130 200 foo.txt 170 100 200170 170 keep running schedule (cheap, atomic)
  26. 26. Google Cloud Platform 24 Rinse, repeat (“liquid sharding”) Workers Time Now Avg completion time
  27. 27. Google Cloud Platform 25 1 ParDo Skewed 24 workers ParDo/GBK/ParDo Uniform 400 workers 50% 25%
  28. 28. Confidential & ProprietaryGoogle Cloud Platform 26 Get out of trouble > avoid trouble Adaptive > Predictive
  29. 29. 03.2 Dynamic rebalancing Why is it hard?
  30. 30. Google Cloud Platform 28 And that’s it? What’s so hard? Wait-free Perfect granularity What can be split? Data consistency Not just files APIs Non-uniform density Stuckness “Dark matter” Making predictions Testing consistency Debugging Measuring quality Being sure it worksSemantics Quality
  31. 31. Google Cloud Platform 29 What is splitting foo.txt [100, 200) foo.txt [100, 170) foo.txt [170, 200) split at 170
  32. 32. Google Cloud Platform 30 What is splitting: Associativity [A, B) + [B, C) = [A, C)
  33. 33. Google Cloud Platform 31 What is splitting: Rounding up [A, B) = records starting in [A, B) Random access ⇒ Can split without scanning data!
  34. 34. Google Cloud Platform 32 What is splitting: Rounding up [A, B) = records starting in [A, B) Random access ⇒ Can split without scanning data! apple beet fig grape kiwi lime pear rose squash vanilla [a, h) [h, s) [s, $) apple beet fig grape kiwi lime pear rose squash vanilla
  35. 35. Google Cloud Platform 33 What is splitting: Blocks [A, B) = records in blocks starting in [A, B)
  36. 36. Google Cloud Platform 34 What is splitting: Readers foo.txt [100, 200) foo.txt [100, 170) foo.txt [170, 200) split at 170 “Reader” Re-reading consistency: continue until EOF = re-read shard
  37. 37. Google Cloud Platform 35 Dynamic splitting: readers read not yet read not ok ok e.g. can’t split an arbitrary SQL query X = last record read: Exact, Increasing
  38. 38. Confidential & ProprietaryGoogle Cloud Platform 36 [A, B) = blocks of records starting in [A, B) [A, B) + [B, C) = [A, C) Random access ⇒ No scanning needed to split Reading repeatable, ordered by position, positions exact
  39. 39. Google Cloud Platform 37 Concurrency when splitting Read Process ... time should I split? ? ? ? While we wait, 1000s of workers idle. Per-element processing in O(hours) is common!
  40. 40. Google Cloud Platform 38 Concurrency when splitting Read Process ... Read Process ... split! ok. Split wait-free (but race-free), while processing/reading. see code: RangeTracker Per-element processing in O(hours) is common! split! ok. should I split? ? ? ? While we wait, 1000s of workers idle.
  41. 41. Google Cloud Platform 39 Perfectly granular splitting “Few records, heavy processing” is common. ⇒ Perfect parallelism required
  42. 42. Confidential & ProprietaryGoogle Cloud Platform 40 Separation: ParDo { record → sleep(∞) } parallelized perfectly (requires wait-free + perfectly granular)
  43. 43. Google Cloud Platform 41 Separation is a qualitative improvement /path/to/foo*.txt ParDo: expand glob foo5.txt foo42.txt foo8.txt foo100.txt foo91.txt ParDo: read records perfectly parallel over files perfectly parallel over records infinite scalability (no “shard per file”) foo26.txt foo87.txt foo56.txt See also: Splittable DoFn http://s.apache.org/splittable-do-fn
  44. 44. Confidential & ProprietaryGoogle Cloud Platform 42 “Practical” solutions improve performance “No compromise” solutions reduce dimension of the problem space
  45. 45. Google Cloud Platform 43
  46. 46. Google Cloud Platform 44 Making predictions: easy, right? 100 130 200 ~30% complete: 130 / [100, 200) = 0.3 Split at 70%: 0.7 [100, 200) = 170 apple beet fig grape kiwi ~50% complete: k / [a, z) ≈ 0.5 Split at 70%: 0.7 [a, z) ≈ t t 70%
  47. 47. Google Cloud Platform 45 100% t Progress 100% t Progress 100% t Progress 100% t Progress 100% Progress t Easy; usually too good to be true. 100% t100% t Progress tx px
  48. 48. Confidential & ProprietaryGoogle Cloud Platform 46 Accurate predictions = wrong goal, infeasible. Wildly off ⇒ System should still work Optimize for emergent behavior (separation) Better goal: detect stuckness
  49. 49. Google Cloud Platform 47 Heavy work that you don’t know exists, until you hit it. Goal: discover and distribute dark matter as quickly as possible. Dark matter 47
  50. 50. 04 Autoscaling Why dynamic rebalancing really matters
  51. 51. 49 How much work will there be? Can’t predict: data size, complexity, etc. What should you do? Adaptive > Predictive. Keep re-estimating total work; scale up/down (Image credit: Wikipedia) A lot of work ⇒ A lot of workers 49
  52. 52. 50 Start off with 3 workers, things are looking okay 10m 3 days Re-estimation ⇒ orders of magnitude more work: need 100 workers! 92 workers idle 100 workers useless without 100 pieces of work!
  53. 53. Google Cloud Platform 51 Now scaling up is no big deal! Add workers Work distributes itself Job smoothly scales 3 → 1000 workers. Autoscaling + dynamic rebalancing Waves of splitting Upscaling & VM startup
  54. 54. 05 If you remember two things Philosophy of everything above
  55. 55. Google Cloud Platform 53 Reducing dimension > Incremental improvement “Corner cases” are clues that you’re still compromising If you remember two things Adaptive > Predictive “No compromise” solutions matter Fighting stragglers > Preventing stragglers Emergent behavior > Local precision wait-free perfectly granular separation heavy records reading-as-ParDo rebalancing autoscaling reusability
  56. 56. Confidential & ProprietaryGoogle Cloud Platform 54 Thank you Q&A
  57. 57. Google Cloud Platform 55 Apache Beam No shard left behind: Dynamic work rebalancing in Cloud Dataflow Comparing Cloud Dataflow Autoscaling to Spark and Hadoop Splittable DoFn Documentation on Dataflow/Beam source APIs References
  58. 58. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/cloud- dataflow-processing

×