O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Spark your legacy - Distributing an 8-year Monolith

1.339 visualizações

Publicada em

Spark users understand the potential of Spark for heavy-weight distributed processing. But how does one migrate an 8-years-old, single-server, MySQL-based legacy system to such new shiny frameworks? How do you accurately preserve the behavior of a system consuming Gigabytes of data every day, hiding numerous undocumented implicit gotchas and changing constantly, while shifting to brand new development paradigms? In this talk I'll present Kenshoo's attempt at this challenge, where we migrated a legacy aggregation system to Spark. Our solutions include heavy usage of metrics and graphite for analyzing production data; "local-mode" client enabling reuse of legacy tests suits; data validations using side-by-side execution; and maximum reuse of code through refactoring and composition. Some of these solution use Spark-specific characteristics and features.

Publicada em: Software
  • Seja o primeiro a comentar

Spark your legacy - Distributing an 8-year Monolith

  1. 1. 1© 2015 Kenshoo, Ltd. Proprietary Information Spark Your Legacy: Distributing an 8-year Monolith Tzach Zohar, Kenshoo, May 2015
  2. 2. 2© 2015 Kenshoo, Ltd. Proprietary Information Who? Tzach Zohar Architect @ Kenshoo tzach.zohar@kenshoo.com http://il.linkedin.com/in/tzachzohar
  3. 3. 3© 2015 Kenshoo, Ltd. Proprietary Information Where? ● Online advertising technology ● 9-year old startup ● ~500 employees ● Data-intensive (aren’t we all?)
  4. 4. 4© 2015 Kenshoo, Ltd. Proprietary Information Agenda ● Project Background ● Why not to Greenfield ● Refactoring Challenges ● Solutions
  5. 5. 5© 2015 Kenshoo, Ltd. Proprietary Information Project Background
  6. 6. 6© 2015 Kenshoo, Ltd. Proprietary Information Domain: Data Aggregation ● Of: advertising metrics ● On: versatile, batched, occasionally re-stated input ● By: many different keys ● When: now + ~0.5 hour ● While: filtering and normalising per business rules ● For: eternity (data lives forever)
  7. 7. 7© 2015 Kenshoo, Ltd. Proprietary Information Domain: Data Aggregation Slow Sources Fast Custom Re-stated Normalize Aggregate By X By Y By X + Y ... Observations
  8. 8. 8© 2015 Kenshoo, Ltd. Proprietary Information Domain: Data Aggregation Slow Sources Fast Custom Re-stated Normalize Aggregate By X By Y By X + Y ... Observations Aggregate
  9. 9. 9© 2015 Kenshoo, Ltd. Proprietary Information Requirement: Better, Faster ● Higher throughput: business is growing ● More keys: and ad-hoc aggregations ● Linear scalability: anything else is not cost-effective ● Easy to enhance: by any decent developer
  10. 10. 10© 2015 Kenshoo, Ltd. Proprietary Information Chosen Design: Spark sources Normalize Driver HDFS + Spark Cluster
  11. 11. 11© 2015 Kenshoo, Ltd. Proprietary Information Chosen Design: Spark sources Normalize Driver HDFS + Spark Cluster Landing Zone
  12. 12. 12© 2015 Kenshoo, Ltd. Proprietary Information Chosen Design: Spark sources Normalize Driver HDFS + Spark Cluster Landing Zone By X By Y By X+Y ... Spark Jobs
  13. 13. 13© 2015 Kenshoo, Ltd. Proprietary Information B: New Shiny System Great, but how do we get there? A: Legacy System Refactoring? “Greenfield” project? ???
  14. 14. 14© 2015 Kenshoo, Ltd. Proprietary Information Why Not to “Greenfield”
  15. 15. 15© 2015 Kenshoo, Ltd. Proprietary Information Q1 Q3Q2 Legacy Challenge: Moving Target
  16. 16. 16© 2015 Kenshoo, Ltd. Proprietary Information Q1 Q3Q2 Legacy New System Challenge: Moving Target
  17. 17. 17© 2015 Kenshoo, Ltd. Proprietary Information Q1 Q3Q2 Legacy Legacy’ New System Challenge: Moving Target
  18. 18. 18© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Zero Diff Tolerance ● Different clients have different data, different customizations, different scales ● Our data is often validated against external sources
  19. 19. 19© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Code Is Our Only Spec ? But it isn’t necessarily a friendly one...
  20. 20. 20© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Code Is Our Only Spec What exactly should the new system do?
  21. 21. 21© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Test Reuse? Tests assume a single-server setup...
  22. 22. 22© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Test Reuse? Some are coupled with current implementation...
  23. 23. 23© 2015 Kenshoo, Ltd. Proprietary Information Refactoring Challenges
  24. 24. 24© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Legacy Code Some of it still untested
  25. 25. 25© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Tight Coupling Implementation is tightly coupled with many other components Kenshoo Server Search Engines SE API Facade WebUserInterface Proxy Servers Client's Website Client Users Client Systems / DWH Entity Mgmt/ DAO Normalizers Optimization Algorithms DataProviders/ ScoreSQL Builder Client Configuration SEM Entity Data Performance Data Campaign Generation Tools (RTC, KW Tool) Report Generation Bulk Editing and Advanced Features Co nf. DA O Kenshoo Editor FTP Sites Tracking Processor Aggregator HELP ME!
  26. 26. 26© 2015 Kenshoo, Ltd. Proprietary Information Challenge: Paradigm Shift How do you gradually refactor a single-node java application into a distributed Spark application?
  27. 27. 27© 2015 Kenshoo, Ltd. Proprietary Information Solutions
  28. 28. 28© 2015 Kenshoo, Ltd. Proprietary Information Legacy System New System Solution #1: Shared Code
  29. 29. 29© 2015 Kenshoo, Ltd. Proprietary Information Legacy System New System Solution #1: Shared Code Core Business Rules 1. Refactor legacy code to create stand-alone jar
  30. 30. 30© 2015 Kenshoo, Ltd. Proprietary Information Legacy System New System Solution #1: Shared Code Core Business Rules 2. Build new system around this core code 1. Refactor legacy code to create stand-alone jar Core Business Rules
  31. 31. 31© 2015 Kenshoo, Ltd. Proprietary Information Solution #1: Shared Code Business rules refactored into Java static methods, to avoid serialization issue in Spark
  32. 32. 32© 2015 Kenshoo, Ltd. Proprietary Information Solution #2: Empiric Reverse Engineering
  33. 33. 33© 2015 Kenshoo, Ltd. Proprietary Information Solution #2: Empiric Reverse Engineering
  34. 34. 34© 2015 Kenshoo, Ltd. Proprietary Information Solution #2: Empiric Reverse Engineering
  35. 35. 35© 2015 Kenshoo, Ltd. Proprietary Information Solution #3: Local Mode Testing Legacy System New Aggregation System Spark
  36. 36. 36© 2015 Kenshoo, Ltd. Proprietary Information Solution #3: Local Mode Testing Legacy System New Aggregation System Spark 1. Embed Spark in Aggregation System
  37. 37. 37© 2015 Kenshoo, Ltd. Proprietary Information Solution #3: Local Mode Testing Legacy System New Aggregation System Spark 1. Embed Spark in Aggregation System 2. Embed Aggregation System in Legacy
  38. 38. 38© 2015 Kenshoo, Ltd. Proprietary Information Solution #4: Side-by-Side Both at the component level and at the system level
  39. 39. 39© 2015 Kenshoo, Ltd. Proprietary Information Solution #4: Side-by-Side
  40. 40. 40© 2015 Kenshoo, Ltd. Proprietary Information Solution #4: Side-by-Side
  41. 41. 41© 2015 Kenshoo, Ltd. Proprietary Information Solution #4: Side-by-Side
  42. 42. 42© 2015 Kenshoo, Ltd. Proprietary Information Questions?

×