O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Functional architectural patterns

1.108 visualizações

Publicada em

The functional paradigm is not only applicable to programming. There is even more reason for using functional patterns at an architectural level. MapReduce is the most famous example of such a pattern. In this talk, we will go through a few other architectural patterns, and their corresponding stateful anti-patterns.

Publicada em: Software
  • Seja a primeira pessoa a gostar disto

Functional architectural patterns

  1. 1. Functional architectural patterns Lars Albertsson 1
  2. 2. Who’s talking? Swedish Institute of Comp. Sc. (test tools) Sun Microsystems (very large machines) Google (Hangouts, productivity) Recorded Future (NLP startup) Cinnober Financial Tech. (trading systems) Spotify (data processing & modelling) Schibsted (data processing & modelling) 2
  3. 3. Why functional? Verbs ... has made ... expanding ... ... flourishes ... merged ... has been unable to escape lingering .. built ... ... are ... placed ... say ... are ... to explode ... .. are considering ... to reopen … to recall ... 3
  4. 4. Or object-oriented? Nouns, pronouns ... bankruptcy ... government bailout ... automaker Chrysler ... comeback ... sales ... Jeep sport utility vehicles. ... Chrysler ... part ... Fiat Chrysler Automobiles, it ... concerns ... the safety ... Jeeps ... ... Jeeps ... gas tanks ... regulators ... safety advocates ... rear-end crash. ... regulators ... an investigation ... those Jeeps ... Fiat Chrysler’s agreement ... models. 4
  5. 5. Functional benefits? My version. Matches a few problems Data processing Matches a few computer properties Consistency through immutability Deterministic - replay for resilience 5
  6. 6. Local vs distributed properties Local Hardware provides strong consistency Faults -> death 6 Distributed Eventual consistency Faults must be survived
  7. 7. Architectural functional patterns Personal anti-pattern experiences Strive to look for Immutability Reexecution 7
  8. 8. MapReduce Discovered pattern, not invention Well known, enough said Succeeded by Spark RDD paradigm 8
  9. 9. Data flows 9 Users Page views Sales Sales reports Views with demographics Sales with demographics Conversion analytics Conversion analytics Views with demographics Dataset artifacts, typically files with date parameter. Raw Derived
  10. 10. Anti-pattern - isolated batch jobs Get data (more on that later) Cron an ETL batch job (function) Output solidifies. Mostly. Steps in isolation - often different teams What to do on ETL code changes? 10 Sales with demographics Views with demographics
  11. 11. Pattern: data pipeline End-to-end sequences/DAG of jobs Not only exist, but treated end-to-end Input is raw, original data Separate raw data from generated 11 Users Page views Sales with demographics Conversio n analytics Conversion analytics Views with demographics
  12. 12. Lambda architecture, part 1 Save all collected data without preprocessing But timestamp on generation, register, arrival Rerun everything downstream on code change Human fault tolerance In conflict with privacy management? 12
  13. 13. Pipeline workflow orchestration Ideally: Good old make + cluster + IDE + xUnit Test end-to-end Rebuild on upstream changes (but not all) State of practice: Luigi, Pinball, Azkaban Don’t take you all the way :-( 13
  14. 14. Lambda architecture, part 2 Parallel batch and real-time pipelines Batch more accurate, overrides Real-time for window of recent data 14
  15. 15. Obtaining data Log things. Conceptually stable, but collection is challenging at scale. Have legacy code and master data in databases? Let us have a look. 15
  16. 16. Database dimensioned for online traffic Hadoop = herd of elephants Load spike Height = #mapper nodes Area = #users Anti-pattern: direct dump 16 API
  17. 17. Direct dumps in the trenches Company successful - #users increasing More Sqoop mappers - higher DB load Daily dump jobs went to 25h Devops firewalled off Hadoop to recover 17
  18. 18. Anti-pattern: dump through API SOA/microservice culture DB protected by throttling API not used to elephants Query area is still large Herd of elephants through gate - 1-2 weeks 18 API
  19. 19. Anti-pattern: slave dump Protect live service by mirroring to a dump slave No online service risk, good! Why anti-pattern? 19
  20. 20. All dumps are non-deterministic HDFS down? Dump later. State is gone - dump not accurate Slave replication down? Dump not accurate 20
  21. 21. Anti-pattern: deterministic mirror Replay commit log until full day/hour Discovered through archaeology :-) Not scalable, point of failure Hourly dump took 45 minutes, increasing... 2121
  22. 22. (Anti-)pattern: better dumping Netflix Aegisthus Snapshot Cassandra (fast, atomic, reliable) Transfer SSTables to HDFS Replicate compaction in MapReduce Other DBs? Depends on atomic snapshot. 22
  23. 23. All dumps are anti-patterns? Typical use: Join activity events with user info Event time != dump time Aggregation discards information Which users enabled X, tried, and disabled? 23
  24. 24. Pattern: Event source All facts are events. Immutable, timestamped Event stream is source of truth No explicit “current state” The functional data architecture? 24
  25. 25. Event source incarnated: unified log Pour events into pub/sub bus, with long history. Kafka de-facto standard. Tap from bus to HDFS/S3 in time buckets. Camus/Secor Stream processing pipelines to dest topics Replay on code changes 25
  26. 26. Unified log, practical considerations Long history necessary Must have time to fix stream process bugs Use 3+ months and use stream as temp DB Unified log also useful for meta and control Tweak Kafka for low latency 26
  27. 27. Event source + views View = snapshot of aggregated state @ time For ETL, choice of hourly/daily aggregates or exact views 27 Logs View View
  28. 28. Event source + database Business logic may demand “current state” Event stream is truth, keep DB in sync 28
  29. 29. Event source, synced database A. Service interface generates events and DB transactions B. Generate stream from DB commit log. Postgres, MySQL -> Kafka C.Build DB with stream processing 29 APIAPIAPI
  30. 30. Deployment & orchestration System = many machines Desired system state = code + config Actual state = Orchestrator(current, desired) 30
  31. 31. Anti-pattern: stateful orchestration Orchestrator = Puppet|Chef|Ansible { current.changeSomeProperties(desired) return current // current.otherProperties unchanged } 31
  32. 32. Stateful orchestration in the trench Desired = { case roleA: install(x,y) case roleB: install(z) } Current = x installed on roleB. Old x. Zombie woke up when B load decreased. Puppet+apt = No simple way to remove undesired state 32
  33. 33. Pattern: artifacts from source Orchestrator = Docker|Packer { delete current return Image(desired) } No state leak from existing state. Sort of. 33
  34. 34. Deterministic, predictable? Image building leaky on purpose E.g. “apt-get update && apt-get install” Imports external state Ephemeral databases preserve state Ability to rebuild from unified log is valuable 34
  35. 35. Jay Kreps, Confluent: Unified log Martin Kleppman: Unified log, Bottled Water Nathan Marz: Lambda Sander Mak @ Jfokus: Event sourcing Datomic Questions? More? 35