O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

772 visualizações

Publicada em

Learn how the combination of Apache Kafka and Apache Flink is making stateful stream processing even more expressive and flexible to support applications in streaming that were previously not considered streamable.

The new world of applications and fast data architectures has broken up the database: Raw data persistence comes in the form of event logs, and the state of the world is computed by a stream processor. Apache Kafka provides a strong solution for the event log, while Apache Flink forms a powerful foundation for the computation over the event streams.

In this talk we discuss how Flink’s abstraction and management of application state have evolved over time and how Flink’s snapshot persistence model and Kafka’s log work together to form a base to build ‘versioned applications’. We will also show how end-to-end exactly-once processing works through a smart integration of Kafka’s transactions and Flink’s checkpointing mechanism.

Publicada em: Tecnologia
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka for Stateful Streaming Applications"

  2. 2. 2 Original creators of Apache Flink® dA Platform Stream Processing for the Enterprise
  3. 3. What is Apache Flink? 3 Batch Processing process static and historic data Data Stream Processing realtime results from data streams Event-driven Applications data-driven actions and services Stateful Computations Over Data Streams
  4. 4. Apache Flink in a Nutshell 4 Queries Applications Devices etc. Database Stream File / Object Storage Stateful computations over streams real-time and historic fast, scalable, fault tolerant, in-memory, event time, large state, exactly-once Historic Data Streams Application
  5. 5. Everything Streams 5 Apache Flink handles everything as streams internally. Continuous streaming and applications use "unbounded streams". Batch processing and finite applications use "bounded streams".
  6. 6. Layered abstractions 6 Process Function (events, state, time) DataStream API (streams, windows) Stream SQL / Tables (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } Navigate simple to complex use cases
  7. 7. DataStream API 7 Source Transformation Windowed Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Streaming Dataflow Source Transform Window (state read/write) Sink
  8. 8. Low Level: Process Function 8
  9. 9. High Level: SQL (ANSI) 9 SELECT campaign, TUMBLE_START(clickTime, INTERVAL ’1’ HOUR), COUNT(ip) AS clickCnt FROM adClicks WHERE clickTime > ‘2017-01-01’ GROUP BY campaign, TUMBLE(clickTime, INTERVAL ‘1’ HOUR) Query past futurenowstart of the stream
  10. 10. 10 How Large (or Small) can Flink get?
  11. 11. 11 Blink is Alibaba's Flink-based System
  12. 12. 12 Keystone Routing Pipeline at Netflix (as presented at Flink Forward San Francisco, 2018)
  13. 13. Small Flink  Can run in single process  Some users run it on IoT Gateways  Also runs with zero dependencies in IDE 13
  14. 14. 14 Checkpoints instead of Transactions
  15. 15. Event Sourcing + Memory Image 15 event log persists events (temporarily) event / command Process main memory update local variables/structures periodically snapshot the memory
  16. 16. Event Sourcing + Memory Image 16 Recovery: Restore snapshot and replay events since snapshot event log persists events (temporarily) Process
  17. 17. Consistent Distributed Snapshots 17
  18. 18. Checkpoints for Recovery 18 Re-load state Reset positions in input streams Rolling back computation Re-processing
  19. 19. Why Checkpoints?  No barriers / boundaries  low latency  No intermediate stream/state replication necessary • High throughput • Shuffles are very cheap! No load on brokers.  Handles very large state well (TBs)  Supports fast batch processing  Supports flexibly types of states and timers 19
  20. 20. Incremental Snapshots 20
  21. 21. Localized State Recovery (Flink 1.5) 21 Piggybags on internal multi-version data structures: • LSM Tree (RocksDB) • MV Hashtable (Fs / Mem State Backend) Setup: • 500 MB state per node • Checkpoints to S3 • Soft failure (Flink fails, machine survives)
  22. 22. Checkpoints for Program Evolution 22 Restore to different programs Bugfixes, Upgrades, A/B testing, etc
  23. 23. State Archiving Through Savepoints 23 time
  24. 24. Replay from Savepoints to Drill Down 24 time Incident of Interest "Debug Job" (modified version of original Job) Filter (events of interest only) Extra sink for trace output
  25. 25. Pause / Resume style execution 25 time Bursty Event Stream (events only at only end-of-day )
  26. 26. Pause / Resume style execution 26 time Bursty Event Stream (events only at only end-of-day ) Checkpoint / Savepoint Store
  27. 27. 27 Flink and Kafka Integration
  28. 28. Flink Kafka Reader  Supports version 0.8 – 0.11/1.0  Exactly-once semantics • Flink checkpoints manage offsets • Can optionally participate in reader groups offset committing  Topic and partition discovery  Multiple topics at the same time  Per-partition watermarking 28
  29. 29. Flink Kafka Writer  Supports version 0.8 – 0.11/1.0  Exactly-once via Kafka Transactions (0.11+) • Details in a later sections  Supports partitioners, timestamps, and the usual 29
  30. 30. Transaction Coordination  Similar to a distributed 2-phase/3-phase commit  Coordinated by asynchronous checkpoints ==> non-blocking, no voting delays  Basic algorithm: • Between checkpoints: Produce into transaction or Write Ahead Log • On operator snapshot: Flush local transaction (vote-to-commit) • On checkpoint complete: Commit transactions (commit) • On recovery: check and commit any pending transactions 30
  31. 31. Exactly-once via Transactions 31 chk-1 chk-2 TXN-1 ✔chk-1 ✔chk- 2 TXN-2 ✘ TXN-3 ✔ global ✔ global
  32. 32. Transaction fails after local snapshot 32 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global
  33. 33. Transaction fails before commit… 33 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global ✔ global
  34. 34. … commit on recovery 34 chk-2 TXN-2 TXN-3 ✔ global recover TXN handle chk-3
  35. 35. 35 Some interesting Flink and Kafka Use Cases
  36. 36. Flink and Kafka 36 Application Sensor APIs Application Application Application
  37. 37. Data Stream Parsing & Routing 37 Steven Wu / Netflix - "Scaling Flink in Cloud"
  38. 38. Machine Learning Pipelines 38 Dave Torok & Sameer Wadkar (Comcast) "Embedding Flink Throughout an Operationalized Streaming ML Lifecycle"
  39. 39. Machine Learning Pipelines 39 Xingzhong Xu / Uber "Scaling Uber’s Realtime Optimization with Apache Flink"
  40. 40. Real-time and Historic Data 40 Kafka HDFS, S3, GCS, SAN, NAS, NFS, ECS, Swift, Ceph, … The anatomy of a data stream
  41. 41. Combining S3 and Kafka/Kinesis data 41 Gregory Fee / Lyft - "Bootstrapping State In Apache Flink"
  42. 42. Event Sourcing CQRS Applictions 42 Aris Koliopoulos "Drivetribe's Kappa Architecture with Apache Flink"
  43. 43. Sophisticated Time Semantics 43 Erik de Nooij / ING "StreamING models, how ING adds models at runtime to catch fraudsters" Low-latency event-time joins/aggregations
  44. 44. Large and complex state 44
  45. 45. 45 Outlook
  46. 46. Going for more languages 46
  47. 47. Flink 1.5  Big change to process model • Better support for Framework and Library modes • Dynamic resource acquisition • All communication with clients is REST  Special network protocol to speed up checkpoint alignments  Lower shuffle latency with same throughput  Faster recovery of large state  Managed broadcast state  Interactive SQL Client (beta)  … much more … 47
  48. 48. 48 Thank you! Questions?