O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

719 visualizações

Publicada em

Apache Flink® is an open-source stream processing framework for distributed and accurate data streaming applications. An increasing number of IoT use cases will (and some already do) require robust processing frameworks that can handle an ever-increasing amount of data and provide insights in real time. Apache Flink is one of the contenders for the top spot among such frameworks and in this presentation Aljoscha Krettek will highlight some of the properties that make Flink so well suited for IoT use cases: We will first learn what stream processing frameworks in general provide before diving into stateful stream processing and event-time based stream-processing. We will see why these two features are important for IoT scenarios and also why they, together with Flink’s robust handling of failures, enable accurate and robust analytics on real-time streaming data.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

  1. 1. 1 Aljoscha Krettek @aljoscha ApacheCon North America May, 2017 Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics
  2. 2. What I’d Like to Talk About 2 § IoT and event-time stream processing § Stateful stream processing § Streaming architecture and Flink
  3. 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  4. 4. IoT and Event-time Stream Processing 4
  5. 5. Example Event Sources 5
  6. 6. A Simple Definition 6 IoT use cases from the system’s perspective: A large number of (distributed) things continuously generating a large amount of data.
  7. 7. IoT: Some Insights 7 § Data is continuously produced → Stream Processing § Events have a timestamp → Event-time based processing § Data/Events can arrive with huge delays/out-of-order § Most analyses happen on time windows
  8. 8. What Is Event-Time Processing 8 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  9. 9. What Is Event-Time Processing 9 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
  10. 10. What’s The Problem? 10 13 12 735961112 1234567891011121314 Processing Time Processing-Time Windows 137356 12 137 356Event-Time Windows 12 1112 Mismatch between event time and processing time.
  11. 11. Sources of Time Mismatch § Big Mismatch • Network disconnects • Slow network § Small Mismatch • The nature of distributed systems • Differing system clock time 11
  12. 12. Small Event-Time Mismatch 12 Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. Recap: Event-Time § IoT use cases need event-time processing § Even small mismatch of event time/processing time will lead to wrong results 16
  17. 17. (Stateful) Stream Processing 17
  18. 18. Stream Processing 18 Computation Computations on never-ending “streams” of data records (“events”)
  19. 19. Distributed Stream Processing 19 Computation Computation spread across many machines Computation Computation
  20. 20. Stateful Stream Processing 20 Computation State State is usually partitioned by some key in the data
  21. 21. Stateful Stream Processing II 21 § Result depends on history of stream § A stateful stream processor should gives the tools to manage state • Recover, roll back, version upgrade, etc.
  22. 22. 22 app state app state app state event log Query service
  23. 23. Recap: Stateful Streams § Continuous processing of data that is continuously generated § I.e., pretty much all “big” data § It’s all about state and time § Flink does all of that 23
  24. 24. Operational Issues 24
  25. 25. Operational Questions § What happens in case of failures? § What if I need to update my code/Flink? § Can I re-process my data? § How can I execute my programs? 25
  26. 26. Failure Handling § JobManager High-Availability using ZooKeeper § Periodic checkpoints of state to persistent storage (HDFS, S3, …) § In case of failure: rollback to previous consistent state 26
  27. 27. Savepoints § A persistent snapshot of all state § When starting an application, state can be initialized from a savepoint § In-between savepoint and restore we can update Flink version or user code 27
  28. 28. Closing 28
  29. 29. TL;DR § Stateful stream processing is nice 😎 § IoT use cases require proper time management § Apache Flink is a stateful stream processor with plenty of nifty features 29
  30. 30. 3 Thank you! @aljoscha @ApacheFlink @dataArtisans
  31. 31. Backup Slides 31
  32. 32. Event-Time Processing 32
  33. 33. What Is Event-Time Processing 33 1977 1980 1983 1999 2002 2005 2015 Processing Time Episode IV Episode V Episode VI Episode I Episode II Episode III Episode VII Event Time
  34. 34. What Is Event-Time Processing 34 1312735961112 1234567891011121314 Processing Time Event timestamp Message Queue
  35. 35. What is Event-Time Streaming § Events have timestamps § Processing depends on timestamps § An event-time stream processor should give you the tools to reason about time • Handle streams that are out of order 35 Your code state t3 t1 t2t4 t1-t2 t3-t4
  36. 36. Recap: Event-Time § IoT use cases need event-time processing § Even small mismatch of event time/processing time will lead to wrong results 36
  37. 37. History of Flink 37
  38. 38. A brief History of Flink 38 January ‘10 December ‘14 v0.5 v0.6 v0.7 March ‘16 Flink Project Incubation Top Level Project v0.8 v0.10 Release 1.0 Project Stratosphere (Flink precursor) v0.9 April ‘14
  39. 39. A brief History of Flink 39 January ‘10 December ‘14 v0.5 v0.6 v0.7 March ‘16 Flink Project Incubation Top Level Project v0.8 v0.10 Release 1.0 Project Stratosphere (Flink precursor) v0.9 April ‘14 The academia gap: Reading/writing papers, teaching, worrying about thesis Realizing this might be interesting to people beyond academia (even more so, actually)

×