O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Apache flink 1.7 and Beyond

1.815 visualizações

Publicada em

The streaming space is evolving at an ever increasing pace. This trend is also reflected in Apache Flink whose latest major release included again many new features. For streaming practitioners it is essential to learn about Flink's newest capabilities because often they enable completely new use cases and applications.

In this talk, I want to give a brief overview about Apache Flink and its latest feature additions, including the integration of CEP with streaming SQL, proper support for state evolution, temporal joins and many more. Furthermore, I want to put them in perspective with respect to Flink's future direction by giving some insights into ongoing development threads in the community. Thereby, I intend to give attendees a better picture about Flink's current and future capabilities.

Publicada em: Tecnologia
  • Entre para ver os comentários

Apache flink 1.7 and Beyond

  1. 1. Apache Flink® 1.7 and Beyond 公司:data Artisans 职位:Engineering Lead 演讲者:Till Rohrmann @stsffap 1
  2. 2. 2 Original creators of Apache Flink® dA Platform Stream Processing for the Enterprise
  3. 3. 3 What is Apache Flink? Batch Processing process static and historic data Data Stream Processing realtime results from data streams Event-driven Applications data-driven actions and services Stateful Computations Over Data Streams
  4. 4. Flink 1.7: What happened so far? 4
  5. 5. • Contributors: 112 • Resolved issues: 430 • Commits: 970 • Changes LOC: +103824/-63124 5 Flink 1.7.0 in Numbers
  6. 6. • E.g. changing requirements, new algorithms, better serializers, bug fixes, etc. • Expensive to restart application from scratch (maintain state) 6 Flink Applications Need to Evolve
  7. 7. • Support for changing state schema • Adding/Removing fields • Changing type of fields • Currently fully supported when using Avro types 7 State Schema Evolution “Upgrading Stateful Flink Streaming Applications: State of the Union” by Tzu-Li Tai Today @ 5:20 pm Room 2
  8. 8. 8 Converting Currencies 7:12pm 9:37am 8:45am € 1 $ 1.13 CN¥ 7.8
  9. 9. 9 Temporal Tables and Joins 13 11 7 Currency Rate Time CN¥ 7.8 3 CN¥ 7.89 5 CN¥ 7.75 915 14 12 7 4
  10. 10. 10 SQL for Pattern Analysis SELECT * from ?
  11. 11. 11 MATCH_RECOGNIZESELECT * FROM TaxiRides MATCH_RECOGNIZE ( PARTITION BY driverId ORDER BY rideTime MEASURES S.rideId as sRideId AFTER MATCH SKIP PAST LAST ROW PATTERN (S M{2,} E) DEFINE S AS S.isStart = true, M AS M.rideId <> S.rideId, E AS E.isStart = false AND E.rideId = S.rideId )
  12. 12. • ElasticSearch 6 Table Sink • Support for views in SQL Client • More built-in functions: TO_BASE64, LOG2, REPLACE, COSH,… 12 More SQL Improvements “Flink Streaming SQL 2018” by Piotr Nowojski Today @ 4:00 pm Room 2
  13. 13. • Scala 2.12 Support • Exactly-once S3 StreamingFileSink • Kafka 2.0 connector • Versioned REST API • Removal of legacy mode 13 Other Notable Features
  14. 14. Flink 1.8+: What is happening next? 14
  15. 15. 15 Capability Spectrum offline real time Batch Event-driven applications Streaming analytics Strict SLA applications Flink
  16. 16. • Deploying Flink applications should be as easy as starting a process • Bundle application code and Flink into a single image • Process connects to other application processes and figures out its role • Removing the cluster out of the equation 16 Flink as a Library P1 P2 P3 P4 New process
  17. 17. • Active mode • Flink is aware of underlying cluster framework • Flink allocate resources • E.g. existing YARN and Mesos integration • Reactive mode • Flink is oblivious to its runtime environment • External system allocates and releases resources • Flink scales with respect to available resources • Relevant for environments: Kubernetes, Docker, as a library 17 Reactive vs. Active
  18. 18. 18 Dynamic Scaling • Latency • Throughput • Resource utilization • Connector signals
  19. 19. • No fundamental difference between batch and stream processing • Batch allows optimizations because data is bounded and ”complete” • Batch and streaming still separately treated from task level upwards • Working toward a single runtime for batch and streaming workloads 19 Batch-Streaming Unification
  20. 20. • Lazy scheduling (batch case) • Deploy tasks starting from the sources • Whenever data is produced start consumers • Scheduling of idling tasks  resource under-utilization 20 Flink Scheduler src src join join src build side build side prob e side probe side
  21. 21. • More efficient scheduling by taking dependencies into account • E.g. probe side is only scheduled after build side has been processed 21 Batch Scheduler src src join join src build side build side prob e side probe side (1) (2) (2) (3)
  22. 22. • Make Flink’s scheduler extendable & pluggable • Scheduler considers dependencies and reacts to signals from ExecutionGraph • Specialized scheduler for different use cases 22 Extendable Scheduler Scheduler Streaming Scheduler Batch Scheduler Speculative Scheduler
  23. 23. • Tasks own produced result partitions • Containers cannot be freed until result is consumed • One implementation for streaming and batch loads 23 Flink’s Shuffle Service Result partitionContainer
  24. 24. • Result partitions are written to an external shuffle service • Containers can be freed early • Different implementations based on use case 24 External & Persistent Shuffle Service External shuffle service (e.g. Yarn, DFS)
  25. 25. • Support for external catalogs (Confluent Schema Registry, Hive Meta Store) • Data definition language (DDL) 25 End-to-end SQL Only Pipelines Hive Meta Store Table Source Table Sink Output schema information Input schema information SQL Query
  26. 26. • Flink 1.7.0 added many new features around SQL, connectors and state evolution • A lot of new features in the pipeline • Join the community! • Subscribe to mailing lists • Participate in Flink development • Become active 26 TL;DL
  27. 27. 谢谢 THANKS 27

×