O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Apache Flink and More @ MesosCon Asia 2017

461 visualizações

Publicada em

Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well-integrated applications. One of the latest additions to the Mesos family is Apache Flink.

Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos.

In this talk, we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.

Publicada em: Tecnologia
  • Entre para ver os comentários

Apache Flink and More @ MesosCon Asia 2017

  1. 1. Till Rohrmann trohrmann@apache.org @stsffap Apache Flink® 
 and More Jörg Schad joerg@mesosphere.io @joerg_schad
  2. 2. MapReduce is crunching Data
  3. 3. We need to turn faster!
  4. 4. SMACK Stack EVENTS Ubiquitous data streams from connected devices INGEST Apache Kafka STORE Apache Spark ANALYZE Apache Cassandra ACT Akka Ingest millions of events per second Distributed & highly scalable database Real-time and batch process data Visualize data and build data driven applications Mesos/ DC/OS Sensors Devices Clients
  5. 5. Evolution of Data Analytics Batch Event ProcessingMicro-Batch Days Hours Minutes Seconds Microseconds Solves problems using predictive and prescriptive analytics Reports what has happened using descriptive analytics Predictive User Interface Real-time Pricing and Routing Real-time Advertising Billing,
 Chargeback Product recommendations
  6. 6. 8
  7. 7. 9 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  8. 8. Apache Flink In a Nutshell 10 Event-driven applications
 (event sourcing, CQRS) Stateful, event-driven,
 event-time-aware processing Batch Processing (data sets) Stream Processing / Analytics (data streams, windows, …)
  9. 9. Apache Flink Stack 11 DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Libraries Streaming and batch as first class citizens.
  10. 10. Programming Model 12 Computation Computation Computation Computation Source Source Sink Sink Transformation state state state state
  11. 11. API & Execution 13 7 Source DataStream<String> lines = env.addSource(new FlinkKafkaConsumer010(…)); DataStream<Event> events = lines.map(line -> parse(line)); DataStream<Statistic> stats = stream .keyBy("id") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()); stats.addSink(new BucketingSink(path)); keyBy()/ window()/ apply() Transformation Transformation Sink Streaming Dataflow map()Source Sink
  12. 12. Distributed Runtime 14
  13. 13. Levels of Abstraction 15 Process Function (events, state, time) DataStream API (streams, windows) Table API (dynamic tables) Stream SQL low-level (stateful stream processing) stream processing & analytics declarative DSL high-level language
  14. 14. What Is Flink Good For? 16
  15. 15. 17 Detecting fraud in real time As fraudsters get better, need to update models without downtime Live 24/7 service Credit card transactions Notifications and alerts Evolving fraud models built by data scientists @
  16. 16. 18 ▪ Athena X ▪ SQL to define metrics ▪ Thresholds and actions to trigger ▪ Blends analytics and
 actions Streams from Hadoop, Kafka, etc SQL, thresholds, actions Analytics Alerts Derived streams @
  17. 17. 19 ▪ Route events to Kafka, ES, Hive ▪ Complex interaction sessions rules ▪ Mix of stateless / small state / large state ▪ Stream Processing as a Service • Launching, monitoring, scaling, updating • DSL to define jobs @
  18. 18. 20 ▪ Blink based on Flink ▪ A core system in Alibaba Search • Machine learning, search, recommendations • A/B testing of search algorithms • Online feature updates to boost conversion rate ▪ Alibaba is a major contributor to Flink ▪ Contributing many changes back to open source @
  19. 19. 21 Complete social network Implemented using event sourcing and
 CQRS (Command Query Responsibility Segregation) @
  20. 20. Apache Flink & Apache Mesos 22
  21. 21. Why Apache Mesos? ▪ Mesos offers full functionality to implement fault tolerant and elastic distributed applications ▪ 30% of survey respondents were running Flink on Mesos (prior to proper Mesos support, September 2016) 23
  22. 22. Flink’s Mesos Integration 24 ▪ Kudos to Eron Wright ( EronWright) for this work Apache Flink Framework Mesos Master Mesos App Master Flink Mesos
 ResourceManager JobManager Mesos Task TaskManager Mesos Task TaskManager Allocate Resources Launch Mesos tasks Register Execute Job
  23. 23. Resource Manager Components ▪ Monitors connection to Mesos 25 Connection Monitor Launch Coordinator ▪ Resource offer processing and task scheduling ▪ Gathers offers and matches them to tasks using Fenzo Task Monitor Reconciliation Coordinator ▪ Monitors Mesos tasks ▪ Triggers reconciliation ▪ Makes sure tasks are properly killed ▪ Reconciles tasks view between ResourceManager and Mesos Master
  24. 24. Component Interplay 26 ResourceManager Connection Monitor Launch Coordinator Task MonitorReconciliation Coordinator Mesos Master Resource offers Launch tasks Monitor tasks Status messages Trigger reconciliation Status messages Mesos Task Reconcile tasks Start TaskManagers Recover tasks Kill task
  25. 25. Fenzo ▪ Developed by Netflix ▪ Generic task scheduler for frameworks ▪ Matching between tasks and resource offers • Pluggable fitness evaluator 27 Fenzo Mesos Launch Coordinator Periodic resource offers Tell Fenzo offered resources & tasks Fenzo returns resource task matchings Tasks to launch
  26. 26. Datacenter
  27. 27. NAIVE APPROACH Typical Datacenter
 siloed, over-provisioned servers,
 low utilization Industry Average
 12-15% utilization mySQL microservice Cassandra Flink Kafka
  28. 28. © 2017 Mesosphere, Inc. All Rights Reserved. 30
  29. 29. Apache Mesos Typical Datacenter
 siloed, over-provisioned servers,
 low utilization Industry Average
 12-15% utilization mySQL microservice Cassandra Flink Kafka Mesos
 automated schedulers, workload multiplexing onto the same machines
  30. 30. 
 Why Mesos? " 2-level scheduling " Fault-tolerant, battle-tested " Scalable to 10,000+ nodes " Created by Mesosphere founder @ UC Berkeley; used in production by 100+ web-scale companies [1] [1] http://mesos.apache.org/documentation/latest/powered-by-mesos/ APACHE MESOS
  31. 31. DC/OS Datacenter Operating System (DC/OS) Distributed Systems Kernel (Mesos) Big Data + Analytics Engines Microservices (in containers) Streaming Batch Machine Learning Analytics Functions & Logic Search Time Series SQL / NoSQL Databases Modern App Components Any Infrastructure (Physical, Virtual, Cloud)
  32. 32. © 2016 Mesosphere, Inc. All Rights Reserved. DEMO
  33. 33. Conclusion 36
  34. 34. Conclusion ▪ Apache Flink runs on Mesos using Fenzo ▪ DC/OS offers easy to use Flink package ▪ Contributions welcome!
 DC/OS Office Hour June 29th 37
  35. 35. Thank you! @stsffap @joerg_schad @ApacheFlink @dataArtisans @dcos
  36. 36. 39
  37. 37. We are hiring! data-artisans.com/careers