SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
DON'T CROSS THE STREAMS!
STREAMING AND APACHE FLINK
Senior Data Consultant
Dublin
JOHN GORMAN
amberhand
WHAT WE WILL COVER
It's all about pain!
Streaming and Related Terminology
Stream Processing Engines
Apache Flink
It started with a pain...a so ware pain
Things were big, slow & shaky....and getting worse!
The calm before the storm
Batch Processing (High Latency, inability to reason about
time)
Coupled systems prevented fast delivery of single change
requirements
Processing large distributed data
Messaging incorporated business logic (Service Bus)
Customers demanded immediate insight/action
Event Ordering/Timing, Consistency, Data Lineage
Lack of Fault Tolerant Systems
Someone noticed the need to change some time back...
Oh! The other Michael Hammer...
Ref: Michael Hammer - Harvard Business Review 1990
“We cannot achieve breakthroughs in
performance by cutting fat or automating
existing processes. Rather, we must
challenge old assumptions and shed the
old rules that made the business
underperform in the first place.”
Ref: Michael Hammer - Harvard Business Review 1990
“These rules of work design are based on
assumptions about technology, people,
and organisational goals that no longer
hold”
So...So ware Legends set out to fix it...
THE PERFECT STORM
Elements of the "Perfect Storm"
Elements of the "Perfect Storm" contd.
Can something save us?
Streams!
flowing from a to a
Any event that happens internal or external to your
company is fair game for inclusion in a stream!
WHAT ARE STREAMS?
Unbounded Events Producer Consumer
Streaming obliterates old working habbits, not automates
them
When did you last drop a DVD back to your video store ?
Convenience of streaming films won out
Anyone using Dublin Bus still carry a timetable?
Realtime with Context is needed...
SOME OTHER COMMON STREAM EXAMPLES
Log files
User website clicks,
Finance stocks
Social media streams
Ideal Stream Charactristics
Low Latency (Time required to produce some result)
High Throughput (Number of results produced in time)
Persisted for reuse
Fault Tolerant
Scalable Event Production (i.e. Partitioning)
Scaleable Event Consumption (i.e. Consumer Groups)
Consumer manages state (offsets)
Handle Back Pressure
Benefits of streams
Ability to augment and enrich data streams
Duality of Streams and Tables (Only Streams Work)
Replay from define offset
Stream outputs can become stream inputs (unix pipes!)
Data first - Processing Later (Fast feature creation)
Stream your monitoring (Logs, Ops Metrics, Business KPI
etc.)
Benefits of streams contd.
Location in Time Testing (Bugs In Code)
Replication for Scale
Cross/Join prior unrelated sources (i.e. Time, Context -
Analytics)
Point of Record Stream (produce suitable Materialized
Views)
MOST POPULAR STREAMING TOOLS
Apache Kafka
Amazon Kinesis - Based on Kafka Ideas
MapR Streams - Uses Kafka API (adds resilience features)
Can these Streams handle the load ?
Apache Kafka Data Handling at LinkedIn
LinkedIn Engineering Blog March 20, 2015
We have the stream! Now what?
Enter the Stream Processing Engine
What is a Stream Processing Engine ?
8 Requirements of a Real-Time Stream Processing Engine
(Michael Stonebraker)
1. Keep the data moving
2. Query using SQL on Stream
3. Handle Stream Imperfections (Delayed, Missing, Out-Of-
Order Data)
4. Generate Predictable Outcomes
5. Integrate Stored and Streaming Data
6. Guarantee Data Safety and Availabilty
7. Partition and Scale Applications Automatically
8. Process and Respond Instantaneously
OK - Engines on... What can we do with it ?
Stream Processing Engine - Use Cases
Lineage, Auditing, History (Immutable)
Internet of Things (Sensor data)
Realtime Monitoring (Failure Prevention)
Autonomous Cars
Fraud/Anomoly Detection
Health devices (fitbit, cardio pacemakers etc)
For System of record (Infinite persistence)
Digital Marketing
Network monitoring
Realtime pricing / analytics
Stream Processing Engine - Use Cases Contd...
Intelligence and Surveillance
Risk management (Realtime Asset Coverage)
E-commerce (Realtime customer retention)
Fraud detection (Card, Insurance)
Smart order routing
Transaction cost analysis
Pricing and analytics
Market data management
Algorithmic trading
Data warehouse augmentation
Streaming does not mandate BigData
Streaming does not mandate RealTime processing
...but many application types may mandate either or both
Ok great - Let's dig into an engine...
APACHE FLINK
Apache Flink Components
Apache Flink Architecture
Source: DataArtisans (BerlinBuzzwords 2016)
Job Manager UI - (For Job Submission & Monitoring)
Job Manager UI - (Plan and Scheduling)
WAIT! Let's clear a few things up...
Pipelining & Backpressure
Time Semantics (Event, Injestion, Processing etc.)
Windows (count, rolling, session, custom)
Watermarks, Triggers (Inserted into stream)
Checkpoints (Async Recovery - Choice of state store
backend)
"Exactly Once" semantics (no need to question if fail on
send, process, return?)
Apache Flink - Features out of the box!
Support for Event Time and Out-of-Order Events
Exactly-once Semantics for Stateful Computations
Highly flexible Streaming Windows & CEP
Continuous Streaming Model with Backpressure (Buffers)
Fault-tolerance via Lightweight Distributed Snapshots
One Runtime for Streaming and Batch Processing
Memory Management & Custom Serialization
Iterations and Delta Iterations
Program Optimizer
SQL (Batch and Streams) due soon in 1.1
But I'm only here for the Machine Learning and Graph
Processing!!...
Machine Learning in Flink with FlinkML
* Apache Samoa Project - Streaming Machine Learning that works on top of Flink
** Apache Mahout - Batch based Machine Learning that works on top of Flink
Graph Processing in Flink?
"Gelly" is Apache Flink's Graph Analysis API
Iterative Graph processing abstractions on top of Flink
1. Vertex-Centric Iterations (like pregal, giraph)
2. Scatter-Gather Iterations
3. Gather-Sum-Apply (like PowerGraph)
GELLY SUPPORTS
1. Graph Properties (numberOfVerices etc...)
2. Transformations (map, difference, join...)
3. Mutations (Add/Remove vertices/edges...)
4. Batch and Streams - Java, Scala
* External "Gradoop" Project adds further features on top of Flink
Graph Processing with Gelly - Algorithms
PageRank
Single Source Shortest Path
Label Propogation
Weakly Connected Components
Community Detection
Planned Algorithms
Triangle Count
HITS
Affinity Propogation
Graph Summarization
Planned Algorithms - Attribution: Vasia Kalavri
Ecosystem Integration
Data Source/Sinks via Connectors (Kafka, jdbc, S3, etc)
Storm and Cascading & MapReduce support
Machine Learning - Apache Samoa (Streaming ML),
Appache Mahout (Batch)
Graph - Gradoop
Python API, Scala Repl, Apache Zeppelin Support
DataFlow Model - Apache Beam (API Abstraction + Flink
"Runner")
Apache Beam - Data Flow Model Support in Flink
Supported Distributions / Deployment Options
HortonWorks - Ambari Service (Confirmed full support on
the way)
Cloudera - Not Supported to my knowledge (Discussion
forums ref BigTop)
MapR - Not part of their MapR converged data platform
Amazon EMR (Yarn - Single Instance, Session)
Google Compute Engine (Yarn Support & Hosted
Competitor -> Cloud Dataflow)
Via Apache Myriad on Mesos (Native support coming in
1.2)
Some DataStream API Code (Setup)
* Code courtesy of DataArtisans on github
Some DataStream Code (Destination Sink & Running)
Sometimes, crossing the streams is the solution you need...
Crossing the streams with DataStream API
Crossing the streams with CEP Library
Proposed Flink 1.1 SQL API
* Code courtesy of DataArtisans on github
Flink Furthering Yahoo Benchmarks
Apache Flink Adoption
Whats Next For Flink?
Queryable State (Database inversion! Kafka log, RocksDB)
Release of 1.1+
Dynamic Scaling, Resource Elasticity (i.e. for catchup)
Production Hardening (1,000 node cluster Alibaba)
Stream SQL (Apache Calcite)
CEP Enhancements (large sized async state snapshoting)
Mesos Support
More Connectors
API enhancements (joins, slowly changing inputs)
Security (data encryption, Kerberos with Kafka)
Email: john.gorman@amberhand.ie
LinkedIn: johnpgorman
THANK YOU
ACKNOWLEDGEMENTS
Bank Of Ireland - Event and Venue
Hadoop User Group Ireland - Community Building
Data Artisans - Images, Code and Community Support
Anne Ebeling - Dublin Artwork
RESOURCES
APACHE FLINK
APACHE FLINK
IN FLINK
CEP MONITORING
RUNNING FLINK ON
BY TYLER AKIDAU
BY TYLER AKIDAU
MAPR FREE EBOOK ON
TRAINING
TAXI STREAM EXAMPLE
BACK PRESSURE CEP
SAMPLE
YARN
STREAMING 101
STREAMING 102
STREAMING ARCHITECTURE

Mais conteúdo relacionado

Mais procurados

Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache FlinkJuan Fumero
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Stephan Ewen
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...ucelebi
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink StreamingTuri, Inc.
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestDataGyula Fóra
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin MeetupMárton Balassi
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetupKostas Tzoumas
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)ucelebi
 

Mais procurados (20)

Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache Flink
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
 

Destaque

Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch Dataiku
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight lineVasia Kalavri
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuDataiku
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureKhalid Salama
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital OneFlink Forward
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsKai Wähner
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 

Destaque (19)

Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight line
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital One
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and Products
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Semelhante a Don't Cross The Streams - Data Streaming And Apache Flink

AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSantosh Sahoo
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From UberChester Chen
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 

Semelhante a Don't Cross The Streams - Data Streaming And Apache Flink (20)

AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 

Último

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Don't Cross The Streams - Data Streaming And Apache Flink

  • 1. DON'T CROSS THE STREAMS! STREAMING AND APACHE FLINK
  • 3. WHAT WE WILL COVER It's all about pain! Streaming and Related Terminology Stream Processing Engines Apache Flink
  • 4.
  • 5. It started with a pain...a so ware pain
  • 6. Things were big, slow & shaky....and getting worse!
  • 7. The calm before the storm Batch Processing (High Latency, inability to reason about time) Coupled systems prevented fast delivery of single change requirements Processing large distributed data Messaging incorporated business logic (Service Bus) Customers demanded immediate insight/action Event Ordering/Timing, Consistency, Data Lineage Lack of Fault Tolerant Systems
  • 8. Someone noticed the need to change some time back...
  • 9.
  • 10. Oh! The other Michael Hammer...
  • 11. Ref: Michael Hammer - Harvard Business Review 1990 “We cannot achieve breakthroughs in performance by cutting fat or automating existing processes. Rather, we must challenge old assumptions and shed the old rules that made the business underperform in the first place.”
  • 12. Ref: Michael Hammer - Harvard Business Review 1990 “These rules of work design are based on assumptions about technology, people, and organisational goals that no longer hold”
  • 13. So...So ware Legends set out to fix it...
  • 15. Elements of the "Perfect Storm"
  • 16. Elements of the "Perfect Storm" contd.
  • 19. flowing from a to a Any event that happens internal or external to your company is fair game for inclusion in a stream! WHAT ARE STREAMS? Unbounded Events Producer Consumer
  • 20. Streaming obliterates old working habbits, not automates them
  • 21. When did you last drop a DVD back to your video store ? Convenience of streaming films won out
  • 22. Anyone using Dublin Bus still carry a timetable? Realtime with Context is needed...
  • 23. SOME OTHER COMMON STREAM EXAMPLES Log files User website clicks, Finance stocks Social media streams
  • 24. Ideal Stream Charactristics Low Latency (Time required to produce some result) High Throughput (Number of results produced in time) Persisted for reuse Fault Tolerant Scalable Event Production (i.e. Partitioning) Scaleable Event Consumption (i.e. Consumer Groups) Consumer manages state (offsets) Handle Back Pressure
  • 25. Benefits of streams Ability to augment and enrich data streams Duality of Streams and Tables (Only Streams Work) Replay from define offset Stream outputs can become stream inputs (unix pipes!) Data first - Processing Later (Fast feature creation) Stream your monitoring (Logs, Ops Metrics, Business KPI etc.)
  • 26. Benefits of streams contd. Location in Time Testing (Bugs In Code) Replication for Scale Cross/Join prior unrelated sources (i.e. Time, Context - Analytics) Point of Record Stream (produce suitable Materialized Views)
  • 27. MOST POPULAR STREAMING TOOLS Apache Kafka Amazon Kinesis - Based on Kafka Ideas MapR Streams - Uses Kafka API (adds resilience features)
  • 28. Can these Streams handle the load ?
  • 29. Apache Kafka Data Handling at LinkedIn LinkedIn Engineering Blog March 20, 2015
  • 30. We have the stream! Now what?
  • 31. Enter the Stream Processing Engine
  • 32. What is a Stream Processing Engine ?
  • 33. 8 Requirements of a Real-Time Stream Processing Engine (Michael Stonebraker) 1. Keep the data moving 2. Query using SQL on Stream 3. Handle Stream Imperfections (Delayed, Missing, Out-Of- Order Data) 4. Generate Predictable Outcomes 5. Integrate Stored and Streaming Data 6. Guarantee Data Safety and Availabilty 7. Partition and Scale Applications Automatically 8. Process and Respond Instantaneously
  • 34. OK - Engines on... What can we do with it ?
  • 35. Stream Processing Engine - Use Cases Lineage, Auditing, History (Immutable) Internet of Things (Sensor data) Realtime Monitoring (Failure Prevention) Autonomous Cars Fraud/Anomoly Detection Health devices (fitbit, cardio pacemakers etc) For System of record (Infinite persistence) Digital Marketing Network monitoring Realtime pricing / analytics
  • 36. Stream Processing Engine - Use Cases Contd... Intelligence and Surveillance Risk management (Realtime Asset Coverage) E-commerce (Realtime customer retention) Fraud detection (Card, Insurance) Smart order routing Transaction cost analysis Pricing and analytics Market data management Algorithmic trading Data warehouse augmentation
  • 37. Streaming does not mandate BigData Streaming does not mandate RealTime processing ...but many application types may mandate either or both
  • 38. Ok great - Let's dig into an engine...
  • 41. Apache Flink Architecture Source: DataArtisans (BerlinBuzzwords 2016)
  • 42. Job Manager UI - (For Job Submission & Monitoring)
  • 43. Job Manager UI - (Plan and Scheduling)
  • 44. WAIT! Let's clear a few things up... Pipelining & Backpressure Time Semantics (Event, Injestion, Processing etc.) Windows (count, rolling, session, custom) Watermarks, Triggers (Inserted into stream) Checkpoints (Async Recovery - Choice of state store backend) "Exactly Once" semantics (no need to question if fail on send, process, return?)
  • 45. Apache Flink - Features out of the box! Support for Event Time and Out-of-Order Events Exactly-once Semantics for Stateful Computations Highly flexible Streaming Windows & CEP Continuous Streaming Model with Backpressure (Buffers) Fault-tolerance via Lightweight Distributed Snapshots One Runtime for Streaming and Batch Processing Memory Management & Custom Serialization Iterations and Delta Iterations Program Optimizer SQL (Batch and Streams) due soon in 1.1
  • 46. But I'm only here for the Machine Learning and Graph Processing!!...
  • 47. Machine Learning in Flink with FlinkML * Apache Samoa Project - Streaming Machine Learning that works on top of Flink ** Apache Mahout - Batch based Machine Learning that works on top of Flink
  • 49. "Gelly" is Apache Flink's Graph Analysis API Iterative Graph processing abstractions on top of Flink 1. Vertex-Centric Iterations (like pregal, giraph) 2. Scatter-Gather Iterations 3. Gather-Sum-Apply (like PowerGraph)
  • 50. GELLY SUPPORTS 1. Graph Properties (numberOfVerices etc...) 2. Transformations (map, difference, join...) 3. Mutations (Add/Remove vertices/edges...) 4. Batch and Streams - Java, Scala * External "Gradoop" Project adds further features on top of Flink
  • 51. Graph Processing with Gelly - Algorithms PageRank Single Source Shortest Path Label Propogation Weakly Connected Components Community Detection
  • 52. Planned Algorithms Triangle Count HITS Affinity Propogation Graph Summarization Planned Algorithms - Attribution: Vasia Kalavri
  • 53. Ecosystem Integration Data Source/Sinks via Connectors (Kafka, jdbc, S3, etc) Storm and Cascading & MapReduce support Machine Learning - Apache Samoa (Streaming ML), Appache Mahout (Batch) Graph - Gradoop Python API, Scala Repl, Apache Zeppelin Support DataFlow Model - Apache Beam (API Abstraction + Flink "Runner")
  • 54. Apache Beam - Data Flow Model Support in Flink
  • 55. Supported Distributions / Deployment Options HortonWorks - Ambari Service (Confirmed full support on the way) Cloudera - Not Supported to my knowledge (Discussion forums ref BigTop) MapR - Not part of their MapR converged data platform Amazon EMR (Yarn - Single Instance, Session) Google Compute Engine (Yarn Support & Hosted Competitor -> Cloud Dataflow) Via Apache Myriad on Mesos (Native support coming in 1.2)
  • 56. Some DataStream API Code (Setup) * Code courtesy of DataArtisans on github
  • 57. Some DataStream Code (Destination Sink & Running)
  • 58. Sometimes, crossing the streams is the solution you need...
  • 59. Crossing the streams with DataStream API
  • 60. Crossing the streams with CEP Library
  • 61. Proposed Flink 1.1 SQL API * Code courtesy of DataArtisans on github
  • 64. Whats Next For Flink? Queryable State (Database inversion! Kafka log, RocksDB) Release of 1.1+ Dynamic Scaling, Resource Elasticity (i.e. for catchup) Production Hardening (1,000 node cluster Alibaba) Stream SQL (Apache Calcite) CEP Enhancements (large sized async state snapshoting) Mesos Support More Connectors API enhancements (joins, slowly changing inputs) Security (data encryption, Kerberos with Kafka)
  • 65. Email: john.gorman@amberhand.ie LinkedIn: johnpgorman THANK YOU ACKNOWLEDGEMENTS Bank Of Ireland - Event and Venue Hadoop User Group Ireland - Community Building Data Artisans - Images, Code and Community Support Anne Ebeling - Dublin Artwork
  • 66. RESOURCES APACHE FLINK APACHE FLINK IN FLINK CEP MONITORING RUNNING FLINK ON BY TYLER AKIDAU BY TYLER AKIDAU MAPR FREE EBOOK ON TRAINING TAXI STREAM EXAMPLE BACK PRESSURE CEP SAMPLE YARN STREAMING 101 STREAMING 102 STREAMING ARCHITECTURE