SlideShare uma empresa Scribd logo
1 de 39
Kostas Tzoumas
@kostas_tzoumas
Hadoop Summit San Jose
June 6, 2016
Streaming in the Wild with
Apache FlinkTM
2
Streaming technology is enabling the
obvious: continuous processing on data that
is continuously produced
Hint: you are already doing streaming
Why embrace streaming?
 Monitor your business and react in real time
 Implement robust continuous applications
 Adopt a decentralized architecture
 Consolidate analytics infrastructure
3
React in real time
4
Streaming versus real-time
 Streaming != Real-time
 E.g., streaming that is not real time:
continuous applications with large
windows
 E.g., real-time that is not streaming: very
fast data warehousing queries
 However: streaming applications can be
fast
5
Streaming
Real time
How real-time is Flink?
6
Yahoo! benchmark* data Artisans benchmarks**
* https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
** http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ and http://data-artisans.com/high-throughput-
low-latency-and-exactly-once-stream-processing-with-apache-flink/
When and why does this matter?
 Immediate reaction to life
• E.g., generate alerts on
anomaly/pattern/special event
 Avoid unnecessary tradeoffs
• Even if application is not latency-critical
• With Flink you do not pay a price for latency!
7
Bouygues Telecom – LUX
8
One of the largest telcos in
France. System (among
others) used for real time
diagnostics and alarming.
Read more: http://data-
artisans.com/flink-at-
bouygues-html/
Robust continuous
applications
9
Continuous application
 A production data application that needs to
be live 24/7 feeding other systems (perhaps
customer-facing)
 Need to be efficient, consistent, correct, and
manageable
 Stream processing is a great way to
implement continuous applications robustly
10
Continuous apps with “batch”
11
file 1
file 2
Job 1
Job 2
time
file 3 Job 3
Scheduler
Serve&store
Continuous apps with “lambda”
12
file 1
file 2
Job 1
Job 2
Scheduler
Streaming job
Serve&
store
Problems with batch and λ
 Way too many moving parts (and code dup)
 Implicit treatment of time
 Out of order event handling
 Implicit batch boundaries
13
Continuous apps with streaming
14
Streaming job
Serve&
store
Extending the Yahoo! benchmark
 Work of Jamie Grier, inspired by a real continuous
application at Twitter
15
http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
What is the use case?
 Counting!
• Tweet impressions or ad views
 Most analytics is continuous counting and
aggregations grouped by dimensions
• E.g., anomaly detection
16
Requirements
 Performance: millions of events/sec, millions of
keys
 Correctness: counts correlated with timestamps
 Consistency: counts should be correct under
failures
 Manageability: ability to pause & restart,
reprocess, change code, etc
17
Before Flink
 Performance: 1000s of cores needed to sustain
workload
 Correctness: time handled in application code (or
not)
 Consistency: approximate results during the day,
exact results once a day (lambda)
 Manageability: acceptable
18
After Flink
 Performance: 10s of cores needed to sustain
workload
 Correctness: time handled by framework
 Consistency: correct results on demand
 Manageability: acceptable
19
Results (yet to be beaten!)
 Same program as Yahoo! benchmark
 30x over Storm, plus consistent results
20
Manageability
 Flink savepoints (Flink 1.0): consistent
snapshots of stateful applications
• Planned downtime for code upgrades,
maintenance, migration, debugging, etc
 Monitoring (Flink 1.1)
 Dynamic scaling (Flink 1.2+)
21
Decentralized architecture
22
Streaming and microservices
23
App App
App
local statelocal state
Archive
A decentralized architecture favors
a streaming-based data
infrastructure with local application
state
Zalando
24
Slides at http://www.slideshare.net/ZalandoTech/flink-in-zalandos-world-of-microservices-62376341
Zalando
25
Transitioning from monolithic
architecture to microservices
New BI stack
26
Flink @ Zalando (present & future)
 Business process monitoring
• Check if Zalando platform works
• Order & delivery velocities
• SLAs of related events
 Continuous ETL
• Transformation, combination, pre-aggregation
• Data cleansing and validation
 Complex Event Processing
 Sales monitoring
27
Consolidate analytics
28
Stream Processing as a Service
 How do we make stream processing more
accessible to the data analyst?
 More familiar interfaces
• Flink 1.1 includes the first version of SQL for
static data sets and data streams
 Easier deployment
29
King.com
30
King.com - RBEA
 RBEA – a platform
designed to make
stream processing
available inside
King.com
 Data scientists submit
scripts in Groovy
 Flink backend executes
these scripts
31
https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Netflix
 Netflix plans to offer
Stream Processing as a
Service internally in the
company
 Currently testing Flink
and Apache Beam
32
http://www.slideshare.net/mdaxini/netflix-keystone-streaming-data-pipeline-scale-in-the-clouddbtb2016-62076009
Closing
33
Disclaimer
 A lot of this presentation is based on the work of very
talented engineers building data products with Flink
 Bouygues Telecom: Amine Abdessemed, ...
 Zalando: Mihail Vieru, Javier Lopez
 King.com: Gyula Fora, Mattias Andersson, ...
 Netflix: Monal Daxini, ...
34
More Flink tales at Hadoop Summit
35
Xiaowei Jiang
Blink−Improved Runtime for Flink and its
Application in Alibaba Search
Wednesday, June 29, 2016, 2:10PM - 2:50PM
210C
Stephan Ewen
Turning the Stream Processor into a Database:
Building Online Applications on Streams
Thursday, June 30, 2016, 12:20PM - 1:00PM
212
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016 (watch website)
Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!
data-artisans.com/careers
Appendix
Batch < Streaming
 In principle, batch is a special
case of streaming (global
window)
 In practice, batch processors
can be more efficient than
stream processors in batch
 Flink is a very efficient batch
processor (DataSet code path)
39

Mais conteúdo relacionado

Mais procurados

Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
High Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopHigh Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopDataWorks Summit
 

Mais procurados (20)

From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
High Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopHigh Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into Hadoop
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 

Destaque

Apache Storm
Apache StormApache Storm
Apache StormEdureka!
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 

Destaque (6)

Apache Storm
Apache StormApache Storm
Apache Storm
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Semelhante a Streaming in the Wild with Apache Flink

Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Parallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingParallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingQueBIT Consulting
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Partner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - AprilPartner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - Aprilconfluent
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonVMware Tanzu
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 

Semelhante a Streaming in the Wild with Apache Flink (20)

Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Parallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingParallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT Consulting
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Partner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - AprilPartner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - April
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 

Mais de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Streaming in the Wild with Apache Flink

  • 1. Kostas Tzoumas @kostas_tzoumas Hadoop Summit San Jose June 6, 2016 Streaming in the Wild with Apache FlinkTM
  • 2. 2 Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you are already doing streaming
  • 3. Why embrace streaming?  Monitor your business and react in real time  Implement robust continuous applications  Adopt a decentralized architecture  Consolidate analytics infrastructure 3
  • 4. React in real time 4
  • 5. Streaming versus real-time  Streaming != Real-time  E.g., streaming that is not real time: continuous applications with large windows  E.g., real-time that is not streaming: very fast data warehousing queries  However: streaming applications can be fast 5 Streaming Real time
  • 6. How real-time is Flink? 6 Yahoo! benchmark* data Artisans benchmarks** * https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at ** http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ and http://data-artisans.com/high-throughput- low-latency-and-exactly-once-stream-processing-with-apache-flink/
  • 7. When and why does this matter?  Immediate reaction to life • E.g., generate alerts on anomaly/pattern/special event  Avoid unnecessary tradeoffs • Even if application is not latency-critical • With Flink you do not pay a price for latency! 7
  • 8. Bouygues Telecom – LUX 8 One of the largest telcos in France. System (among others) used for real time diagnostics and alarming. Read more: http://data- artisans.com/flink-at- bouygues-html/
  • 10. Continuous application  A production data application that needs to be live 24/7 feeding other systems (perhaps customer-facing)  Need to be efficient, consistent, correct, and manageable  Stream processing is a great way to implement continuous applications robustly 10
  • 11. Continuous apps with “batch” 11 file 1 file 2 Job 1 Job 2 time file 3 Job 3 Scheduler Serve&store
  • 12. Continuous apps with “lambda” 12 file 1 file 2 Job 1 Job 2 Scheduler Streaming job Serve& store
  • 13. Problems with batch and λ  Way too many moving parts (and code dup)  Implicit treatment of time  Out of order event handling  Implicit batch boundaries 13
  • 14. Continuous apps with streaming 14 Streaming job Serve& store
  • 15. Extending the Yahoo! benchmark  Work of Jamie Grier, inspired by a real continuous application at Twitter 15 http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
  • 16. What is the use case?  Counting! • Tweet impressions or ad views  Most analytics is continuous counting and aggregations grouped by dimensions • E.g., anomaly detection 16
  • 17. Requirements  Performance: millions of events/sec, millions of keys  Correctness: counts correlated with timestamps  Consistency: counts should be correct under failures  Manageability: ability to pause & restart, reprocess, change code, etc 17
  • 18. Before Flink  Performance: 1000s of cores needed to sustain workload  Correctness: time handled in application code (or not)  Consistency: approximate results during the day, exact results once a day (lambda)  Manageability: acceptable 18
  • 19. After Flink  Performance: 10s of cores needed to sustain workload  Correctness: time handled by framework  Consistency: correct results on demand  Manageability: acceptable 19
  • 20. Results (yet to be beaten!)  Same program as Yahoo! benchmark  30x over Storm, plus consistent results 20
  • 21. Manageability  Flink savepoints (Flink 1.0): consistent snapshots of stateful applications • Planned downtime for code upgrades, maintenance, migration, debugging, etc  Monitoring (Flink 1.1)  Dynamic scaling (Flink 1.2+) 21
  • 23. Streaming and microservices 23 App App App local statelocal state Archive A decentralized architecture favors a streaming-based data infrastructure with local application state
  • 27. Flink @ Zalando (present & future)  Business process monitoring • Check if Zalando platform works • Order & delivery velocities • SLAs of related events  Continuous ETL • Transformation, combination, pre-aggregation • Data cleansing and validation  Complex Event Processing  Sales monitoring 27
  • 29. Stream Processing as a Service  How do we make stream processing more accessible to the data analyst?  More familiar interfaces • Flink 1.1 includes the first version of SQL for static data sets and data streams  Easier deployment 29
  • 31. King.com - RBEA  RBEA – a platform designed to make stream processing available inside King.com  Data scientists submit scripts in Groovy  Flink backend executes these scripts 31 https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  • 32. Netflix  Netflix plans to offer Stream Processing as a Service internally in the company  Currently testing Flink and Apache Beam 32 http://www.slideshare.net/mdaxini/netflix-keystone-streaming-data-pipeline-scale-in-the-clouddbtb2016-62076009
  • 34. Disclaimer  A lot of this presentation is based on the work of very talented engineers building data products with Flink  Bouygues Telecom: Amine Abdessemed, ...  Zalando: Mihail Vieru, Javier Lopez  King.com: Gyula Fora, Mattias Andersson, ...  Netflix: Monal Daxini, ... 34
  • 35. More Flink tales at Hadoop Summit 35 Xiaowei Jiang Blink−Improved Runtime for Flink and its Application in Alibaba Search Wednesday, June 29, 2016, 2:10PM - 2:50PM 210C Stephan Ewen Turning the Stream Processor into a Database: Building Online Applications on Streams Thursday, June 30, 2016, 12:20PM - 1:00PM 212
  • 36. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 (watch website) Early bird deadline: July 15, 2016 www.flink-forward.org
  • 39. Batch < Streaming  In principle, batch is a special case of streaming (global window)  In practice, batch processors can be more efficient than stream processors in batch  Flink is a very efficient batch processor (DataSet code path) 39

Notas do Editor

  1. 3 systems (batch), or 5 systems (streaming), Need to add a new system for millisecond alerts What If I want to count every 5 minutes, not 1 hour? Just ignores out of order What if I wanna do sessions?