SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Blink
Improvements to Flink &
Its Applications in Alibaba SearchXiaowei Jiang, Feng Wang
{xiaowei.jxw, jason.wang}
@alibaba-inc.com
Who Are We?
n Xiaowei Jiang
l 2014 −− now Alibaba
l 2010 −− 2014 Facebook
l 2002 −− 2010 Microsoft
l 2000 −− 2002 Stratify
n Feng Wang
l 2006 −− now Alibaba
About Alibaba
n  Alibaba Group
l  Operating the world’s largest online marketplace
l  Annual GMV $394 Billion in year 2015
n  Alibaba Search
l  Personalized search and recommendation platform
l  Major driver of online traffic
Agenda
n Background
n What is Blink?
n Improvements in Blink
n Challenges & Future
Logs
Scenario – Realtime A/B Test
Transacton
Parser
 Filter
 Join
 Agg
Parser
 Filter
UDF
 Druid
Click
Impression
 Parser
 Filter
Scenario – Search Index Build & Update
DataSource
Filter
Sync
HBase
IC
Filter
Sync
UIC
Join
Search
Engine
Export
HBase
Result
UIC
IC1
IC2
UIC1
UIC2
Streaming Topologies
Long Batch Pipelines
Machine Learning at Scale
Graph Analysis
à low latency
à resource utilization
à iterative algorithms
à mutable state
Flink: Unified Compute Engine
Flink Stack
What is Blink?
n Blink – Improvements to Flink from Alibaba
l Comprehensive Improvements to Flink Table API
l Improved Runtime Compatible with Flink API and Ecosystem
n Status
l Runs on Thousands of Nodes In Alibaba Production
l Supports Mission Critical Products
Table API Improvements
n Principle – Unified SQL layer for batch and streaming
n Functionality
l  UDF/UDTF/UDAGG
l  Stream-Stream Join
l  Aggregation(min, max, avg, sum, count, distinct_count)
l  Windowing (time_window, count_window)
l  Retraction
Runtime Improvements
n New Runtime Architecture on YARN
n Optimized State, Checkpoint & Failover
n Reliable & Production Quality
n Much More
Flink on YARN
Client Node YARN Node
YARN Node
YARN
ResourceManager
YARN
NodeManager
Container
Flink
JobManager
YARN
AppMaster
YARN Node
YARN
NodeManager
Container
Flink
TaskManager
YARN Node
YARN
NodeManager
Container
Flink
TaskManager
Flink
YARN Client
HDFS
4.allocate worker
3.allocate app master
1. store user jar and configuration
2. register resource and request app master
always bootstrap containers with user jar and config
Blink on YARN
Client Node YARN Node
YARN Node
YARN
ResourceManager
YARN
NodeManager
Container
JobMaster
YARN Node
YARN
NodeManager
YARN Node
YARN
NodeManager
Blink Client
HDFS
4.allocate worker
3.allocate app master
1. store user jar and configuration
2. register resource and request app master
always bootstrap containers with user jar and config
Container
TaskExecutor
Container
TaskExecutor
Container
TaskExecutor
Container
Container
TaskExecutor
JobMaster
4.allocate worker
Blink Job Architecture
Yarn Node
NodeManager
Yarn Node
NodeManager
Shuffle Service
Yarn Node
NodeManager
Shuffle Service
HDFS
ZooKeeper
controlchannel
controlchannel
state backup/recover
local data channel local data channel
state backup/recover
Container
Job Master
task scheduler
checkpoint
coordinator
Container
rocks db spilled file
Task Executor
taskin out
Container
rocks db spilled file
Task Executor
taskin out
Container
rocks db spilled file
Task Executor
taskin out
Container
rocks db spilled file
Task Executor
taskin out
completed checkpoint
schedule events
Network data channel
Blink Checkpoint & State
TaskExecutor
Local CPn Local CPn-1Incremental Backup
OnComplete
i1 i2 i3 Bn
in queue
o1 o2 Bn-
1
o3
out queue
2. hard link snapshot
Job Master
1. trigger
3.ack
clean up
4. complete
clean up
Task
operator
state
HDFS
reference
async
CPn
CPn-1
diff
State Files
1.sst 2.sst n.sst
Blink Rescale
Blink Failover
At Least Once
Source
Source
Source
Source
fail restart
restart
failover
Excactly Once
Source
Source
Source
Source
fail restart
failover
Sink
Sink
Sink
Sink
Blink Metrics
Job Vertex Number: [CPU, Memory] * Parallelism

In Queue
TPS 
Out Queue
Latency
Delay
CPU
 Memory
Task Metrics
Running Tasks
Challenges & Future
n Continued Optimization in Streaming
n Batch in Production
n Machine Learning in Production
n Larger Cluster Scale
n Contribute back to Flink community
Q & A
Thank You!
Xiaowei Jiang: xiaowei.jxw@alibaba-inc.com
Twitter: @xiaoweij
Feng Wang: jason.wang@alibaba-inc.com
Twitter: @ifengwang

Mais conteúdo relacionado

Mais procurados

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
 

Mais procurados (20)

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
 

Destaque

Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
Olga Lavrentieva
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
Sematext Group, Inc.
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
Cloudera, Inc.
 

Destaque (20)

Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013
 
Machine Learning Exposed!
Machine Learning Exposed!Machine Learning Exposed!
Machine Learning Exposed!
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talk
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
 
Etsy desconstruction
Etsy desconstructionEtsy desconstruction
Etsy desconstruction
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 

Semelhante a Improvements to Flink & it's Applications in Alibaba Search

cuttingEdgepresentation0318
cuttingEdgepresentation0318cuttingEdgepresentation0318
cuttingEdgepresentation0318
Hongbiao Chen
 
Framework Engineering Revisited
Framework Engineering RevisitedFramework Engineering Revisited
Framework Engineering Revisited
YoungSu Son
 

Semelhante a Improvements to Flink & it's Applications in Alibaba Search (20)

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
 
2021 JCConf 使用Dapr簡化Java微服務應用開發
2021 JCConf 使用Dapr簡化Java微服務應用開發2021 JCConf 使用Dapr簡化Java微服務應用開發
2021 JCConf 使用Dapr簡化Java微服務應用開發
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE Fukuoka
 
cuttingEdgepresentation0318
cuttingEdgepresentation0318cuttingEdgepresentation0318
cuttingEdgepresentation0318
 
Graphs: Fabric of DevOps
Graphs: Fabric of DevOpsGraphs: Fabric of DevOps
Graphs: Fabric of DevOps
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Sai_Resume
Sai_ResumeSai_Resume
Sai_Resume
 
Nats and netlify
Nats and netlifyNats and netlify
Nats and netlify
 
Framework Engineering Revisited
Framework Engineering RevisitedFramework Engineering Revisited
Framework Engineering Revisited
 
Build your MVP on AWS - AWS Startup Day Johannesburg.pdf
Build your MVP on AWS - AWS Startup Day Johannesburg.pdfBuild your MVP on AWS - AWS Startup Day Johannesburg.pdf
Build your MVP on AWS - AWS Startup Day Johannesburg.pdf
 
Srinu_JavaDEveloper@2.5 years
Srinu_JavaDEveloper@2.5 yearsSrinu_JavaDEveloper@2.5 years
Srinu_JavaDEveloper@2.5 years
 
Java FX Part2
Java FX Part2Java FX Part2
Java FX Part2
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016
 
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first design
 
java web framework standard.20180412
java web framework standard.20180412java web framework standard.20180412
java web framework standard.20180412
 

Mais de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Improvements to Flink & it's Applications in Alibaba Search

  • 1. Blink Improvements to Flink & Its Applications in Alibaba SearchXiaowei Jiang, Feng Wang {xiaowei.jxw, jason.wang} @alibaba-inc.com
  • 2. Who Are We? n Xiaowei Jiang l 2014 −− now Alibaba l 2010 −− 2014 Facebook l 2002 −− 2010 Microsoft l 2000 −− 2002 Stratify n Feng Wang l 2006 −− now Alibaba
  • 3. About Alibaba n  Alibaba Group l  Operating the world’s largest online marketplace l  Annual GMV $394 Billion in year 2015 n  Alibaba Search l  Personalized search and recommendation platform l  Major driver of online traffic
  • 5. Logs Scenario – Realtime A/B Test Transacton Parser Filter Join Agg Parser Filter UDF Druid Click Impression Parser Filter
  • 6. Scenario – Search Index Build & Update DataSource Filter Sync HBase IC Filter Sync UIC Join Search Engine Export HBase Result UIC IC1 IC2 UIC1 UIC2
  • 7. Streaming Topologies Long Batch Pipelines Machine Learning at Scale Graph Analysis à low latency à resource utilization à iterative algorithms à mutable state Flink: Unified Compute Engine
  • 9. What is Blink? n Blink – Improvements to Flink from Alibaba l Comprehensive Improvements to Flink Table API l Improved Runtime Compatible with Flink API and Ecosystem n Status l Runs on Thousands of Nodes In Alibaba Production l Supports Mission Critical Products
  • 10. Table API Improvements n Principle – Unified SQL layer for batch and streaming n Functionality l  UDF/UDTF/UDAGG l  Stream-Stream Join l  Aggregation(min, max, avg, sum, count, distinct_count) l  Windowing (time_window, count_window) l  Retraction
  • 11. Runtime Improvements n New Runtime Architecture on YARN n Optimized State, Checkpoint & Failover n Reliable & Production Quality n Much More
  • 12. Flink on YARN Client Node YARN Node YARN Node YARN ResourceManager YARN NodeManager Container Flink JobManager YARN AppMaster YARN Node YARN NodeManager Container Flink TaskManager YARN Node YARN NodeManager Container Flink TaskManager Flink YARN Client HDFS 4.allocate worker 3.allocate app master 1. store user jar and configuration 2. register resource and request app master always bootstrap containers with user jar and config
  • 13. Blink on YARN Client Node YARN Node YARN Node YARN ResourceManager YARN NodeManager Container JobMaster YARN Node YARN NodeManager YARN Node YARN NodeManager Blink Client HDFS 4.allocate worker 3.allocate app master 1. store user jar and configuration 2. register resource and request app master always bootstrap containers with user jar and config Container TaskExecutor Container TaskExecutor Container TaskExecutor Container Container TaskExecutor JobMaster 4.allocate worker
  • 14. Blink Job Architecture Yarn Node NodeManager Yarn Node NodeManager Shuffle Service Yarn Node NodeManager Shuffle Service HDFS ZooKeeper controlchannel controlchannel state backup/recover local data channel local data channel state backup/recover Container Job Master task scheduler checkpoint coordinator Container rocks db spilled file Task Executor taskin out Container rocks db spilled file Task Executor taskin out Container rocks db spilled file Task Executor taskin out Container rocks db spilled file Task Executor taskin out completed checkpoint schedule events Network data channel
  • 15. Blink Checkpoint & State TaskExecutor Local CPn Local CPn-1Incremental Backup OnComplete i1 i2 i3 Bn in queue o1 o2 Bn- 1 o3 out queue 2. hard link snapshot Job Master 1. trigger 3.ack clean up 4. complete clean up Task operator state HDFS reference async CPn CPn-1 diff State Files 1.sst 2.sst n.sst
  • 17. Blink Failover At Least Once Source Source Source Source fail restart restart failover Excactly Once Source Source Source Source fail restart failover Sink Sink Sink Sink
  • 18. Blink Metrics Job Vertex Number: [CPU, Memory] * Parallelism In Queue TPS Out Queue Latency Delay CPU Memory Task Metrics Running Tasks
  • 19. Challenges & Future n Continued Optimization in Streaming n Batch in Production n Machine Learning in Production n Larger Cluster Scale n Contribute back to Flink community
  • 20. Q & A Thank You! Xiaowei Jiang: xiaowei.jxw@alibaba-inc.com Twitter: @xiaoweij Feng Wang: jason.wang@alibaba-inc.com Twitter: @ifengwang