SlideShare a Scribd company logo
Flink SQL & Table API in Large Scale
Production at Alibaba
Xiaowei Jiang
Shaoxuan Wang
June, 2017
About Us
Xiaowei Jiang
• 2014-now Alibaba
• 2010-2014 Facebook
• 2002-2010 Microsoft
• 2000-2002 Stratify
Shaoxuan Wang
• 2015-now Alibaba
• 2014-2015 Facebook
• 2010-2014 Broadcom
Outline
1 Background
2 Why SQL & Table API
3 Blink SQL & Table API
4 Blink SQL & Table API in Large Scale Production
Background
Section 1
About Alibaba
Alibaba Group
• Operates the world’s largest e-commerce platform
• Recorded GMV of $485 Billion in year 2016, $17.8 billion worth of GMV in a single day on Nov 11, 2016
Realtime Data Infrastructure
• Supports internal products such as search, recommendation, BI
• Also supports external customers through its cloud service
Blink – Alibaba’s version of Flink
Looked into Flink two years ago
• best choice of unified computing engine
• a few issues in Flink that can be problems for large scale applications
Started “Blink” project
• aimed to make Flink work reliably and efficiently at the very large scale at Alibaba
Made various improvements in Flink runtime
Enhanced Flink SQL & Table API to production ready
Working with Flink community to contribute back since last August
• several key improvements
• hundreds of patches
Blink Ecosystem in Alibaba
Cluster Resource Management (Yarn/Fuxi)
Search
Storage (HDFS/Pangu)
SQL & Table API
Blink
Products Recommendation BI Security
DataStream API
Runtime Engine
Ads
DataSet API
Machine Learning Platform StreamCompute PlatformPlatform
Why SQL & Table API
Section 2
Why SQL & Table API
Unified batch and streaming
• Flink currently offers DataSet API for batch and DataStream API for streaming
• We want a single API that can run in both batch and streaming mode
Improved development efficiency
• Users only describe the semantics of data processing
• Leave hard optimization problems to the system
• SQL is proven to be good at describing data processing
• Table API offers seamless integration with Scala and Java
• Table API makes it easy to extend standard SQL when necessary
Stream-Table Duality
word count
Hello 3
World 1
Bark 1
word count
Hello 1
World 1
Hello 2
Bark 1
Hello 3
Stream
Dynamic Table
Apply
Changelog
Dynamic Tables
Apply Changelog Stream to Dynamic Table
• Append Mode: each stream record is an insert to the dynamic table. Hence, all records of a stream are
appended to the dynamic table
• Update Mode: a stream record can represent an insert, update, or delete modification on the dynamic
table (append mode is in fact a special case of update mode)
Derive Changelog Stream from Dynamic Table
• REDO Mode: where the stream records the new value of a modified element to redo lost changes of
completed transactions
• REDO+UNDO Mode: where the stream records the old and the new value of a changed element to undo
incomplete transactions and redo lost changes of completed transactions
Dynamic Tables
There is no such thing as Stream SQL
Stream SQL?
Dynamic Tables generalize the concept of Static Tables
SQL serves as the unified way to describe data processing in both batch and streaming
Blink SQL & Table API
Section 3
Blink SQL & Table API Overview
Simple Query: Select and Where
Stream-Stream Inner Join
User Defined Function (UDF)
User Defined Table Function (UDTF)
User Defined Aggregate Function (UDAGG)
Retraction (stream only)
Aggregate
A Simple Query: Select and Where
id name price sales stock
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
7 Tea 4 1 2000
1 Latte 6 2 998
id name price sales stock
1 Latte 6 1 1000
1 Latte 6 2 998
Stream-Stream Inner Join
id1 name stock
1 Latte 1000
8 Mocha 800
4 Breve 200
3 Water 5000
7 Tea 2000
id2 price sales
1 6 1
8 8 1
9 3 1
4 5 1
7 4 1
id name price sales stock
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
7 Tea 4 1 2000
This is proposed and
discussed in FLINK-5878
User Defined Function (UDF)
UDF converts a scalar input to a scalar output. Create and use a UDF is very
simple and easy:
We have enhanced UDF/UDTF to support
variable types and variable arguments
lSum iSum
35L 1106
Scalar  Scalar
long1 long2 int1 int2 int3
10L 25L 6 100 1000
User Defined Table Function (UDTF)
name age
Tom 23
Jack 17
David 50
line
Tom#23 Jark#17 David#50
Scalar  Table (multi rows and columns)
UDTF converts a scalar input to a table output:
We have shipped UDTF in Flink release 1.2 (FLINK-4469).
“SELECT SUM(stock) as total”
Table  Scalar
User Defined Aggregate Function (UDAGG) - Motivation
total
2000
UDAGG converts a table input to a scalar output:
Flink has built-in aggregates (count, sum, avg, min, max) for SQL and table API:
What if user wants an aggregate that is not covered by built-in aggregates, say a
weighted average aggregate? We need an aggregate interface to support user
defined aggregate function.
id name price sales stock
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
UDAGG – Accumulator (ACC)
id name price sales stock
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
7 Tea 4 1 2000
1 Latte 6 2 998
UDAGG represents its
state using accumulator
UDAGG – Interface
UDAGG example: a weighted average
SQL Query
UDAGG Interface
UDAGG – Merge
Motivated by local & global aggregate (and session window merge etc.), we need a merge
method which can merge the partial aggregated accumulator into one single accumulator
How to count the total visits on TaoBao web pages in real time?
UDAGG – Retraction – Motivations
Incorrect! The freq of
cnt=1 should be 2
UDAGG – Retraction – Motivations
We need a retract method in UDAGG, which can retract the UNDO messages from the
accumulator
Retraction – Solution
The design doc and the progress of
retraction implementation are
tracked in FLINK-6047. We have
delivered this in Flink release 1.3
Retraction is introduced to handle
updates
We use query optimizer to decide
where the retraction is needed.
DataStreamAgg
(Redo+Undo)
Update Table
(consume Undo log)
DataStreamAgg
(Redo)
Update Table
Append Table
TableScan without PK
(Redo)
NeedRetraction
NeedRetraction
Sink Table
(Does not needRetraction)
UpsertSink
UDAGG – Summary
Master JIRA for UDAGG is FLINK-5564. We have shipped this in Flink release 1.3.
Aggregate – Over Aggregate
time itemID avgPrice
1000 101 1
3000 201 1.5
4000 301 2
5000 101 2.2
5000 401 2.2
7000 301 2.6
8000 401 3
10000 101 2.8
time itemID price
1000 101 1
3000 201 2
4000 301 3
5000 101 1
5000 401 4
7000 301 3
8000 501 5
10000 101 1
Time based Group Aggregate is
not able to differentiate two
records with the same row time,
but Over Aggregate can.
Calculate moving average (in the past 5 seconds), and emit the result for each record
Aggregate – Summary
The design of aggregate is mainly tracked in FLIP11 (FLINK-4557). We have
delivered the above aggregates in Flink release 1.3
Grouping methods: Groupby / Over
Time types: Event time; Process time (only for stream)
Unbounded Aggregate: early-firing under a certain emit configuration (by
default it emits the result on every input record)
Windows:
• Time/Count + TUMBLE/SESSION/SLIDE window
• OVER Rows/Time Range window
Contributions to Flink SQL & Table API
Flink blog: “Continuous Queries on Dynamic Tables” (posted at
https://flink.apache.org/news/2017/04/04/dynamic-tables.html)
UDF (several improvements are released in 1.3)
UDTF (FLINK-4469, released in 1.2)
UDAGG (FLINK-5564, released in 1.3)
Retraction (FLINK-6047, released in 1.3)
Group/Over Window Aggregate (FLINK-4557, released in 1.3)
Unbounded Stream Group Aggregate (FLINK-6216, released in 1.3)
Stream-Stream Inner Join (FLINK-5878, targeted for release 1.4)
More coming…..
SQL & Table API in Large Scale Production
Section 4
SQL & Table API in Alibaba Production - example
SQL & Table API is proven to be a successful and sufficient declarative language for data processing.
Significantly reduce the development efforts to rewrite existing jobs or implement new jobs
SQL & Table API in Alibaba Production - Summary
Blink@Alibaba
In production at Alibaba for more than a year
• Hundreds of jobs
• The biggest cluster is more than 1500 nodes
• The biggest job has thousands of tasks and states over tens of TB
Blink SQL@Alibaba
In production before 2016 China Singles’ Day (biggest shopping festival, similar as black Friday in US)
• Blink jobs written by SQL & Table API are used to do real time analysis for recommendataion system, which
helps improve the targetting efficiency thereby increasing the traffic-to-sales conversion.
• The biggest SQL job has thousands of tasks and states over TB
The latest release of Blink SQL will be used to support entire Alibaba internal business and server
external customers via Alibaba Cloud streamCompute Service
Thanks
xiaowei.jxw@alibaba-inc.com
shaoxuan.wsx@alibaba-inc.com

More Related Content

What's hot

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 

What's hot (20)

Apache flink
Apache flinkApache flink
Apache flink
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 

Similar to Flink SQL & TableAPI in Large Scale Production at Alibaba

Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLDataWorks Summit
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFlink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Flink Forward
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsBowen Li
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part isqlserver.co.il
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Oracle applications r12.2, ebr, online patching means lot of work for devel...
Oracle applications r12.2, ebr, online patching   means lot of work for devel...Oracle applications r12.2, ebr, online patching   means lot of work for devel...
Oracle applications r12.2, ebr, online patching means lot of work for devel...Ajith Narayanan
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallssam2sung2
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker seriesMonal Daxini
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingHostedbyConfluent
 
Oracle Apex Technical Introduction
Oracle Apex   Technical IntroductionOracle Apex   Technical Introduction
Oracle Apex Technical Introductioncrokitta
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t
2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t
2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&tDatabase & Technology s.r.l.
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Bobby Curtis
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Zalando Technology
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 

Similar to Flink SQL & TableAPI in Large Scale Production at Alibaba (20)

Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part i
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Oracle applications r12.2, ebr, online patching means lot of work for devel...
Oracle applications r12.2, ebr, online patching   means lot of work for devel...Oracle applications r12.2, ebr, online patching   means lot of work for devel...
Oracle applications r12.2, ebr, online patching means lot of work for devel...
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch ProcessingFlink 2.0: Navigating the Future of Unified Stream and Batch Processing
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
 
Oracle Apex Technical Introduction
Oracle Apex   Technical IntroductionOracle Apex   Technical Introduction
Oracle Apex Technical Introduction
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t
2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t
2015 owb2 odi converter - white paper_owb_to_odi_migration_service_d&t
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Recently uploaded

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 

Flink SQL & TableAPI in Large Scale Production at Alibaba

  • 1. Flink SQL & Table API in Large Scale Production at Alibaba Xiaowei Jiang Shaoxuan Wang June, 2017
  • 2. About Us Xiaowei Jiang • 2014-now Alibaba • 2010-2014 Facebook • 2002-2010 Microsoft • 2000-2002 Stratify Shaoxuan Wang • 2015-now Alibaba • 2014-2015 Facebook • 2010-2014 Broadcom
  • 3. Outline 1 Background 2 Why SQL & Table API 3 Blink SQL & Table API 4 Blink SQL & Table API in Large Scale Production
  • 5. About Alibaba Alibaba Group • Operates the world’s largest e-commerce platform • Recorded GMV of $485 Billion in year 2016, $17.8 billion worth of GMV in a single day on Nov 11, 2016 Realtime Data Infrastructure • Supports internal products such as search, recommendation, BI • Also supports external customers through its cloud service
  • 6. Blink – Alibaba’s version of Flink Looked into Flink two years ago • best choice of unified computing engine • a few issues in Flink that can be problems for large scale applications Started “Blink” project • aimed to make Flink work reliably and efficiently at the very large scale at Alibaba Made various improvements in Flink runtime Enhanced Flink SQL & Table API to production ready Working with Flink community to contribute back since last August • several key improvements • hundreds of patches
  • 7. Blink Ecosystem in Alibaba Cluster Resource Management (Yarn/Fuxi) Search Storage (HDFS/Pangu) SQL & Table API Blink Products Recommendation BI Security DataStream API Runtime Engine Ads DataSet API Machine Learning Platform StreamCompute PlatformPlatform
  • 8. Why SQL & Table API Section 2
  • 9. Why SQL & Table API Unified batch and streaming • Flink currently offers DataSet API for batch and DataStream API for streaming • We want a single API that can run in both batch and streaming mode Improved development efficiency • Users only describe the semantics of data processing • Leave hard optimization problems to the system • SQL is proven to be good at describing data processing • Table API offers seamless integration with Scala and Java • Table API makes it easy to extend standard SQL when necessary
  • 10. Stream-Table Duality word count Hello 3 World 1 Bark 1 word count Hello 1 World 1 Hello 2 Bark 1 Hello 3 Stream Dynamic Table Apply Changelog
  • 11. Dynamic Tables Apply Changelog Stream to Dynamic Table • Append Mode: each stream record is an insert to the dynamic table. Hence, all records of a stream are appended to the dynamic table • Update Mode: a stream record can represent an insert, update, or delete modification on the dynamic table (append mode is in fact a special case of update mode)
  • 12. Derive Changelog Stream from Dynamic Table • REDO Mode: where the stream records the new value of a modified element to redo lost changes of completed transactions • REDO+UNDO Mode: where the stream records the old and the new value of a changed element to undo incomplete transactions and redo lost changes of completed transactions Dynamic Tables
  • 13. There is no such thing as Stream SQL Stream SQL? Dynamic Tables generalize the concept of Static Tables SQL serves as the unified way to describe data processing in both batch and streaming
  • 14. Blink SQL & Table API Section 3
  • 15. Blink SQL & Table API Overview Simple Query: Select and Where Stream-Stream Inner Join User Defined Function (UDF) User Defined Table Function (UDTF) User Defined Aggregate Function (UDAGG) Retraction (stream only) Aggregate
  • 16. A Simple Query: Select and Where id name price sales stock 1 Latte 6 1 1000 8 Mocha 8 1 800 4 Breve 5 1 200 7 Tea 4 1 2000 1 Latte 6 2 998 id name price sales stock 1 Latte 6 1 1000 1 Latte 6 2 998
  • 17. Stream-Stream Inner Join id1 name stock 1 Latte 1000 8 Mocha 800 4 Breve 200 3 Water 5000 7 Tea 2000 id2 price sales 1 6 1 8 8 1 9 3 1 4 5 1 7 4 1 id name price sales stock 1 Latte 6 1 1000 8 Mocha 8 1 800 4 Breve 5 1 200 7 Tea 4 1 2000 This is proposed and discussed in FLINK-5878
  • 18. User Defined Function (UDF) UDF converts a scalar input to a scalar output. Create and use a UDF is very simple and easy: We have enhanced UDF/UDTF to support variable types and variable arguments lSum iSum 35L 1106 Scalar  Scalar long1 long2 int1 int2 int3 10L 25L 6 100 1000
  • 19. User Defined Table Function (UDTF) name age Tom 23 Jack 17 David 50 line Tom#23 Jark#17 David#50 Scalar  Table (multi rows and columns) UDTF converts a scalar input to a table output: We have shipped UDTF in Flink release 1.2 (FLINK-4469).
  • 20. “SELECT SUM(stock) as total” Table  Scalar User Defined Aggregate Function (UDAGG) - Motivation total 2000 UDAGG converts a table input to a scalar output: Flink has built-in aggregates (count, sum, avg, min, max) for SQL and table API: What if user wants an aggregate that is not covered by built-in aggregates, say a weighted average aggregate? We need an aggregate interface to support user defined aggregate function. id name price sales stock 1 Latte 6 1 1000 8 Mocha 8 1 800 4 Breve 5 1 200
  • 21. UDAGG – Accumulator (ACC) id name price sales stock 1 Latte 6 1 1000 8 Mocha 8 1 800 4 Breve 5 1 200 7 Tea 4 1 2000 1 Latte 6 2 998 UDAGG represents its state using accumulator
  • 22. UDAGG – Interface UDAGG example: a weighted average SQL Query UDAGG Interface
  • 23. UDAGG – Merge Motivated by local & global aggregate (and session window merge etc.), we need a merge method which can merge the partial aggregated accumulator into one single accumulator How to count the total visits on TaoBao web pages in real time?
  • 24. UDAGG – Retraction – Motivations Incorrect! The freq of cnt=1 should be 2
  • 25. UDAGG – Retraction – Motivations We need a retract method in UDAGG, which can retract the UNDO messages from the accumulator
  • 26. Retraction – Solution The design doc and the progress of retraction implementation are tracked in FLINK-6047. We have delivered this in Flink release 1.3 Retraction is introduced to handle updates We use query optimizer to decide where the retraction is needed. DataStreamAgg (Redo+Undo) Update Table (consume Undo log) DataStreamAgg (Redo) Update Table Append Table TableScan without PK (Redo) NeedRetraction NeedRetraction Sink Table (Does not needRetraction) UpsertSink
  • 27. UDAGG – Summary Master JIRA for UDAGG is FLINK-5564. We have shipped this in Flink release 1.3.
  • 28. Aggregate – Over Aggregate time itemID avgPrice 1000 101 1 3000 201 1.5 4000 301 2 5000 101 2.2 5000 401 2.2 7000 301 2.6 8000 401 3 10000 101 2.8 time itemID price 1000 101 1 3000 201 2 4000 301 3 5000 101 1 5000 401 4 7000 301 3 8000 501 5 10000 101 1 Time based Group Aggregate is not able to differentiate two records with the same row time, but Over Aggregate can. Calculate moving average (in the past 5 seconds), and emit the result for each record
  • 29. Aggregate – Summary The design of aggregate is mainly tracked in FLIP11 (FLINK-4557). We have delivered the above aggregates in Flink release 1.3 Grouping methods: Groupby / Over Time types: Event time; Process time (only for stream) Unbounded Aggregate: early-firing under a certain emit configuration (by default it emits the result on every input record) Windows: • Time/Count + TUMBLE/SESSION/SLIDE window • OVER Rows/Time Range window
  • 30. Contributions to Flink SQL & Table API Flink blog: “Continuous Queries on Dynamic Tables” (posted at https://flink.apache.org/news/2017/04/04/dynamic-tables.html) UDF (several improvements are released in 1.3) UDTF (FLINK-4469, released in 1.2) UDAGG (FLINK-5564, released in 1.3) Retraction (FLINK-6047, released in 1.3) Group/Over Window Aggregate (FLINK-4557, released in 1.3) Unbounded Stream Group Aggregate (FLINK-6216, released in 1.3) Stream-Stream Inner Join (FLINK-5878, targeted for release 1.4) More coming…..
  • 31. SQL & Table API in Large Scale Production Section 4
  • 32. SQL & Table API in Alibaba Production - example SQL & Table API is proven to be a successful and sufficient declarative language for data processing. Significantly reduce the development efforts to rewrite existing jobs or implement new jobs
  • 33. SQL & Table API in Alibaba Production - Summary Blink@Alibaba In production at Alibaba for more than a year • Hundreds of jobs • The biggest cluster is more than 1500 nodes • The biggest job has thousands of tasks and states over tens of TB Blink SQL@Alibaba In production before 2016 China Singles’ Day (biggest shopping festival, similar as black Friday in US) • Blink jobs written by SQL & Table API are used to do real time analysis for recommendataion system, which helps improve the targetting efficiency thereby increasing the traffic-to-sales conversion. • The biggest SQL job has thousands of tasks and states over TB The latest release of Blink SQL will be used to support entire Alibaba internal business and server external customers via Alibaba Cloud streamCompute Service