Flink SQL & TableAPI in Large Scale Production at Alibaba

Flink SQL & Table API in Large Scale
Production at Alibaba
Xiaowei Jiang
Shaoxuan Wang
June, 2017

About Us
Xiaowei Jiang
• 2014-now Alibaba
• 2010-2014 Facebook
• 2002-2010 Microsoft
• 2000-2002 Stratify
Shaoxuan Wang
• 2015-now Alibaba
• 2014-2015 Facebook
• 2010-2014 Broadcom

Outline
1 Background
2 Why SQL & Table API
3 Blink SQL & Table API
4 Blink SQL & Table API in Large Scale Production

About Alibaba
Alibaba Group
• Operates the world’s largest e-commerce platform
• Recorded GMV of $485 Billion in year 2016, $17.8 billion worth of GMV in a single day on Nov 11, 2016
Realtime Data Infrastructure
• Supports internal products such as search, recommendation, BI
• Also supports external customers through its cloud service

Blink – Alibaba’s version of Flink
Looked into Flink two years ago
• best choice of unified computing engine
• a few issues in Flink that can be problems for large scale applications
Started “Blink” project
• aimed to make Flink work reliably and efficiently at the very large scale at Alibaba
Made various improvements in Flink runtime
Enhanced Flink SQL & Table API to production ready
Working with Flink community to contribute back since last August
• several key improvements
• hundreds of patches

Blink Ecosystem in Alibaba
Cluster Resource Management (Yarn/Fuxi)
Search
Storage (HDFS/Pangu)
SQL & Table API
Blink
Products Recommendation BI Security
DataStream API
Runtime Engine
Ads
DataSet API
Machine Learning Platform StreamCompute PlatformPlatform

Why SQL & Table API
Unified batch and streaming
• Flink currently offers DataSet API for batch and DataStream API for streaming
• We want a single API that can run in both batch and streaming mode
Improved development efficiency
• Users only describe the semantics of data processing
• Leave hard optimization problems to the system
• SQL is proven to be good at describing data processing
• Table API offers seamless integration with Scala and Java
• Table API makes it easy to extend standard SQL when necessary

Stream-Table Duality
word count
Hello 3
World 1
Bark 1
word count
Hello 1
World 1
Hello 2
Bark 1
Hello 3
Stream
Dynamic Table
Apply
Changelog

Dynamic Tables
Apply Changelog Stream to Dynamic Table
• Append Mode: each stream record is an insert to the dynamic table. Hence, all records of a stream are
appended to the dynamic table
• Update Mode: a stream record can represent an insert, update, or delete modification on the dynamic
table (append mode is in fact a special case of update mode)

Derive Changelog Stream from Dynamic Table
• REDO Mode: where the stream records the new value of a modified element to redo lost changes of
completed transactions
• REDO+UNDO Mode: where the stream records the old and the new value of a changed element to undo
incomplete transactions and redo lost changes of completed transactions
Dynamic Tables

There is no such thing as Stream SQL
Stream SQL?
Dynamic Tables generalize the concept of Static Tables
SQL serves as the unified way to describe data processing in both batch and streaming

Blink SQL & Table API
Section 3

Blink SQL & Table API Overview
Simple Query: Select and Where
Stream-Stream Inner Join
User Defined Function (UDF)
User Defined Table Function (UDTF)
User Defined Aggregate Function (UDAGG)
Retraction (stream only)
Aggregate

A Simple Query: Select and Where
id name price sales stock
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
7 Tea 4 1 2000
1 Latte 6 2 998
1 Latte 6 1 1000
1 Latte 6 2 998

Stream-Stream Inner Join
id1 name stock
1 Latte 1000
8 Mocha 800
4 Breve 200
3 Water 5000
7 Tea 2000
id2 price sales
1 6 1
8 8 1
9 3 1
4 5 1
7 4 1
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
7 Tea 4 1 2000
This is proposed and
discussed in FLINK-5878

User Defined Function (UDF)
UDF converts a scalar input to a scalar output. Create and use a UDF is very
simple and easy:
We have enhanced UDF/UDTF to support
variable types and variable arguments
lSum iSum
35L 1106
Scalar  Scalar
long1 long2 int1 int2 int3
10L 25L 6 100 1000

User Defined Table Function (UDTF)
name age
Tom 23
Jack 17
David 50
line
Tom#23 Jark#17 David#50
Scalar  Table (multi rows and columns)
UDTF converts a scalar input to a table output:
We have shipped UDTF in Flink release 1.2 (FLINK-4469).

“SELECT SUM(stock) as total”
Table  Scalar
User Defined Aggregate Function (UDAGG) - Motivation
total
2000
UDAGG converts a table input to a scalar output:
Flink has built-in aggregates (count, sum, avg, min, max) for SQL and table API:
What if user wants an aggregate that is not covered by built-in aggregates, say a
weighted average aggregate? We need an aggregate interface to support user
defined aggregate function.
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200

UDAGG – Accumulator (ACC)
1 Latte 6 1 1000
8 Mocha 8 1 800
4 Breve 5 1 200
7 Tea 4 1 2000
1 Latte 6 2 998
UDAGG represents its
state using accumulator

UDAGG – Interface
UDAGG example: a weighted average
SQL Query
UDAGG Interface

UDAGG – Merge
Motivated by local & global aggregate (and session window merge etc.), we need a merge
method which can merge the partial aggregated accumulator into one single accumulator
How to count the total visits on TaoBao web pages in real time?

UDAGG – Retraction – Motivations
Incorrect! The freq of
cnt=1 should be 2

UDAGG – Retraction – Motivations
We need a retract method in UDAGG, which can retract the UNDO messages from the
accumulator

Retraction – Solution
The design doc and the progress of
retraction implementation are
tracked in FLINK-6047. We have
delivered this in Flink release 1.3
Retraction is introduced to handle
updates
We use query optimizer to decide
where the retraction is needed.
DataStreamAgg
(Redo+Undo)
Update Table
(consume Undo log)
DataStreamAgg
(Redo)
Update Table
Append Table
TableScan without PK
(Redo)
NeedRetraction
NeedRetraction
Sink Table
(Does not needRetraction)
UpsertSink

UDAGG – Summary
Master JIRA for UDAGG is FLINK-5564. We have shipped this in Flink release 1.3.

Aggregate – Over Aggregate
time itemID avgPrice
1000 101 1
3000 201 1.5
4000 301 2
5000 101 2.2
5000 401 2.2
7000 301 2.6
8000 401 3
10000 101 2.8
time itemID price
1000 101 1
3000 201 2
4000 301 3
5000 101 1
5000 401 4
7000 301 3
8000 501 5
10000 101 1
Time based Group Aggregate is
not able to differentiate two
records with the same row time,
but Over Aggregate can.
Calculate moving average (in the past 5 seconds), and emit the result for each record

Aggregate – Summary
The design of aggregate is mainly tracked in FLIP11 (FLINK-4557). We have
delivered the above aggregates in Flink release 1.3
Grouping methods: Groupby / Over
Time types: Event time; Process time (only for stream)
Unbounded Aggregate: early-firing under a certain emit configuration (by
default it emits the result on every input record)
Windows:
• Time/Count + TUMBLE/SESSION/SLIDE window
• OVER Rows/Time Range window

Contributions to Flink SQL & Table API
Flink blog: “Continuous Queries on Dynamic Tables” (posted at
https://flink.apache.org/news/2017/04/04/dynamic-tables.html)
UDF (several improvements are released in 1.3)
UDTF (FLINK-4469, released in 1.2)
UDAGG (FLINK-5564, released in 1.3)
Retraction (FLINK-6047, released in 1.3)
Group/Over Window Aggregate (FLINK-4557, released in 1.3)
Unbounded Stream Group Aggregate (FLINK-6216, released in 1.3)
Stream-Stream Inner Join (FLINK-5878, targeted for release 1.4)
More coming…..

SQL & Table API in Large Scale Production
Section 4

SQL & Table API in Alibaba Production - example
SQL & Table API is proven to be a successful and sufficient declarative language for data processing.
Significantly reduce the development efforts to rewrite existing jobs or implement new jobs

SQL & Table API in Alibaba Production - Summary
Blink@Alibaba
In production at Alibaba for more than a year
• Hundreds of jobs
• The biggest cluster is more than 1500 nodes
• The biggest job has thousands of tasks and states over tens of TB
Blink SQL@Alibaba
In production before 2016 China Singles’ Day (biggest shopping festival, similar as black Friday in US)
• Blink jobs written by SQL & Table API are used to do real time analysis for recommendataion system, which
helps improve the targetting efficiency thereby increasing the traffic-to-sales conversion.
• The biggest SQL job has thousands of tasks and states over TB
The latest release of Blink SQL will be used to support entire Alibaba internal business and server
external customers via Alibaba Cloud streamCompute Service

Thanks
xiaowei.jxw@alibaba-inc.com
shaoxuan.wsx@alibaba-inc.com

Flink SQL & TableAPI in Large Scale Production at Alibaba

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flink SQL & TableAPI in Large Scale Production at Alibaba

Similar to Flink SQL & TableAPI in Large Scale Production at Alibaba (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Flink SQL & TableAPI in Large Scale Production at Alibaba