SlideShare a Scribd company logo
1 of 23
Download to read offline
© 2020 Ververica
Timo Walther
@twalthr
Flink SQL in 2020
© 2020 Ververica
About me
● Apache Flink Committer and PMC Member
● Working on Flink before it became part of the Apache Software
Foundation
● Software Engineer at Ververica
(first dataArtisans, then acquired by Alibaba in 2019)
● Part of the SDK Team, focused on Table / SQL API and Ecosystem
© 2020 Ververica
Apache Flink is a Distributed Data Processing System
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant,
event time, large state, exactly-once.
© 2020 Ververica
Scalable and Consistent Data Processing
● Flexible and expressive APIs
● Guaranteed correctness
○ Exactly-once state consistency
○ Event-time semantics
● Processing at massive scale
○ Runs on 10000s of cores
○ Manages 10s TBs of state either in-memory or on disk
© 2020 Ververica
Powered By Apache Flink
Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html
Also check out the Flink Forward YouTube channel more than 350 recorded talks at https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA
© 2020 Ververica6
A standard-compliant SQL service
to query static and streaming data alike
that leverages the performance, scalability, and consistency
of Apache Flink.
Flink SQL in a Nutshell
© 2020 Ververica7
Refreshing Streaming SQL Semantics
● Basically all tables that are processed with SQL queries change over time
○ Transactions from applications
○ Bulk inserts from ETL processes
○ …
● Traditional processors run SQL queries on static snapshots of the tables
○ The query input is finite à result is also finite and definitive
● Stream SQL processors run continuous queries on changing (dynamic) tables
○ The query input is unbounded à result is potentially unbounded, and continuously updated
● Semantics of a query are the same for both snapshot and continuously
changing table!
© 2020 Ververica8
Running a One-time Query on a Static Table Snapshot
user cnt
Mary 2
Bob 1
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Take a snapshot when
the query starts
A final result is
produced
A row that was added after the query
was started is not considered
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
The query
terminates
© 2020 Ververica9
Running a Continuous Query on a Changing Table
user cTime url
user cnt
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1Mary 2
Ingest all changes
as they happen
Continuously update
the result
The result is identical to the one-time query (at this point)
© 2020 Ververica10
Why is Stream-Batch Unification Important?
● Usability
○ ANSI SQL syntax: No custom “StreamSQL” syntax.
○ ANSI SQL semantics: No stream-specific result semantics.
● Portability
○ Run the same query on bounded & unbounded data
○ Run the same query on recorded & real-time data
○ Bootstrapping query state or backfilling results from historic data
now
bounded query
unbounded query
past future
bounded query
start of the stream
unbounded query
© 2020 Ververica11
What about Time? Aren't we in the Streaming Space?
● Proper time handling is very important in many continuous queries
○ Group or join rows that are temporally related
○ Semantics are the same if a query runs on a snapshot
● Tracking progress in time enables efficient execution of continuous queries
○ Determine when input of a computation is complete
○ Determine when rows are no longer needed and clean up state
○ Periodically trigger computations and result updates
● Flink SQL supports sophisticated event-time handling with watermarks
● Those are streaming optimizations, they don't affect standard SQL queries!
© 2020 Ververica
What Will You See in This Demo?
© 2020 Ververica13
What Will You See in This Demo?
● Read and write data from and to different storage systems
○ Apache Kafka
○ MySQL (via a generic JDBC connector)
○ S3-compatible storage
● Manage catalog metadata
○ Create (alter and drop) tables and views with DDL statements
○ Persistently store catalog metadata in Apache Hive Metastore
● Show how Flink unifies batch and stream processing with SQL
○ Demonstrate different ways to join dynamic tables
● Maintain the results of continuous queries in Kafka and MySQL
© 2020 Ververica
Our Demo Environment
JobManager
TaskManager
SQL Client
Data
Provider
Assign & monitor
query tasks
Push events
Submit query
Coordinate
MetaStore
Manage & lookup
Catalog Metadata
Read & write data
Execute
query tasks
S3-compatible Storage
Query data
© 2020 Ververica15
Our Demo Scenario - An Order System (derived from TPC-H)
LineitemOrders
RatesCustomerNationRegion
1
n
1 n
nn 1
1
n
1
Frequently updated tables
Seldomly updated tables
o_orderkey
o_ordertime
o_custkey
o_orderpriority
...
l_orderkey
l_linenumber
l_ordertime
l_proctime
l_currency
l_extendedprice
...
rs_symbol
rs_timestamp
rs_rate
r_regionkey
r_name
n_nationkey
n_name
n_regionkey
c_custkey
c_name
c_nationkey
...
Rates
History
rs_symbol
rs_timestamp
rs_rate
n 1
© 2020 Ververica
DEMO
https://github.com/fhueske/flink-sql-demo
© 2020 Ververica
Outlook
© 2020 Ververica18
SQL Feature Set in Flink 1.11
STREAMING ONLY
● OVER / WINDOW
○ UNBOUNDED + BOUNDED PRECEDING
● INNER JOIN with
○ Time-versioned table
○ External lookup table
● MATCH_RECOGNIZE
○ Pattern Matching/CEP (SQL:2016)
BATCH ONLY
● Full TPC-DS support
STREAMING & BATCH
● SELECT FROM WHERE
● GROUP BY [HAVING]
○ Non-windowed
○ TUMBLE, HOP, SESSION windows
● JOIN
○ Time-Windowed INNER + OUTER JOIN
○ Non-windowed INNER + OUTER JOIN
● User-Defined Functions
○ Scalar
○ Aggregation
○ Table-valued
© 2020 Ververica19
SQL Feature Set in Flink 1.11
CREATE TABLE people (
id BIGINT,
name STRING,
email STRING
) WITH (
'connector'='kafka',
'topic'='people',
'properties.bootstrap.servers'='localhost:9092',
'scan.startup.mode'='earliest-offset',
'format'='debezium-json'
);
● Changelog processing support (FLIP-95, FLIP-105)
○ New table source and sink interfaces
○ Deeper integration with connectors (interpret a Kafka topic as a changelog)
○ Change Data Capture (CDC) processing using the Debezium format
© 2020 Ververica20
● Flink SQL is evolving super fast!
● Flink SQL runs continuous queries at scale on static and dynamic data.
● Flink SQL connects to many systems in the data ecosystem.
● Flink can do a lot more
○ Python Table API & support for notebooks like Apache Zeppelin
○ Java/Scala DataStream API
○ Stateful Functions API
Go, check it out!
=> https://github.com/fhueske/flink-sql-demo
Summary
© 2020 Ververica
Questions?
© 2020 Ververica
Questions?
© 2020 Ververica
www.ververica.com @VervericaDatatimo@ververica.com

More Related Content

What's hot

Faster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-JobserverFaster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-JobserverDatabricks
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/HudiVinoth Chandar
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Matt Fuller
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalDatabricks
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
Managing ADLS gen2 using Apache Spark
Managing ADLS gen2 using Apache SparkManaging ADLS gen2 using Apache Spark
Managing ADLS gen2 using Apache SparkDatabricks
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015alanfgates
 

What's hot (20)

Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Faster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-JobserverFaster Data Integration Pipeline Execution using Spark-Jobserver
Faster Data Integration Pipeline Execution using Spark-Jobserver
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Presto
PrestoPresto
Presto
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Managing ADLS gen2 using Apache Spark
Managing ADLS gen2 using Apache SparkManaging ADLS gen2 using Apache Spark
Managing ADLS gen2 using Apache Spark
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
 

Similar to ApacheCon 2020 - Flink SQL in 2020: Time to show off!

Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021StreamNative
 
Don't Cross the Streams! (or do, we got you)
Don't Cross the Streams! (or do, we got you)Don't Cross the Streams! (or do, we got you)
Don't Cross the Streams! (or do, we got you)Caito Scherr
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3takezoe
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Select Star: Unified Batch & Streaming with Flink SQL & Pulsar
Select Star: Unified Batch & Streaming with Flink SQL & PulsarSelect Star: Unified Batch & Streaming with Flink SQL & Pulsar
Select Star: Unified Batch & Streaming with Flink SQL & PulsarCaito Scherr
 
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
Better, Faster, Stronger Streaming: Your First Dive into Flink SQLBetter, Faster, Stronger Streaming: Your First Dive into Flink SQL
Better, Faster, Stronger Streaming: Your First Dive into Flink SQLCaito Scherr
 
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...Rajesh Kannan S
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementRicardo Jimenez-Peris
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...NETWAYS
 
Sql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiSql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiAlex Tumanoff
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015
MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015
MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015Dave Stokes
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's Newdpcobb
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore IndexSolidQ
 
9th docker meetup 2016.07.13
9th docker meetup 2016.07.139th docker meetup 2016.07.13
9th docker meetup 2016.07.13Amrita Prasad
 

Similar to ApacheCon 2020 - Flink SQL in 2020: Time to show off! (20)

Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Don't Cross the Streams! (or do, we got you)
Don't Cross the Streams! (or do, we got you)Don't Cross the Streams! (or do, we got you)
Don't Cross the Streams! (or do, we got you)
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Select Star: Unified Batch & Streaming with Flink SQL & Pulsar
Select Star: Unified Batch & Streaming with Flink SQL & PulsarSelect Star: Unified Batch & Streaming with Flink SQL & Pulsar
Select Star: Unified Batch & Streaming with Flink SQL & Pulsar
 
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
Better, Faster, Stronger Streaming: Your First Dive into Flink SQLBetter, Faster, Stronger Streaming: Your First Dive into Flink SQL
Better, Faster, Stronger Streaming: Your First Dive into Flink SQL
 
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...
Slashn Talk OLTP in Supply Chain - Handling Super-scale and Change Propagatio...
 
Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
 
Sql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiSql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen Nedaskivskyi
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015
MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015
MySQL Workbench and Visual Explain -- RMUG Feb 19th 2015
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore Index
 
9th docker meetup 2016.07.13
9th docker meetup 2016.07.139th docker meetup 2016.07.13
9th docker meetup 2016.07.13
 

Recently uploaded

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 

Recently uploaded (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 

ApacheCon 2020 - Flink SQL in 2020: Time to show off!

  • 1. © 2020 Ververica Timo Walther @twalthr Flink SQL in 2020
  • 2. © 2020 Ververica About me ● Apache Flink Committer and PMC Member ● Working on Flink before it became part of the Apache Software Foundation ● Software Engineer at Ververica (first dataArtisans, then acquired by Alibaba in 2019) ● Part of the SDK Team, focused on Table / SQL API and Ecosystem
  • 3. © 2020 Ververica Apache Flink is a Distributed Data Processing System Stateful computations over streams real-time and historic fast, scalable, fault tolerant, event time, large state, exactly-once.
  • 4. © 2020 Ververica Scalable and Consistent Data Processing ● Flexible and expressive APIs ● Guaranteed correctness ○ Exactly-once state consistency ○ Event-time semantics ● Processing at massive scale ○ Runs on 10000s of cores ○ Manages 10s TBs of state either in-memory or on disk
  • 5. © 2020 Ververica Powered By Apache Flink Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html Also check out the Flink Forward YouTube channel more than 350 recorded talks at https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA
  • 6. © 2020 Ververica6 A standard-compliant SQL service to query static and streaming data alike that leverages the performance, scalability, and consistency of Apache Flink. Flink SQL in a Nutshell
  • 7. © 2020 Ververica7 Refreshing Streaming SQL Semantics ● Basically all tables that are processed with SQL queries change over time ○ Transactions from applications ○ Bulk inserts from ETL processes ○ … ● Traditional processors run SQL queries on static snapshots of the tables ○ The query input is finite à result is also finite and definitive ● Stream SQL processors run continuous queries on changing (dynamic) tables ○ The query input is unbounded à result is potentially unbounded, and continuously updated ● Semantics of a query are the same for both snapshot and continuously changing table!
  • 8. © 2020 Ververica8 Running a One-time Query on a Static Table Snapshot user cnt Mary 2 Bob 1 SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Take a snapshot when the query starts A final result is produced A row that was added after the query was started is not considered user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… The query terminates
  • 9. © 2020 Ververica9 Running a Continuous Query on a Changing Table user cTime url user cnt SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1Mary 2 Ingest all changes as they happen Continuously update the result The result is identical to the one-time query (at this point)
  • 10. © 2020 Ververica10 Why is Stream-Batch Unification Important? ● Usability ○ ANSI SQL syntax: No custom “StreamSQL” syntax. ○ ANSI SQL semantics: No stream-specific result semantics. ● Portability ○ Run the same query on bounded & unbounded data ○ Run the same query on recorded & real-time data ○ Bootstrapping query state or backfilling results from historic data now bounded query unbounded query past future bounded query start of the stream unbounded query
  • 11. © 2020 Ververica11 What about Time? Aren't we in the Streaming Space? ● Proper time handling is very important in many continuous queries ○ Group or join rows that are temporally related ○ Semantics are the same if a query runs on a snapshot ● Tracking progress in time enables efficient execution of continuous queries ○ Determine when input of a computation is complete ○ Determine when rows are no longer needed and clean up state ○ Periodically trigger computations and result updates ● Flink SQL supports sophisticated event-time handling with watermarks ● Those are streaming optimizations, they don't affect standard SQL queries!
  • 12. © 2020 Ververica What Will You See in This Demo?
  • 13. © 2020 Ververica13 What Will You See in This Demo? ● Read and write data from and to different storage systems ○ Apache Kafka ○ MySQL (via a generic JDBC connector) ○ S3-compatible storage ● Manage catalog metadata ○ Create (alter and drop) tables and views with DDL statements ○ Persistently store catalog metadata in Apache Hive Metastore ● Show how Flink unifies batch and stream processing with SQL ○ Demonstrate different ways to join dynamic tables ● Maintain the results of continuous queries in Kafka and MySQL
  • 14. © 2020 Ververica Our Demo Environment JobManager TaskManager SQL Client Data Provider Assign & monitor query tasks Push events Submit query Coordinate MetaStore Manage & lookup Catalog Metadata Read & write data Execute query tasks S3-compatible Storage Query data
  • 15. © 2020 Ververica15 Our Demo Scenario - An Order System (derived from TPC-H) LineitemOrders RatesCustomerNationRegion 1 n 1 n nn 1 1 n 1 Frequently updated tables Seldomly updated tables o_orderkey o_ordertime o_custkey o_orderpriority ... l_orderkey l_linenumber l_ordertime l_proctime l_currency l_extendedprice ... rs_symbol rs_timestamp rs_rate r_regionkey r_name n_nationkey n_name n_regionkey c_custkey c_name c_nationkey ... Rates History rs_symbol rs_timestamp rs_rate n 1
  • 18. © 2020 Ververica18 SQL Feature Set in Flink 1.11 STREAMING ONLY ● OVER / WINDOW ○ UNBOUNDED + BOUNDED PRECEDING ● INNER JOIN with ○ Time-versioned table ○ External lookup table ● MATCH_RECOGNIZE ○ Pattern Matching/CEP (SQL:2016) BATCH ONLY ● Full TPC-DS support STREAMING & BATCH ● SELECT FROM WHERE ● GROUP BY [HAVING] ○ Non-windowed ○ TUMBLE, HOP, SESSION windows ● JOIN ○ Time-Windowed INNER + OUTER JOIN ○ Non-windowed INNER + OUTER JOIN ● User-Defined Functions ○ Scalar ○ Aggregation ○ Table-valued
  • 19. © 2020 Ververica19 SQL Feature Set in Flink 1.11 CREATE TABLE people ( id BIGINT, name STRING, email STRING ) WITH ( 'connector'='kafka', 'topic'='people', 'properties.bootstrap.servers'='localhost:9092', 'scan.startup.mode'='earliest-offset', 'format'='debezium-json' ); ● Changelog processing support (FLIP-95, FLIP-105) ○ New table source and sink interfaces ○ Deeper integration with connectors (interpret a Kafka topic as a changelog) ○ Change Data Capture (CDC) processing using the Debezium format
  • 20. © 2020 Ververica20 ● Flink SQL is evolving super fast! ● Flink SQL runs continuous queries at scale on static and dynamic data. ● Flink SQL connects to many systems in the data ecosystem. ● Flink can do a lot more ○ Python Table API & support for notebooks like Apache Zeppelin ○ Java/Scala DataStream API ○ Stateful Functions API Go, check it out! => https://github.com/fhueske/flink-sql-demo Summary
  • 23. © 2020 Ververica www.ververica.com @VervericaDatatimo@ververica.com