SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API.
Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer.
In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
1. 1
Timo Walther
Apache Flink PMC
@twalthr
With slides from Fabian Hueske
Flink Meetup @ Amsterdam, March 2nd, 2017
Table & SQL API
unified APIs for batch and stream processing
4. DataStream API is not for Everyone
4
§ Writing DataStream programs is not easy
• Stream processing technology spreads rapidly
§ Requires Knowledge & Skill
• Stream processing concepts (time, state, windows, ...)
• Programming experience (Java / Scala)
§ Program logic goes into UDFs
• great for expressiveness
• bad for optimization - need for manual tuning
5. Why not a Relational API?
5
§ Relational APIs are declarative
• User says what is needed
• System decides how to compute it
§ Users do not specify implementation
§ Queries are efficiently executed
§ “Everybody” knows SQL!
6. Goals
§ Flink is a platform for distributed stream and batch data
processing
§ Relational APIs as a unifying layer
• Queries on batch tables terminate and produce a finite result
• Queries on streaming tables run continuously and produce
result stream
§ Same syntax & semantics for both queries
6
8. Table API & SQL
§ Flink features two relational APIs
• Table API: LINQ-style API for Java & Scala (since Flink 0.9.0)
• SQL: Standard SQL (since Flink 1.1.0)
§ Equivalent feature set (at the moment)
• Table API and SQL can be mixed
§ Both are tightly integrated with Flink’s core APIs
• DataStream
• DataSet
8
9. Table API Example
9
val sensorData: DataStream[(String, Long, Double)] = ???
// convert DataSet into Table
val sensorTable: Table = sensorData
.toTable(tableEnv, 'location, ’time, 'tempF)
// define query on Table
val avgTempCTable: Table = sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
10. SQL Example
10
val sensorData: DataStream[(String, Long, Double)] = ???
// register DataStream
tableEnv.registerDataStream(
"sensorData", sensorData, 'location, ’time, 'tempF)
// query registered Table
val avgTempCTable: Table = tableEnv
.sql("""
SELECT FLOOR(rowtime() TO DAY) AS day, location,
AVG((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%'
GROUP BY location, FLOOR(rowtime() TO DAY) """)
13. Architecture
§ Table API and SQL queries
are translated into common
logical plan representation.
§ Logical plans are translated
and optimized depending on
execution backend.
§ Plans are transformed into
DataSet or DataStream
programs.
13
14. Translation to Logical Plan
14
sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day, 'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
17. Current State (in master)
§ Batch SQL & Table API support
• Selection, Projection, Sort, Inner & Outer Joins, Set operations
• Windows for Slide, Tumble, Session
§ Streaming Table API support
• Selection, Projection, Union
• Windows for Slide, Tumble, Session
§ Streaming SQL
• Selection, Projection, Union, Tumble, but …
17
18. Use Cases for Streaming SQL
§ Continuous ETL & Data Import
§ Live Dashboards & Reports
§ Ad-hoc Analytics & Exploration
18
20. Dynamic Tables
§ Dynamic tables change over time
§ Dynamic tables are treated like static batch tables
• Dynamic tables are queried with standard SQL
• A query returns another dynamic table
§ Stream ←→ Dynamic Table conversions without
information loss
• “Stream / Table Duality”
20
22. Querying Dynamic Tables
§ Dynamic tables change over time
• A[t]: Table A at time t
§ Dynamic tables are queried with regular SQL
• Result of a query changes as input table changes
• q(A[t]): Evaluate query q on table A at time t
§ Query result is continuously updated as t progresses
• Similar to maintaining a materialized view
• t is current event time
22
24. Querying Dynamic Tables
§ Can we run any query on Dynamic Tables? No!
§ State may not grow infinitely as more data arrives
• Set clean-up timeout or key constraints.
§ Input may only trigger partial re-computation
§ Queries with possibly unbounded state or computation
are rejected
24
28. Contributions welcome!
§ Huge interest and many contributors
• Adding more window operators
• Introducing dynamic tables
§ And there is a lot more to do
• New operators and features for streaming and batch
• Performance improvements
• Tooling and integration
§ Try it out, give feedback, and start contributing!
28
29. 29
One day of hands-on Flink
training
One day of conference
Tickets are on sale
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward