SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
© Hortonworks Inc. 2015
Why you care about

relational algebra
(even though you didn’t know it)
Julian Hyde
@julianhyde
Enterprise Data World
Washington, DC
April 2nd, 2015
© Hortonworks Inc. 2015
About me
Apache

Calcite
Apache
Calcite
© Hortonworks Inc. 2015
Why you should care about relational algebra
Why should you care?
• It is old
• It is as useful as ever
• Exposed in new products such as Hadoop
• New challenges
Agenda
• Is Hadoop a revolution for the database world?
• What is relational algebra?
• Examples of algebra in action
• Introducing Apache Calcite
• Adding data independence to Hadoop via materialized views
© Hortonworks Inc. 2015
Hadoop
Old world, new world
RDBMS
• Security
• Metadata
• SQL
• Query planning
• Data independence
• Scale
• Late schema
• Choice of front-end
• Choice of engines
• Workload: batch, interactive,
streaming, ML, graph, …
© Hortonworks Inc. 2015
Many front ends, many engines
SQL
Planning
Execution

engine
Planning
User code
Map

Reduce
Tez User code
in Yarn
Spark MongoDB
Hadoop
External

SQL
SQL Spark Storm Cascading HBase Graph
© Hortonworks Inc. 2015
Analogy: LLVM
Lessons from the compiler

community:
• Writing a front end is hard
• Writing a back end is hard
• Writing an optimizer is really hard
• Most of the logic in the optimizer is independent of
front end and back end
• E.g. register assignment
• The optimizer is a collection of separate algorithms
• Common language between algorithms
© Hortonworks Inc. 2015
Relational algebra
SELECT d.name, COUNT(*) AS c
FROM Emps AS e
JOIN Depts AS d ON e.deptno = d.deptno
WHERE e.age < 30
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Scan [Emps] Scan [Depts]
Join [e.deptno

= d.deptno]
Filter [e.age < 30]
Aggregate [deptno, COUNT(*) AS c]
Filter [c > 5]
Project [name, c]
Sort [c DESC]
(Column names are simplified. They would usually

be ordinals, e.g. $0 is the first column of the left input.)
© Hortonworks Inc. 2015
Relational algebra - Union and sub-query
SELECT * FROM (

SELECT zipcode, state

FROM Emps

UNION ALL

SELECT zipcode, state

FROM Customers)

WHERE state IN (‘CA’, ‘TX’)
Scan [Emps] Scan [Customers]
Union [all]
Project [zipcode, state] Project [zipcode, state]
Filter [state IN (‘CA’, ‘TX’)]
© Hortonworks Inc. 2015
Relational algebra - Insert and Values
INSERT INTO Facts

VALUES (‘Meaning of life’, 42),

(‘Clever as clever’, 6)
Insert [Facts]
Values [[‘Meaning of life’, 42],
[‘Clever as clever’, 6]]
© Hortonworks Inc. 2015
Relational algebra - Strict versus Pragmatic
“Strict” relational algebra
Introduced by E.F. Codd in “A relational
model for large shared data banks” [1970]
Goal is mathematical elegance (ability to
prove theorems)
Greek symbols: σ, π, ρ, U,
Relations cannot contain duplicates
Relations are not sorted
Column values are scalars
Only logical operators
Pragmatic relational algebra
Goal is to optimize queries, allow real-
world data models, extensibility
Elegance still important
Verbs: Project, Filter, Union, Join
Relations may contain duplicates
Relations may be sorted
• But Sort is the only logical operator
that guarantees order
Null values have 3-value semantics, as in
SQL
Physical operators (e.g. HashJoin,
MergeJoin)
Physical properties (sort, distribution)
© Hortonworks Inc. 2015
Algebraic transformations
(R filter c1) filter c2 → R filter (c1 and c2)
(R1 union R2) join R3 on c → (R1 join R3 on C) union (R2 join R3 on c)
• Compare distributive law of arithmetic: (x + y) * z → (x * z) + (y * z)
(R1 join R2 on c) filter c2 → (R1 filter c2) join R2 on c
(R1 join R2 on c) → (R2 join R2 on c) project [R1.*, R2.*]
(R1 join R2 on c) join R3 on c2 → R1 join (R2 join R3 on c2) on c
Many, many others…
(provided C2 only depends on
columns in E, and join is inner)
(provided c, c2 have the
necessary columns)
© Hortonworks Inc. 2015
Query using a view
SELECT deptno, min(salary)

FROM Managers

WHERE age >= 50

GROUP BY deptno
CREATE VIEW Managers AS

SELECT *

FROM Emps 

WHERE EXISTS (

SELECT *

FROM Emps AS underling

WHERE underling.manager = emp.id) Scan [Emps]
Join [$0, $5]
Project [$0, $1, $2, $3]
Filter [age >= 50]
Aggregate [deptno, min(salary)]
Scan [Managers]
Aggregate [manager]
Scan [Emps]
© Hortonworks Inc. 2015
After view expansion
SELECT deptno, min(salary)

FROM Managers

WHERE age >= 50

GROUP BY deptno
CREATE VIEW Managers AS

SELECT *

FROM Emps 

WHERE EXISTS (

SELECT *

FROM Emps AS underling

WHERE underling.manager = emp.id)
Scan [Emps] Aggregate [manager]
Join [$0, $5]
Project [$0, $1, $2, $3]
Filter [age >= 50]
Aggregate [deptno, min(salary)]
Scan [Emps]
© Hortonworks Inc. 2015
After pushing down filter
SELECT deptno, min(salary)

FROM Managers

WHERE age >= 50

GROUP BY deptno
CREATE VIEW Managers AS

SELECT *

FROM Emps 

WHERE EXISTS (

SELECT *

FROM Emps AS underling

WHERE underling.manager = emp.id)
Scan [Emps]
Scan [Emps]
Join [$0, $5]
Project [$0, $1, $2, $3]
Filter [age >= 50]
Aggregate [deptno, min(salary)]
© Hortonworks Inc. 2015
Materialized view
CREATE MATERIALIZED VIEW EmpSummary AS

SELECT deptno,

gender,

COUNT(*) AS c,

SUM(sal) AS s

FROM Emps

GROUP BY deptno, gender
SELECT COUNT(*)

FROM Emps

WHERE deptno = 10

AND gender = ‘M’
Scan [Emps]
Aggregate [deptno, gender,

COUNT(*), SUM(sal)]
Scan [EmpSummary] =
Scan [Emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [COUNT(*)]
© Hortonworks Inc. 2015
Materialized view, step 2: Rewrite query to match
CREATE MATERIALIZED VIEW EmpSummary AS

SELECT deptno,

gender,

COUNT(*) AS c,

SUM(sal) AS s

FROM Emps

GROUP BY deptno, gender
SELECT COUNT(*)

FROM Emps

WHERE deptno = 10

AND gender = ‘M’
Scan [Emps]
Aggregate [deptno, gender,

COUNT(*), SUM(sal)]
Scan [EmpSummary] =
Scan [Emps]
Filter [deptno = 10 AND gender = ‘M’]
Aggregate [deptno, gender,

COUNT(*) AS c, SUM(sal) AS s]
Project [c]
© Hortonworks Inc. 2015
Materialized view, step 3: Substitute table
CREATE MATERIALIZED VIEW EmpSummary AS

SELECT deptno,

gender,

COUNT(*) AS c,

SUM(sal) AS s

FROM Emps

GROUP BY deptno, gender
SELECT COUNT(*)

FROM Emps

WHERE deptno = 10

AND gender = ‘M’
Scan [Emps]
Aggregate [deptno, gender,

COUNT(*), SUM(sal)]
Scan [EmpSummary] =
Filter [deptno = 10 AND gender = ‘M’]
Project [c]
Scan [EmpSummary]
© Hortonworks Inc. 2015
Streaming
SELECT STREAM DISTINCT productName,

floor(rowtime TO HOUR) AS h

FROM Orders
Delta
Converts a table to a stream
Each time a row is inserted into the table, a
record appears in the stream
Chi
Converts a stream into a table
Often we can safely narrow the table down to a
small time window
Chi
Aggregate [productName, h]
Scan [Orders]
Project [productName,

floor(rowtime TO HOUR) AS h]
Delta
© Hortonworks Inc. 2015
Streaming - efficient implementation
SELECT STREAM DISTINCT productName,

floor(rowtime TO HOUR) AS h

FROM Orders
Can create efficient implementation:
• Input is sorted by timestamp
• Only need to aggregate an hour at a time
• Output timestamp tracks input timestamp
• Therefore it is safe to cancel out the Chi
and Delta operators

StreamingAggregate [productName, h]
Scan [Orders]
Project [productName,

floor(rowtime TO HOUR) AS h]
© Hortonworks Inc. 2015
Algebraic transformations - streaming
delta(filter(c, R)) → filter(delta(c, R))
delta(project(e1, …, en, R) → project(delta(e1, …, en, R))
delta(union(R1, R2)) → union(delta(R1), delta(R2))
delta(join(R1, R2, c)) → union(join(R1, delta(R2), c),

join(delta(R1), R2), c)
Delta behaves like “differentiate” in differential calculus,
Chi like “integrate”.
(f + g)’ = f’ + g’
(f . g)’ = f.g’ + f’.g
© Hortonworks Inc. 2015
Apache Calcite
Apache

Calcite
Apache
Calcite
© Hortonworks Inc. 2015
Apache Calcite
Apache incubator project since May, 2014
• Originally named Optiq
Query planning framework
• Relational algebra, rewrite rules, cost model
• Extensible
Packaging
• Library (JDBC server optional)
• Open source
• Community-authored rules, adapters
Adoption
• Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP,
Apache Phoenix, Apache Samza
• Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data
© Hortonworks Inc. 2015
Conventional DB architecture
© Hortonworks Inc. 2015
Calcite architecture
© Hortonworks Inc. 2015
Calcite – APIs and SPIs
Cost, statistics
RelOptCost
RelOptCostFactory
RelMetadataProvider
• RelMdColumnUniquensss
• RelMdDistinctRowCount
• RelMdSelectivity
SQL parser
SqlNode

SqlParser

SqlValidator
Transformation rules
RelOptRule
• MergeFilterRule
• PushAggregateThroughUnionRule
• 100+ more
Global transformations
• Unification (materialized view)
• Column trimming
• De-correlation
• Join ordering
Relational algebra
RelNode (operator)
• Scan
• Filter
• Project
• Union
• Aggregate
• …
RelDataType (type)
RexNode (expression)
RelTrait (physical property)
• RelConvention (calling-convention)
• RelCollation (sort-order)
• RelDistribution (partitions)
JDBC driver
Metadata
Schema
Table
Function
• TableFunction
• TableMacro
Lattice
© Hortonworks Inc. 2015
Data independence
A core principle of data management
Data independence is a contract:
• Applications do not make assumptions about the location or organization of data
• The DBMS chooses the most efficient access path
Requires:
• Declarative query language
• Query planner
Allows:
• The DBMS (or administrator) can re-organize the data without breaking the
application
• Redundant copies of the data (indexes, materialized views, replicas)
• Novel algorithms
• Novel data formats and organizations (e.g. b-tree, r-tree, column store)
© Hortonworks Inc. 2015
Disk
Hadoop
B2B1
B3 B4
Memory
CPU
Name
node
(HDFS)
Application
master
(YARN)
Zookeeper
© Hortonworks Inc. 2015
Commodity hardware
Storage, memory and CPU all scale as you add nodes
N replicas of each block (typically 3) give redundancy & scheduling flexibility
Disk
Hadoop scales
B2B1
B3 B4
Memory
CPU
Disk
B3B1
B5
Memory
CPU
Disk
B4B1
B5
Memory
CPU
Disk
B3B2
B6
Memory
CPU
Disk
B5B2
B6
Memory
CPU
B3
© Hortonworks Inc. 2015
Data flow among operators running on nodes
Nodes are assigned to work on blocks that have a replica locally
Memory is used for file blocks and for scratch space (e.g. hash tables)
Disk
Hadoop query execution
B2B1
B3 B4
Memory
CPU
Disk
B3B1
B5
Memory
CPU
Disk
B4B1
B5
Memory
CPU
Disk
B3B2
B6
Memory
CPU
Disk
B5B2
B6
Memory
CPU
B3
B1 B3 B4 B21 1 11
© Hortonworks Inc. 2015
Data independence and Hadoop
Hadoop is very flexible when data is loaded
That flexibility has made it hard for the system to optimize access
Materialized views are an opportunity to “crack” the data, and create copies in
other formats
Page‹#› © Hortonworks Inc. 2014
Calcite: Lattices and tiles
Materialized view
A table whose contents are guaranteed to be the same as
executing a given query.
Lattice
Recommends, builds, and recognizes summary
materialized views (tiles) based on a star schema.
A query defines the tables and many:1 relationships in the
star schema.
Tile
A summary materialized view that belongs to a lattice.
A tile may or may not be materialized.
Materialization methods:
• Declare in lattice
• Generate via recommender algorithm
• Created in response to query
CREATE MATERIALIZED VIEW t AS
SELECT * FROM Emps
WHERE deptno = 10;
CREATE LATTICE star AS
SELECT *
FROM Sales AS s
JOIN Products AS p ON …
JOIN ProductClasses AS pc ON …
JOIN Customers AS c ON …
JOIN Time AS t ON …;
CREATE MATERIALIZED VIEW zg IN star

SELECT gender, zipcode,
COUNT(*), SUM(unit_sales)

FROM star

GROUP BY gender, zipcode;
(FAKE SYNTAX)
© Hortonworks Inc. 2015
Query: SELECT x, SUM(y) FROM t GROUP BY x
In-memory

materialized
queries
Tables

on disk
Tiled, in-memory materializations
Where we’re going… algebraic cache: http://hortonworks.com/blog/dmmq/
© Hortonworks Inc. 2015
Summary
1. Relational algebra allows us to reason about queries, and
is the foundation of query planning
2. Hadoop is deconstructing the DBMS, and enabling new
languages, engines and data formats
3. Data independence is more important than ever
4. Apache Calcite - an implementation of relational algebra
© Hortonworks Inc. 2015
Thank you!
@julianhyde
http://calcite.incubator.apache.org
Apache

Calcite
Apache
Calcite

Mais conteúdo relacionado

Mais procurados

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / OptiqJulian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using OptiqJulian Hyde
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive Zara Tariq
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 

Mais procurados (20)

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
SQL on Big Data using Optiq
SQL on Big Data using OptiqSQL on Big Data using Optiq
SQL on Big Data using Optiq
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive Query optimization techniques in Apache Hive
Query optimization techniques in Apache Hive
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 

Destaque

Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopJulian Hyde
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too muchJulian Hyde
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?Julian Hyde
 
Optiq: A dynamic data management framework
Optiq: A dynamic data management frameworkOptiq: A dynamic data management framework
Optiq: A dynamic data management frameworkJulian Hyde
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentJulian Hyde
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Chris Fregly
 
Improve Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functionsImprove Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functionsRaimonds Simanovskis
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
 

Destaque (13)

Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Discardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With HadoopDiscardable In-Memory Materialized Queries With Hadoop
Discardable In-Memory Materialized Queries With Hadoop
 
The twins that everyone loved too much
The twins that everyone loved too muchThe twins that everyone loved too much
The twins that everyone loved too much
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
 
Optiq: A dynamic data management framework
Optiq: A dynamic data management frameworkOptiq: A dynamic data management framework
Optiq: A dynamic data management framework
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
Improve Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functionsImprove Mondrian MDX usability with user defined functions
Improve Mondrian MDX usability with user defined functions
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 

Semelhante a Why you care about
 relational algebra (even though you didn’t know it)

Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in HiveDataWorks Summit
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016Maryann Xue
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczLDBC council
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernates4al_com
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo MondrianSimone Campora
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence PortfolioChris Seebacher
 
Bi Ppt Portfolio Elmer Donavan
Bi Ppt Portfolio  Elmer DonavanBi Ppt Portfolio  Elmer Donavan
Bi Ppt Portfolio Elmer DonavanEJDonavan
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...Amazon Web Services
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus TechnologiesImpetus Technologies
 
Powerpivot web wordpress present
Powerpivot web wordpress presentPowerpivot web wordpress present
Powerpivot web wordpress presentMariAnne Woehrle
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolionpatel2362
 

Semelhante a Why you care about
 relational algebra (even though you didn’t know it) (20)

Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Cost-based Query Optimization in Hive
Cost-based Query Optimization in HiveCost-based Query Optimization in Hive
Cost-based Query Optimization in Hive
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Bi Ppt Portfolio Elmer Donavan
Bi Ppt Portfolio  Elmer DonavanBi Ppt Portfolio  Elmer Donavan
Bi Ppt Portfolio Elmer Donavan
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
(DVO308) Docker & ECS in Production: How We Migrated Our Infrastructure from ...
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
Powerpivot web wordpress present
Powerpivot web wordpress presentPowerpivot web wordpress present
Powerpivot web wordpress present
 
Nitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence PortfolioNitin\'s Business Intelligence Portfolio
Nitin\'s Business Intelligence Portfolio
 

Mais de Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 

Mais de Julian Hyde (16)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 

Último

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Último (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

Why you care about
 relational algebra (even though you didn’t know it)

  • 1. © Hortonworks Inc. 2015 Why you care about
 relational algebra (even though you didn’t know it) Julian Hyde @julianhyde Enterprise Data World Washington, DC April 2nd, 2015
  • 2. © Hortonworks Inc. 2015 About me Apache
 Calcite Apache Calcite
  • 3. © Hortonworks Inc. 2015 Why you should care about relational algebra Why should you care? • It is old • It is as useful as ever • Exposed in new products such as Hadoop • New challenges Agenda • Is Hadoop a revolution for the database world? • What is relational algebra? • Examples of algebra in action • Introducing Apache Calcite • Adding data independence to Hadoop via materialized views
  • 4. © Hortonworks Inc. 2015 Hadoop Old world, new world RDBMS • Security • Metadata • SQL • Query planning • Data independence • Scale • Late schema • Choice of front-end • Choice of engines • Workload: batch, interactive, streaming, ML, graph, …
  • 5. © Hortonworks Inc. 2015 Many front ends, many engines SQL Planning Execution
 engine Planning User code Map
 Reduce Tez User code in Yarn Spark MongoDB Hadoop External
 SQL SQL Spark Storm Cascading HBase Graph
  • 6. © Hortonworks Inc. 2015 Analogy: LLVM Lessons from the compiler
 community: • Writing a front end is hard • Writing a back end is hard • Writing an optimizer is really hard • Most of the logic in the optimizer is independent of front end and back end • E.g. register assignment • The optimizer is a collection of separate algorithms • Common language between algorithms
  • 7. © Hortonworks Inc. 2015 Relational algebra SELECT d.name, COUNT(*) AS c FROM Emps AS e JOIN Depts AS d ON e.deptno = d.deptno WHERE e.age < 30 GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC Scan [Emps] Scan [Depts] Join [e.deptno
 = d.deptno] Filter [e.age < 30] Aggregate [deptno, COUNT(*) AS c] Filter [c > 5] Project [name, c] Sort [c DESC] (Column names are simplified. They would usually
 be ordinals, e.g. $0 is the first column of the left input.)
  • 8. © Hortonworks Inc. 2015 Relational algebra - Union and sub-query SELECT * FROM (
 SELECT zipcode, state
 FROM Emps
 UNION ALL
 SELECT zipcode, state
 FROM Customers)
 WHERE state IN (‘CA’, ‘TX’) Scan [Emps] Scan [Customers] Union [all] Project [zipcode, state] Project [zipcode, state] Filter [state IN (‘CA’, ‘TX’)]
  • 9. © Hortonworks Inc. 2015 Relational algebra - Insert and Values INSERT INTO Facts
 VALUES (‘Meaning of life’, 42),
 (‘Clever as clever’, 6) Insert [Facts] Values [[‘Meaning of life’, 42], [‘Clever as clever’, 6]]
  • 10. © Hortonworks Inc. 2015 Relational algebra - Strict versus Pragmatic “Strict” relational algebra Introduced by E.F. Codd in “A relational model for large shared data banks” [1970] Goal is mathematical elegance (ability to prove theorems) Greek symbols: σ, π, ρ, U, Relations cannot contain duplicates Relations are not sorted Column values are scalars Only logical operators Pragmatic relational algebra Goal is to optimize queries, allow real- world data models, extensibility Elegance still important Verbs: Project, Filter, Union, Join Relations may contain duplicates Relations may be sorted • But Sort is the only logical operator that guarantees order Null values have 3-value semantics, as in SQL Physical operators (e.g. HashJoin, MergeJoin) Physical properties (sort, distribution)
  • 11. © Hortonworks Inc. 2015 Algebraic transformations (R filter c1) filter c2 → R filter (c1 and c2) (R1 union R2) join R3 on c → (R1 join R3 on C) union (R2 join R3 on c) • Compare distributive law of arithmetic: (x + y) * z → (x * z) + (y * z) (R1 join R2 on c) filter c2 → (R1 filter c2) join R2 on c (R1 join R2 on c) → (R2 join R2 on c) project [R1.*, R2.*] (R1 join R2 on c) join R3 on c2 → R1 join (R2 join R3 on c2) on c Many, many others… (provided C2 only depends on columns in E, and join is inner) (provided c, c2 have the necessary columns)
  • 12. © Hortonworks Inc. 2015 Query using a view SELECT deptno, min(salary)
 FROM Managers
 WHERE age >= 50
 GROUP BY deptno CREATE VIEW Managers AS
 SELECT *
 FROM Emps 
 WHERE EXISTS (
 SELECT *
 FROM Emps AS underling
 WHERE underling.manager = emp.id) Scan [Emps] Join [$0, $5] Project [$0, $1, $2, $3] Filter [age >= 50] Aggregate [deptno, min(salary)] Scan [Managers] Aggregate [manager] Scan [Emps]
  • 13. © Hortonworks Inc. 2015 After view expansion SELECT deptno, min(salary)
 FROM Managers
 WHERE age >= 50
 GROUP BY deptno CREATE VIEW Managers AS
 SELECT *
 FROM Emps 
 WHERE EXISTS (
 SELECT *
 FROM Emps AS underling
 WHERE underling.manager = emp.id) Scan [Emps] Aggregate [manager] Join [$0, $5] Project [$0, $1, $2, $3] Filter [age >= 50] Aggregate [deptno, min(salary)] Scan [Emps]
  • 14. © Hortonworks Inc. 2015 After pushing down filter SELECT deptno, min(salary)
 FROM Managers
 WHERE age >= 50
 GROUP BY deptno CREATE VIEW Managers AS
 SELECT *
 FROM Emps 
 WHERE EXISTS (
 SELECT *
 FROM Emps AS underling
 WHERE underling.manager = emp.id) Scan [Emps] Scan [Emps] Join [$0, $5] Project [$0, $1, $2, $3] Filter [age >= 50] Aggregate [deptno, min(salary)]
  • 15. © Hortonworks Inc. 2015 Materialized view CREATE MATERIALIZED VIEW EmpSummary AS
 SELECT deptno,
 gender,
 COUNT(*) AS c,
 SUM(sal) AS s
 FROM Emps
 GROUP BY deptno, gender SELECT COUNT(*)
 FROM Emps
 WHERE deptno = 10
 AND gender = ‘M’ Scan [Emps] Aggregate [deptno, gender,
 COUNT(*), SUM(sal)] Scan [EmpSummary] = Scan [Emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [COUNT(*)]
  • 16. © Hortonworks Inc. 2015 Materialized view, step 2: Rewrite query to match CREATE MATERIALIZED VIEW EmpSummary AS
 SELECT deptno,
 gender,
 COUNT(*) AS c,
 SUM(sal) AS s
 FROM Emps
 GROUP BY deptno, gender SELECT COUNT(*)
 FROM Emps
 WHERE deptno = 10
 AND gender = ‘M’ Scan [Emps] Aggregate [deptno, gender,
 COUNT(*), SUM(sal)] Scan [EmpSummary] = Scan [Emps] Filter [deptno = 10 AND gender = ‘M’] Aggregate [deptno, gender,
 COUNT(*) AS c, SUM(sal) AS s] Project [c]
  • 17. © Hortonworks Inc. 2015 Materialized view, step 3: Substitute table CREATE MATERIALIZED VIEW EmpSummary AS
 SELECT deptno,
 gender,
 COUNT(*) AS c,
 SUM(sal) AS s
 FROM Emps
 GROUP BY deptno, gender SELECT COUNT(*)
 FROM Emps
 WHERE deptno = 10
 AND gender = ‘M’ Scan [Emps] Aggregate [deptno, gender,
 COUNT(*), SUM(sal)] Scan [EmpSummary] = Filter [deptno = 10 AND gender = ‘M’] Project [c] Scan [EmpSummary]
  • 18. © Hortonworks Inc. 2015 Streaming SELECT STREAM DISTINCT productName,
 floor(rowtime TO HOUR) AS h
 FROM Orders Delta Converts a table to a stream Each time a row is inserted into the table, a record appears in the stream Chi Converts a stream into a table Often we can safely narrow the table down to a small time window Chi Aggregate [productName, h] Scan [Orders] Project [productName,
 floor(rowtime TO HOUR) AS h] Delta
  • 19. © Hortonworks Inc. 2015 Streaming - efficient implementation SELECT STREAM DISTINCT productName,
 floor(rowtime TO HOUR) AS h
 FROM Orders Can create efficient implementation: • Input is sorted by timestamp • Only need to aggregate an hour at a time • Output timestamp tracks input timestamp • Therefore it is safe to cancel out the Chi and Delta operators
 StreamingAggregate [productName, h] Scan [Orders] Project [productName,
 floor(rowtime TO HOUR) AS h]
  • 20. © Hortonworks Inc. 2015 Algebraic transformations - streaming delta(filter(c, R)) → filter(delta(c, R)) delta(project(e1, …, en, R) → project(delta(e1, …, en, R)) delta(union(R1, R2)) → union(delta(R1), delta(R2)) delta(join(R1, R2, c)) → union(join(R1, delta(R2), c),
 join(delta(R1), R2), c) Delta behaves like “differentiate” in differential calculus, Chi like “integrate”. (f + g)’ = f’ + g’ (f . g)’ = f.g’ + f’.g
  • 21. © Hortonworks Inc. 2015 Apache Calcite Apache
 Calcite Apache Calcite
  • 22. © Hortonworks Inc. 2015 Apache Calcite Apache incubator project since May, 2014 • Originally named Optiq Query planning framework • Relational algebra, rewrite rules, cost model • Extensible Packaging • Library (JDBC server optional) • Open source • Community-authored rules, adapters Adoption • Embedded: Lingual (SQL interface to Cascading), Apache Drill, Apache Hive, Kylin OLAP, Apache Phoenix, Apache Samza • Adapters: Splunk, Spark, MongoDB, JDBC, CSV, JSON, Web tables, In-memory data
  • 23. © Hortonworks Inc. 2015 Conventional DB architecture
  • 24. © Hortonworks Inc. 2015 Calcite architecture
  • 25. © Hortonworks Inc. 2015 Calcite – APIs and SPIs Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode
 SqlParser
 SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUnionRule • 100+ more Global transformations • Unification (materialized view) • Column trimming • De-correlation • Join ordering Relational algebra RelNode (operator) • Scan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sort-order) • RelDistribution (partitions) JDBC driver Metadata Schema Table Function • TableFunction • TableMacro Lattice
  • 26. © Hortonworks Inc. 2015 Data independence A core principle of data management Data independence is a contract: • Applications do not make assumptions about the location or organization of data • The DBMS chooses the most efficient access path Requires: • Declarative query language • Query planner Allows: • The DBMS (or administrator) can re-organize the data without breaking the application • Redundant copies of the data (indexes, materialized views, replicas) • Novel algorithms • Novel data formats and organizations (e.g. b-tree, r-tree, column store)
  • 27. © Hortonworks Inc. 2015 Disk Hadoop B2B1 B3 B4 Memory CPU Name node (HDFS) Application master (YARN) Zookeeper
  • 28. © Hortonworks Inc. 2015 Commodity hardware Storage, memory and CPU all scale as you add nodes N replicas of each block (typically 3) give redundancy & scheduling flexibility Disk Hadoop scales B2B1 B3 B4 Memory CPU Disk B3B1 B5 Memory CPU Disk B4B1 B5 Memory CPU Disk B3B2 B6 Memory CPU Disk B5B2 B6 Memory CPU B3
  • 29. © Hortonworks Inc. 2015 Data flow among operators running on nodes Nodes are assigned to work on blocks that have a replica locally Memory is used for file blocks and for scratch space (e.g. hash tables) Disk Hadoop query execution B2B1 B3 B4 Memory CPU Disk B3B1 B5 Memory CPU Disk B4B1 B5 Memory CPU Disk B3B2 B6 Memory CPU Disk B5B2 B6 Memory CPU B3 B1 B3 B4 B21 1 11
  • 30. © Hortonworks Inc. 2015 Data independence and Hadoop Hadoop is very flexible when data is loaded That flexibility has made it hard for the system to optimize access Materialized views are an opportunity to “crack” the data, and create copies in other formats
  • 31. Page‹#› © Hortonworks Inc. 2014 Calcite: Lattices and tiles Materialized view A table whose contents are guaranteed to be the same as executing a given query. Lattice Recommends, builds, and recognizes summary materialized views (tiles) based on a star schema. A query defines the tables and many:1 relationships in the star schema. Tile A summary materialized view that belongs to a lattice. A tile may or may not be materialized. Materialization methods: • Declare in lattice • Generate via recommender algorithm • Created in response to query CREATE MATERIALIZED VIEW t AS SELECT * FROM Emps WHERE deptno = 10; CREATE LATTICE star AS SELECT * FROM Sales AS s JOIN Products AS p ON … JOIN ProductClasses AS pc ON … JOIN Customers AS c ON … JOIN Time AS t ON …; CREATE MATERIALIZED VIEW zg IN star
 SELECT gender, zipcode, COUNT(*), SUM(unit_sales)
 FROM star
 GROUP BY gender, zipcode; (FAKE SYNTAX)
  • 32. © Hortonworks Inc. 2015 Query: SELECT x, SUM(y) FROM t GROUP BY x In-memory
 materialized queries Tables
 on disk Tiled, in-memory materializations Where we’re going… algebraic cache: http://hortonworks.com/blog/dmmq/
  • 33. © Hortonworks Inc. 2015 Summary 1. Relational algebra allows us to reason about queries, and is the foundation of query planning 2. Hadoop is deconstructing the DBMS, and enabling new languages, engines and data formats 3. Data independence is more important than ever 4. Apache Calcite - an implementation of relational algebra
  • 34. © Hortonworks Inc. 2015 Thank you! @julianhyde http://calcite.incubator.apache.org Apache
 Calcite Apache Calcite