SlideShare uma empresa Scribd logo
1 de 25
Drill / SQL / Optiq
     Julian Hyde
 Apache Drill User Group
     2013-03-13
SQL
SQL: Pros & cons
Fact:

  SQL is older than Macaulay Culkin

Less interesting but more relevant:

  Can be written by (lots of) humans

  Can be written by machines

  Requires query optimization

  Allows query optimization

  Based on “flat” relations and basic relational
  operations
Quick intro to Optiq
Introducing Optiq
Framework
Derived from LucidDB
Minimal query mediator:

    No storage

    No runtime

    No metadata

    Query planning engine

    Core operators & rewrite rules

    Optional SQL parser/validator
SELECT p.“product_name”, COUNT(*) AS c
Expression tree                        FROM “splunk”.”splunk” AS s
                                         JOIN “mysql”.”products” AS p
                                         ON s.”product_id” = p.”product_id”
                                       WHERE s.“action” = 'purchase'
                                       GROUP BY p.”product_name”
                                       ORDER BY c DESC
 Splunk
 Table: splunk
                                                    Key: product_name
                     Key: product_id                Agg: count
                                       Condition:                       Key: c DESC
                                         action =
                                       'purchase'
 scan
                          join
MySQL                                  filter          group            sort
    scan
                 Table: products
SELECT p.“product_name”, COUNT(*) AS c
Expression tree                      FROM “splunk”.”splunk” AS s
                                       JOIN “mysql”.”products” AS p
(optimized)                            ON s.”product_id” = p.”product_id”
                                     WHERE s.“action” = 'purchase'
                                     GROUP BY p.”product_name”
                                     ORDER BY c DESC
             Splunk
                        Condition:
 Table: splunk
                          action =
                        'purchase'                     Key: product_name
                                                       Agg: count
                                                                           Key: c DESC

                                     Key: product_id
 scan                   filter

MySQL
                                     join                 group            sort
   scan
                 Table: products
Metadata SPI

    interface Table
    −   RelDataType getRowType()

    interface TableFunction
    −   List<Parameter> getParameters()
    −   Table apply(List arguments)
    −   e.g. ViewTableFunction

    interface Schema
     − Map<String, List<TableFunction>>
       getTableFunctions()
Operators and rules

    Rule: interface RelOptRule

    Operator: interface RelNode

    Core operators: TableAccess, Project,
    Filter, Join, Aggregate, Order, Union,
    Intersect, Minus, Values

    Some rules: MergeFilterRule,
    PushAggregateThroughUnionRule,
    RemoveCorrelationForScalarProjectRule
    + 100 more
Planning algorithm

    Start with a logical plan and a set of
    rewrite rules

    Form a list of rewrite rules that are
    applicable (based on pattern-matching)

    Fire the rule that is likely to do the most
    good

    Rule generates an expression that is
    equivalent (and hopefully cheaper)

    Queue up new rule matches

    Repeat until cheap enough
Concepts

    Cost

    Equivalence sets

    Calling convention

    Logical vs Physical

    Traits

    Implementation
Outside the kernel
                                      JDBC client

    SQL
    parser/validator
                                    JDBC server

    JDBC driver            Optional SQL parser /          Metadata
                                       validator            SPI

    SQL function             Core
                                        Query            Pluggable
    library (validation               3
                                       planner
                                        rd
                                              3
                                              rd
                                                           rules

    + code-               Pluggable  party party
    generation)       3 party
                            rd        ops ops
                                                    3rd party
                        data                          data

    Lingual
    (Cascading
    adapter)

    Splunk adapter

    Drill adapter
Optiq roadmap

    Building blocks for analytic DB:
    −   In-memory tables in a distributed cache
    −   Materialized views
    −   Partitioned tables

    Faster planning

    Easier rule development

    ODBC driver

    Adapters for XXX, YYY
Applying Optiq to Drill
  1. Enhance SQL
 2. Query translation
Drill vs Traditional SQL

    SQL:
    −   Flat data
    −   Schema up front

    Drill:
    −   Nested data (list & map)
    −   No schema

    We'd like to write:
    −   SELECT name, toppings[2] FROM donuts
        WHERE ppu > 0.6

    Solution: ARRAY, MAP, ANY types
ARRAY & MAP SQL types

  ARRAY is like java.util.List

  MAP is like java.util.LinkedHashMap
Examples:

  VALUES ARRAY ['a', 'b', 'c']

  VALUES MAP ['Washington', 1, 'Obama', 44]

  SELECT name, address[1], address[2], state
  FROM Employee

  SELECT * FROM Donuts WHERE
  CAST(donuts['ppu'] AS DOUBLE) > 0.6
ANY SQL type

    ANY means “type to be determined at runtime”

    Validator narrows down possible type based
    on operators used

    Similar to converting Java's type system into
    JavaScript's. (Not easy.)
Sugaring the donut
Query:

  SELECT c['ppu'], c['toppings'][1] FROM
  Donuts
Additional syntactic sugar:

  c.x means c['x']
So:

  CREATE TABLE Donuts(c ANY)

  SELECT c.ppu, c.toppings[1] FROM
  Donuts
Better:
UNNEST
Employees nested inside departments:

  CREATE TYPE employee (empno INT, name
  VARCHAR(30));

  CREATE TABLE dept (deptno INT, name
  VARCHAR(30),
   employees EMPLOYEE ARRAY);
Unnest:

  SELECT d.deptno, d.name, e.empno, e.name
  FROM department AS d
   CROSS JOIN UNNEST(d.employees) AS e
SQL standard provides other operations on
  collections:
Applying Optiq to Drill
   1. Enhance SQL
2. Query translation
Query translation
SQL:

  select d['name'] as name, d['xx'] as xx
  from (
    select _MAP['donuts'] as d from donuts)
  where cast(d['ppu'] as double) > 0.6

Drill:

  { head: { … },
      storage: { … },
      query: [ {
         op: “sequence”, do: [
           { op: “scan”, … selection: { path:
Planner log
Original rel:
AbstractConverter(subset=[rel#14:Subset#3.AR
 RAY], convention=[ARRAY])
 ProjectRel(subset=[rel#10:Subset#3.NONE],
 NAME=[ITEM($0, 'name')], XX=[ITEM($0,
 'xx')])
  FilterRel(subset=[rel#8:Subset#2.NONE],
 condition=[>(CAST(ITEM($0, 'ppu')):DOUBLE
 NOT NULL, 0.6)])
    ProjectRel(subset=[rel#6:Subset#1.NONE],
 D=[ITEM($0, 'donuts')])
     DrillScan(subset=[rel#4:Subset#0.DRILL],
Next

    Translate join, aggregate, sort, set ops

    Operator overloading with ANY

    Mondrian on Drill
Thank you!


https://github.com/julianhyde/share/tree/master/slides
https://github.com/julianhyde/incubator-drill
https://github.com/julianhyde/optiq
http://incubator.apache.org/drill

@julianhyde

Mais conteúdo relacionado

Mais procurados

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Julian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Optiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingOptiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingJulian Hyde
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 

Mais procurados (20)

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)Why you care about
 relational algebra (even though you didn’t know it)
Why you care about
 relational algebra (even though you didn’t know it)
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
What's new in Mondrian 4?
What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
Optiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingOptiq: a SQL front-end for everything
Optiq: a SQL front-end for everything
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 

Semelhante a Drill / SQL / Optiq

Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0 Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0 Bryan Yang
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for TrainingBryan Yang
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youLuc Bors
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
Meetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbczznate
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowChetan Khatri
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL DatabaseMohammad Alghanem
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit
 
Nko workshop - node js & nosql
Nko workshop - node js & nosqlNko workshop - node js & nosql
Nko workshop - node js & nosqlSimon Su
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
 
Living the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonLiving the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonParis Container Day
 

Semelhante a Drill / SQL / Optiq (20)

Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0 Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Meetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Road to Analytics
Road to AnalyticsRoad to Analytics
Road to Analytics
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
 
Nko workshop - node js & nosql
Nko workshop - node js & nosqlNko workshop - node js & nosql
Nko workshop - node js & nosql
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Living the Nomadic life - Nic Jackson
Living the Nomadic life - Nic JacksonLiving the Nomadic life - Nic Jackson
Living the Nomadic life - Nic Jackson
 

Mais de Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 

Mais de Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Drill / SQL / Optiq

  • 1. Drill / SQL / Optiq Julian Hyde Apache Drill User Group 2013-03-13
  • 2.
  • 3. SQL
  • 4. SQL: Pros & cons Fact:  SQL is older than Macaulay Culkin Less interesting but more relevant:  Can be written by (lots of) humans  Can be written by machines  Requires query optimization  Allows query optimization  Based on “flat” relations and basic relational operations
  • 6. Introducing Optiq Framework Derived from LucidDB Minimal query mediator:  No storage  No runtime  No metadata  Query planning engine  Core operators & rewrite rules  Optional SQL parser/validator
  • 7. SELECT p.“product_name”, COUNT(*) AS c Expression tree FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Splunk Table: splunk Key: product_name Key: product_id Agg: count Condition: Key: c DESC action = 'purchase' scan join MySQL filter group sort scan Table: products
  • 8. SELECT p.“product_name”, COUNT(*) AS c Expression tree FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p (optimized) ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC Splunk Condition: Table: splunk action = 'purchase' Key: product_name Agg: count Key: c DESC Key: product_id scan filter MySQL join group sort scan Table: products
  • 9. Metadata SPI  interface Table − RelDataType getRowType()  interface TableFunction − List<Parameter> getParameters() − Table apply(List arguments) − e.g. ViewTableFunction  interface Schema − Map<String, List<TableFunction>> getTableFunctions()
  • 10. Operators and rules  Rule: interface RelOptRule  Operator: interface RelNode  Core operators: TableAccess, Project, Filter, Join, Aggregate, Order, Union, Intersect, Minus, Values  Some rules: MergeFilterRule, PushAggregateThroughUnionRule, RemoveCorrelationForScalarProjectRule + 100 more
  • 11. Planning algorithm  Start with a logical plan and a set of rewrite rules  Form a list of rewrite rules that are applicable (based on pattern-matching)  Fire the rule that is likely to do the most good  Rule generates an expression that is equivalent (and hopefully cheaper)  Queue up new rule matches  Repeat until cheap enough
  • 12. Concepts  Cost  Equivalence sets  Calling convention  Logical vs Physical  Traits  Implementation
  • 13. Outside the kernel JDBC client  SQL parser/validator JDBC server  JDBC driver Optional SQL parser / Metadata validator SPI  SQL function Core Query Pluggable library (validation 3 planner rd 3 rd rules + code- Pluggable party party generation) 3 party rd ops ops 3rd party data data  Lingual (Cascading adapter)  Splunk adapter  Drill adapter
  • 14. Optiq roadmap  Building blocks for analytic DB: − In-memory tables in a distributed cache − Materialized views − Partitioned tables  Faster planning  Easier rule development  ODBC driver  Adapters for XXX, YYY
  • 15. Applying Optiq to Drill 1. Enhance SQL 2. Query translation
  • 16. Drill vs Traditional SQL  SQL: − Flat data − Schema up front  Drill: − Nested data (list & map) − No schema  We'd like to write: − SELECT name, toppings[2] FROM donuts WHERE ppu > 0.6  Solution: ARRAY, MAP, ANY types
  • 17. ARRAY & MAP SQL types  ARRAY is like java.util.List  MAP is like java.util.LinkedHashMap Examples:  VALUES ARRAY ['a', 'b', 'c']  VALUES MAP ['Washington', 1, 'Obama', 44]  SELECT name, address[1], address[2], state FROM Employee  SELECT * FROM Donuts WHERE CAST(donuts['ppu'] AS DOUBLE) > 0.6
  • 18. ANY SQL type  ANY means “type to be determined at runtime”  Validator narrows down possible type based on operators used  Similar to converting Java's type system into JavaScript's. (Not easy.)
  • 19. Sugaring the donut Query:  SELECT c['ppu'], c['toppings'][1] FROM Donuts Additional syntactic sugar:  c.x means c['x'] So:  CREATE TABLE Donuts(c ANY)  SELECT c.ppu, c.toppings[1] FROM Donuts Better:
  • 20. UNNEST Employees nested inside departments:  CREATE TYPE employee (empno INT, name VARCHAR(30));  CREATE TABLE dept (deptno INT, name VARCHAR(30), employees EMPLOYEE ARRAY); Unnest:  SELECT d.deptno, d.name, e.empno, e.name FROM department AS d CROSS JOIN UNNEST(d.employees) AS e SQL standard provides other operations on collections:
  • 21. Applying Optiq to Drill 1. Enhance SQL 2. Query translation
  • 22. Query translation SQL:  select d['name'] as name, d['xx'] as xx from ( select _MAP['donuts'] as d from donuts) where cast(d['ppu'] as double) > 0.6 Drill:  { head: { … }, storage: { … }, query: [ { op: “sequence”, do: [ { op: “scan”, … selection: { path:
  • 23. Planner log Original rel: AbstractConverter(subset=[rel#14:Subset#3.AR RAY], convention=[ARRAY]) ProjectRel(subset=[rel#10:Subset#3.NONE], NAME=[ITEM($0, 'name')], XX=[ITEM($0, 'xx')]) FilterRel(subset=[rel#8:Subset#2.NONE], condition=[>(CAST(ITEM($0, 'ppu')):DOUBLE NOT NULL, 0.6)]) ProjectRel(subset=[rel#6:Subset#1.NONE], D=[ITEM($0, 'donuts')]) DrillScan(subset=[rel#4:Subset#0.DRILL],
  • 24. Next  Translate join, aggregate, sort, set ops  Operator overloading with ANY  Mondrian on Drill

Notas do Editor

  1. It&apos;s much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn&apos;t have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  2. It&apos;s much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn&apos;t have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.