SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
SQL Now!
NoSQL Now!
San Jose, California
August, 2013
Julian Hyde @julianhyde
About me
• Database: Oracle, Broadbase
• Streaming query: SQLstream
• Open source BI: Mondrian / Pentaho
• Open source SQL: LucidDB, Optiq
• Contributor to Apache Drill, Cascading
Lingual
http://manning.com/back/
http://manning.com/back/
45% off code
mlnosql13
3 things
1. Modern data challenges need both NoSQL
and SQL.
2. Optiq is not a database (and this is a Good
Thing).
3. How to use Optiq with your data.
A disturbance in the Force
SQL has been around a
very, very long time
NoSQL: trade-offs vs
SQL
Gain Lose
Scale out Optimization
Flexibility Data independence
Productivity Central control
Purchase cost Tool integration
SQL: key features
• Inspired by Codd’s 1970 “relational
database” paper
• Semantics based on relational operators
(scan, filter, project, join, agg, union, sort)
• All implementations transform queries
• Optimizer rewrites queries onto available
data structures and operators
Optiq
Optiq design principles
1. Do the stuff that SQL does well
2. Let other systems do what they do well
Conventional DB architecture
Optiq architecture
Examples
Example #1: CSV
• Uses CSV adapter (optiq-csv)
• Demo using sqlline
• Easy to run this for yourself:
$ git clone https://github.com/julianhyde/optiq-csv
$ cd optiq-csv
$ mvn install
$ ./sqlline
!connect jdbc:optiq:model=target/test-classes/model.json admin admin
!tables
!describe emps
SELECT * FROM emps;
EXPLAIN PLAN FOR SELECT * FROM emps;
SELECT depts.name, count(*)
FROM emps JOIN depts USING (deptno)
GROUP BY depts.name;
model.json:
{
version: '1.0',
defaultSchema: 'SALES',
schemas: [
{
name: 'SALES',
type: 'custom',
factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory',
operand: {
directory: 'target/test-classes/sales'
}
}
]
}
More adapters
Adapters Embedded Planned
CSV Cascading (Lingual) HBase (Phoenix)
JDBC Apache Drill Spark
MongoDB Cassandra
Splunk Mondrian
linq4j
SQL on NoSQL
• No schema or schema-on-read
• Optiq MAP columns, dynamic schema
• Nested data (Drill, MongoDB)
• Optiq ARRAY columns
• Source system optimized for transactions &
focused reads, not analytic queries
• Optiq cache and materializations
• Mongo’s standard “zips” data set with all US
zip codes
• Raw ZIPS table with _MAP column
• ZIPS view extracts named fields, including
nested latitude and longitude, and converts
them to SQL types
Example #2: MongoDB
Splunk
• NoSQL database
• Every log file in the enterprise
• A single “table”
• A record for every line in every log file
• A column for every field that exists in
any log file
• No schema
Example #3:
Splunk + MySQL
SELECT p.product_name,
COUNT(*) AS c
FROM splunk.splunk AS s
JOIN mysql.products AS p
ON s.product_id = p.product_id
WHERE s.action = 'purchase'
GROUP BY p.product_name
ORDER BY c DESC
Expression tree
Optimized tree
Analytics
Interactive analytics on
NoSQL?
• NoSQL operational DB (e.g. HBase,
MongoDB, Cassandra)
• Analytic queries aggregate over full scan
• Speed-of-thought response (< 5 seconds)
• Data freshness (< 10 minutes)
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
Simple analytics
problem
• 100M U.S. census records
• 1KB each record, 100GB total
• 4 SATA3 disks, total 1.2GB/s
• How to count all records in under 5s?
• Not possible?! It takes 80s just to read the
data.
Solution: Cheat!
Solution: Cheat!
• Compress data
• Column-oriented storage
• Store data in sorted order
• Put data in memory
• Cache previous query results
• Pre-compute (materialize) aggregates
Mondrian on Optiq on
NoSQL (under construction)
Hybrid analytic
architecture
1. NoSQL source system
2. Access via Optiq SQL
3. Optiq rewrites queries to use materialized data
(e.g. aggregate tables)
4. Cached results are treated as “in-memory
tables”
5. Materializations offline dynamically as underlying
data changes, and go online as they are refreshed
Summary
3 things (reprise)
1. SQL allows you to reorganize your data and
optimize your queries—while still using your
NoSQL database for what it does best.
2. Optiq is not a database. It lets you create
very powerful federated data architectures.
3. Access your data using Optiq adapters.
Write schemas in JSON or use schema SPI.
Connect via JDBC.
Questions?
Thanks!
@julianhyde
optiq https://github.com/julianhyde/optiq
optiq-csv https://github.com/julianhyde/optiq-csv
drill http://incubator.apache.org/drill/
lingual http://www.cascading.org/lingual/
mondrian http://mondrian.pentaho.com
blog http://julianhyde.blogspot.com/
book http://manning.com/back/ 45% off code mlnosql13

Mais conteúdo relacionado

Mais procurados

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solutionJulian Hyde
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memoryJulian Hyde
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / OptiqJulian Hyde
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Julian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Optiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingOptiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingJulian Hyde
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

Mais procurados (20)

Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solution
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Optiq: a SQL front-end for everything
Optiq: a SQL front-end for everythingOptiq: a SQL front-end for everything
Optiq: a SQL front-end for everything
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Semelhante a SQL Now! How Optiq brings the best of SQL to NoSQL data.

Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Ryan Tabora
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 GanglerSecure-24
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL OverviewTu Hoang
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Lviv Startup Club
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillMapR Technologies
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018Roy Russo
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Serverhannonhill
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management SystemAmar Myana
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherObjectRocket
 

Semelhante a SQL Now! How Optiq brings the best of SQL to NoSQL data. (20)

Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)Searching Billions of Product Logs in Real Time (Use Case)
Searching Billions of Product Logs in Real Time (Use Case)
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 Gangler
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Swiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache DrillSwiss Big Data User Group - Introduction to Apache Drill
Swiss Big Data User Group - Introduction to Apache Drill
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 

Mais de Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 

Mais de Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

SQL Now! How Optiq brings the best of SQL to NoSQL data.

  • 1. SQL Now! NoSQL Now! San Jose, California August, 2013 Julian Hyde @julianhyde
  • 2. About me • Database: Oracle, Broadbase • Streaming query: SQLstream • Open source BI: Mondrian / Pentaho • Open source SQL: LucidDB, Optiq • Contributor to Apache Drill, Cascading Lingual
  • 5. 3 things 1. Modern data challenges need both NoSQL and SQL. 2. Optiq is not a database (and this is a Good Thing). 3. How to use Optiq with your data.
  • 6. A disturbance in the Force
  • 7. SQL has been around a very, very long time
  • 8. NoSQL: trade-offs vs SQL Gain Lose Scale out Optimization Flexibility Data independence Productivity Central control Purchase cost Tool integration
  • 9. SQL: key features • Inspired by Codd’s 1970 “relational database” paper • Semantics based on relational operators (scan, filter, project, join, agg, union, sort) • All implementations transform queries • Optimizer rewrites queries onto available data structures and operators
  • 10. Optiq
  • 11. Optiq design principles 1. Do the stuff that SQL does well 2. Let other systems do what they do well
  • 15. Example #1: CSV • Uses CSV adapter (optiq-csv) • Demo using sqlline • Easy to run this for yourself: $ git clone https://github.com/julianhyde/optiq-csv $ cd optiq-csv $ mvn install $ ./sqlline
  • 16. !connect jdbc:optiq:model=target/test-classes/model.json admin admin !tables !describe emps SELECT * FROM emps; EXPLAIN PLAN FOR SELECT * FROM emps; SELECT depts.name, count(*) FROM emps JOIN depts USING (deptno) GROUP BY depts.name; model.json: { version: '1.0', defaultSchema: 'SALES', schemas: [ { name: 'SALES', type: 'custom', factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory', operand: { directory: 'target/test-classes/sales' } } ] }
  • 17. More adapters Adapters Embedded Planned CSV Cascading (Lingual) HBase (Phoenix) JDBC Apache Drill Spark MongoDB Cassandra Splunk Mondrian linq4j
  • 18. SQL on NoSQL • No schema or schema-on-read • Optiq MAP columns, dynamic schema • Nested data (Drill, MongoDB) • Optiq ARRAY columns • Source system optimized for transactions & focused reads, not analytic queries • Optiq cache and materializations
  • 19. • Mongo’s standard “zips” data set with all US zip codes • Raw ZIPS table with _MAP column • ZIPS view extracts named fields, including nested latitude and longitude, and converts them to SQL types Example #2: MongoDB
  • 20. Splunk • NoSQL database • Every log file in the enterprise • A single “table” • A record for every line in every log file • A column for every field that exists in any log file • No schema
  • 21. Example #3: Splunk + MySQL SELECT p.product_name, COUNT(*) AS c FROM splunk.splunk AS s JOIN mysql.products AS p ON s.product_id = p.product_id WHERE s.action = 'purchase' GROUP BY p.product_name ORDER BY c DESC
  • 25. Interactive analytics on NoSQL? • NoSQL operational DB (e.g. HBase, MongoDB, Cassandra) • Analytic queries aggregate over full scan • Speed-of-thought response (< 5 seconds) • Data freshness (< 10 minutes)
  • 26. Simple analytics problem • 100M U.S. census records • 1KB each record, 100GB total • 4 SATA3 disks, total 1.2GB/s • How to count all records in under 5s?
  • 27. Simple analytics problem • 100M U.S. census records • 1KB each record, 100GB total • 4 SATA3 disks, total 1.2GB/s • How to count all records in under 5s?
  • 28. Simple analytics problem • 100M U.S. census records • 1KB each record, 100GB total • 4 SATA3 disks, total 1.2GB/s • How to count all records in under 5s? • Not possible?! It takes 80s just to read the data.
  • 30. Solution: Cheat! • Compress data • Column-oriented storage • Store data in sorted order • Put data in memory • Cache previous query results • Pre-compute (materialize) aggregates
  • 31.
  • 32. Mondrian on Optiq on NoSQL (under construction)
  • 33. Hybrid analytic architecture 1. NoSQL source system 2. Access via Optiq SQL 3. Optiq rewrites queries to use materialized data (e.g. aggregate tables) 4. Cached results are treated as “in-memory tables” 5. Materializations offline dynamically as underlying data changes, and go online as they are refreshed
  • 35. 3 things (reprise) 1. SQL allows you to reorganize your data and optimize your queries—while still using your NoSQL database for what it does best. 2. Optiq is not a database. It lets you create very powerful federated data architectures. 3. Access your data using Optiq adapters. Write schemas in JSON or use schema SPI. Connect via JDBC.
  • 37. Thanks! @julianhyde optiq https://github.com/julianhyde/optiq optiq-csv https://github.com/julianhyde/optiq-csv drill http://incubator.apache.org/drill/ lingual http://www.cascading.org/lingual/ mondrian http://mondrian.pentaho.com blog http://julianhyde.blogspot.com/ book http://manning.com/back/ 45% off code mlnosql13