SlideShare uma empresa Scribd logo
1 de 61
The Volcano/Cascades Optimizer
Eric Fu
2018-11-14
Outline
● Background
● Dynamic Programming
● Components
● Search Engine
● Summary
2
Life of SQL
SQL Parser Optimizer Executor
Syntax
Tree
Logical
Plan
Physical
Plan data
● Parser
● Optimizer
● Executor
statistics
3
Query Optimization Strategies
● Choice #1: Heuristics
○ INGRES, Oracle (until mid 1990s)
● Choice #2: Heuristics + Cost-based Join Search
○ System R, early IBM DB2, most open-source DBMSs
● Choice #3: Randomized Search
○ Academics in the 1980s, current Postgres
● Choice #4: Stratified Search
○ IBM’s STARBURST (late 1980s), now IBM DB2 + Oracle
● Choice #5: Unified Search
○ Volcano/Cascades in 1990s, now MSSQL + Greenplum
4
Problem
● Why query optimizing is such a hard problem?
● It’s not difficult to find a feasible solution
● It’s almost impossible to find a optimal solution
5
Why So Many Choices?
● Equivalence Rules
● Various Implements
Join
Join D
Join C
A B
Join
JoinA
JoinB
DC
Join
Join
A
Join
B DC
ABCD, ABDC, ACBD, ACDB, ADBC, ADCB,
BACD, BADC, BCAD, BCDA, BDAC, BDCA,
CABD, CADB, CBAD, CBDA, CDAB, CDBA,
DABC, DACB, DBAC, DBCA, DCAB, DCBA
6
Why So Many Choices?
● Equivalence Rules
● Various Implements
HashJoin
NestedLoopJoin
SortMergeJoin
IndexScan
TableScan
Join
JoinA
JoinB
DC
In Total: 24 * 3 * 2^4 * 3^3 = 31104 !!!
7
Which one is better?
● Given a physical plan, we can estimate its total cost
● Cost of an operator is related to input rows
● Selectivity Factors
SELECT *
FROM Reviews
WHERE 7/1< date < 7/31 AND
rating > 9
8
Summary of Background
Good News
● We known how to construct the search space
Bad News
● It’s almost impossible to exhaust the search space
● We need an elegant & smart way to do the search
9
Dynamic Programing
in Algorithm
10
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
11
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1
12
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2
13
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3
14
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 5 8 13 21 34 55 89
15
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 ?
It’s fine to go reversely...
16
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 ? ?
17
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 ? ? ?
18
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 ? ? ? ?
19
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 ? ? ? ? ?
20
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 ? ? ? ?
21
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 5 ? ? ?
22
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 5 8 13 21 34 55 89
23
Define Dynamic Programing (DP)
● DP is solving a problem by solving a sub-problem
● DP is only appliable for Optimal Substructure
○ Optimal solution of current solution can be calculated from optimal solution of sub-problems
● DP can be done in both directions
○ Filling a table
○ DFS with memo
24
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
2
3 4
6 5 7
4 1 8 3
25
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3 4 1 8 3
26
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
7 6
4 1 8 3
10
27
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
9
7 6
4 1 8 3
10
10
11
28
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
?
4 1 8 3
29
Dynamic Programing
30
Apply DP in Optimization?
Sort
Join
A B
Sort
HashJoin
Scan A Scan B
SortMergeJoin
Scan B
SELECT * FROM A, B WHERE A.bid = B.bid ORDER BY A.bid
Scan A
Sort
Optimal Plan!
Order by aid
Order by bid
Order by bid
31
Apply DP in Optimization?
Sort
Join
A B
Sort
HashJoin
Scan A Scan B
SortMergeJoin
Scan B
Scan A
Sort
Optimal Plan of [AB]
You cannot just apply DP straightforwardly
32
RelSet[ABCD]
System-R Optimizer
● Dynamic Programing
● Interesing Orders
The main contribution: Optimal Substructure is defined so DP is feasible.
ABCD, ABDC, ACBD, ACDB,
ADBC, ADCB, BACD, BADC,
BCAD, BCDA, BDAC, BDCA,
CABD, CADB, CBAD, CBDA,
CDAB, CDBA, DABC, DACB,
DBAC, DBCA, DCAB, DCBA
Access Path Selection in a Relational Database Management System (SIGMOD 1979)
33
RelSet[ABCD]
System-R Optimizer
● Dynamic Programing
● Interesing Orders
The main contribution: Optimal Substructure is defined so DP is feasible.
SortBy[A]ASC SortBy[A]DESC SortBy[B]ASC
······ ··· ···
34
Optimal Substructures
● Based on assumption that cost function is polynomial
● Stores Best Plan for each pair of (Relation Set, Physical Properties)
● Instead of O(n!) plans, only O(n·2n-1) plans need to be enumerated.
RelSet[ABCD]
Order1 Order2 Order3
RelSet[ABC]
Order1 Order2 Order3
RelSet[BCD]
Order1 Order2 Order3
Goal
35
Volcano/Cascades Optimizer (1993)
● Implemented as a code generator (operators, rules, etc) and dynamic-link
library (the search engine)
● Top-down Search (Directed Search)
○ Start with the final outcome that you want
○ Search path could be guided by heuristics
● Relatively, System-R’s approach is in bottom-up style
36
Graefe Goetz
● Volcano - An Extensible and Parallel Query
Evaluation System (1990)
● The Volcano Optimizer Generator: Extensibility and
Efficient Search (1991)
● The Cascades Framework for Query Optimization
(1995)
37
Components
Operators
● logical operators
● algorithms
● enforcers
Rules
● transformation rules
● implementation rules
Properties
● logical properties
● physical properties
Interfaces of Operators
● property function
● applicability function (physical-only)
● cost function (physical-only)
38
Operators
● logical operators
○ e.g. Join, Scan
● algorithms
○ e.g. HashJoin, SortMergeJoin, FileScan, IndexScan
● enforcers
○ e.g. Sort, Shuffle
39
Rules
● transformation rules
○ Tha algebraic rules of expression equivalence
○ e.g. associativity rule, commutative rule
● implementation rules
○ Rules mapping logical operator to algorithms
○ Possible to map multiple logical operators to a single physical operator
● Specify how to match rules to plan tree
○ Sime pattern matching
○ Other condition code is also allowed
40
Properties
● logical properties
○ Can be derived from the logical algebra expression
○ Attached to logical equivalent set: [LogExpr]
○ e.g. schema, expected size
● physical properties
○ Depend on algorithms
○ Attached to physical equivalent set: [LogExpr, PhyProp]
○ e.g. sort order, partitioning
physical properties vector
41
Interfaces of Operators
● applicability function
○ Physical property vectors that it can deliver with
○ Physical property vectors that its input must satisfy
● cost function
○ Estimate its cost
○ Cost is an abstract data type in Volcano. e.g. (CPU cost, IO cost)
● property function
○ Determine logical properties e.g. schema, row count
■ selectivity estimate
○ Determine physical properties e.g. sort order
only applicable for
algorithms & enforcers
42
Components
Operators
● logical operators
● algorithms
● enforcers
Rules
● transformation rules
● implementation rules
Properties
● logical properties
● physical properties
Interfaces of Operators
● property function
● applicability function (physical-only)
● cost function (physical-only)
43
Search Engine
Define goal as [LogExpr, PhysProp]
Logically we may divide the searching procedure into 2 stages:
1. Explore: Apply transformation rules to explore expression space
2. Build: Apply implementation rules to build physical plans and find best one
44
Explore
● Apply transformation rules to explore expression space
● e.g. [ABC] = { (A⨝B)⨝C, (B⨝A)⨝C, (A⨝C)⨝B …}
Join
Join C
A B
Join
Join C
B A
Join
JoinA
CB
Join
JoinC
AB
····
Generated Logical PlansGoal.LogExpr
45
Build
● Apply implementation rules to build physical plans
● For every [LogExpr, PhyProp] record the physical plan to Memo table
● e.g. [AB]⨝C ➡ SortMergeJoin v.s. HashJoin
LogExpr PhyProp BestPlan
[ABC]
-
x⬆
x⬇
[AB] -
… …
Memo Table
HashJoin
[AB] Scan(C)
SMJ
Scan(C)
[AB]
Sort
SMJ
Scan(C)[AB] x⬆
Total Cost = ? Total Cost = ? Total Cost = ?
46
Some Facts
● Volcano do Explore then Build
● While Cascades have only one stage
Actually exploring almost happens before building even in Cascades. Why?
47
Example
Logical Expression Space:
[ABC]
[AB], [AC], [BC]
[A], [B], [C]
Our Mission:
FindBestPlan((A⨝B)⨝C, A.x, 500)
Logical Expression Order Limit
48
49
50
51
52
53
54
55
56
FindBestPlan(LogExpr, PhysProp)
If Memo[LogExpr, PhysProp] is not empty:
● return BestPlan or Failures
Possible moves =
● applicable transformations
● algorithms that give the required PhysProp
● enforcers for required PhysProp
ForEach (Move = pop the most promising moves)
● is transformation: Cost = FindBestPlan(LogExpr, PhysProp)
● is algorithm: Cost = Costself + Sum(Costinput)
● is enforcer: Cost = Costself + Costinput
Memo[LogExpr, PhysProp] = Best Plan
return Best Plan
57
Some Details
● Use cost limit to do branch-and-bound pruning
○ By default set to unlimited
● Mark (LogExpr, PhysProp) as in-progress to prevent dead loop
○ e.g. A JOIN B <=> B JOIN A
● Use prioirity queue to do heuristic ordering of moves
○ Calcite prioritizes RelSet with less depth and higher cost
58
Summary
Volcano/Cascades Optimizer …
● use Rules to build all logical or physical plans
● use Cost to evaluate a physical plan
● use Dynamic Programming to search for the optimal physical plan
59
Compared with RBO
Here are my personal opinions …
● Cost-based: Could find better physical plans
● Rule-independent: Provide an elegant interface for DB implementors
● Still Heuristic: May performs bad in some corner cases
60
Thanks!
Q&A

Mais conteúdo relacionado

Mais procurados

Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
Yingjun Wu
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 

Mais procurados (20)

Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
 
MariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and OptimizationMariaDB Performance Tuning and Optimization
MariaDB Performance Tuning and Optimization
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 

Semelhante a The Volcano/Cascades Optimizer

Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
Janani C
 

Semelhante a The Volcano/Cascades Optimizer (20)

How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Embedded C
Embedded CEmbedded C
Embedded C
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Google
GoogleGoogle
Google
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedback
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdf
 
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Active record, standalone migrations, and working with Arel
Active record, standalone migrations, and working with ArelActive record, standalone migrations, and working with Arel
Active record, standalone migrations, and working with Arel
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
 
Rails data migrations
Rails data migrationsRails data migrations
Rails data migrations
 
Lecture01 algorithm analysis
Lecture01 algorithm analysisLecture01 algorithm analysis
Lecture01 algorithm analysis
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Monitoring with ElasticSearch
Monitoring with ElasticSearch Monitoring with ElasticSearch
Monitoring with ElasticSearch
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
SOLID refactoring - racing car katas
SOLID refactoring - racing car katasSOLID refactoring - racing car katas
SOLID refactoring - racing car katas
 
Alexandr Vronskiy "Evolution of Ecommerce Application"
Alexandr Vronskiy "Evolution of Ecommerce Application"Alexandr Vronskiy "Evolution of Ecommerce Application"
Alexandr Vronskiy "Evolution of Ecommerce Application"
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2
 
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
 

Mais de 宇 傅

Mais de 宇 傅 (12)

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems
 
PelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloadsPelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloads
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Golang 101
Golang 101Golang 101
Golang 101
 
Docker Container: isolation and security
Docker Container: isolation and securityDocker Container: isolation and security
Docker Container: isolation and security
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm
 

Último

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Último (20)

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 

The Volcano/Cascades Optimizer

  • 2. Outline ● Background ● Dynamic Programming ● Components ● Search Engine ● Summary 2
  • 3. Life of SQL SQL Parser Optimizer Executor Syntax Tree Logical Plan Physical Plan data ● Parser ● Optimizer ● Executor statistics 3
  • 4. Query Optimization Strategies ● Choice #1: Heuristics ○ INGRES, Oracle (until mid 1990s) ● Choice #2: Heuristics + Cost-based Join Search ○ System R, early IBM DB2, most open-source DBMSs ● Choice #3: Randomized Search ○ Academics in the 1980s, current Postgres ● Choice #4: Stratified Search ○ IBM’s STARBURST (late 1980s), now IBM DB2 + Oracle ● Choice #5: Unified Search ○ Volcano/Cascades in 1990s, now MSSQL + Greenplum 4
  • 5. Problem ● Why query optimizing is such a hard problem? ● It’s not difficult to find a feasible solution ● It’s almost impossible to find a optimal solution 5
  • 6. Why So Many Choices? ● Equivalence Rules ● Various Implements Join Join D Join C A B Join JoinA JoinB DC Join Join A Join B DC ABCD, ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC, BCAD, BCDA, BDAC, BDCA, CABD, CADB, CBAD, CBDA, CDAB, CDBA, DABC, DACB, DBAC, DBCA, DCAB, DCBA 6
  • 7. Why So Many Choices? ● Equivalence Rules ● Various Implements HashJoin NestedLoopJoin SortMergeJoin IndexScan TableScan Join JoinA JoinB DC In Total: 24 * 3 * 2^4 * 3^3 = 31104 !!! 7
  • 8. Which one is better? ● Given a physical plan, we can estimate its total cost ● Cost of an operator is related to input rows ● Selectivity Factors SELECT * FROM Reviews WHERE 7/1< date < 7/31 AND rating > 9 8
  • 9. Summary of Background Good News ● We known how to construct the search space Bad News ● It’s almost impossible to exhaust the search space ● We need an elegant & smart way to do the search 9
  • 11. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 11
  • 12. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 12
  • 13. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 13
  • 14. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 14
  • 15. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 5 8 13 21 34 55 89 15
  • 16. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 ? It’s fine to go reversely... 16
  • 17. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 ? ? 17
  • 18. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 ? ? ? 18
  • 19. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 ? ? ? ? 19
  • 20. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 ? ? ? ? ? 20
  • 21. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 ? ? ? ? 21
  • 22. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 5 ? ? ? 22
  • 23. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 5 8 13 21 34 55 89 23
  • 24. Define Dynamic Programing (DP) ● DP is solving a problem by solving a sub-problem ● DP is only appliable for Optimal Substructure ○ Optimal solution of current solution can be calculated from optimal solution of sub-problems ● DP can be done in both directions ○ Filling a table ○ DFS with memo 24
  • 25. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 2 3 4 6 5 7 4 1 8 3 25
  • 26. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 4 1 8 3 26
  • 27. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 7 6 4 1 8 3 10 27
  • 28. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 9 7 6 4 1 8 3 10 10 11 28
  • 29. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 ? 4 1 8 3 29
  • 31. Apply DP in Optimization? Sort Join A B Sort HashJoin Scan A Scan B SortMergeJoin Scan B SELECT * FROM A, B WHERE A.bid = B.bid ORDER BY A.bid Scan A Sort Optimal Plan! Order by aid Order by bid Order by bid 31
  • 32. Apply DP in Optimization? Sort Join A B Sort HashJoin Scan A Scan B SortMergeJoin Scan B Scan A Sort Optimal Plan of [AB] You cannot just apply DP straightforwardly 32
  • 33. RelSet[ABCD] System-R Optimizer ● Dynamic Programing ● Interesing Orders The main contribution: Optimal Substructure is defined so DP is feasible. ABCD, ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC, BCAD, BCDA, BDAC, BDCA, CABD, CADB, CBAD, CBDA, CDAB, CDBA, DABC, DACB, DBAC, DBCA, DCAB, DCBA Access Path Selection in a Relational Database Management System (SIGMOD 1979) 33
  • 34. RelSet[ABCD] System-R Optimizer ● Dynamic Programing ● Interesing Orders The main contribution: Optimal Substructure is defined so DP is feasible. SortBy[A]ASC SortBy[A]DESC SortBy[B]ASC ······ ··· ··· 34
  • 35. Optimal Substructures ● Based on assumption that cost function is polynomial ● Stores Best Plan for each pair of (Relation Set, Physical Properties) ● Instead of O(n!) plans, only O(n·2n-1) plans need to be enumerated. RelSet[ABCD] Order1 Order2 Order3 RelSet[ABC] Order1 Order2 Order3 RelSet[BCD] Order1 Order2 Order3 Goal 35
  • 36. Volcano/Cascades Optimizer (1993) ● Implemented as a code generator (operators, rules, etc) and dynamic-link library (the search engine) ● Top-down Search (Directed Search) ○ Start with the final outcome that you want ○ Search path could be guided by heuristics ● Relatively, System-R’s approach is in bottom-up style 36
  • 37. Graefe Goetz ● Volcano - An Extensible and Parallel Query Evaluation System (1990) ● The Volcano Optimizer Generator: Extensibility and Efficient Search (1991) ● The Cascades Framework for Query Optimization (1995) 37
  • 38. Components Operators ● logical operators ● algorithms ● enforcers Rules ● transformation rules ● implementation rules Properties ● logical properties ● physical properties Interfaces of Operators ● property function ● applicability function (physical-only) ● cost function (physical-only) 38
  • 39. Operators ● logical operators ○ e.g. Join, Scan ● algorithms ○ e.g. HashJoin, SortMergeJoin, FileScan, IndexScan ● enforcers ○ e.g. Sort, Shuffle 39
  • 40. Rules ● transformation rules ○ Tha algebraic rules of expression equivalence ○ e.g. associativity rule, commutative rule ● implementation rules ○ Rules mapping logical operator to algorithms ○ Possible to map multiple logical operators to a single physical operator ● Specify how to match rules to plan tree ○ Sime pattern matching ○ Other condition code is also allowed 40
  • 41. Properties ● logical properties ○ Can be derived from the logical algebra expression ○ Attached to logical equivalent set: [LogExpr] ○ e.g. schema, expected size ● physical properties ○ Depend on algorithms ○ Attached to physical equivalent set: [LogExpr, PhyProp] ○ e.g. sort order, partitioning physical properties vector 41
  • 42. Interfaces of Operators ● applicability function ○ Physical property vectors that it can deliver with ○ Physical property vectors that its input must satisfy ● cost function ○ Estimate its cost ○ Cost is an abstract data type in Volcano. e.g. (CPU cost, IO cost) ● property function ○ Determine logical properties e.g. schema, row count ■ selectivity estimate ○ Determine physical properties e.g. sort order only applicable for algorithms & enforcers 42
  • 43. Components Operators ● logical operators ● algorithms ● enforcers Rules ● transformation rules ● implementation rules Properties ● logical properties ● physical properties Interfaces of Operators ● property function ● applicability function (physical-only) ● cost function (physical-only) 43
  • 44. Search Engine Define goal as [LogExpr, PhysProp] Logically we may divide the searching procedure into 2 stages: 1. Explore: Apply transformation rules to explore expression space 2. Build: Apply implementation rules to build physical plans and find best one 44
  • 45. Explore ● Apply transformation rules to explore expression space ● e.g. [ABC] = { (A⨝B)⨝C, (B⨝A)⨝C, (A⨝C)⨝B …} Join Join C A B Join Join C B A Join JoinA CB Join JoinC AB ···· Generated Logical PlansGoal.LogExpr 45
  • 46. Build ● Apply implementation rules to build physical plans ● For every [LogExpr, PhyProp] record the physical plan to Memo table ● e.g. [AB]⨝C ➡ SortMergeJoin v.s. HashJoin LogExpr PhyProp BestPlan [ABC] - x⬆ x⬇ [AB] - … … Memo Table HashJoin [AB] Scan(C) SMJ Scan(C) [AB] Sort SMJ Scan(C)[AB] x⬆ Total Cost = ? Total Cost = ? Total Cost = ? 46
  • 47. Some Facts ● Volcano do Explore then Build ● While Cascades have only one stage Actually exploring almost happens before building even in Cascades. Why? 47
  • 48. Example Logical Expression Space: [ABC] [AB], [AC], [BC] [A], [B], [C] Our Mission: FindBestPlan((A⨝B)⨝C, A.x, 500) Logical Expression Order Limit 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. 54
  • 55. 55
  • 56. 56
  • 57. FindBestPlan(LogExpr, PhysProp) If Memo[LogExpr, PhysProp] is not empty: ● return BestPlan or Failures Possible moves = ● applicable transformations ● algorithms that give the required PhysProp ● enforcers for required PhysProp ForEach (Move = pop the most promising moves) ● is transformation: Cost = FindBestPlan(LogExpr, PhysProp) ● is algorithm: Cost = Costself + Sum(Costinput) ● is enforcer: Cost = Costself + Costinput Memo[LogExpr, PhysProp] = Best Plan return Best Plan 57
  • 58. Some Details ● Use cost limit to do branch-and-bound pruning ○ By default set to unlimited ● Mark (LogExpr, PhysProp) as in-progress to prevent dead loop ○ e.g. A JOIN B <=> B JOIN A ● Use prioirity queue to do heuristic ordering of moves ○ Calcite prioritizes RelSet with less depth and higher cost 58
  • 59. Summary Volcano/Cascades Optimizer … ● use Rules to build all logical or physical plans ● use Cost to evaluate a physical plan ● use Dynamic Programming to search for the optimal physical plan 59
  • 60. Compared with RBO Here are my personal opinions … ● Cost-based: Could find better physical plans ● Rule-independent: Provide an elegant interface for DB implementors ● Still Heuristic: May performs bad in some corner cases 60