SlideShare uma empresa Scribd logo
1 de 23
Query Optimization
Succinctly: Making the execution of queries optimally fast
Brandon Latronica - 2017
The pathway of a database command:
The pathway of a database command:
Query?
Just a request information
from a database.
We can use a language to
do it...name the most
famous for a DBMS...SQL!
The pathway of a database command:
Parsing and translation:
translate the query into its
internal form. This is then
translated into relational
algebra. Parser checks
syntax, verifies relations
The pathway of a database command:
Relational Algebra:
The conversion of query
syntax (SQL, etc) into some
type of internal, DBMS
relational algebra. Why?
CAS systems are easier for
computers, while words are
better for humans!
The pathway of a database command:
Optimization:
Last stop. The best plan is
determined and then is
pushed for execution via
database calls which results
in a query output.
Basic Overview of Query Optimization
● Cost difference between evaluation plans for a query can be huge!
● Sometimes seconds vs. days!
● Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules (Car travel paths)
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
● Estimation of plan cost based on:
● Statistical information about relations. Eg: number of tuples, number of distinct values for an
attribute,etc.
● Statistics estimation for intermediate results to compute cost of complex expressions
● Cost formulae for algorithms, computed using statistics
Seconds vs days?
Query Optimization - Yannis E. Ioannidis
Seconds vs days?
Algebra on real numbers? We know that.
And so on...
(x2
+ 2x - 8) / (x - 2) = ?
= (x - 2)(x + 4) / (x - 2)
= 1 * (x + 4)
= (x + 4)
(8,6) : (2,1)
Many terms and operations, to few.
Boolean Algebra? We know that.
AB V ( BC(B V C) ) = ?
= AB V BBC V BCC [Distrib]
= AB V BC V BC [Idem]
= AB V BC [- Distrib]
= B(A V C)
(6,5) : (3,2)
Many terms and operations, to
few.
σ : Select operator that selects specific filters requirements for information --
Ex: σname = "Dan" (customer) will select all in customer whose name that match "Dan".
Π : Project operator that displays all information from a specified area areas --
Ex: Πname, balance(customer) will show all names and balances from customer
In a relation table, a PROJECT eliminates columns while SELECT eliminates rows!
Natural join (⋈ ) is a binary operator that is written as (R S) where R and S are⋈
relations. The result of the natural join is the set of all combinations of tuples(that is,
ordered lists) in R and S that are equal on their common attribute names.
Theta join ( ⋈ θ ) is an operation that consists of all combinations of tuples in R and S
that satisfy θ. The result of the θ-join is defined only if the headers of S and R are
disjoint, that is, do not contain a common attribute.
Relational Algebra in a DB; some operators.
Algebra Transformations
Two relational algebra expressions are said to be equivalent if the two expressions generate
the same set of tuples on every legal database instance
● Note: order of tuples is irrelevant
● we don’t care if they generate different results on databases that violate integrity
constraints
An equivalence rule says that expressions of two forms are equivalent
● Can replace expression of first form by second, or vice versa
Equivalence rules - Relational Algebra
1. Conjunctive selection operations can be deconstructed into a sequence of
individual selections.
2. Selection operations are commutative.
3. Only the last in a sequence of projection operations is needed, the others can be
omitted.
4. Selections can be combined with Cartesian products and theta joins.
a. σθ(E1 X E2) = E1 θ E2
b. σθ1(E1 θ2 E2) = E1 θ1 θ2∧ E2
Equivalence rules - more.
5. Theta-join operations (and natural joins) are commutative.
E1 θ E2 = E2 θ E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)
(b) Theta joins are associative in the following manner:
(E1 θ1 E2) θ2 θ∧ 3 E3 = E1 θ1 θ∧ 3 (E2 θ2 E3)
where θ2 involves attributes from only E2 and E3.
And many more...
Pictorial Reduction Example
Πname, title(σdept_name= “Music”∧year = 2009 (instructor (teaches Πcourse_id, title (course))))
Good ways to order the joins?
● For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
(Join Associativity)
● If r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3
so that we compute and store a smaller temporary relation.
Good rule to always avoid Cartesian products when searching for an optimal plan.
Counting alternative plans: How many?
● Query optimizers use equivalence rules to systematically generate expressions equivalent to
the given expression
● Can generate all equivalent expressions as follows:
● Repeat until no new equivalent expressions are generated:
*Apply all applicable equivalence rules on every subexpression of every equivalent
expression which is found.
*Add newly generated expressions to the set of equivalent expressions
● The above approach is expensive in terms of memory and compute time
● Two approaches:
-Optimized plan generation based on transformation rules.
-Special case approach for queries with only selections, projections and joins.
● Consider finding the best join-order for r1 r2 . . . rn.
● There are (bushy tree) (2(n – 1))!/(n – 1)! different join orders for above expression. With
n = 7, the number is 665280, with n = 10, the number is greater than 176 billion!
● No need to generate all the join orders. Using dynamic programming, the least-cost join
order for any subset of {r1, r2, . . . rn} is computed only once and stored for future use.
Dynamic programming? A method for solving a complex problem by breaking it down into
a collection of simpler subproblems, solving each of those subproblems just once, and
storing their solutions – ideally, using a memory-based data structure. The next time the
same subproblem occurs, instead of recomputing its solution, one simply looks up the
previously computed solution.
Practical query optimizers incorporate elements of the following two broad approaches:
1. Search all the plans and choose the best plan in a cost-based fashion. Use dynamic
programing to store and recall past found optimal plans and subplans!
2. Uses heuristics to choose a plan.
Cost and choice.
Heuristics: Being Pragmatic
● Cost-based optimization is expensive, even with dynamic programming.
● Systems may use heuristics to reduce the number of choices that must be made in a cost-
based fashion.
● Heuristic optimization transforms the query-tree by using a set of rules that typically (but not
in all cases) improve execution performance:
● Perform selection early (reduces the number of tuples)
● Perform projection early (reduces the number of attributes)
● Perform most restrictive selection and join operations (i.e. with smallest result size)
before other similar operations.
● Some systems use only heuristics, others combine heuristics with partial cost-based
optimization.
Statistical estimation of data size?
What if, on a particular database relation, we stored information regarding the content of
the table?
Database indices?
A database index is a data structure that improves the speed of
data retrieval operations on a database table at the cost of
additional writes and storage space to maintain the index data
structure. Indexes are used to quickly locate data without
having to search every row in a database table every time a
database table is accessed.
A B+ tree is an n-ary tree with a variable but often large
number of children per node. A B+ tree consists of a root,
internal nodes and leaves.[1] The root may be either a leaf or a
node with two or more children.
Hash table is a data structure used to implement an
associative array, a structure that can map keys to values. A
hash table uses a hash function to compute an index into an
array of buckets or slots, from which the desired value can be
found.
O log(n)
O (1)
Other Indices and reductions
Offset B+- Tree is a type of index used to park new and update data on table set outside the body of the
main table. Offset table << Main Table! Used with columnar TBAT files
Why not columnar DBs? TBAT are.

Mais conteúdo relacionado

Mais procurados

Functional dependencies and normalization
Functional dependencies and normalizationFunctional dependencies and normalization
Functional dependencies and normalization
daxesh chauhan
 
SQL select statement and functions
SQL select statement and functionsSQL select statement and functions
SQL select statement and functions
Vikas Gupta
 

Mais procurados (20)

Normalization
NormalizationNormalization
Normalization
 
Functional dependencies and normalization
Functional dependencies and normalizationFunctional dependencies and normalization
Functional dependencies and normalization
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Introduction to triggers
Introduction to triggersIntroduction to triggers
Introduction to triggers
 
Denormalization
DenormalizationDenormalization
Denormalization
 
Object oriented database concepts
Object oriented database conceptsObject oriented database concepts
Object oriented database concepts
 
SQL select statement and functions
SQL select statement and functionsSQL select statement and functions
SQL select statement and functions
 
SQL JOIN
SQL JOINSQL JOIN
SQL JOIN
 
basic structure of SQL FINAL.pptx
basic structure of SQL FINAL.pptxbasic structure of SQL FINAL.pptx
basic structure of SQL FINAL.pptx
 
SQL Functions and Operators
SQL Functions and OperatorsSQL Functions and Operators
SQL Functions and Operators
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
Oraclesql
OraclesqlOraclesql
Oraclesql
 
Query processing and Query Optimization
Query processing and Query OptimizationQuery processing and Query Optimization
Query processing and Query Optimization
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Chapter15
Chapter15Chapter15
Chapter15
 
SQL
SQLSQL
SQL
 
Relational algebra in DBMS
Relational algebra in DBMSRelational algebra in DBMS
Relational algebra in DBMS
 
Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 

Destaque

company profile for SH
company profile for SHcompany profile for SH
company profile for SH
Madiha Asif
 
VoiceThread Basics
VoiceThread BasicsVoiceThread Basics
VoiceThread Basics
tevansb
 
Mk dasar2 komputer
Mk dasar2 komputerMk dasar2 komputer
Mk dasar2 komputer
ricazuma
 

Destaque (16)

Timestamped Binary Association Table - IEEE Big Data Congress 2015
Timestamped Binary Association Table - IEEE Big Data Congress 2015Timestamped Binary Association Table - IEEE Big Data Congress 2015
Timestamped Binary Association Table - IEEE Big Data Congress 2015
 
Meeting the challenges of globalization3
Meeting the challenges of globalization3Meeting the challenges of globalization3
Meeting the challenges of globalization3
 
company profile for SH
company profile for SHcompany profile for SH
company profile for SH
 
Kiran DBA
Kiran DBAKiran DBA
Kiran DBA
 
Fintech Infographic- Spain November 2015
Fintech Infographic- Spain November 2015Fintech Infographic- Spain November 2015
Fintech Infographic- Spain November 2015
 
Open Learning Pitch
Open Learning PitchOpen Learning Pitch
Open Learning Pitch
 
Alba 111 modelo rendición cuentas
Alba 111 modelo rendición cuentasAlba 111 modelo rendición cuentas
Alba 111 modelo rendición cuentas
 
behavioural_term_paper
behavioural_term_paperbehavioural_term_paper
behavioural_term_paper
 
Señales preventivas
Señales preventivasSeñales preventivas
Señales preventivas
 
VoiceThread Basics
VoiceThread BasicsVoiceThread Basics
VoiceThread Basics
 
Mk dasar2 komputer
Mk dasar2 komputerMk dasar2 komputer
Mk dasar2 komputer
 
05 e
05 e05 e
05 e
 
Etica apresentação
Etica  apresentaçãoEtica  apresentação
Etica apresentação
 
Resume
ResumeResume
Resume
 
Nueva arquitectura al servicio del hombre
Nueva arquitectura al servicio del hombreNueva arquitectura al servicio del hombre
Nueva arquitectura al servicio del hombre
 
Evolución de las telecomunicaciones
Evolución de las telecomunicacionesEvolución de las telecomunicaciones
Evolución de las telecomunicaciones
 

Semelhante a Query Optimization - Brandon Latronica

CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
 
1b_query_optimization_sil_7ed_ch16.ppt
1b_query_optimization_sil_7ed_ch16.ppt1b_query_optimization_sil_7ed_ch16.ppt
1b_query_optimization_sil_7ed_ch16.ppt
ArunachalamSelva
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
James Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
Fraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
Tony Nguyen
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
Avelin Huo
 

Semelhante a Query Optimization - Brandon Latronica (20)

Transaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptxTransaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptx
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Query processing System
Query processing SystemQuery processing System
Query processing System
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Query trees
Query treesQuery trees
Query trees
 
Java 8
Java 8Java 8
Java 8
 
1b_query_optimization_sil_7ed_ch16.ppt
1b_query_optimization_sil_7ed_ch16.ppt1b_query_optimization_sil_7ed_ch16.ppt
1b_query_optimization_sil_7ed_ch16.ppt
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
User biglm
User biglmUser biglm
User biglm
 
ch14.ppt
ch14.pptch14.ppt
ch14.ppt
 
Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03Designing A Syntax Based Retrieval System03
Designing A Syntax Based Retrieval System03
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Query Optimization - Brandon Latronica

  • 1. Query Optimization Succinctly: Making the execution of queries optimally fast Brandon Latronica - 2017
  • 2. The pathway of a database command:
  • 3. The pathway of a database command: Query? Just a request information from a database. We can use a language to do it...name the most famous for a DBMS...SQL!
  • 4. The pathway of a database command: Parsing and translation: translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations
  • 5. The pathway of a database command: Relational Algebra: The conversion of query syntax (SQL, etc) into some type of internal, DBMS relational algebra. Why? CAS systems are easier for computers, while words are better for humans!
  • 6. The pathway of a database command: Optimization: Last stop. The best plan is determined and then is pushed for execution via database calls which results in a query output.
  • 7. Basic Overview of Query Optimization ● Cost difference between evaluation plans for a query can be huge! ● Sometimes seconds vs. days! ● Steps in cost-based query optimization 1. Generate logically equivalent expressions using equivalence rules (Car travel paths) 2. Annotate resultant expressions to get alternative query plans 3. Choose the cheapest plan based on estimated cost ● Estimation of plan cost based on: ● Statistical information about relations. Eg: number of tuples, number of distinct values for an attribute,etc. ● Statistics estimation for intermediate results to compute cost of complex expressions ● Cost formulae for algorithms, computed using statistics
  • 8. Seconds vs days? Query Optimization - Yannis E. Ioannidis
  • 10. Algebra on real numbers? We know that. And so on... (x2 + 2x - 8) / (x - 2) = ? = (x - 2)(x + 4) / (x - 2) = 1 * (x + 4) = (x + 4) (8,6) : (2,1) Many terms and operations, to few.
  • 11. Boolean Algebra? We know that. AB V ( BC(B V C) ) = ? = AB V BBC V BCC [Distrib] = AB V BC V BC [Idem] = AB V BC [- Distrib] = B(A V C) (6,5) : (3,2) Many terms and operations, to few.
  • 12. σ : Select operator that selects specific filters requirements for information -- Ex: σname = "Dan" (customer) will select all in customer whose name that match "Dan". Π : Project operator that displays all information from a specified area areas -- Ex: Πname, balance(customer) will show all names and balances from customer In a relation table, a PROJECT eliminates columns while SELECT eliminates rows! Natural join (⋈ ) is a binary operator that is written as (R S) where R and S are⋈ relations. The result of the natural join is the set of all combinations of tuples(that is, ordered lists) in R and S that are equal on their common attribute names. Theta join ( ⋈ θ ) is an operation that consists of all combinations of tuples in R and S that satisfy θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute. Relational Algebra in a DB; some operators.
  • 13. Algebra Transformations Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of tuples on every legal database instance ● Note: order of tuples is irrelevant ● we don’t care if they generate different results on databases that violate integrity constraints An equivalence rule says that expressions of two forms are equivalent ● Can replace expression of first form by second, or vice versa
  • 14. Equivalence rules - Relational Algebra 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2. Selection operations are commutative. 3. Only the last in a sequence of projection operations is needed, the others can be omitted. 4. Selections can be combined with Cartesian products and theta joins. a. σθ(E1 X E2) = E1 θ E2 b. σθ1(E1 θ2 E2) = E1 θ1 θ2∧ E2
  • 15. Equivalence rules - more. 5. Theta-join operations (and natural joins) are commutative. E1 θ E2 = E2 θ E1 6. (a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3) (b) Theta joins are associative in the following manner: (E1 θ1 E2) θ2 θ∧ 3 E3 = E1 θ1 θ∧ 3 (E2 θ2 E3) where θ2 involves attributes from only E2 and E3. And many more...
  • 16. Pictorial Reduction Example Πname, title(σdept_name= “Music”∧year = 2009 (instructor (teaches Πcourse_id, title (course))))
  • 17. Good ways to order the joins? ● For all relations r1, r2, and r3, (r1 r2) r3 = r1 (r2 r3 ) (Join Associativity) ● If r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3 so that we compute and store a smaller temporary relation. Good rule to always avoid Cartesian products when searching for an optimal plan.
  • 18. Counting alternative plans: How many? ● Query optimizers use equivalence rules to systematically generate expressions equivalent to the given expression ● Can generate all equivalent expressions as follows: ● Repeat until no new equivalent expressions are generated: *Apply all applicable equivalence rules on every subexpression of every equivalent expression which is found. *Add newly generated expressions to the set of equivalent expressions ● The above approach is expensive in terms of memory and compute time ● Two approaches: -Optimized plan generation based on transformation rules. -Special case approach for queries with only selections, projections and joins.
  • 19. ● Consider finding the best join-order for r1 r2 . . . rn. ● There are (bushy tree) (2(n – 1))!/(n – 1)! different join orders for above expression. With n = 7, the number is 665280, with n = 10, the number is greater than 176 billion! ● No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r1, r2, . . . rn} is computed only once and stored for future use. Dynamic programming? A method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions – ideally, using a memory-based data structure. The next time the same subproblem occurs, instead of recomputing its solution, one simply looks up the previously computed solution. Practical query optimizers incorporate elements of the following two broad approaches: 1. Search all the plans and choose the best plan in a cost-based fashion. Use dynamic programing to store and recall past found optimal plans and subplans! 2. Uses heuristics to choose a plan. Cost and choice.
  • 20. Heuristics: Being Pragmatic ● Cost-based optimization is expensive, even with dynamic programming. ● Systems may use heuristics to reduce the number of choices that must be made in a cost- based fashion. ● Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance: ● Perform selection early (reduces the number of tuples) ● Perform projection early (reduces the number of attributes) ● Perform most restrictive selection and join operations (i.e. with smallest result size) before other similar operations. ● Some systems use only heuristics, others combine heuristics with partial cost-based optimization.
  • 21. Statistical estimation of data size? What if, on a particular database relation, we stored information regarding the content of the table?
  • 22. Database indices? A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. A B+ tree is an n-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children. Hash table is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. O log(n) O (1)
  • 23. Other Indices and reductions Offset B+- Tree is a type of index used to park new and update data on table set outside the body of the main table. Offset table << Main Table! Used with columnar TBAT files Why not columnar DBs? TBAT are.