SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Index management
in shallow depth

Andrea Giuliano
@bit_shark
Architecture of a DBMS

Andrea Giuliano @bit_shark
Disk manager
The disk manager provides the following commands for a page

•
•
•
•

allocate a page
deallocate a page
read a page
write a page

The size of a page is chosen to be the size of a disk block and pages are stored as disk
blocks so that reading or writing a page can be done in one disk I/O
Andrea Giuliano @bit_shark
Buffer manager
is the software layer responsible for bringing pages from disk to main memory
as needed and manages the available main memory by partitioning it into a
collection of pages

the buffer pool

Andrea Giuliano @bit_shark
Mysql indexes are stored!
in B-trees
B-tree structure

data entry: record stored in an index file
data record: record stored in a database file

Andrea Giuliano @bit_shark
Clustered index
data entries and data records have the same (or close) order

Andrea Giuliano @bit_shark
Unclustered index
data entries and data records have the different order criteria

Andrea Giuliano @bit_shark
Dense index
An index is dense if every value of the search key that appears in the data
file appears also in at least one data entry of the index

Andrea Giuliano @bit_shark
Sparse index
An index is sparse if every value of the search key that appears in the data
entry points to a page of the data record

Andrea Giuliano @bit_shark
Search in a b-tree
fan-out

n leaves
Cost: logF n
Andrea Giuliano @bit_shark
Insert and delete
Insert and delete
operations must keep the
tree balanced towards
split, redistribution and
coalesce techniques.

..too deep
Andrea Giuliano @bit_shark
How can I compute I/O accesses?

Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher

•
•
•
•

•

2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
dense non-clustering B+-tree index with search key cost

Query:
ask for code, author, publisher af all books with a given cost

how many page accesses do we
need to answer to the query?
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher

•
•
•
•

•

2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
dense non-clustering B+-tree index with search key cost

Let’s build the index structure
!
•
•
•

we know that each tuple has 4 field so in each page there are 40 fields
we can infer that 20 data entries fit in one leaf page of the index
so we have a fan-out of 20

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…
We know there is an occupancy factor of 67% we have 13 data
entries in the leaves (each of which can contain 20 data entries)

How many leaves are there in the tree?

2.000.000/13 = 153.846 leaves

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…

There are 153.846 leaves
In order to go to the leaves we need

log20 (153.846) = 4 I/O page accesses
Andrea Giuliano @bit_shark
Book case
fan-out: 20

…

Remember, we have on average 200 records with the same value of the
attribute cost therefore 200/13 = 15 pages (on average)

We need to visit these leaves because the index is dense and for each tuple
we have to access the 200 data record in order to obtain the other attributes

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…

The total cost is: 

4 + 15 + 200 = 219 I/O accesses
~ 3 sec
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher

•
•
•
•

•

2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
sparse clustering B+-tree index with search key cost

Query:
ask for code, author, publisher af all books with a given cost

how many page accesses do we
need to answer to the query?
Andrea Giuliano @bit_shark
Book case
BOOK
code
author
cost
publisher

•
•
•
•

•

2.000.000 records (tuples)
200.000 pages
10 data record in a page
200 records with the same value of the attribute cost (on
average)
sparse clustering B+-tree index with search key cost

Let’s build the index structure
!
•
•
•

we know that each tuple has 4 field so in each page there are 40 fields
we can infer that 20 data entries fit in one leaf page of the index
so we have a fan-out of 20

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…
We know there is an occupancy factor of 67% we have 13 data
entries in the leaves (each of which can contain 20 data entries)

BUT each data entry points to a data record page (and not to a tuple)

How many data record pages do we have?

2.000.000/10 = 200.000 data record pages

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…
We know there is an occupancy factor of 67% we have 13 data
entries in the leaves (each of which can contain 20 data entries)

BUT each data entry points to a data record (and not to a tuple)

How many leaves there are in the tree?

200.000/13 = 15.384 leaves

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…
There are 15.384 leaves
In order to go to the leaves we need

log20 (15.384) = 3 I/O pages accesses
Remember, we have on average 200 data records with the same value of
the attribute cost therefore 200/10 = 20 data record pages to visit

Andrea Giuliano @bit_shark
Book case
fan-out: 20

…

The total cost is: 

3 + 20 = 23 I/O accesses
~ 0.3 sec
Andrea Giuliano @bit_shark
Book case

And what if the attributes we want

were part of the search key?

Andrea Giuliano @bit_shark
Book case

without index

In the worst case we have to visit all the 2.000.000 tuples

~ 50 min

Andrea Giuliano @bit_shark
Ο λογος δηλοι οτι
Think before doing
?
Thanks!
Andrea Giuliano
@bit_shark
References:
https://www.flickr.com/photos/james_wheeler/9340597900/sizes/o/in/photolist-fep1ko-bQByHkduQ4Qr-82aKA9-82aL6y-8Tn6uc-iPzADZ-99etoQ-cZy6e9-jyqdnW-bxHjLf-8gP59X-cZDq3h-cZDq9dcZDq8N-cZDq7d-cZDqwy-cZDqym-cZDqCf-cZDqAo-cZDqsS-cZDqnQ-cZDqey-cZDqkA-cZDqkJcZDqLh-7Dg8pp-a7f1QC-a7c8rK-7Dg7n6-gCbBVr-9FZ4J1-e6XCpX-aZnsGv-ecTv5D-atFACMgjXozL-9LBjtC-knoEf8-8LGGqw-a8Hw3M-gvL3bp-a7gmG6-aju6p2brQ76S-7Ckbm1-85XaXe-8JBcwN-9oYU3p-a3VsvR-atFAup/
http://www.woking.gov.uk/images/instances/00004A290FD4.C0A801BA.000079A7.0015.jpg
http://assets.20bits.com/20080513/b-tree.png,
http://dblab.cs.toronto.edu/courses/443/2014/basic-index/dense-index.png
http://dblab.cs.toronto.edu/courses/443/2014/basic-index/sparse-index.png
http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg, http://
www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg
http://webhostinggeeks.com/blog/wp-content/uploads/2012/07/611157_small.jpg
https://farm7.staticflickr.com/6237/6230474283_50d1f0f4ac_b.jpg, https://www.flickr.com/photos/
javiercosio/6230474283/sizes/l/in/photolistauyNWp-9GRjnM-9GRjmZ-9GRjFz-9GRjvV-9GUcPq-9GRjz8-9GRjm4-9GUcTb-9GRjCt-9GUcQC-9GUcY
m-9GRjZD-9GRk1Z-9GUcMU-9GUcGh-9GRjsg-9GRjYZ-9GRjTF-9GRjGe-9GRjNe-9GRjBz-9GUcNs-9GU
d25-9GUcKS-9GRjPn-9GRjRg-9GUcBh-9GUcVf-9GUcxj-9GUcuu-brLe7G-e8s4Cwfyi4Rj-83LyYW-83HuFg-83LyLm-83LzUY-83Htrv-83Hv2H-83LBBb-83LAg9-83LBhQ-83Hw8t-83HtKD-83H
sYk-afT6uk-cwVhL1-ceVgGC-8tFezr-8SeW9d/

Mais conteúdo relacionado

Semelhante a Index management in shallow depth

Semelhante a Index management in shallow depth (20)

Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Getting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBGetting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Croco talk pgconfeu
Croco talk pgconfeuCroco talk pgconfeu
Croco talk pgconfeu
 
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise GrandjoncIndexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
 
Style based
Style basedStyle based
Style based
 
Style based
Style basedStyle based
Style based
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDB
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
개발자가 알아야 할 Amazon DynamoDB 활용법 :: 김일호 :: AWS Summit Seoul 2016
 
File Structures(Part 2)
File Structures(Part 2)File Structures(Part 2)
File Structures(Part 2)
 
Amazon DynamoDB
Amazon DynamoDBAmazon DynamoDB
Amazon DynamoDB
 

Mais de Andrea Giuliano

Mais de Andrea Giuliano (10)

CQRS, ReactJS, Docker in a nutshell
CQRS, ReactJS, Docker in a nutshellCQRS, ReactJS, Docker in a nutshell
CQRS, ReactJS, Docker in a nutshell
 
Go fast in a graph world
Go fast in a graph worldGo fast in a graph world
Go fast in a graph world
 
Concurrent test frameworks
Concurrent test frameworksConcurrent test frameworks
Concurrent test frameworks
 
Index management in depth
Index management in depthIndex management in depth
Index management in depth
 
Consistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your ChoiceConsistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your Choice
 
Asynchronous data processing
Asynchronous data processingAsynchronous data processing
Asynchronous data processing
 
Think horizontally @Codemotion
Think horizontally @CodemotionThink horizontally @Codemotion
Think horizontally @Codemotion
 
Everything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to askEverything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to ask
 
Stub you!
Stub you!Stub you!
Stub you!
 
Let's test!
Let's test!Let's test!
Let's test!
 

Último

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Index management in shallow depth