2. 2
Outline
• Data Storage
– storing data: disks
– buffer manager
– representing data
– external sorting
• Indexing
– types of indexes
– B+ trees
– hash tables (maybe)
3. 3
Physical Addresses
• Each block and each record have a physical
address that consists of:
– The host
– The disk
– The cylinder number
– The track number
– The block within the track
– For records: an offset in the block
• sometimes this is in the block’s header
4. 4
Logical Addresses
• Logical address: a string of bytes (10-16)
• More flexible: can blocks/records around
• But need translation table:
Logical address Physical address
L1 P1
L2 P2
L3 P3
5. 5
Main Memory Address
• When the block is read in main memory, it
receives a main memory address
• Buffer manager has another translation table
Memory address Logical address
M1 L1
M2 L2
M3 L3
6. 6
The I/O Model of Computation
• In main memory algorithms
– we care about CPU time
• In databases time is dominated by I/O cost
• Assumption: cost is given only by I/O
• Consequence: need to redesign certain
algorithms
• Will illustrate here with sorting
7. 7
Sorting
• Illustrates the difference in algorithm design
when your data is not in main memory:
– Problem: sort 1Gb of data with 1Mb of RAM.
• Arises in many places in database systems:
– Data requested in sorted order (ORDER BY)
– Needed for grouping operations
– First step in sort-merge join algorithm
– Duplicate removal
– Bulk loading of B+-tree indexes.
8. 8
2-Way Merge-sort:
Requires 3 Buffers
• Pass 1: Read a page, sort it, write it.
– only one buffer page is used
• Pass 2, 3, …, etc.:
– three buffer pages used.
Main memory buffers
INPUT 1
INPUT 2
OUTPUT
DiskDisk
9. 9
Two-Way External Merge Sort
• Each pass we read + write
each page in file.
• N pages in the file => the
number of passes
• So total cost is:
• Improvement: start with
larger runs
• Sort 1GB with 1MB memory
in 10 passes
= +log2 1N
( )2 12N Nlog +
Input file
1-page runs
2-page runs
4-page runs
8-page runs
PASS 0
PASS 1
PASS 2
PASS 3
9
3,4 6,2 9,4 8,7 5,6 3,1 2
3,4 5,62,6 4,9 7,8 1,3 2
2,3
4,6
4,7
8,9
1,3
5,6 2
2,3
4,4
6,7
8,9
1,2
3,5
6
1,2
2,3
3,4
4,5
6,6
7,8
10. 10
Can We Do Better ?
• We have more main memory
• Should use it to improve performance
11. 11
Cost Model for Our Analysis
• B: Block size
• M: Size of main memory
• N: Number of records in the file
• R: Size of one record
14. 14
Indexes
• An index on a file speeds up selections on the
search key field(s)
• Search key = any subset of the fields of a relation
– Search key is not the same as key (minimal set of fields
that uniquely identify a record in a relation).
• Entries in an index: (k, r), where:
– k = the key
– r = the record OR record id OR record ids
15. 15
Index Classification
• Clustered/unclustered
– Clustered = records sorted in the key order
– Unclustered = no
• Dense/sparse
– Dense = each record has an entry in the index
– Sparse = only some records have
• Primary/secondary
– Primary = on the primary key
– Secondary = on any key
– Some books interpret these differently
• B+ tree / Hash table / …
16. 16
Clustered Index
• File is sorted on the index attribute
• Dense index: sequence of (key,pointer) pairs
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
17. 17
Clustered Index
• Sparse index: one key per data block
10
30
50
70
90
110
130
150
10
20
30
40
50
60
70
80
18. 18
Clustered Index with Duplicate
Keys
• Dense index: point to the first record with
that key
10
20
30
40
50
60
70
80
10
10
10
20
20
20
30
40
19. 19
Clustered Index with Duplicate
Keys
• Sparse index: pointer to lowest search key in
each block:
• Search for 20
10
10
20
30
10
10
10
20
20
20
30
40
20. 20
Clustered Index with Duplicate
Keys
• Better: pointer to lowest new search key in
each block:
• Search for 20
10
20
30
40
50
60
70
80
10
10
10
20
30
30
40
50
21. 21
Unclustered Indexes
• To index other attributes than primary key
• Always dense (why ?)
10
10
20
20
20
30
30
30
20
30
30
20
10
20
10
30
23. 23
Composite Search Keys
• Composite Search Keys: Search
on a combination of fields.
– Equality query: Every field
value is equal to a constant
value. E.g. wrt <sal,age>
index:
• age=20 and sal =75
– Range query: Some field
value is not a constant. E.g.:
• age =20; or age=20 and
sal > 10
sue 13 75
bob
cal
joe 12
10
20
8011
12
name age sal
<sal, age>
<age, sal> <age>
<sal>
12,20
12,10
11,80
13,75
20,12
10,12
75,13
80,11
11
12
12
13
10
20
75
80
Data records
sorted by name
Data entries in index
sorted by <sal,age>
Data entries
sorted by <sal>
Examples of composite key
indexes using lexicographic order.
24. 24
B+ Trees
• Search trees
• Idea in B Trees:
– make 1 node = 1 block
• Idea in B+ Trees:
– Make leaves into a linked list (range queries are
easier)
25. 25
• Parameter d = the degree
• Each node has >= d and <= 2d keys (except root)
• Each leaf has >=d and <= 2d keys:
B+ Trees Basics
30 120 240
Keys k < 30
Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k
40 50 60
40 50 60
Next leaf
27. 27
B+ Tree Design
• How large d ?
• Example:
– Key size = 4 bytes
– Pointer size = 8 bytes
– Block size = 4096 byes
• 2d x 4 + (2d+1) x 8 <= 4096
• d = 170
28. 28
Searching a B+ Tree
• Exact key values:
– Start at the root
– Proceed down, to the leaf
• Range queries:
– As above
– Then sequential traversal
Select name
From people
Where age = 25
Select name
From people
Where 20 <= age
and age <= 30
29. 29
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133
• Typical capacities:
– Height 4: 1334
= 312,900,700 records
– Height 3: 1333
= 2,352,637 records
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
30. 30
Insertion in a B+ Tree
Insert (K, P)
• Find leaf where K belongs, insert
• If no overflow (2d keys or less), halt
• If overflow (2d+1 keys), split node, insert in parent:
• If leaf, keep K3 too in right node
• When root splits, new root has 1 key only
K1 K2 K3 K4 K5
P0 P1 P2 P3 P4 p5
K1 K2
P0 P1 P2
K4 K5
P3 P4 p5
(K3, ) to parent
42. 42
Deletion from a B+ Tree
80
19 30 60 100 120 140
10 15 18 19 2
0
60 65 80 85 90
10 15 18 20 60 65 80 85 9019
After deleting 40
Rotation not possible
Need to merge nodes
50
50
43. 43
Deletion from a B+ Tree
80
19 60 100 120 140
10 15 18 19 2
0
5
0
60 65 80 85 90
10 15 18 20 60 65 80 85 9019
Final tree
50
44. 44
Hash Tables
• Secondary storage hash tables are much like
main memory ones
• Recall basics:
– There are n buckets
– A hash function f(k) maps a key k to {0, 1, …, n-1}
– Store in bucket f(k) a pointer to record with key k
• Secondary storage: bucket = block, use
overflow blocks when needed
45. 45
• Assume 1 bucket (block) stores 2 keys +
pointers
• h(e)=0
• h(b)=h(f)=1
• h(g)=2
• h(a)=h(c)=3
Hash Table Example
e
b
f
g
a
c
0
1
2
3
46. 46
• Search for a:
• Compute h(a)=3
• Read bucket 3
• 1 disk access
Searching in a Hash Table
e
b
f
g
a
c
0
1
2
3
47. 47
• Place in right bucket, if space
• E.g. h(d)=2
Insertion in Hash Table
e
b
f
g
d
a
c
0
1
2
3
48. 48
• Create overflow block, if no space
• E.g. h(k)=1
• More over-
flow blocks
may be needed
Insertion in Hash Table
e
b
f
g
d
a
c
0
1
2
3
k
49. 49
Hash Table Performance
• Excellent, if no overflow blocks
• Degrades considerably when number of
keys exceeds the number of buckets (I.e.
many overflow blocks).
50. 50
Extensible Hash Table
• Allows has table to grow, to avoid
performance degradation
• Assume a hash function h that returns
numbers in {0, …, 2k
– 1}
• Start with n = 2i
<< 2k
, only look at first i
most significant bits
51. 51
Extensible Hash Table
• E.g. i=1, n=2, k=4
• Note: we only look at the first bit (0 or 1)
0(010)
1(011)
i=1 1
1
0
1
55. 55
Insertion in Extensible Hash
Table
• Now insert 0000, then 0101
• Need to split block
0(010)
0(000), 0(101)
10(11)
10(10)
i=2 1
2
00
01
10
11
11(10) 2
56. 56
Insertion in Extensible Hash
Table
• After splitting the block
00(10)
00(00)
10(11)
10(10)
i=2
2
2
00
01
10
11
11(10) 2
01(01) 2
57. 57
Performance Extensible Hash
Table
• No overflow blocks: access always one read
• BUT:
– Extensions can be costly and disruptive
– After an extension table may no longer fit in
memory
58. 58
Linear Hash Table
• Idea: extend only one entry at a time
• Problem: n= no longer a power of 2
• Let i be such that 2i
<= n < 2i+1
• After computing h(k), use last i bits:
– If last i bits represent a number > n, change msb
from 1 to 0 (get a number <= n)
59. 59
Linear Hash Table Example
• N=3
(01)00
(11)00
(10)10
i=2
00
01
10
(01)11 BIT FLIP
60. 60
Linear Hash Table Example
• Insert 1000: overflow blocks…
(01)00
(11)00
(10)10
i=2
00
01
10
(01)11
(10)00
61. 61
Linear Hash Tables
• Extension: independent on overflow blocks
• Extend n:=n+1 when average number of
records per block exceeds (say) 80%
62. 62
Linear Hash Table Extension
• From n=3 to n=4
• Only need to touch
one block (which one ?)
(01)00
(11)00
(10)10
i=2
00
01
10
(01)11
(01)11
(01)11
i=2
00
01
10
(10)10
(01)00
(11)00
11
63. 63
Linear Hash Table Extension
• From n=3 to n=4 finished
• Extension from n=4
to n=5 (new bit)
• Need to touch every
single block (why ?)
(01)11
i=2
00
01
10
(10)10
(01)00
(11)00
11