Postgres indexes: how to make them work for your application

INTRODUCTION
Bartosz Sypytkowski
▪ @Horusiath
▪ b.sypytkowski@gmail.com
▪ bartoszsypytkowski.com

 B+Tree: how databases manage data?
 How indexes work
 Query plans & execution
 Types of indexes
 Vector search
AGENDA

HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
44
DATA
49
DATA
62
DATA
64
DATA
8KB
B+TREE
link
link link link

HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
49
DATA
49
DATA
62
DATA
64
DATA
8KB
SEARCH BY EQUALITY
link
link link link

HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
HEADER
4 32
4 25
4
DATA
32 49
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
32
DATA
42
DATA
43
DATA
44
DATA
49
DATA
62
DATA
64
DATA
8KB
RANGE SCAN
link
link link link

POSTGRESQL
TOAST TABLES
book_id title content
1 ‘Haikus’
‘In the twilight rain
these brilliant-hued
hibiscus - A lovely
sunset.’
2 ‘Moby Dick’
public.books
chunk_id chunk_seq chunk_data
16403 0 0xFF2F000000436
16403 1 0x6167F27521F25B
16403 2 0x23FB21030E6F6
16403 3 0x7974686108676
pg_toast.pg_toast_{$OID}
select relname
from pg_class
where oid = (
select reltoastrelid
from pg_class
where relname='books')
Find name of a
corresponding
TOAST table
if(compress(content).len > 2000)

HEADER
HEADER
HEADER
HEADER
HEADER
4 25
4
DATA
8
DATA
16
DATA
20
DATA
25
DATA
27
DATA
30
DATA
TUPLE IDs
link
Table Heap Index Storage
4
TID
8
TID
16
TID
20
TID
25
TID
27
TID
30
TID
32
TID
32
DATA
<block id, tuple offset>
select ctid from my_table

select *
from books
where author_id = 10
SEQ SCAN

SEQ SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. (Hopefully) sequential I/O
2. Scans all table’s related pages
3. Doesn’t use index pages

create index on books(publication_date);
select publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
ONLY
SCAN

INDEX ONLY
SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Sequential I/O over index pages
2. Doesn’t use table’s related pages

create index on books(publication_date);
select title, publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
SCAN

INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…

INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
2. Position read cursor on the first page…

INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
2. Position read cursor on the first page…
3. Sequential I/O over all table’s pages until condition is done

create index on books
using gist(description_lex);
select title, publication_date
from books
where description_lex @@ ‘epic’
BITMAP
SCAN

BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
Bitmap

BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
2. Random I/O over pages covered by bitmap
Bitmap

INCLUDE & PARTIAL
INDEXES
create index ix_books_by_author
on books(author_id)
include (created_at)
where author_id is not null;
HEADER
HEADER
4
25
HEADER
HEADER
HEADER
4
DATA
7
DATA
13
DATA
16
DATA
19
DATA
25
DATA
32
DATA
47
DATA
61
DATA
4
TID
INC
16
TID
INC
25
TID
INC
32
TID
INC
duplicated
columns
Index
Storage

BTREE INDEX
create index ix_users_birthdate on users(birthdate desc);

COMPAR
ING
VECTOR
CLOCKS
B-TREE INDEX
1. Default
2. Access time: always O(logN)
3. Supports most of the index features
create index ix_users_birthdate
on users(birthdate desc);

HASH INDEX
create index ix_orders_no on orders using hash(order_no);

HEADER
HEADER
0x
01
0x
02
HEADER
HEADER
HEADER
Meta page
HASH INDEX
HEADER
Bucket 0
Overflow page Bitmap page
Bucket 1
4
TID
71
TID
13
TID
73
TID
42
TID
67
TID
86
TID
99
TID
5
TID
38
TID
7
TID
51
TID
14
TID
34
TID
66
TID
70
TID
72
TID
90
TID
79
TID
91
TID
82
TID

COMPAR
ING
VECTOR
CLOCKS
HASH INDEX
1. Access time: O(1) – O(N)
2. Doesn’t shrink in size*
create index ix_orders_no
on orders using hash(order_no);

BRIN INDEX
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);

HEADER
HEADER
HEADER
2020/01/01
2020/03/14
2020/02/09
2020/11/10
2020/01/01
2020/01/21
0..127
BRIN INDEX
link
2020/01/21
2020/03/14
128..255
2020/02/09
2020/07/28
256..383
2020/03/17
2020/11/10
384..511

select tablename, attname, correlation
from pg_stats
where tablename = 'film'
tablename attname correlation
film film_id 0.9979791
film title 0.9979791
film description 0.04854498
film release_year 1
film rating 0.1953281
film last_update 1
film fulltext <null>
COLUMN-TUPLE
CORRELATION

COMPAR
ING
VECTOR
CLOCKS
BRIN INDEX
1. Imprecise
2. Very small in size
3. Good for columns aligned with tuple
insert order and immutable records
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);

BLOOM INDEX
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);

BLOOM FILTER
INSERT(C) =
h1(C)/len | h2(C)/len .. | hn(C)/len

BLOOM FILTER
INSERT(C) =
INSERT(D) =
h1(D)/len | h2(D)/len .. | hn(D)/len

BLOOM FILTER
INSERT(C) =
INSERT(D) =
CONTAINS(A) =
h1(A)/len | h2(A)/len .. | hn(A)/len
FALSE

BLOOM FILTER
INSERT(C) =
INSERT(D) =
CONTAINS(A) =
h1(A)/len | h2(A)/len .. | hn(A)/len
CONTAINS(B) =
h1(B)/len | h2(B)/len .. | hn(B)/len
FALSE
MAYBE?

COMPAR
ING
VECTOR
CLOCKS
BLOOM INDEX
1. Small in size
2. Good for exclusion/narrowing
3. False positive ratio: hur.st/bloomfilter/
create extension bloom;
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
number of bits per record
number of hashes for each
column

GiST INDEX
create index ix_books_content on books using gist(content_lex);

GiST INDEX
GEO POINTS
image: https://postgrespro.com/blog/pgsql/4175817

GiST INDEX
TSVECTOR
-- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gist(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan

GiST INDEX
TSVECTOR
HEADER
011011 110111
HEADER
010011 011010
HEADER
100110 110001
HEADER
aab
TID
aac
TID
aba
TID HEADER
adf
TID
azf
TID
bac
TID
HEADER
brc
TID
caa
TID
cdl
TID
HEADER
cff
TID
fre
TID
klm
TID

COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Supports specialized operators
2. Index is not updated during deletes
create index ix_books_content
on books using gist(content_lex);

SP-GiST INDEX
create index ix_files_path on files using spgist(path);

SP-GiST INDEX
GEO POINTS
image: https://postgrespro.com/blog/pgsql/4220639

SP-GiST INDEX
TSVECTOR
-- spgist can be created on text column but not on nvarchar
create index idx_film_title on film using spgist(title);
select * from film
where title like ‘A Fast-Paced% in New Orleans';
Filter: (description ~~ 'A Fast-Paced%'::text)
-> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0)
Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text))
Query Plan

COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Just like GiST, but faster for some ops…
2. … but unable to perform some other
3. Indexed space is partitioned into non-
overlapping regions
create index ix_files_path
on files using spgist(path);

GIN INDEX
create index ix_books_content on books using gin(content_lex);

GIN INDEX -- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gin(description_lex);
select * from film where description_lex @@ 'epic';
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan

GIN INDEX
HEADER
ever gone
HEADER
HEADER
HEADER
HEADER
omit
call
(1,1)
(3,1)
dome
(1,2)
ever
(1,1)
(2,1)
faucet
(4,0)
gather
(2,1)
(3,2)
gone leather
(2,1)
omit
(2,1)
(3,2)
(2,2) (3,1)
HEADER
HEADER
(1,1)
(1,6)
(1,9)
(1,2) (1,3)
(1,7) (1,8)
(2,1) (2,2)
(2,3)
(2,6)
(2,8)
(2,5)
(2,7)
(3,1)
Posting list
Posting tree

COMPAR
ING
VECTOR
CLOCKS
GIN INDEX
on books using gin(content_lex);
1. Reads usually faster than GiST
2. Writes are usually slower than GiST
3. Index size greater than GiST

RUM INDEX
create index ix_books_content on books using rum(content_lex);

RUM INDEX
HEADER
ever gone
HEADER
HEADER
HEADER
HEADER
omit
call
(1,1)
(3,1)
dome
(1,2)
ever
(1,1)
(2,1)
faucet
(4,0)
gather
(2,1)
(3,2)
gone leather
(2,1)
omit
(2,1)
(3,2)
(2,2) (3,1)
HEADER
HEADER
(1,1)
(1,6)
(1,9)
(1,2) (1,3)
(1,7) (1,8)
(2,1) (2,2)
1, 14
5
2, 5 25
1, 9
45 1, 21
9, 11
2, 10 45
1, 22
13
1
211
2, 4
2, 21
13
25
5
4, 7
(2,3)
(2,6)
(2,8)
(2,5)
(2,7)
(3,1)
2
3
1, 7
8, 13
17
1

RUM INDEX
-- similarity ranking
select description_lex <=> to_tsquery('epic’) as similarity
from books;
-- find description with 2 words located one after another
select * from books
where description_lex @@ to_tsquery(‘hello <-> world’);

COMPAR
ING
VECTOR
CLOCKS
RUM INDEX
1. GIN on steroids (bigger but more
capable)
2. Allows to query for terms and their
relative positions in text
3. Supports Index Scan and EXCLUDE
create extension rum;
on books using rum(content_lex);

1. Uses only lexical similarity
and is language-sensitive
2. Misses the context
(meaning)
3. Works only on text
LIMITS OF FULL-TEXT
SEARCH
select * from books
where description_lex @@
to_tsquery(‘white <-> castle’);
select * from books
where description_lex @@
to_tsquery(‘white <-> fortress’);

VECTOR
SIMILARITY
101
R G B
250 185 0
115 0 152
Embedding

EUCLIDEAN
DISTANCE
COSINE
DISTANCE
x
y
A
B
x
y
A
B

IVFFLAT VECTOR
INDEX
create index ix_books_content on books
using ivfflat(embedding content_lex) with (lists = 1000);
select * from items where embedding <-> ‘[3,1,2]’ < 5;

INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
lists
with (lists=3);

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
centroids
with (lists=3);

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
where embedding <-> ‘[1,2]’ < 5;
[1,2]

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
select * from items
order by embedding <-> ‘[1,2]’
take 4;
[1,2]

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
select * from items
take 4;

HNSW VECTOR INDEX
create index on items using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
Layer 1

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
Layer 1
Layer 2

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
Layer 1
Layer 2
Layer 3

x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
take 4;

1. Fast build time
2. Smaller size
3. Slower query
performance
4. Bad for frequent index
updates
1. Slow initial build time
2. Bigger index size
3. Faster performance
4. Better recall after updates
IVFFLAT
INDEX
HNSW
INDEX

Postgres indexes: how to make them work for your application

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Postgres indexes: how to make them work for your application

Semelhante a Postgres indexes: how to make them work for your application (20)

Mais de Bartosz Sypytkowski

Mais de Bartosz Sypytkowski (13)

Último

Último (20)

Postgres indexes: how to make them work for your application