Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Wish list from PostgreSQL - Linux Kernel Summit 2009
1. Released at “Linux Kernel Summit 2009”
http://events.linuxfoundation.org/archive/2009/linux-kernel-summit
Linux Kernel Summit 2009
Wish list from PostgreSQL
Itagaki Takahiro
NTT Open Source Software Center
October 18 - 20, 2009 - Tokyo, Japan
2. Agenda
Background
Postgres won’t use Direct I/O!
Storage and buffer usage in Postgres
Discussions
Low priority I/O for background tasks
Avoid duplicated caching in DB and kernel buffers
2
3. Background: Postgres won’t use Direct I/O!
Our policy is to delegate as much as possible to the kernel
and avoid re-implementing the whole block layer in user-
space of PostgreSQL.
It might be opposite requirements from commercial DBMS folks.
We’d like to keep I/O layer in small.
codes for block layer is
We won’t use RAW device, too. <30K lines (5%)
Layout of files should be managed
by file system. Postgres code lines (600K lines)
Not ideal, but it is good approach to support many platforms
by a small number of developers.
<100 active main developers
<10 committers
support >10 platforms
3
4. Background: Storage and buffer usage in Postgres
Consist of multiple processes.
Use file system and multiple files. (per 1GB of table / per 16MB of xlog)
Mainly use traditional system calls. (lseek, read, write, fsync)
Starting to use posix_fadvise() in the latest version.
We depends on kernel buffer cache and I/O managements.
Do not use synchronous I/O to access data files.
Do not read-ahead by itself; expect read() to do it.
fork()
postmaster
(listener process)
writer backend
(sync process) (SQL executor process)
own shared buffer pool with shmget()
lseek() lseek()
write() own I/O exclusion control read() overwrites
fsync() write() expands
data files 1GB 1GB 1GB
xlog files 16MB 16MB 16MB storage + file system
4
5. Low priority I/O for background tasks
PostgreSQL uses some background tasks
VACUUM – cleanup DELETE’d rows and reclaim the area.
CHECKPOINT – flush all modified pages to disks.
Current behavior in Postgres
Take some sleep every constant amount of I/O.
Consume constant I/O band width regardless of workload.
Ideal behavior Does operation blocked by fsync() ?
on-cache page off-cache page
Background tasks can use all of
read() not blocked blocked
surplus I/O band width as far as
write() blocked blocked
it does not affect to service.
lseek() blocked
pread() not blocked sometimes
Requirements pwrite() sometimes sometimes
Low priority I/O should affect buffered writes and fsync.
Normal I/O should not wait for low priority I/Os; so fsync should
not block lseek, read, write (both overwrites and extends).
5
6. Avoid duplicated caching in DB and kernel buffers
Both postgres and kernel might cache file data because postgres uses
buffered I/O.
Same blocks might be cached in DB and kernel buffers.
duplicated
Approaches to eliminate duplicated caching DB buffers
Direct I/O
kernel buffers
Pros: Can eliminate kernel cache
Cons: Need to add I/O manager to Postgres
storage
mmap
Pros: Can eliminate DB cache
Cons: Hard to implements “Write-Ahead Logging” because mapped
blocks could be flushed out at arbitrary timing.
mmap is better to avoid reinvention of I/O manager in Postgres.
Requirements
Have a control flag to prevent modified blocks to be flushed out.
The flag is released when WAL buffers are written into storage.
– mlock() is not enough because it cannot prevent flushing.
madvise( MADV_{ DOFLUSH | DONTFLUSH } ) ?
6