This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw twenty random slides and explain you why such practices are bad and how to avoid running into them.
2. Best practices are just boring
• Never follow them, try worst practices
• Only worst practices can really help you screw things up in a
most effective way
• PostgreSQL consultants are nice people - keep them happy!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
3. How it works?
• I have a list, a little bit more than 100 worst practices
• I do not make this stuff up, all of them are real-life examples
• I reshuffle my list every time before presenting and pick a few
examples
• Well, there are some things, which I like more or less, so it is
not a very honest shuffle
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
4. 0. Do not use indexes (a test one!)
• Basically, there is no difference between full table scan and
index scan
• You can check that. Just insert 10 rows into a test table on
your test server and compare.
• Nobody deals with more than 10 row tables in production!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5. 1. Use as many count(*) as you can
• Figure 301083021830123921 is very informative for the end
user
• If it changes in a second to 30108302894839434020, it is still
informative
• select count(*) from sometable is a quite light-weighted query
• Tuple estimation from pg_catalog can never be precise
enough to you
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
6. 2. Use ORM
• All databases share the same syntax
• You must write database-independent code
• Are there any benefits, which are based on database specific
features?
• It always good to learn a new complicated technology
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7. 3. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite programming language
• Than join them at the application level
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
8. 3. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite programming language
• Than join them at the application level
• Now you only need to implement nested loop join, hash join
and merge join as well as query optimizer and page cache
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
9. 4. Be in trend, be schema-less
• You do not need to design the schema
• You only need one table, two columns: id bigserial and extra
jsonb
• JSONB datatype is pretty effective in PostgreSQL, you can
query it just like a well-structured table
• Even if you put a 100M of JSON in it
• Even if you have 1000+ tps
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
10. 5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
11. 5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
12. 5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
• When it starts slowing down, just call those four tables The
Core and add 1000+ tables with denormalized data
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
13. 5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
• When it starts slowing down, just call those four tables The
Core and add 1000+ tables with denormalized data
• If it is not enough, you can always add value_version
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
14. 6. Try to create as many indexes as you can
• Indexes consume no disk space
• Indexes consume no shared_bufers
• There is no overhead on DML if one and every column in a
table covered with bunch of indexes
• Optimizer will definitely choose your index once you created it
• Keep calm and create more indexes
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
15. 7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
16. 7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
• You will always know where to check, if you run out of disk
space
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. 7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
• You will always know where to check, if you run out of disk
space
• You can always call that Big Data
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
18. 7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
• You will always know where to check, if you run out of disk
space
• You can always call that Big Data
• Solve the problem using partitioning... one partition for an
hour or for a minute
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
19. 8. Turn autovacuum off
• It is quite auxiliary process, you can easily stop it
• There is no problem at all to have 100Gb data in a database
which is 1Tb in size
• 2-3Tb RAM servers are cheap, IO is a fastest thing in modern
computing
• Besides, everyone likes BigData
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
20. 9. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
21. 9. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
• That allows you to run into all sorts of problems PostgreSQL
had since introducing Slony
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
22. 10. Keep master and slave on different hardware
• That will maximize the possibility of unsuccessful failover
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
23. 10. Keep master and slave on different hardware
• That will maximize the possibility of unsuccessful failover
• To make things even worse, you can change only slave-related
parameters at slave, leaving defaults for shared_buffers etc.
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
24. 11. Put a synchronous replica to remote DC
• Indeed! That will maximize availability!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
25. 11. Put a synchronous replica to remote DC
• Indeed! That will maximize availability!
• Especially, if you put the replica to another continent
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
26. 12. Never use Foreign Keys
• Consistency control at application level always works as
expected
• You will never get data inconsistency without constraints
• Even if you already have a bullet proof framework to maintain
consistency, could it be good enough reason to use it?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
27. 13. Always use text type for all columns
• It is always fun to reimplement date or ip validation in your
code
• You will never mistakenly convert ”12-31-2015 03:01AM” to
”15:01 12 of undef 2015” using text fields
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
28. 14. Always use improved ”PostgreSQL”
• Postgres is not a perfect database and you are smart
• All that annoying MVCC staff, 32 bit xid and autovacuum
nightmares look the way they do because hackers are oldschool
and lazy
• Hack it in a hard way, do not bother submitting your patch to
the community, just put it into production
• It is easy to maintain such production and keep it compatible
with ”not perfect” PostgreSQL upcoming versions
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
29. 15. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
30. 15. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
• Oh, it is arguable... It can be, if 100% of developers were
familiar with word timeout
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
31. 15. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
• Oh, it is arguable... It can be, if 100% of developers were
familiar with word timeout
• Anyway, you can just start transaction and go away for a
weekend
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
32. 16. Never read your code, write it!
genre_id IN
( SELECT id FROM genres WHERE genres.id IN
(SELECT * FROM unnest(array[155]))
)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
33. 17. Have problems with you PostgreSQL installation?
• Move those problems to the container!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
34. 17. Have problems with you PostgreSQL installation?
• Move those problems to the container!
• It is always good to have something very stable inside
something very unstable!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
35. 17. Have problems with you PostgreSQL installation?
• Move those problems to the container!
• It is always good to have something very stable inside
something very unstable!
• Now your problems are both inside and outside!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
36. 17. Not only Slony should be reinvented!
• Need to convert timestamp? Stored procedure in C will help!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
37. 17. Not only Slony should be reinvented!
• Need to convert timestamp? Stored procedure in C will help!
• Need a message queue? Write it!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
38. 17. Not only Slony should be reinvented!
• Need to convert timestamp? Stored procedure in C will help!
• Need a message queue? Write it!
• Won’t use ORM? Write your own in plpgsql
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
39. 18. And never, never use exceptions
• Documentation says they are slow
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
40. 18. And never, never use exceptions
• Documentation says they are slow
• Raise notice on errors - everyone reads logs constantly!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
41. 18. And never, never use exceptions
• Documentation says they are slow
• Raise notice on errors - everyone reads logs constantly!
• Who cares about errors?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
42. 19. Application runs out of connections?
• Set max_connections to 1000
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
43. 19. Application runs out of connections?
• Set max_connections to 1000
• Common, servers with 1000 CPUs are cheap now
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
44. 19. Application runs out of connections?
• Set max_connections to 1000
• Common, servers with 1000 CPUs are cheap now
• Who said PostgreSQL workers have some overhead?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
45. 19. Application runs out of connections?
• Set max_connections to 1000
• Common, servers with 1000 CPUs are cheap now
• Who said PostgreSQL workers have some overhead?
• And never ever use pgbouncer!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
46. 20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
47. 20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
• Just like writing a code in that Emacs OS...
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
48. 20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
• Just like writing a code in that Emacs OS...
• Simple config, useful features
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
49. 20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
• Just like writing a code in that Emacs OS...
• Simple config, useful features
• Consulters are happy!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
50. 21. Always start tuning PostgreSQL...
• From optimizer options in postgresql.conf
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
51. 21. Always start tuning PostgreSQL...
• From optimizer options in postgresql.conf
• Forget about those shared_buffers and checkpoints!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
52. 21. Always start tuning PostgreSQL...
• From optimizer options in postgresql.conf
• Forget about those shared_buffers and checkpoints!
• geqo options are a good candidate to be a silver bullet!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
53. 22. Have heard about a cool new feature?
• Use it in production immediately!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
54. 22. Have heard about a cool new feature?
• Use it in production immediately!
• Attend MVCC Unmasked by Bruce Momjian (Today, 15:50,
Liberty I)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
55. 22. Have heard about a cool new feature?
• Use it in production immediately!
• Attend MVCC Unmasked by Bruce Momjian (Today, 15:50,
Liberty I)
• Learn about xmin and xmax
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
56. 22. Have heard about a cool new feature?
• Use it in production immediately!
• Attend MVCC Unmasked by Bruce Momjian (Today, 15:50,
Liberty I)
• Learn about xmin and xmax
• Use it in your applications logic!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
57. Do not forget
That was WORST practices talk
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
58. Send me your favourite!
ik@dataegret.com
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com