The PostgreSQL Query Planner

1. The PostgreSQL Query Planner Robert Haas PostgreSQL East 2010

3. In other words, a SQL query is not a program.

4. No control flow statements (e.g. for, while) and no way to control order of operations.

5. SQL describes results, not process.

7. Maybe you gave the planner bad information, or

8. Maybe the query planner really did goof.

11. Prefer sequential I/O to random I/O.

13. Deliver correct results.

15. Join strategy: nested loop, merge join, hash join.

18. Index scan reads index and table in alternation.

19. Bitmap index scan reads index first, populating bitmap, and then reads table in sequential order.

21. Doesn't require reading the index, which has both I/O and CPU cost.

22. Best way to access very small tables.

23. Usually the best way to access all or nearly the rows in a table.

25. Only table access method that can return rows in sorted order – very useful in combination with LIMIT.

26. Random I/O against base table!

28. Table I/O is sequential, with skips; results in physical order.

29. Can efficiently combine data multiple indices – TID bitmap can handle boolean AND and OR operations.

30. Handles LIMIT poorly.

32. # of possibilities grows exponentially with number of tables.

33. When search space is small, planner does a nearly exhaustive search.

34. When search space is too large, planner uses heuristics or GEQO to limit planning time and memory usage.

36. Nested loop with inner index-scan.

37. Merge join.

38. Hash join.

39. Each join strategy takes an “outer” relation and an “inner” relation and produces a result relation.

41. Cost is roughly proportional to product of table sizes – bad if BOTH are large.

42. Nested Loop Example #1 SELECT * FROM foo, bar WHERE foo.x = bar.x Nested Loop Join Filter: (foo.x = bar.x) -> Seq Scan on bar -> Materialize -> Seq Scan on foo This might be very slow!

43. Nested Loop Example #2 SELECT * FROM foo, bar WHERE foo.x = bar.x Nested Loop -> Seq Scan on foo -> Index Scan using bar_pkey on bar Index Cond: (bar.x = foo.x) Nested loop with inner index-scan! Much better... though probably still not the best plan.

45. Put both input relations into sorted order (using sort or index scan) and scan through the two in parallel, matching up equal values.

47. Merge Join Example SELECT * FROM foo, bar WHERE foo.x = bar.x Merge Join Merge Cond: (foo.x = bar.x) -> Sort Sort Key: foo.x -> Seq Scan on foo -> Materialize -> Sort Sort Key: bar.x -> Seq Scan on bar

49. Hash each row from the inner relation to create a hash table. Then, hash each row from the outer relation and probe the hash table for matches.

50. Very fast – but requires enough memory to store inner tuples. Can get around this using multiple “batches”.

51. Not guaranteed to retain input ordering.

52. Hash Join Example SELECT * FROM foo, bar WHERE foo.x = bar.x Hash Join Hash Cond: (foo.x = bar.x) -> Seq Scan on foo -> Hash -> Seq Scan on bar

54. Consider this query:

55. SELECT p.id, p.name FROM projects p LEFT JOIN person pm ON p.project_manager_id = pm.id;

56. If there is a unique index on person (id), then the join need not be performed at all.

57. Common scenario when using views.

59. SELECT * FROM foo JOIN baz ON foo.y = baz.y JOIN bar ON foo.x = bar.x

60. SELECT * FROM foo JOIN (bar JOIN baz ON true) ON foo.x = bar.x AND foo.y = baz.y

61. EXPLAIN Estimates Hash Join (cost=8.28..404.52 rows=9000 width=118) Hash Cond: (foo.x = bar.x) -> Hash Join (cost=3.02..275.52 rows=9000 width=12) Hash Cond: (foo.y = baz.y) -> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=8) -> Hash (cost=1.90..1.90 rows=90 width=4) -> Seq Scan on baz (cost=0.00..1.90 rows=90 width=4) -> Hash (cost=4.00..4.00 rows=100 width=106) -> Seq Scan on bar (cost=0.00..4.00 rows=100 width=106)

62. EXPLAIN ANALYZE Hash Join (cost=8.28..404.52 rows=9000 width=118) (actual time=0.743..51.582 rows=9000 loops=1) Hash Cond: (foo.x = bar.x) -> Hash Join (cost=3.02..275.52 rows=9000 width=12) (actual time=0.368..30.964 rows=9000 loops=1) Hash Cond: (foo.y = baz.y) -> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=8) (actual time=0.021..9.908 rows=10000 loops=1) -> Hash (cost=1.90..1.90 rows=90 width=4) (actual time=0.280..0.280 rows=90 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB -> Seq Scan on baz (cost=0.00..1.90 rows=90 width=4) (actual time=0.010..0.138 rows=90 loops=1) -> Hash (cost=4.00..4.00 rows=100 width=106) (actual time=0.354..0.354 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Seq Scan on bar (cost=0.00..4.00 rows=100 width=106) (actual time=0.007..0.167 rows=100 loops=1) Total runtime: 59.376 ms

64. SELECT * FROM (foo LEFT JOIN baz ON foo.y = baz.y) JOIN bar ON foo.x = bar.x

67. Nested loop with inner index-scan

68. Merge join

69. Hash join

74. You must have good statistics or you will get bad plans!

76. SELECT * FROM foo WHERE (a + 0) = a

77. Planner doesn't have a clue, so will assume 0.5% of rows will match.

79. If the planner overestimates the row count, it may choose a sequential scan instead of an index scan, or a merge or hash join instead of a nested loop.

80. Small values for LIMIT tilt the planner toward fast-start plans and magnify the effect of bad estimates.

82. default_statistics_target (10 or 100) – Level of detail for statistics gathering. Can also be overridden on a per-column basis.

83. enable_hashjoin, enable_sort, etc. - Just for testing.

84. work_mem – Amount of memory per sort or hash.

85. from_collapse_limit, join_collapse_limit, geqo_threshold – Sometimes need to be raised, but be careful!

87. PL/pgsql loops. FOR x IN SELECT ... LOOP SELECT ... END LOOP

88. Repeated calls to SQL or PL/pgsql functions. SELECT id, some_function(id) FROM table;

90. Machine-readable EXPLAIN output.

91. Hash statistics.

92. Better model for Materialize costs.

93. Improved use of indices to handle MIN(x), MAX(x), and x IS NOT NULL.

The PostgreSQL Query Planner

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a The PostgreSQL Query Planner

Semelhante a The PostgreSQL Query Planner (20)

Mais de Command Prompt., Inc

Mais de Command Prompt., Inc (12)

The PostgreSQL Query Planner