An introductory presentation about table partitioning in PostgreSQL and how to integrate it in your Rails application. Given at the Cambridge Ruby User Group meetup Mar 27th 2014.
2. What is table partitioning in
PostgreSQL?
1 logical table = n smaller physical tables
3. What are the benefits?
improved query performance for big tables *
* works best when results are coming from
single partition
4. When to consider partitioning?
Big table that cannot be queried efficiently, as
the index is enormous as well.
Rule of thumb: table does not fit into memory
5. How is data split?
By ranges:
year > 2010 AND year <= 2012
By lists of key values:
company_id = 5
6. How does it work?
At the heart of it lie 2 mechanisms:
➔ table inheritance
➔ table CHECK constraints
7. Table inheritance: schema
CREATE TABLE A (value INT);
CREATE TABLE B () INHERITS (A);
Table "public.a"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
Table "public.b"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
Inherits: a
8. Table inheritance: schema
CREATE TABLE C (extra_field TEXT) INHERITS (A);
Table "public.c"
Column | Type | Modifiers
-------------+---------+-----------
value | integer |
extra_field | text |
Inherits: a
NB: may inherit from multiple tables
9. Table inheritance: querying
INSERT INTO A VALUES (0);
INSERT INTO B VALUES (10);
INSERT INTO C VALUES (20, 'zonk');
SELECT * FROM A WHERE value < 20;
value
-------
0
10
SELECT * FROM ONLY A WHERE value < 20;
value
-------
0
10. What happened?
EXPLAIN ANALYZE SELECT * FROM A WHERE value < 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..69.38 rows=1290 width=4) (actual time=0.020..0.033 rows=2 loops=1)
-> Seq Scan on a (cost=0.00..4.00 rows=80 width=4) (actual time=0.019..0.021 rows=1 loops=1)
Filter: (value < 20)
-> Seq Scan on b (cost=0.00..40.00 rows=800 width=4) (actual time=0.005..0.006 rows=1 loops=1)
Filter: (value < 20)
-> Seq Scan on c (cost=0.00..25.38 rows=410 width=4) (actual time=0.005..0.005 rows=0 loops=1)
Filter: (value < 20)
Rows Removed by Filter: 1
11. Table CHECK constraint
CREATE TABLE A (value INT);
CREATE TABLE B (CHECK (value < 20)) INHERITS (A);
CREATE TABLE C (CHECK (value >= 20)) INHERITS (A);
INSERT INTO B VALUES (10);
INSERT INTO C VALUES (20);
INSERT INTO C VALUES(5);
ERROR: new row for relation "c" violates check constraint "c_value_check"
DETAIL: Failing row contains (5).
12. Ta da!
EXPLAIN ANALYZE SELECT * FROM A WHERE value < 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..40.00 rows=801 width=4) (actual time=0.015..0.017 rows=1 loops=1)
-> Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (value < 20)
-> Seq Scan on b (cost=0.00..40.00 rows=800 width=4) (actual time=0.012..0.013 rows=1 loops=1)
Filter: (value < 20)
14. Gotcha #2
All check constraints and not-null constraints on a parent table are
automatically inherited by its children.
Other types of constraints (unique, primary key, and foreign key constraints)
are not inherited.
15. Solution #2: create constraints or
indexes in all partitions
CREATE TABLE A (id serial NOT NULL, value INT NOT NULL);
CREATE TABLE B (
CONSTRAINT B_pkey PRIMARY KEY (id),
CHECK (value < 20)
) INHERITS (A);
CREATE TABLE C (
CONSTRAINT C_pkey PRIMARY KEY (id),
CHECK (value >= 20)
) INHERITS (A);
16. Gotcha #3
There is no practical way to enforce uniqueness of a SERIAL id across
partitions.
INSERT INTO B (value) VALUES (10);
INSERT INTO C (value) VALUES (20);
INSERT INTO C VALUES (1, 30);
SELECT * FROM A;
id | value
----+-------
1 | 10
2 | 20
1 | 30
17. Solution #3
➔ carry on and always filter by partitioning key
➔ use UUID (uuid-ossp)
18. Gotcha #4
Specifying that another table's column REFERENCES a(value) would allow the
other table to contain “a values”, but not “b or c values”. There is no good
workaround for this case.
19. How to insert / update rows?
➔ PostgreSQL docs recommend using triggers
➔ also possible to do it using rules (overhead, bulk)
20. Trigger example
CREATE OR REPLACE FUNCTION a_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.value < 20 THEN
INSERT INTO b VALUES (NEW.*);
ELSE
INSERT INTO c VALUES (NEW.*);
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_a_trigger
BEFORE INSERT ON a
FOR EACH ROW EXECUTE PROCEDURE a_insert_trigger();
21. Partitioned gem
https://github.com/fiksu/partitioned
class Company < ActiveRecord::Base; end
class ByCompanyId < Partitioned::ByForeignKey
self.abstract_class = true
belongs_to :company
def self.partition_foreign_key
return :company_id
end
partitioned do |partition|
partition.index :id, :unique => true
end
end
class Employee < ByCompanyId; end
employee = Employee.from_partition(1).find(1)
22. UUID in Rails
Rails 4:
enable_extension 'uuid-ossp'
create_table :documents, id: :uuid do |t|
t.string :title
t.string :author
t.timestamps
end
Rails 3:
gem postgres_ext https://github.com/dockyard/postgres_ext
23. Actual performance improvements?
➔ table with ~13 mln rows
➔ partitioned by date into 8 partitions
➔ complex, long-running query
➔ baseline: 0.07 tps
➔ when results in single partition: 0.15 tps
➔ when results in 2 partitions: like baseline