An overview of Postgres-XC is provided. Postgres-XC is a free, open source, PostgreSQL based write scalable cluster. It runs on multiple servers, is fully ACID and consistent. Postgres-XC is a traditional relational alternative to cases where NoSQL solutions are being considered.
This presentation was given in San Francisco August 7th by Mason Sharp, one of the original architects of Postgres-XC and co-founder of StormDB (http://www.stormdb.com).
3. Who am I
● Mason Sharp
● Co-organizer of NYC PUG
● Co-founder of StormDB
● Previously worked at EnterpriseDB
● Original architect of Stado (GridSQL)
● One of the original architects of Postgres-XC
Aug 7, 2012 Postgres-XC 3
4. PostgreSQL User Groups
San Francisco New York
616 Members 502 Members
New:
Philadelphia
Los Angeles
Tokyo
2000? Members
Aug 7, 2012 Postgres-XC 4
10. Data Tier Scaling
● Up versus Out
● More memory, more cores
● Read-only Replicated Slaves
● Caching
● Memcached
● Sharding
● NoSQL
● NewSQL
Aug 7, 2012 Postgres-XC 10
11. XC Origins
Koichi Suzuki, NTT Data Mason Sharp
Aug 7, 2012 Postgres-XC 11
12. PostgreSQL-Related Clustering
Projects
● pgpool-II
● Read replicated slaves
● PL/Proxy
● Used by Skype, meetme (myYearbook)
● All access is over a stored function
● Postgres-R, PostgresForest
● Stado (GridSQL)
● Parallel Query Can we make it write scalable?
● Not write-scalable
Aug 7, 2012 Postgres-XC 12
14. Overview
● PostgreSQL-based database cluster
● Same API to Apps as PostgreSQL
– Same drivers
● Currently based upon PG 9.1. Soon: 9.2.
● Symmetric Multi-headed Cluster
● No master, no slave
– Not just PostgreSQL replication.
– Application can read/write to any coordinator server
● Consistent database view to all the transactions
– Complete ACID property to all the transactions in the cluster
● Scales both for Write and Read
Aug 7, 2012 Postgres-XC 14
15. Postgres-XC Cluster
Application can connect to any server to have the same database view and service
.
PG- XC Server PG- XC Server PG- XC Server PG- XC Server
Coordinator Coordinator Coordinator ・・・
・・ Coordinator
Data Node Data Node Data Node Add PG- XC servers as Data Node
needed
Communication among PG- XC servers
Global Transaction
Manager
GTM
Aug 7, 2012 Postgres-XC 15
18. Is XC right for you?
● I need write scalability
● I like ACID
● I like SQL
● I don't want to rewrite my existing SQL
applications
● I want to leverage the PostgreSQL community
for all of their contrib modules
Aug 7, 2012 Postgres-XC 18
19. Why XC may not be right for you
● I need MPP parallel query capability
● Parallel Query in XC Limited
● Try Stado: www.stado.us
● I need a solution with built-in HA
● I need massive scale and have loose
consistency requirements
● I would rather use a NoSQL solution so I can
put it on my resume
Aug 7, 2012 Postgres-XC 19
22. Coordinator Overview
●
Based on PostgreSQL 9.1 (9.2 soon)
●
Accepts connections from clients
●
Parses and plans requests
●
Interacts with Global Transaction Manager
●
Uses pooler for Data Node connections
●
Sends down XIDs and snapshots to Data
Nodes
●
Collects results and returns to client
●
Uses two phase commit if necessary
22
23. Data Node Overview
●
Based on PostgreSQL 9.1 (9.2 soon)
●
Where user created data is actually
stored
●
Coordinators (not clients) connects to
Data Nodes
●
Accepts XID and snapshots from
Coordinator
●
The rest is fairly similar to vanilla
PostgreSQL
23
24. Global Transaction Manager
GTM Cluster nodes
XID
Snapshot
Timestamp
Sequence values
Aug 7, 2012 Postgres-XC 24
25. Summary
● Coordinator
● Visible to apps Postgres-XC core, based upon
vanilla PostgreSQL
● SQL analysis, planning, execution
● Connection pooling Share same binary
● Datanode (or simply “NODE”) May want to colocate
● Actual database store
● Local SQL execution
● GTM (Global Transaction Manager)
● Provides consistent database view to transactions
– GXID (Global Transaction ID)
– Snapshot (List of active transactions) Different binaries
– Other global values such as SEQUENCE
● GTM Proxy, integrates server-local transaction requirement for performance
Aug 7, 2012 Postgres-XC 25
26. Data Distribution
Distribution Strategies
Aug 7, 2012 Postgres-XC 26
27. Distributing the data
● Replicated table
● Each row in the table is replicated to the datanodes
● Statement based replication
● Distributed table
● Each row of the table is stored on one datanode,
decided by one of following strategies
– Hash
– Round Robin
– Modulo
– Range and user defined function (future)
Aug 7, 2012 Postgres-XC 27
28. Table Distribution and Replication
● Each table can be distributed or replicated
● Strategy based on usage
– Transaction tables → Distributed
– Static lookup tables → Replicate
– Distribute parent-children together
● Join pushdown when possible
● Where clause pushdown
● Simple parallel aggregates
Aug 7, 2012 Postgres-XC 28
29. Defining Tables
● Table Distribution/Replication
● CREATE TABLE tab (…) DISTRIBUTE BY
HASH(col) | MODULO(col) | ROUND
ROBIN | REPLICATION
Aug 7, 2012 Postgres-XC 29
30. Replicated Tables
Reads
Writes
read
write write write
val val2 val val2 val val2
val val2 val val2 val val2
1 2 1 2 1 2
1 2 1 2 1 2
2 10 2 10 2 10
2 10 2 10 2 10
3 4 3 4 3 4
3 4 3 4 3 4
Aug 7, 2012 Postgres-XC 30
31. Distributed Tables
Write Read
Combiner
write
read read read
val val2 val val2 val val2 val val2 val val2
val val2
1 2 11 21 10 20
1 2 11 21 10 20
2 10 21 101 20 100 2 10 20 100
21 101
3 4 31 41 30 40 3 4 31 41 30 40
Aug 7, 2012 Postgres-XC 31
32. Join Pushdown
Hash/Module Round Robin Replicated
distributed
Hash/Modulo Inner join with NO Inner join if replicated
distributed equality condition on table's distribution list
the distribution is superset of
column with same distributed table's
data type and same distribution list
distribution strategy
Round Robin No No Inner join if replicated
table's distribution list
is superset of
distributed table's
distribution list
Replicated Inner join if replicated Inner join if replicated All kinds of joins
table's distribution list table's distribution list
is superset of is superset of
distributed table's distributed table's
distribution list distribution list
Aug 7, 2012 Postgres-XC 32
33. Constraints
● XC does not support Global constraints – i.e.
constraints across datanodes
● Constraints within a datanode are supported
Distribution strategy Unique, primary key Foreign key constraints
constraints
Replicated Supported Supported if the referenced
table is also replicated on
the same nodes
Hash/Modulo distributed Supported if primary OR Supported if the referenced
unique key is distribution key table is replicated on same
nodes OR it's distributed by
primary key in the same
manner and same nodes
Round Robin Not supported Supported if the referenced
table is replicated on same
nodes
Aug 7, 2012 Postgres-XC 33
35. Transaction Management
Why MVCC is Important for Consistency
Global Transaction Manger
Aug 7, 2012 Postgres-XC 35
36. Multi-version Concurrency Control
(MVCC) (quick overview)
● Readers do not block writers
● Writers do not block readers
● Transaction Ids (XIDs)
● Every transaction gets an ID
● Snapshots contain a list of running XIDs
Aug 7, 2012 Postgres-XC 36
37. Multi-version Concurrency Control
(MVCC) (quickly discussed)
Example:
T1 Begin...
T2 Begin; INSERT...; Commit
T3 Begin...
T4 Begin; SELECT
● T4's snapshot contains T1 and T3
● T2 already committed
● It can see T2's commits, but not T1's nor T3's
Aug 7, 2012 Postgres-XC 37
38. Multi-version Concurrency Control
(MVCC) on 2 Independent Nodes
Example:
T1 Begin...
T2 Begin; INSERT..; Commit;
T3 Begin...
T4 Begin; SELECT
● Node 1: T2 Commit, T4 SELECT
● Node 2: T4 SELECT, T2 Commit
● T4's SELECT statement returns inconsistent data
● Includes data from Node1, but not Node2.
● C in ACID Fails
Aug 7, 2012 Postgres-XC 38
39. Global Transaction Manager
(GTM)
● Provides Global Transaction Consistency
GTM Cluster nodes
XID
Snapshot
Timestamp
Sequence values
Aug 7, 2012 Postgres-XC 39
40. Transaction Management
● 2PC is used to guarantee transactional consistency
across nodes
● When there are more than one nodes involved OR
● When there are explicit 2PC transactions
● Only those nodes where write activity has happened,
participate in 2PC
● In PostgreSQL 2PC can not be applied if temporary
tables are involved. Same restriction applies in
Postgres-XC
● When single coordinator command needs multiple
datanode commands, we encase those in transaction
block
Aug 7, 2012 Postgres-XC 40
42. Can GTM be a Performance Bottleneck?
• Depending on implementation
– Current Implementation Coordinators
GTM
GTM Threads Coordinator Backend
Snapshot Data
Domain Socket
Applicable up to
Client Library
Coordinator
Internet
Lock five PG-XC
Call
servers (DBT-1)
Create Terminate
GTM Main Thread
– Large snapshot size and number
– Too many interaction between GTM and Coordinators
July 12th, 2012 42
43. Can GTM be a Performance Bottleneck?
Proxy Implementation Coordinators
GTM
GTM Worker Threads GTM Proxy Thread Coordinator Backend
Snapshot Data
GTM Snapshot Handler
GTM Server Scanner
Server Protocol Handler
Command
Backend
Handler
Client Library
Internet
Coordinator
Domain
Socket
Domain
Socket
Call
Unix
Lock
Call
Response
Backend
Handler
Create Terminate Create Connection
Terminate Assignment
GTM Main Thread Proxy Main Thread
Connection
•Request/Response grouping
•Single representative snapshot applied to multiple transactions
July 12th, 2012 43
44. Can GTM be a SPOF?
• Implement GTM Standby
Checkpoint next starting
point (GXID and Sequence)
GTM Master GTM Standby
Standby can failover the
master without referring to
GTM master information.
July 12th, 2012 44
45. Parallel Query
● OK for simple queries
● Also when all joins can be pushed down
– Star schema with replicated dimensions
● Even aggregates
● SELECT SUM(col1) FROM tab1
● If cross-node join needed performs poorly
● Data on one node needs to join with another
● Ships all data to coordinator for joining
Aug 7, 2012 Postgres-XC 45
46. High Availability
● GTM-standby provides basic HA
● No native HA for nodes
● Use HA middleware such as Pacemaker
● Each data node should be configured with
synchronous replication
Aug 7, 2012 Postgres-XC 46
47. Status
Settings and options
Aug 7, 2012 Postgres-XC 47
48. Present Status
● Project/Developer site
● http://postgres-xc.sourceforge.net/
● http://sourceforge.net/projects/postgres-xc/
● Version 1.0 available
● Base PostgreSQL version: 9.1
● Soon, PostgreSQL 9.2!
– Group commit: even more write scalability
– “Index-only Scans”
● Get Involved
● Even as just a tester
Aug 7, 2012 Postgres-XC 48
49. Easy way of trying it out?
● www.stormdb.com
● Not Postgres-XC, but similar
● Nothing to install, cloud hosted
● Free beta
Aug 7, 2012 Postgres-XC 49
50. Thank You
mason@stormdb.com
Twitter: mason_db
Aug 7, 2012 Postgres-XC 50