1. Deploying Maximum
HA architecture
with
PostgreSQL
/ Denish Patel
Database Architect
2. Who am I ?
• Denish Patel
• Database Architect with OmniTI for more than 5 years
• Expertise in PostgreSQL , Oracle, MySQL, MS SQL Server
• Contact : denish@omniti.com
• Blog: http://denishjpatel.blogspot.com/
• Providing Solutions for business problems to deliver
• Scalability
• Reliability
• High Availability We are hiring!!
Apply @ l42.org/lg
• Consistency
• Security
1
3. Agendum
• Why do you need HA architecture ?
• Why PostgreSQL ?
• Traditional HA Architecture
• Goals for Maximum HA
• Maximum HA Solution
2
4. Assumptions
• Consistency and Availability Matters (CAP
theorem)
• Good to reduce MTTF but you have “real”
control on MTTR.
3
5. Why do you need HA architecture?
Application Unavailability of
Downtime Data
Loss of productivity
Loss of Revenue
Dissatisfied Customers
4
6. Why do you need HA architecture?
System
Unplanned Failures
Outages
Data Failures Prevent
Tolerate
System Recover Fast
Planned Changes
Outages
Data
Changes
5
7. Why PostreSQL ?
• Best protection at Lowest Cost
• No additional software costs for providing
maximum Availability compared to closed source
databases
• Provide free feature sets to prevent outages,
tolerate them and recover fast.
6
9. Traditional HA Architecture
Master Hot Standby
Database Database
Steaming
Replication
Copy WAL
files
WAL WAL
PostgreSQL 9
8
10. Goals for Maximum HA Architecture
• 99.99% Uptime of application
• Reduce MTTR
• Planned outages
• Unplanned outages
9
11. Plan to reduce MTTR
• How do you manage failover ?
• Is it transparent to your application?
• Hot Backups/ Dumps
• Are you running on production server?
• Schema backups
• How often? Are they under revision control ?
• WAL files copy scripts
• Do all of your prod servers using same copy of the
script ?
• Where is your reporting queries pointing to ?
• Production DB?
10
12. System Failures
Server Node Fails
Storage Fails
System
Failures
Site Fails
Unplanned
Outages
11
17. Handle Data Failures
• PITR slave lag using OMNIpitr
• 1 hour lag on wal apply
• Periodic pg_dump tables from slave
• Run pg_extractor
• https://github.com/omniti-labs/pg_extractor
• Track schema changes into subversion/git
16
19. Handle Data Corruption
• File System level backups
• Backups on Slave database using
OMNIpitr
• Regular recovery testing
• Snapshot backups for faster recovery
• Solaris ZFS is recommended!
• Monthly pg_dump backups
• Backups on slave
18
20. System Changes
OS Upgrade
Database Upgrade
System
Changes
Network Changes
Planned
Outages
19
21. Handle OS Upgrades
Floating IP
Master
SRHS
Master
SRHS
Failover
Read WAL
Slave 1 Copy
NAS
20
22. Handle OS Upgrades
Floating IP
Master Upgrade OS
SRHS
Master
SRHS
New
Read WAL
Master
Slave 1 Copy
NAS
21
23. Handle OS Upgrades
Floating IP
Master
SRHS
New
SRHS
Failover
New
Read WAL
Master
Slave 1 Copy
NAS
22
24. System Changes
OS Upgrade
Database Upgrade
System
Changes
Network Changes
Planned
Outages
23
25. Handle Database Upgrade
Yes
No PG 8.3+ ? Outage
acceptable
?
Outage No
acceptable?
Yes
pg_upgrad
No Yes e –check
pass?
Third Yes
party Rep pg_dump No
i.e Slony pg_restore
Drop
incompatible
tables before
pg_upgrade upgrade and
* Only showing recommended options restore after
24
26. Handle Data Changes
Planned
Outages
Alter Schemas
Data
Changes
Data growth
25
27. Handle Alter schemas
• Transactional DDL
• CREATE or REPLACE views
• NOT VALID
• Checks
• FKs
• Add column without scanning entire table
• NULLABLE
• No Default
26
28. Handle Data Changes
Planned
Outages
Alter Schemas
Data
Changes
Data growth
27
29. Handle Data Growth
PostgreSQL Bloat removal
• Offline
• VACUUM FULL
• CLUSTER
• Online
• Rebuild index CONCURRENTLY
• Rebuild table online using pg_reorg
http://denishjpatel.blogspot.com/2011/03/extreme-training-session-at-pgeast-p90x.html
28
30. Now we have ….
9
PITR
Floating IP
pg_extractor
pg_reorg
29
31. Maximum HA Architecture
App Floating IP
Master
LB SRHS Bkp
Failover
Master
SRHS
Read WAL
Slave 2 apply
NAS
Read
Bkp
Salve 1
30