3. • High Availability (auto-failover)
• Read Scaling (extra copies to read from)
• Backups
– Online, Delayed Copy (fat finger)
– Point in Time (PiT) backups
• Use (hidden) replica for secondary workload
– Analytics
– Data-processing
– Integration with external systems 3
4. Planned
– Hardware upgrade
– O/S or file-system tuning
– Relocation of data to new file-system / storage
– Software upgrade
Unplanned
– Hardware failure
– Data center failure
– Region outage
– Human error
– Application corruption 4
5. • A cluster of N servers
• All writes to primary
• Reads can be to primary (default) or a
secondary
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
5
6. Member 1 Member 3
Member 2
• Replica Set is made up of 2 or more nodes
6
7. Member 1 Member 3
Member 2
Primary
• Election establishes the PRIMARY
• Data replication from PRIMARY to SECONDARY
7
8. negotiate new
master
Member 1 Member 3
Member 2
DOWN
• PRIMARY may fail
• Automatic election of new PRIMARY
if majority exists
8
9. negotiate new
master
Member 3
Member 1 Primary
Member 2
DOWN
• New PRIMARY elected
• Replica Set re-established
9
10. Member 3
Member 1
Primary
Member 2
Recovering
• Automatic recovery
10
11. Member 3
Member 1
Primary
Member 2
• Replica Set re-established
11
27. San Francisco
Primary Priority 1
Secondary Priority 1
Secondary Priority 0
Dallas
Disaster recover data center. Will
never become primary
automatically.
27
28. San Francisco
New York Primary
Secondary
Dallas Secondary
28
33. 1 2 3
Primary Primary Primary
Full Sync
Secondary Secondary Secondary Secondary
Arbiter Arbiter Arbiter
Uh oh. Full Sync is going to use
a lot of resources on the
primary. So I may have
downtime or degraded
performance
33
36. 1 2 3
Primary Primary Primary
Secondary Secondary Secondary Secondary
Secondary Secondary Secondary Full Sync
Sync can happen from
secondary, which will not impact
traffic on Primary.
36
37. • Avoid single points of failure
– Separate racks
– Separate data centers
• Avoid long recovery downtime
– Use journaling
– Use 3+ replicas
• Keep your actives close
– Use priority to control where failovers happen
37
Change operations are written to the oplogThe oplog is a capped collection (fixed size)Must have enough space to allow new secondaries to catch up after copying from a primaryMust have enough space to cope with any applicable slaveDelaySecondaries query the primary's oplog and apply what they findAll replicas contain an oplog