Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)
1. Automated conflict resolution
- enabling masterless data
distribution
GOTO Aarhus 2012
Premier software development conference created by developers for
developers.
Conference: Oct. 1-3 // Training: Sept. 30, Oct. 4-5
Se mere på www.gotocon.com/aarhus-2012
1
2. About the speaker
■ Working with databases since '97
■ NoSQL since 2008
■ Danish Shared Medication Record
■ Migrating data from MySQL to Riak
■ Devel Riak clients
■ NoSQL architect on various international projects
■ Seoul, Korea
■ Toulouse, France
■ Leeds, UK RuneSkouLarsen
■ Copenhagen, Denmark
2
3. Agenda
■Polyglot persistence landscape
■Distribute those data!
■Types of Consistency
■Moving towards consistency
■Introduction to CRDTs
■Consistency models of OLTP databases
3
5. Why distributed
databases?
■ Redundancy
■ Availability
■ Scaling
■ Getting closer to your
users
5
6. Types of Concistency
■ Consistency:
All nodes see the same
data at the same time
■ Eventual consistency →
Autonomous consistency
■ Sequential consistency →
Bureaucratic consistency
6
7. When to be Consistent with what
■ Eventual consistency
Support disconnected operations
– Better to read a stale value than nothing
– Better to save writes somewhere than nothing
Potentially anomalous application behavior
– Stale reads and conflicting writes…
■ Sequential consistency
Requires highly available connections
Not suitable for certain scenarios:
– Disconnected clients (e.g. your phone)
– Apps might prefer potential inconsistency to loss of availability
7
9. Last Write Wins (LWW)
User A User B
A B
t=t0 t=t1
A B
t=t0 t=t1
Asynchronous
Synchronization
■ Assign timestamp to all objects
■ Simple but fragile – depends on precise
synchronization of timers
■ Data is lost
9
10. Detecting conflicts using
Vector Clocks (1)
User A User B
A B
vclock=a:1 vclock=a:1,b:1
A A
B
vclock=a:1 vclock=a:1
Asynchronous
vclock=a:1,b:1
Synchronization
■ Assign vector clock to objects
■ Spawn siblings when causality chain is broken
■ Data is never lost
10
11. Detecting conflicts using
Vector Clocks (2)
User A User B
A B
vclock=a:1 vclock=b:1
A B
vclock=a:1 vclock=b:1
Asynchronous
Synchronization
■ Assign vector clock to objects
■ Spawn siblings when causality chain is broken
■ Data is never lost
11
12. Conflict-free Replicated DataTypes
User A User B
A B
AB A AB B
Asynchronous
Synchronization
■ Datastructure intrinsically merges objects
■ No data loss
■ Limited applicability
12
13. Semantic resolution
User A User B
C
A B
A B
Asynchronous
Synchronization
■ Keep both values as siblings
■ User does the merging
■ Only solution for complex, important data
13
14. Methods for resolving conflicts
■ Last Write Wins
■ Easy
■ Data is lost
■ Depends on timestamps
■ Conflict-free Data Types
■ Data structure has built-in convergence
■ Limited ability to model real-world problems
■ Semantic resolution
■ Requires application/user involvement
■ Generic solution
14
15. Conflict-free Replicated Data Types
■ Convergent (CvRDT)
■ State is replicated
■ Moves towards one value
■ Commutative (CmRDT)
■ Operations to the state are
replicated
■ The order of operations is
insignificant
a*b = b*a
■ CvRDT and CmRDT can emulate
eachother
15
17. CRDT References
■ CRDTs: Consistency without concurrency control
2009
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
■ A comprehensive study of Convergent and
Commutative Replicated Data Types
2009
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
■ Sean Cribbs - Eventually Consistent Data Structures
http://vimeo.com/43903960
17
18. Consistency models of OLTP databases
■ Hinted handoff with sloppy quorums
■ Last write wins (highest write-availability)
Riak Riak
CouchDB/CouchBase Cassandra
Cassandra ■ Strong consistency (read you own
writes + strict quorums)
■ User resolvable conflicts
Riak
Riak
Voldemort
Voldemort
Cassandra
CouchDB/CouchBase (but
unreliable) CouchBase
■ Active anti-entropy MongoDB
Riak (Soon) Traditional SQL databases
(Oracle, MySQL, etc.)
18
19. Thank you
Rune Skou Larsen
rsl@trifork.com
Twitter: RuneSkouLarsen
19