O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Put Your Thinking CAP On

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
How Shit Works: Storage
How Shit Works: Storage
Carregando em…3
×

Confira estes a seguir

1 de 50 Anúncio

Put Your Thinking CAP On

Baixar para ler offline

A talk given at JDay Lviv 2015 in Ukraine; originally developed by Yoav Abrahami, and based on the works of Kyle "Aphyr" Kingsbury:

Consistency, availability and partition tolerance: these seemingly innocuous concepts have been giving engineers and researchers of distributed systems headaches for over 15 years. But despite how important they are to the design and architecture of modern software, they are still poorly understood by many engineers.

This session covers the definition and practical ramifications of the CAP theorem; you may think that this has nothing to do with you because you "don't work on distributed systems", or possibly that it doesn't matter because you "run over a local network." Yet even traditional enterprise CRUD applications must obey the laws of physics, which are exactly what the CAP theorem describes. Know the rules of the game and they'll serve you well, or ignore them at your own peril...

A talk given at JDay Lviv 2015 in Ukraine; originally developed by Yoav Abrahami, and based on the works of Kyle "Aphyr" Kingsbury:

Consistency, availability and partition tolerance: these seemingly innocuous concepts have been giving engineers and researchers of distributed systems headaches for over 15 years. But despite how important they are to the design and architecture of modern software, they are still poorly understood by many engineers.

This session covers the definition and practical ramifications of the CAP theorem; you may think that this has nothing to do with you because you "don't work on distributed systems", or possibly that it doesn't matter because you "run over a local network." Yet even traditional enterprise CRUD applications must obey the laws of physics, which are exactly what the CAP theorem describes. Know the rules of the game and they'll serve you well, or ignore them at your own peril...

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (17)

Semelhante a Put Your Thinking CAP On (20)

Anúncio

Mais de Tomer Gabel (20)

Mais recentes (20)

Anúncio

Put Your Thinking CAP On

  1. 1. Put Your Thinking CAP On Tomer Gabel, Wix JDay Lviv, 2015
  2. 2. Credits Originally a talk by Yoav Abrahami (Wix) Based on “Call Me Maybe” by Kyle “Aphyr” Kingsbury
  3. 3. Brewer’s CAP Theorem Partition Tolerance ConsistencyAvailability
  4. 4. Brewer’s CAP Theorem Partition Tolerance ConsistencyAvailability
  5. 5. By Example • I want this book! – I add it to the cart – Then continue browsing • There’s only one copy in stock!
  6. 6. By Example • I want this book! – I add it to the cart – Then continue browsing • There’s only one copy in stock! • … and someone else just bought it.
  7. 7. Consistency
  8. 8. Consistency: Defined • In a consistent system: All participants see the same value at the same time • “Do you have this book in stock?”
  9. 9. Consistency: Defined • If our book store is an inconsistent system: – Two customers may buy the book – But there’s only one item in inventory! • We’ve just violated a business constraint.
  10. 10. Availability
  11. 11. Availability: Defined • An available system: – Is reachable – Responds to requests (within SLA) • Availability does not guarantee success! – The operation may fail – “This book is no longer available”
  12. 12. Availability: Defined • What if the system is unavailable? – I complete the checkout – And click on “Pay” – And wait – And wait some more – And… • Did I purchase the book or not?!
  13. 13. Partition Tolerance
  14. 14. Partition Tolerance: Defined • Partition: one or more nodes are unreachable • No practical system runs on a single node • So all systems are susceptible! A B C D E
  15. 15. “The Network is Reliable” • All four happen in an IP network • To a client, delays and drops are the same • Perfect failure detection is provably impossible1! A B drop delay duplicate reorder A B A B A B time 1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson
  16. 16. Partition Tolerance: Reified • External causes: – Bad network config – Faulty equipment – Scheduled maintenance • Even software causes partitions: – Bad network config. – GC pauses – Overloaded servers • Plenty of war stories! – Netflix – Twilio – GitHub – Wix :-) • Some hard numbers1: – 5.2 failed devices/day – 59K lost packets/day – Adding redundancy only improves by 40% 1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al
  17. 17. “Proving” CAP
  18. 18. In Pictures • Let’s consider a simple system: – Service A writes values – Service B reads values – Values are replicated between nodes • These are “ideal” systems – Bug-free, predictable Node 1 V0A Node 2 V0B
  19. 19. In Pictures • “Sunny day scenario”: – A writes a new value V1 – The value is replicated to node 2 – B reads the new value Node 1 V0A Node 2 V0B V1 V1 V1 V1
  20. 20. In Pictures • What happens if the network drops? – A writes a new value V1 – Replication fails – B still sees the old value – The system is inconsistent Node 1 V0A Node 2 V0B V1 V0 V1
  21. 21. In Pictures • Possible mitigation is synchronous replication – A writes a new value V1 – Cannot replicate, so write is rejected – Both A and B still see V0 – The system is logically unavailable Node 1 V0A Node 2 V0B V1
  22. 22. What does it all mean?
  23. 23. The network is not reliable • Distributed systems must handle partitions • Any modern system runs on >1 nodes… • … and is therefore distributed • Ergo, you have to choose: – Consistency over availability – Availability over consistency
  24. 24. Granularity • Real systems comprise many operations – “Add book to cart” – “Pay for the book” • Each has different properties • It’s a spectrum, not a binary choice! Consistency Availability Shopping CartCheckout
  25. 25. CAP IN THE REAL WORLD Kyle “Aphyr” Kingsbury Breaking consistency guarantees since 2013
  26. 26. PostgreSQL • Traditional RDBMS – Transactional – ACID compliant • Primarily a CP system – Writes against a master node • “Not a distributed system” – Except with a client at play!
  27. 27. PostgreSQL • Writes are a simplified 2PC: – Client votes to commit – Server validates transaction – Server stores changes – Server acknowledges commit – Client receives acknowledgement Client Server Store
  28. 28. PostgreSQL • But what if the ack is never received? • The commit is already stored… • … but the client has no indication! • The system is in an inconsistent state Client Server Store ?
  29. 29. PostgreSQL • Let’s experiment! • 5 clients write to a PostgreSQL instance • We then drop the server from the network • Results: – 1000 writes – 950 acknowledged – 952 survivors
  30. 30. So what can we do? 1. Accept false-negatives – May not be acceptable for your use case! 2. Use idempotent operations 3. Apply unique transaction IDs – Query state after partition is resolved • These strategies apply to any RDBMS
  31. 31. • A document-oriented database • Availability/scale via replica sets – Client writes to a master node – Master replicates writes to n replicas • User-selectable consistency guarantees
  32. 32. MongoDB • When a partition occurs: – If the master is in the minority, it is demoted – The majority promotes a new master… – … selected by the highest optime
  33. 33. MongoDB • The cluster “heals” after partition resolution: – The “old” master rejoins the cluster – Acknowleged minority writes are reverted!
  34. 34. MongoDB • Let’s experiment! • Set up a 5-node MongoDB cluster • 5 clients write to the cluster • We then partition the cluster • … and restore it to see what happens
  35. 35. MongoDB • With write concern unacknowleged: – Server does not ack writes (except TCP) – The default prior to November 2012 • Results: – 6000 writes – 5700 acknowledged – 3319 survivors – 42% data loss!
  36. 36. MongoDB • With write concern acknowleged: – Server acknowledges writes (after store) – The default guarantee • Results: – 6000 writes – 5900 acknowledged – 3692 survivors – 37% data loss!
  37. 37. MongoDB • With write concern replica acknowleged: – Client specifies minimum replicas – Server acks after writes to replicas • Results: – 6000 writes – 5695 acknowledged – 3768 survivors – 33% data loss!
  38. 38. MongoDB • With write concern majority: – For an n-node cluster, requires at least n/2 replicas – Also called “quorum” • Results: – 6000 writes – 5700 acknowledged – 5701 survivors – No data loss
  39. 39. So what can we do? 1. Keep calm and carry on – As Aphyr puts it, “not all applications need consistency” – Have a reliable backup strategy – … and make sure you drill restores! 2. Use write concern majority – And take the performance hit
  40. 40. The prime suspects • Aphyr’s Jepsen tests include: – Redis – Riak – Zookeeper – Kafka – Cassandra – RabbitMQ – etcd (and consul) – ElasticSearch • If you’re considering them, go read his posts • In fact, go read his posts regardless http://aphyr.com/tags/jepsen
  41. 41. STRATEGIES FOR DISTRIBUTED SYSTEMS
  42. 42. Immutable Data • Immutable (adj.): “Unchanging over time or unable to be changed.” • Meaning: – No deletes – No updates – No merge conflicts – Replication is trivial
  43. 43. Idempotence • An idempotent operation: – Can be applied one or more times with the same effect • Enables retries • Not always possible – Side-effects are key – Consider: payments
  44. 44. Eventual Consistency • A design which prefers availability • … but guarantees that clients will eventually see consistent reads • Consider git: – Always available locally – Converges via push/pull – Human conflict resolution
  45. 45. Eventual Consistency • The system expects data to diverge • … and includes mechanisms to regain convergence – Partial ordering to minimize conflicts – A merge function to resolve conflicts
  46. 46. Vector Clocks • A technique for partial ordering • Each node has a logical clock – The clock increases on every write – Track the last observed clocks for each item – Include this vector on replication • When observed and inbound vectors have no common ancestor, we have a conflict • This lets us know when history diverged
  47. 47. CRDTs • Commutative Replicated Data Types1 • A CRDT is a data structure that: – Eventually converges to a consistent state – Guarantees no conflicts on replication 1 “A comprehensive study of Convergent and Commutative Replicated Data Types”, Shapiro et al
  48. 48. CRDTs • CRDTs provide specialized semantics: – G-Counter: Monotonously increasing counter – PN-Counter: Also supports decrements – G-Set: A set that only supports adds – 2P-Set: Supports removals but only once • OR-Sets are particularly useful – Keeps track of both additions and removals – Can be used for shopping carts
  49. 49. Questions? Complaints?
  50. 50. WE’RE DONE HERE! Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Aphyr’s “Call Me Maybe” blog posts: http://aphyr.com/tags/jepsen

Notas do Editor

  • Image source: http://en.wikipedia.org/wiki/File:Seuss-cat-hat.gif
  • Image source: http://en.wikipedia.org/wiki/File:Seuss-cat-hat.gif
  • Photo source: http://pixabay.com/en/meerkat-zoo-animal-sand-desert-363051/
  • Photo source: Unknown
  • Image source: https://www.flickr.com/photos/framesofmind/8541529818/
  • Image source: http://duelingcouches.blogspot.com/2008/12/patiently-waiting.html
  • Image source: http://anapt.deviantart.com/art/together-157107893
  • Image source: https://www.flickr.com/photos/infocux/8450190120/in/set-72157632701634780
  • Image source: http://en.wikipedia.org/wiki/Great_Pyramid_of_Giza#mediaviewer/File:Kheops-Pyramid.jpg
  • Image source: http://2.bp.blogspot.com/--VVPUQ06BaQ/TzmEacERFoI/AAAAAAAAEzE/e2QPIrRWQAg/s1600/washrinse.jpg
  • Photo source: https://www.flickr.com/photos/luschei/1569384007

×