SlideShare uma empresa Scribd logo
1 de 34
Distributed Systems
   and Consistency




Because everything else is easy.
What we're talking about
●   What are distributed systems?
●   Why are they good, why are they bad?
●   CAP theorem
●   Possible CAP configurations
●   Strategies for consistency, including:
    ●   Point-in-time consistency with LSS
    ●   Vector clocks for distributed consistency
    ●   CRDTs for consistency from the data structure
    ●   Bloom, a natively consistent distributed language
What's a distributed system?
●   Short answer: big data systems
    ●   Lots of machines, geographically distributed
●   Technical answer:
    ●   Any system where events are not global
    ●   Where events can happen simultaneously
Why are they good?
●   Centralized systems scale poorly & expensively
    ●   More locks, more contention
    ●   Expensive hardware
    ●   Vertical scaling
●   Distributed systems scale well & cheaply
    ●   No locks, no contention
    ●   (Lots of) cheap hardware
    ●   Linear scaling
So what's the catch?
●   Consistency
    ●   “Easy” in centralized systems
    ●   Hard in distributed systems
CAP Theorem
●   Consistency
    ●   All nodes see the same data at the same time
●   Availability
    ●   Every request definitely succeeds or fails
●   Partition tolerance
    ●   System operates despite message loss, failure
●   Pick two!
No P
●   No partition tolerance = centralized
    ●   Writes can't reach the store? Broken.
    ●   Reads can't find the data? Broken.
●   The most common database type
    ●   MySQL
    ●   Postgres
    ●   Oracle
No A
●   An unavailable database = a crappy database
    ●   Read or write didn't work? Try again.
    ●   Everything sacrifices A to some degree
●   Has some use-cases
    ●   High-volume logs & statistics
    ●   Google BigTable
    ●   Mars orbiters!
No C
●   Lower consistency = distributed systems
    ●   “Eventual consistency”
    ●   Writes will work, or definitely fail
    ●   Reads will work, but might not be entirely true
●   The new hotness
    ●   Amazon S3, Riak, Google Spanner
Why is this suddenly cool?
●   The economics of computing have changed
●   Networking was rare and expensive
    ●   Now cheap and ubiquitous – lots more P
●   Storage was expensive
    ●   Now ridiculously cheap – allows new approaches
●   Partition happens
    ●   Deliberately sacrifice Consistency
    ●   Instead of accidentally sacrificing Availability
Ways to get to eventual consistency
●   App level:
    ●   Write locking
    ●   Last write wins
●   Infrastructure level
    ●   Log structured storage
    ●   Multiversion concurrency control
    ●   Vector clocks and siblings
●   New: language level!
    ●   Bloom
Write-time consistency 1
●   Write-time locking
    ●   Distributed reads
    ●   (Semi)-centralized writes
    ●   Cheap, fast reads (but can be stale)
    ●   Slower writes, potential points of failure
●   In the wild:
    ●   Clipboard.com
    ●   Awe.sm!
Write-time consistency 2
●   Last write wins
    ●   Cheap reads
    ●   Cheap writes
    ●   Can silently lose data!
        –   A sacrifice of Availability
●   In the wild:
    ●   Amazon S3
Side note: Twitter
●   Twitter is eventually consistent!
●   Your timeline isn't guaranteed correct
●   Older tweets can appear or disappear
●   Twitter sacrifices C for A and P
    ●   But doesn't get a lot of A
Infrastructure level consistency 1
●   Log structured storage
    ●   Also called append-only databases
    ●   A new angle on consistency: external consistency
    ●   a.k.a. Point-in-time consistency
●   In the wild:
    ●   BigTable
    ●   Spanner
How LSS Works
●   Every write is appended
●   Indexes are built and appended
●   Reads work backwards through the log
●   Challenges
    ●   Index-building can get chunky
        –   Build them in memory, easily rebuilt
    ●   Garbage collection
        –   But storage is cheap now!
Why is LSS so cool?
●   Easier to manage big data
    ●   Size, schema, allocation of storage simplified
●   Indexes are impossible to corrupt
●   Reads and writes are cheap
●   Point-in-time consistency is free!
    ●   Called Multiversion Concurrency Control
Infrastructure level consistency 2
●   Vector clocks
    ●   Vectors as in math
    ●   Basically an array
Not enough for consistency
●   Different nodes know different things!
●   Quorum reads
    ●   N or more nodes must agree
●   Quorum writes
    ●   N or more nodes must receive new value
●   Can tune N for your application
But siblings suck!
Dealing with siblings
●   1: Consistency at read time
    ●   Slower reads
    ●   Pay every time
●   2: Consistency at write time
    ●   Slower writes
    ●   Pay once
●   3: Consistency at infrastructure level
    ●   CRDTs: Commutative Replicated Data Types
    ●   Monotonic lattices of commutative operations
Don't Panic
●   We're going to go slowly
●   There's no math
Monotonicity
●   Operations only affect the data in one way
    ●   e.g. increment vs. set
●   Instead of storing values, store operations
Commutativity
●   Means the order of operations isn't important
    ●   1 + 5 + 10 == 10 + 5 + 1
    ●   Also: (1+5) + 10 == (10+5) + 1
●   You don't need to know when stuff happened
●   Just what happened
Lattices
●   A data structure of operations
    ●   Like a vector clock, sets of operations
●   “Partially” ordered
    ●   Means you can throw away oldest operations
Put it all together: CRDTs
●   Commutative Replicated Data Types
    ●   Each node stores every entry as a lattice
    ●   Lattices are distributed and merged
    ●   Operations are commutative
        –   So collisions don't break stuff
CRDTs are monotonic
●   Each new operation adds information
●   Data is never deleted or destroyed
●   Applications don't need to know
●   Everything is in the store
CRDTs are pretty awesome
●   But
    ●   use a lot more space
    ●   garbage collection is non-trivial
●   In the wild:
    ●   The data processor!
Language level consistency
●   Bloom
    ●   A natively distributed-safe language
    ●   All operations are monotonic and commutative
    ●   Allows compiler-level analysis
    ●   Flag where unsafe things are happening
        –   And suggest fixes and coordination
    ●   Crazy future stuff
In Summary
●   Big data is easy
    ●   Just use distributed systems!
●   Consistency is hard
    ●   The solution may be in data structures
    ●   Making use of radically cheaper storage
●   Store operations, not values
    ●   And make operations commutative
●   Data is so cool!
More reading
●   Log Structured Storage:
    ●   http://blog.notdot.net/2009/12/Damn-Cool-Algorithms-Log-structured-
        storage
●   Lattice data structures and CALM theorem:
    ●   http://db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf
●   Bloom:
    ●   http://www.bloom-lang.net/
●   Ops: Riak in the Cloud
    ●   https://speakerdeck.com/u/randommood/p/getting-starte
Even more reading
●   http://en.wikipedia.org/wiki/Multiversion_concurrency_control
●   http://en.wikipedia.org/wiki/Monotonic_function
●   http://en.wikipedia.org/wiki/Commutative_property
●   http://en.wikipedia.org/wiki/CAP_theorem
●   http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
●   http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf
●   http://en.wikipedia.org/wiki/Vector_clock

Mais conteúdo relacionado

Mais procurados

BASE: An Acid Alternative
BASE: An Acid AlternativeBASE: An Acid Alternative
BASE: An Acid Alternative
Hiroshi Ono
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
Ali Usman
 

Mais procurados (20)

Simple Solutions for Complex Problems
Simple Solutions for Complex ProblemsSimple Solutions for Complex Problems
Simple Solutions for Complex Problems
 
NoSQL Evolution
NoSQL EvolutionNoSQL Evolution
NoSQL Evolution
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
BASE: An Acid Alternative
BASE: An Acid AlternativeBASE: An Acid Alternative
BASE: An Acid Alternative
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Architecting for the cloud cloud providers
Architecting for the cloud cloud providersArchitecting for the cloud cloud providers
Architecting for the cloud cloud providers
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Design principles of scalable, distributed systems
Design principles of scalable, distributed systemsDesign principles of scalable, distributed systems
Design principles of scalable, distributed systems
 
Client Centric Consistency Model
Client Centric Consistency ModelClient Centric Consistency Model
Client Centric Consistency Model
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
 
Lightning talk: highly scalable databases and the PACELC theorem
Lightning talk: highly scalable databases and the PACELC theoremLightning talk: highly scalable databases and the PACELC theorem
Lightning talk: highly scalable databases and the PACELC theorem
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
data replication
data replicationdata replication
data replication
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocols
 

Destaque

process management
 process management process management
process management
Ashish Kumar
 
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Legacy Typesafe (now Lightbend)
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
Shane Johnson
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
AbDul ThaYyal
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
Manish Singh
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
Rupsee
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
karan2190
 

Destaque (18)

process management
 process management process management
process management
 
Chap 4
Chap 4Chap 4
Chap 4
 
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
 
Scaling Scribd
Scaling ScribdScaling Scribd
Scaling Scribd
 
Scaling up food safety information transparency
Scaling up food safety information transparencyScaling up food safety information transparency
Scaling up food safety information transparency
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
The elements of scale
The elements of scaleThe elements of scale
The elements of scale
 
3. challenges
3. challenges3. challenges
3. challenges
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 
Client-centric Consistency Models
Client-centric Consistency ModelsClient-centric Consistency Models
Client-centric Consistency Models
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soni
 
message passing
 message passing message passing
message passing
 
Transparency - The Double-Edged Sword
Transparency - The Double-Edged SwordTransparency - The Double-Edged Sword
Transparency - The Double-Edged Sword
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 

Semelhante a Distributed systems and consistency

Designing large scale distributed systems
Designing large scale distributed systemsDesigning large scale distributed systems
Designing large scale distributed systems
Ashwani Priyedarshi
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
Justin Dorfman
 
Olap scalability
Olap scalabilityOlap scalability
Olap scalability
lucboudreau
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
kuchinskaya
 

Semelhante a Distributed systems and consistency (20)

AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Cassandra On EC2
Cassandra On EC2Cassandra On EC2
Cassandra On EC2
 
Redis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HARedis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HA
 
Designing large scale distributed systems
Designing large scale distributed systemsDesigning large scale distributed systems
Designing large scale distributed systems
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
 
Benchmarks, performance, scalability, and capacity what s behind the numbers...
Benchmarks, performance, scalability, and capacity  what s behind the numbers...Benchmarks, performance, scalability, and capacity  what s behind the numbers...
Benchmarks, performance, scalability, and capacity what s behind the numbers...
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
Olap scalability
Olap scalabilityOlap scalability
Olap scalability
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
 

Último

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Distributed systems and consistency

  • 1. Distributed Systems and Consistency Because everything else is easy.
  • 2. What we're talking about ● What are distributed systems? ● Why are they good, why are they bad? ● CAP theorem ● Possible CAP configurations ● Strategies for consistency, including: ● Point-in-time consistency with LSS ● Vector clocks for distributed consistency ● CRDTs for consistency from the data structure ● Bloom, a natively consistent distributed language
  • 3. What's a distributed system? ● Short answer: big data systems ● Lots of machines, geographically distributed ● Technical answer: ● Any system where events are not global ● Where events can happen simultaneously
  • 4. Why are they good? ● Centralized systems scale poorly & expensively ● More locks, more contention ● Expensive hardware ● Vertical scaling ● Distributed systems scale well & cheaply ● No locks, no contention ● (Lots of) cheap hardware ● Linear scaling
  • 5. So what's the catch? ● Consistency ● “Easy” in centralized systems ● Hard in distributed systems
  • 6. CAP Theorem ● Consistency ● All nodes see the same data at the same time ● Availability ● Every request definitely succeeds or fails ● Partition tolerance ● System operates despite message loss, failure ● Pick two!
  • 7. No P ● No partition tolerance = centralized ● Writes can't reach the store? Broken. ● Reads can't find the data? Broken. ● The most common database type ● MySQL ● Postgres ● Oracle
  • 8. No A ● An unavailable database = a crappy database ● Read or write didn't work? Try again. ● Everything sacrifices A to some degree ● Has some use-cases ● High-volume logs & statistics ● Google BigTable ● Mars orbiters!
  • 9. No C ● Lower consistency = distributed systems ● “Eventual consistency” ● Writes will work, or definitely fail ● Reads will work, but might not be entirely true ● The new hotness ● Amazon S3, Riak, Google Spanner
  • 10. Why is this suddenly cool? ● The economics of computing have changed ● Networking was rare and expensive ● Now cheap and ubiquitous – lots more P ● Storage was expensive ● Now ridiculously cheap – allows new approaches ● Partition happens ● Deliberately sacrifice Consistency ● Instead of accidentally sacrificing Availability
  • 11. Ways to get to eventual consistency ● App level: ● Write locking ● Last write wins ● Infrastructure level ● Log structured storage ● Multiversion concurrency control ● Vector clocks and siblings ● New: language level! ● Bloom
  • 12. Write-time consistency 1 ● Write-time locking ● Distributed reads ● (Semi)-centralized writes ● Cheap, fast reads (but can be stale) ● Slower writes, potential points of failure ● In the wild: ● Clipboard.com ● Awe.sm!
  • 13. Write-time consistency 2 ● Last write wins ● Cheap reads ● Cheap writes ● Can silently lose data! – A sacrifice of Availability ● In the wild: ● Amazon S3
  • 14. Side note: Twitter ● Twitter is eventually consistent! ● Your timeline isn't guaranteed correct ● Older tweets can appear or disappear ● Twitter sacrifices C for A and P ● But doesn't get a lot of A
  • 15. Infrastructure level consistency 1 ● Log structured storage ● Also called append-only databases ● A new angle on consistency: external consistency ● a.k.a. Point-in-time consistency ● In the wild: ● BigTable ● Spanner
  • 16. How LSS Works ● Every write is appended ● Indexes are built and appended ● Reads work backwards through the log ● Challenges ● Index-building can get chunky – Build them in memory, easily rebuilt ● Garbage collection – But storage is cheap now!
  • 17. Why is LSS so cool? ● Easier to manage big data ● Size, schema, allocation of storage simplified ● Indexes are impossible to corrupt ● Reads and writes are cheap ● Point-in-time consistency is free! ● Called Multiversion Concurrency Control
  • 18. Infrastructure level consistency 2 ● Vector clocks ● Vectors as in math ● Basically an array
  • 19.
  • 20.
  • 21. Not enough for consistency ● Different nodes know different things! ● Quorum reads ● N or more nodes must agree ● Quorum writes ● N or more nodes must receive new value ● Can tune N for your application
  • 23. Dealing with siblings ● 1: Consistency at read time ● Slower reads ● Pay every time ● 2: Consistency at write time ● Slower writes ● Pay once ● 3: Consistency at infrastructure level ● CRDTs: Commutative Replicated Data Types ● Monotonic lattices of commutative operations
  • 24. Don't Panic ● We're going to go slowly ● There's no math
  • 25. Monotonicity ● Operations only affect the data in one way ● e.g. increment vs. set ● Instead of storing values, store operations
  • 26. Commutativity ● Means the order of operations isn't important ● 1 + 5 + 10 == 10 + 5 + 1 ● Also: (1+5) + 10 == (10+5) + 1 ● You don't need to know when stuff happened ● Just what happened
  • 27. Lattices ● A data structure of operations ● Like a vector clock, sets of operations ● “Partially” ordered ● Means you can throw away oldest operations
  • 28. Put it all together: CRDTs ● Commutative Replicated Data Types ● Each node stores every entry as a lattice ● Lattices are distributed and merged ● Operations are commutative – So collisions don't break stuff
  • 29. CRDTs are monotonic ● Each new operation adds information ● Data is never deleted or destroyed ● Applications don't need to know ● Everything is in the store
  • 30. CRDTs are pretty awesome ● But ● use a lot more space ● garbage collection is non-trivial ● In the wild: ● The data processor!
  • 31. Language level consistency ● Bloom ● A natively distributed-safe language ● All operations are monotonic and commutative ● Allows compiler-level analysis ● Flag where unsafe things are happening – And suggest fixes and coordination ● Crazy future stuff
  • 32. In Summary ● Big data is easy ● Just use distributed systems! ● Consistency is hard ● The solution may be in data structures ● Making use of radically cheaper storage ● Store operations, not values ● And make operations commutative ● Data is so cool!
  • 33. More reading ● Log Structured Storage: ● http://blog.notdot.net/2009/12/Damn-Cool-Algorithms-Log-structured- storage ● Lattice data structures and CALM theorem: ● http://db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf ● Bloom: ● http://www.bloom-lang.net/ ● Ops: Riak in the Cloud ● https://speakerdeck.com/u/randommood/p/getting-starte
  • 34. Even more reading ● http://en.wikipedia.org/wiki/Multiversion_concurrency_control ● http://en.wikipedia.org/wiki/Monotonic_function ● http://en.wikipedia.org/wiki/Commutative_property ● http://en.wikipedia.org/wiki/CAP_theorem ● http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing ● http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf ● http://en.wikipedia.org/wiki/Vector_clock

Notas do Editor

  1. - What's a distributed system? - Short answer: "big data" - Lots of machines, geographically distributed - Actual answer: any system where events are not global - Can a read and write happen at the same time? == Distributed - Mostly things are queued - Or in database systems, it's fudged -- no lock, so no problem
  2. - Why are they good? - Centralized systems scale poorly & expensively - More locks, more contention - Really fast hardware - Vertical scaling - Diminishing returns -- will always eventually fail - Distributed systems scale well & cheaply - Lots of cheap hardware - No locks, no contention - Linear scaling -- can theoretically scale indefinitely
  3. - So what's the catch? - Consistency - In a centralized system consistency is simple: single source of truth - The problem is writing to it performantly - In a distributed system writes are really fast - But the definition of "truth" is much, much harder
  4. - CAP theorem - Consistency (all nodes see the same data at the same time) - Availability (a guarantee that every request receives a response about whether it was successful or failed) - Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) - Pick 2 - But actually it's usually a sliding scale
  5. - P: No partition tolerance = centralized database - Can't connect to read or write? You're broken. - Replication log got corrupted? You're broken. <img: welcome to our ool>
  6. - A: No availability guarantee = guessing - Read or write didn't work: try again - Cost/benefit calculation -- everything is unavailable *sometimes* - High-volume logs, statistics - Google BigTable locks data on write, will throw errors if you try to read it - Mars orbiters! Not all the data makes it back, and that's okay.
  7. - C: Lower consistency = Amazon S3, Riak, other distributed systems - "Eventual" consistency - Write will work, or definitely fail - Reads will work, but might not be "true" - Keep retrying for the truth
  8. - Why is this a big deal now? - The last 10 years have been about systems getting so big that P has become a bigger and bigger problem - Network was expensive, now it's cheap - And everything is networked - Storage was expensive, now it's cheap - Sacrificing A has been the accidental solution - Instead we can deliberately dial down C to get bigger
  9. - Ways to get to eventual consistency - There are a ton! - App level: - Write locking - Last write wins - Infrastructure level: - Log structured storage, multiversion concurrency control - Vector clocks and siblings - New: language level! - Bloom
  10. - Eventual consistency at write time: 1 - Write-time locking - Like a centralized database, except reads are okay with stale data - Slower writes, potential points of failure - Cheap, fast reads
  11. - Eventual consistency at write time: 2 - Last write wins - This is Amazon S3. - Relies on accurate clocks - Cheap reads and writes - Can lose data! - Okay for image files, bad for payment processing
  12. - Side note: twitter is eventually consistent - Your timeline doesn't always turn up exactly in order - Older tweets can slot themselves in - Tweets can disappear - Two new tweets can never collide - This is a form of eventual consistency, last write wins, but no conflicts
  13. - A consistency approach: log-structured storage - Also called append-only databases - Eventual consistency where *consistency* is important, but *currency* is not <diagram>
  14. - How LSS works - Each write is appended - Indexes are also appended - To get a value, consult the index - As the data grows, throw away older values - Index doesn't need to be updated as often - If you find operations before the index, rebuild an index from them - Relies of lots of really cheap storage - But it turns out we have that!
  15. - Why is this good? - Don't have to care about the size or schema of the object - Deleting old objects is automatic - Can't corrupt the index - Reads and writes are cheap - Point-in-time consistency is automatic: just read values older than the one you started with - BUT: you still could be behind reality
  16. - Another consistency approach: vector clocks - Eventual consistency where consistency and currency both matter - Vector, as in math - It means an array, but mathematicians are annoying <diagram> - Simultaneous writes produce siblings - never any data lost
  17. - Not good enough! - Read consistency: quorum reads - N or more sources must return the same value - Write consistency: quorum writes - N or more nodes must receive the new value
  18. - Pretty good - But man do siblings suck! http://3.bp.blogspot.com/-h60iS4_uwfg/T2B4rntiV4I/AAAAAAAAK9M/Wc_jaXLRowg/s400/istock_brothers-fighting-300x198.jpg
  19. - Dealing with siblings - 1: Consistency at read time through clever resolution - Cheap, fast writes - Potentially slower reads, duplicated dispute resolution logic - Pay on every read - 2: Avoid creating them in the first place - Put a sharded lock in front of your writes - Potentially slower writes - Pay once on write - 3: CRDTs: Commutative Replicated Data Types - monotonic lattices of commutative operations - Don't panic
  20. - Monotonicity - Means operations only affect the data in one way - Simplest example: setter vs. incrementer - Bad: http://en.wikipedia.org/wiki/File:Monotonicity_example3.png - Good: http://en.wikipedia.org/wiki/File:Monotonicity_example1.png - The setter can get it wrong, destroy information - The incrementer doesn't need to know the exact value, just that it goes up by one ( Also good: http://en.wikipedia.org/wiki/File:Monotonicity_example2.png ) - Instead of storing values, store operations
  21. - Commutativity - Means the order of operations isn't important - 1 + 5 + 10 == 10 + 5 + 1 - Also: (1+5) + 10 == 1 + (5+10) - Means you don't need to know what order the operations happened in - Just that they happened
  22. - Lattices - A data structure consisting of a set of operatios - Like vector clocks, a (partial) order of operations - Doesn't have to be exact - Just enough to able to avoid having to re-run every operation every time
  23. - Put it all together: CRDTs - Commutative Replicated Data Types - Each node stores operations in a lattice - As data is distributed, lattices are merged - Because operations are commutative, collisions are okay - Because the exact order is irrelevant
  24. - CRDTs are a monotonic data structure - Each new operation only adds information - It's never taken away or destroyed - This is really exciting! - It means we don't have to build application logic to handle it - Just get your data types right, and the database will sort it out - Enables radically distributed systems
  25. - Crazy future shit: Bloom - A language where all the operations available are monotonic, commutative - Calls to non-monotonic operations are special - Allows for compiler-level analysis of distributed code - Flag in advance whether or not you are safe, where you need coordination, and what type - Crazy shit
  26. - In summary: - Big data is easy - Distributed systems are the answer - Distribution makes consistency harder in exchange for better partition - The solution may be changing the way data is stored - Don't store a value, store a sequence of operations - Make the operations commutative, the structure monotonic - Pretty cool stuff
  27. Log Structured Storage: http://blog.notdot.net/2009/12/Damn-Cool-Algorithms-Log-structured-storage Lattice data structures and CALM theorem: http://db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf Bloom: http://www.bloom-lang.net/ Ops: Riak in the Cloud https://speakerdeck.com/u/randommood/p/getting-starte
  28. Other sources: http://en.wikipedia.org/wiki/Multiversion_concurrency_control http://en.wikipedia.org/wiki/Monotonic_function http://en.wikipedia.org/wiki/Commutative_property http://en.wikipedia.org/wiki/CAP_theorem http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf http://en.wikipedia.org/wiki/Vector_clock