2. Outline
Motivation & Objective
Key Ideas in Dynamo
Simulation Method & Result
Conclusion
3. Motivation
It is all about $!
− Massive scale data in hundreds of nodes
− Commodity hardware infrastructure
− Failure is the norm, not the exception
4. Motivation - Availability
'always-on' experience for end users
− How to handle failures transparently?
− Parity checking or replication?
− Strongly consistent or eventually consistent?
− Conflict resolution: who and when?
5. Motivation - Scalability
$ matters!
Poor performance means losing customers
and money
− Increase capacity easily and incrementally
Over-provisioning means unnecessary cost
− Decrease capacity easily and incrementally
6. Objective
Service is always available for customers with a
guaranteed response time no matter what, and
achieve this with as little $ as possible
7. Key Ideas
A fully decentralized DHT (Distributed Hash Table)
Consistent hashing
− Natural partitioning and LB(division of labor)
− Minimum data migration when node joins/leaves
Replication for fault tolerance
− Quorum techniques: R + W > N
Eventual(weak) consistency model
Conflict resolution
− By application, not Dynamo
− When reading, not writing
8. Simulation - Overview
Performance test tool for
concurrent requests
− Dynamo applications
− Gather and record results
a ring of services as dynamo
nodes
− replication and fault tolerance
A proxy sits between the PT
tool and the ring
− a simple service interface
− requests randomness
− membership discovering
9. Simulation - Availability
When a node leaves, the coordinating node
uses the next available node on the ring
With node replacement, right after a node
leaves the ring (fails), a new node will join the
ring, keeping the number of nodes
unchanged
System load increases gradually (from100 to
200 requests / second)
4 simulation cases
− W=2, N=3 (R=2)
With node replacement (15 nodes)
Without node replacement (15 →
10 nodes)
− W=3, N=3 (R=1)
With node replacement (15 nodes)
Without node replacement (15 →
10 nodes)
10. Simulation - Availability
No failure requests
recorded for all cases,
service remains
available when node
leaves (and joins)
With replacement
nodes, service level
(throughput) is
maintained
A W=2 setting gives
better performance,
while a W=3 setting
provides better fault
tolerance
11. Simulation - Scalability
Scalability: more nodes → larger
capacity
Incremental & dynamic scalability: no
service interruption
System load increases gradually (from
100 to 200 requests / second)
6 simulation cases
− W=2, N=3 (R=2)
10 nodes
From 10 to 15 nodes
15 nodes
− W=3, N=3 (R=1)
10 nodes
From 10 to 15 nodes
15 nodes
12. Simulation - Scalability
A Ring with more
nodes provide greater
capacity (throughput)
than a ring with less
nodes does
Moreover, capacity
(throughput) increased
incrementally
(dynamically) when
more nodes join the
ring, without incurring
service interruption
Higher the W setting,
better fault tolerance,
but worse writing
performance
13. Conclusion
With consistent hashing, the Dynamo model is
able to provide great scalability and availability
Massive scale data storage on large cluster of
commodity infrastructure is possible
A real application: the shopping cart on
www.amazon.com