A short talk on how Cassandra deals with various failure modes. Discussion of replication and consistency levels and how they can be used to survive many kinds of failure. Ends with explanation of recovery methods - repair, hinted handoff and read repair.
4. Failures are the norm
• With more than a few nodes, something
goes wrong all the time
• Don’t want to be down all the time
Tuesday, 6 September 2011
5. Failure causes
• Hardware failure
• Bug
• Power
• Natural disaster
Tuesday, 6 September 2011
6. Failure modes
• Data centre failure
• Node failure
• Disk failure
Tuesday, 6 September 2011
7. Failure modes
• Data centre failure
• Node failure
• Disk failure
• Temporary
• Permanent
Tuesday, 6 September 2011
8. Failure modes
• Network failure
• One node
• Network partition
• Whole data centre
Tuesday, 6 September 2011
10. Failure modes
• Want a system that can tolerate all the
above failures
• Make assumptions about probabilities of
multiple events
• Be careful when assuming independence
Tuesday, 6 September 2011
11. Solutions
• Do nothing
• Make boxes bullet proof
• Replication
Tuesday, 6 September 2011
13. How maintain
availability in the
presence of failure?
Tuesday, 6 September 2011
14. Replication
• Buy cheap nodes and cheap disks
• Store multiple copies of the data
• Don’t care if some disappear
Tuesday, 6 September 2011
15. Replication
• What about consistency?
• What if I can’t tolerate out-of-date reads?
• How restore a replica?
Tuesday, 6 September 2011
16. RF and CL
• Replication factor
• How many copies
• How much failure can tolerate
• Consistency Level
• How many nodes must be contactable
for operation to succeed
Tuesday, 6 September 2011
17. Simple example
• Replication factor 3
• Uniform network topology
• Read and write at CL.QUORUM
• Strong consistency
• Available if any one node is down
• Can recover if any two nodes fail
Tuesday, 6 September 2011
18. In general
• RF N, reads and writes at CL.QUORUM
• Available if ceil(N/2)-1 nodes fail
• Can recover if N-1 nodes fail
Tuesday, 6 September 2011
19. Multi data centre
• Cassandra knows location of hosts
• Through the snitch
• Can ensure replicas in each DC
• NetworkTopologyStrategy
• => can cope with whole DC failure
Tuesday, 6 September 2011
23. Automatic processes
• Eventually moves replicas towards
consistency
• The ‘eventual’ in ‘eventual consistency’
Tuesday, 6 September 2011
24. Hinted Handoff
• Hints
• Stored on any node
• When a node is temporarily unavailable
• Delivered when the node comes back
• Can use CL.ANY
• Writes not immediately readable
Tuesday, 6 September 2011
25. Read Repair
• Since done a read, might as well repair any
old copies
• Compare values, update any out of sync
Tuesday, 6 September 2011
27. Repair: method
• Ensures a node is up to date
• Run ‘nodetool -h <node> repair’
• Reads through entire data on the node
• Builds a Merkel tree
• Compares with replicas
• Streams differences
Tuesday, 6 September 2011
28. Repair: when
• After node has been down a long time
• After increasing replication factor
• Every 10 days to ensure tombstones are
propagated
• Can be used to restore a failed node
Tuesday, 6 September 2011
29. Replace a node: method
• Bootstrap new node with <old_token>-1
• Tell existing nodes old node is dead
• nodetool remove
Tuesday, 6 September 2011
30. Replace a node: when
• Complete node failure
• Cannot replace failed disk
• Corruption
Tuesday, 6 September 2011
31. Restore from backup:
method
• Stop Cassandra on the node
• Copy SSTables from backup
• Restart Cassandra
• Make take a while reading indexes
Tuesday, 6 September 2011
32. Restore from backup:
when
• Disk failure
• with no RAID rebuild available
• Operator error
• Corruption
• Hacker
Tuesday, 6 September 2011