The document provides an overview of Apache Cassandra's architecture and design. It was created to address the needs of building reliable, high-performing, and always-available distributed databases. Cassandra is based on Dynamo and BigTable and uses a distributed hashing technique to partition and replicate data across nodes. It supports configurable replication across multiple data centers for high availability. Writes are sent to the local node and replicated to other nodes based on consistency level, while reads can be served from any replica.
4. Dynamo Paper(2007)
• How do we build a data store that is:
• Reliable
• Performant
• “Always On”
• Nothing new and shiny
• 24 papers cited
Also the basis for Riak and Voldemort
11. Summary
•The evolution of the internet and online data created new
problems
•Apache Cassandra was based on a variety of
technologies to solve these problems
•The goals of Apache Cassandra are all about staying
online and performant
•Apache Cassandra is a database best used for
applications, close to your users
19. Token
Server
•Each partition is a 128 bit value
•Consistent hash between 2-63 and 264
•Each node owns a range of those
values
•The token is the beginning of that
range to the next node’s token value
•Virtual Nodes break these down
further
Data
Token Range
0 …
22. The cluster Server
Token Range
0 0-25
26 26-50
51 51-75
76 76-100
Server
ServerServer
0-25
76-100
26-5051-75
23. Summary
•Tables store rows of data by column
•Partitions are similar data grouped by a partition key
•Keyspaces contain tables and are grouped by data center
•Tokens show node placement in the range of cluster data
24. 4.1.3 Cassandra - Replication, High Availability and Multi-datacenter
29. Consistency level
Consistency Level Number of Nodes Acknowledged
One One - Read repair triggered
Local One One - Read repair in local DC
Quorum 51%
Local Quorum 51% in local DC
36. Summary
•Replication Factor indicates how many times your data is
copied
•Consistency Level specifies how many replicas are
consistent at read or write
•Replication along with Consistency Factor are critical for
uptime
42. Summary
•By default, writes are durable
•Client receives ack when consistency level is achieved
•Reads must always go to disk
•Compaction is data housekeeping