Cassandra

CASSANDRA DATABASE
A NOSQL DATABASE

Agenda
What is Cassandra?
History
Architecture
Key Features and Benefits
Who’s using Cassandra?
Where to get Cassandra

Definition of Cassandra
Apache Cassandra™ is a free
Distributed…
High performance…
Extremely scalable…
Fault tolerant (i.e. no single point of failure)…
post-relational database solution. Cassandra can serve as
both real-time datastore (the “system of record”) for
online/transactional applications, and as a read-intensive
database for business intelligence systems.

The History of Cassandra
Bigtable Dynamo

Architecture Overview
 Cassandra was designed with the understanding that
system/hardware failures can and do occur
 Peer-to-peer, distributed system
 All nodes the same
 Data partitioned among all nodes in the cluster
 Custom data replication to ensure fault tolerance
 Read/Write-anywhere design

 Each node communicates with each other through the Gossip
protocol, which exchanges information across the cluster every
second
 A commit log is used on each node to capture write activity.
Data durability is assured
 Data also written to an in-memory structure (memtable) and
then to disk once the memory structure is full (an SStable)

 The schema used in Cassandra is mirrored after Google
Bigtable. It is a row-oriented, column structure
 A keyspace is akin to a database in the RDBMS world
 A column family is similar to an RDBMS table but is more
flexible/dynamic
 A row in a column family is indexed by its key. Other columns
may be indexed as well
ID Name SSN DOB
Portfolio Keyspace
Customer Column Family

Data Model
keyspace
settings
column family
settings
column
name value timestamp
* Figure taken from Eben Hewitt’s (author of Oreilly’s Cassandra book) slides.

 Partitioning
How data is partitioned across nodes
 Replication
How data is duplicated across nodes
 Cluster Membership
How nodes are added, deleted to the cluster
System Architecture

• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data partition
is used to assign it to a node in the ring.
• Hashing rounds off after certain value to support
ring structure.
• Lightly loaded nodes moves position to alleviate
highly loaded nodes.
Partitioning

Replication
 Each data item is replicated at N (replication factor)
nodes.
 Different Replication Policies
◦ Rack Unaware – replicate data at N-1 successive nodes after
its coordinator
◦ Rack Aware – uses ‘Zookeeper’ to choose a leader which tells
nodes the range they are replicas for
◦ Datacenter Aware – similar to Rack Aware but leader is
chosen at Datacenter level instead of Rack level.

01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)
12
Partitioning and Replication
* Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.

Gossip Protocols
• Network Communication protocols inspired for real life
rumour spreading.
• Periodic, Pairwise, inter-node communication.
• Low frequency communication ensures low cost.
• Random selection of peers.
• Example – Node A wish to search for pattern in data
– Round 1 – Node A searches locally and then gossips with node
B.
– Round 2 – Node A,B gossips with C and D.
– Round 3 – Nodes A,B,C and D gossips with 4 other nodes ……
• Round by round doubling makes protocol very robust.

Gossip Protocols
• Variety of Gossip Protocols exists
– Dissemination protocol
• Event Dissemination: multicasts events via gossip. high latency might
cause network strain.
• Background data dissemination: continuous gossip about information
regarding participating nodes
– Anti Entropy protocol
• Used to repair replicated data by comparing and reconciling
differences. This type of protocol is used in Cassandra to repair data
in replications.

Cluster Management
 Uses Scuttleback (a Gossip protocol) to manage
nodes.
 Uses gossip for node membership and to transmit
system control state.
 Node Fail state is given by variable ‘phi’ which tells
how likely a node might fail (suspicion level)
instead of simple binary value (up/down).
 This type of system is known as Accrual Failure
Detector.

Why Cassandra?
 Gigabyte to Petabyte scalability
 Linear performance gains through adding nodes
 No single point of failure
 Easy replication / data distribution
 Multi-data center and Cloud capable
 No need for separate caching layer
 Tunable data consistency
 Flexible schema design
 Data Compression
 CQL language (like SQL)
 Support for key languages and platforms
 No need for special hardware or software

Big Data Scalability
 Capable of comfortably scaling to petabytes
 New nodes = Linear performance increases
 Add new nodes online
1
2
Double Throughput
Capabilities
1
2
3
4

No Single Point of Failure
 All nodes the same
 Customized replication affords tunable data
redundancy
 Read/write from any node
 Can replicate data among different physical data
center racks

Easy Replication / Data Distribution
 Transparently handled by Cassandra
 Multi-data center capable
 Exploits all the benefits of Cloud computing
 Able to do hybrid Cloud/On-premise setup

No Need for Caching Software
 Peer-to-peer architecture removes need for special
caching layer and the programming that goes with it
 The database cluster uses the memory from all
participating nodes to cache the data assigned to each
node
 No irregularities between a memory cache and
database are encountered
Database Server
Memcached Servers
Application Servers Writes
Reads

Tunable Data Consistency
 Choose between strong and eventual consistency (All
to any node responding) depending on the need
 Can be done on a per-operation basis, and for both
reads and writes
 Handles Multi-data center operations
1
2
3
4
5
6
 Any
 One
 Quorum
 Local_Quorum
 Each_Quorum
 All
Writes
 One
 Quorum
 Local_Quorum
 Each_Quorum
 All
Reads

Flexible Schema
 Dynamic schema design allows for much more flexible
data storage than rigid RDBMS
 Handles structured, semi-structured, and unstructured
data. Counters also supported
 No offline/downtime for schema changes
 Supports primary and secondary indexes
ID Name SSN DOB
Portfolio Keyspace
Customer Column Family

Data Compression
 Uses Google’s Snappy data compression algorithm
 Compresses data on a per column family level
 Internal tests at DataStax show up to 80%+
compression of raw data
 No performance penalty (and some increases in
overall performance due to less physical I/O)!

CQL Language
 Very similar to RDBMS SQL syntax
 Create objects via DDL (e.g. CREATE…)
 Core DML commands supported: INSERT, UPDATE,
DELETE
 Query data with SELECT
1
2
3
4
5
6
SELECT *
FROM USERS
WHERE STATE = ‘TX’;

Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest Query
Digest Response Digest Response
Result
Client
Read repair if
digests differ
Read Operation
* Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.

Facebook Inbox Search
• Cassandra developed to address this problem.
• 50+TB of user messages data in 150 node cluster
on which Cassandra is tested.
• Search user index of all messages in 2 ways.
– Term search : search by a key word
– Interactions search : search by a user id
Latency
Stat
Search
Interactions
Term
Search
Min 7.69 ms 7.78 ms
Median 15.69 ms 18.27 ms
Max 26.13 ms 44.41 ms

Comparison with MySQL
• MySQL > 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
• Cassandra > 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms
• Stats provided by Authors using facebook data.

Who’s Using Cassandra?
http://www.datastax.com/cassandrausers#all

Cassandra

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Cassandra

Semelhante a Cassandra (20)

Último

Último (20)

Cassandra