2. Agenda
• Session I recap
- Why NoSQL/ Drawback of Relational DB
- Common Characteristics
- Storage Mechanism
- CAP Theorem & Advantages
• Data stax Apache Cassandra Installation
• Cassandra Concepts
3. Features of Cassandra
• Column based storage mechanism
• High Availability
• High Scalability/ Horizontal scaling
• Predictable performance
• No SPOF – Single point of failure
• Multi DC – Data Center/ Multi region availability
• Commodity Hardware
• Easy to manage operationally
5. • Node – One Cassandra instance
• Rack – A Logical set of Nodes
• Data Center – A Logical set of Racks
• Cluster - The full set of nodes which map to a
single complete token ring
6. CQL
• CREATE KEYSPACE “KeySpace Name” WITH
replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’}
• CREATE TABLE tablename( column1 name
datatype PRIMARYKEY, column2 name data
type, column3 name data type, PRIMARY KEY
(column1) )
7.
8. Strategy name Description
Simple Strategy' Specifies a simple replication
factor for the cluster.
Network Topology
Strategy
Using this option, you can set
the replication factor for each
data-center independently.
The replication option is to specify the Replica
Placement strategy and the number of replicas
wanted. The following table lists all the replica
placement strategies.
9. CONSISTENCY
• Consistency levels are available for Read and
Write Operations.
• ANY, ALL, QUORUM([RF/2]+1), EACH, etc
• High Consistency – Low Availability
• Low Consistency – High Availability
10. SEED & CO-ORDINATOR NODE
• Seeds and Coordinators serve different purposes.
• Seed nodes: In general it is recommended to have 2 seeds for
the whole cluster. If you have multi-datacenter cluster then
you may want to distribute the seeds across each datacenter.
• Coordinator nodes: Every node can be a coordinator (as
designed by Cassandra). Coordinator is picked by Cassandra
per request and the only thing you can change is how it is
picked - for example Round-Robin (default) or DC-aware,
LatencyAware. This is found in the cassandra.yaml file.
11.
12. • Maximum columns per row is 2 billion, but in
practical it is about 10 to 20 thousand max
used
• Maximum data size per cell (Column value) is
2 GB, but in practical it is about 10MB used.
14. SNITCHES & GOSSIP
• Snitch - Cassandra does its best not to have
more than one replica on the same rack to
avoid duplicate
• Determines the location of nodes by rack and
data center corresponding to the IP addresses
• Gossip – Once per second each node gossip’s
each other to update themselves
15. • Hinted Hand Off – A recovery mechanism for
writes targeting offline nodes
• Grace time can be maintained yaml file
• Property – max_hint_window_in_ms : 1000
• hinted_handoff enable: true
17. SSTable – Static & Sorted Table
• Immutable data file for row storage
• Partition is spread across multiple SS Table
based on timestamp
• Easy Backup – Delete is marked as
“TombStones”
18. Read Path
• Read Repair – When any node is stale it is
marked as read-repair
• Property – read_repair_chance