SlideShare a Scribd company logo
1 of 44
Download to read offline
Cassandra 101
Introduction to Apache Cassandra
What is Cassandra?
● A distributed, columnar database
● Data model inspired by Google BigTable (2006)
● Distribution model inspired by Amazon Dynamo (2007)
● Open Sourced by Facebook in 2008
● Monolithic Kernel written in Java
● Used by Digg, Facebook, Twitter, Reddit, Rackspace,
CloudKick and others
Etymology
● In Greek mythology Cassandra (Also known as Alexandra) was
the daughter of King Priam and Queen Hecuba of Troy
● Her beauty caused Apollo to grant her the gift of prophecy
● When she did not return his love, Apollo placed a curse on her
so that no one would ever believe her predictions
Why Cassandra ?
● Minimal Administration
● No Single Point of Failure
● Scale Horizontally
● Writes are durable
● Optimized for writes
● Consistency is flexible, can be updated
online
● Schema is flexible, can be updated online
● Handles failure gracefully
● Replication is easy, Rack and DC aware
Commercial Support
Data Model
A Column is the basic unit consisting Key, Value and Timestamp
Data Model
A Column is the basic unit consisting Key, Value and Timestamp
RDBMS vs Cassandra
Map<RowKey, SortedMap<ColumnKey,
ColumnValue>>
Cassandra is good at
Reading data from a row in
the order it is stored, i.e. by
Column Name!
Understand the queries you
application requires before
building the data model
Consistent Hashing
Load Balancing in a changing world ...
● Evenly map keys to nodes
● Minimize key movement when
nodes join or leave
The Partitioner:
● RandomPartitioner transforms
Keys to Tokens using MD5
● In C* 1.2 the default hashing is
Murmur3 algorithm
Keys and Tokens?
0 999010
‘fop’ ‘foo’
MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b
MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00
Token Ring.
99 0
‘fop’
token: 10‘foo’
token: 90
Token Ranges (Pre 1.2)
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
Token Ranges With Virtual Nodes in 1.2
Node 1
Node 2
Node 3
● Easier to Enlarge or
shrink the cluster
● The cluster can grow in
steps of 1 node
● Node Recovery is much
more faster
Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
Selects Replication Factor number of nodes
for a row.
Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
SimpleStrategy with RF 3
Replication Strategy
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
NetworkTopolgyStrategy Uses Replication Factor
per Data Center
Node 1
token:0
76-0 1-25
26-5051-75
Node 2
token:25
Node 3
token:50
Node 4
token:75
‘foo’
token 90
EAST WEST
SimpleSnitch
Places all nodes in the same DC & RACK
(Default)
EC2Snitch/EC2MultiRegionSnitch
DC is set to AWS Region and a Rack to
Availability Zone
PropertyFileSnitch
Nodes DC and Racks are maintained in a
property file
GossipPropertyFileSnitch
Uses GOSSIP as first source for node info and
if not available it uses the property file
The Client and the Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’
token 90
Client
Multi DC Client and Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’
token 90
Client
Node 10
Node 20
Gossip
Nodes share information with
small number of neighbours,
who share information with
other small number of
neighbours …
● Used for intra-cluster
communication
● Routes client requests
● Detects nodes failure
● Peers are called by seeds in
config file.
Cassandra Objects
● CommitLog
● MemTable
● SSTable
● Index
● Bloom Filter
Consistency
● CAP theorem
○ Trade consistency for availability
○ Consistency is a choice
* it doesn't matter if you are good at somethings long as you are consistent.
Partition
Consistency
Availability
OR
Level Description
ZERO Cross fingers
ANY 1st to Respond (HH)
ONE, TWO, THREE 1st to Respond
QUORUM N/2+1 replicas
ALL All replicas
WRITE
Level Description
ZERO N/A
ANY N/A
ONE, TWO, THREE nth to Respond
QUORUM* N/2+1
ALL All replicas
READ
Consistency Level
● Specifies for each request
● Number of nodes to wait for
* QUORUM, LOCAL_QUORUM, EACH_QUOROM
Write ‘foo’ at Quorum with Hinted
Handoff
Node 1
Node 3 is
Down
Node 4 holds
‘foo’ for node 3
Node 2
‘foo’
token 90
Client
Read ‘foo’ at Quorum
Node 1
Node 3 is
Down
Node 4 holds
‘foo’ for node 3
Node 2
‘foo’
token 90
Client
Are used to resolve differences
● Stored for each Column Value
● 64bit Integers
Column Node 1 Node 2 Node 3
Vegetable ‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
<missing>
Fruit ‘Apple’
(timestamp 10)
‘banana’
(timestamp 15)
‘Apple’
(timestamp 10)
Column TimeStamps
Strong Consistency
W + R > N
#Write Nodes + #Read Nodes > Replication Factor
● QUORUM Read + QUORUM Write
● ALL Read + ONE Write
● ONE Read + ALL Write
Achieving Consistency
● Consistency Level
● Hinted Handoff
● Read Repair
● Anti Entropy (User triggered Repairs)
Write Path
● Append to Commit Log File
● Merge Columns into Memtable
● Asynchronously flush Memtabe to a
new file (Never update existing files)
● Data is stored in immutable files called
SSTables (Sorted String Tables)
SSTables Files
*-Data.db
*-Index.db
*-Filter.db
(And others)
Read Path
Bloom Filter (cache)
Index/Key Cache
Memory
SStable-1.Data.db
foo:
fruit (ts:10)
apple
vegetable (ts:15)
cucumber
….
….
….
SSTable-1-Index.db
Disk
Bloom Filter (cache)
Index/Key Cache
SStable-2.Data.db
foo:
fruit (ts:10)
apple
vegetable (ts:10)
Pepper
….
….
….
SSTable-2-Index.db
Bloom Filter Bloom Filter
Compactions
Compactions merges truth from multiple
SSTables into one SSTable with the same
truth
(Manual and continuous background process)
Column SSTable 1 SStable 2 New
Vegetable ‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
‘cucumber’
(timestamp 10)
Fruit ‘Apple’
(timestamp 10)
<tombstone>
(timestamp 15)
<tombstone>
(timestamp: 15)
Writes and Reads
Managing Cassandra
● Single configuration file
/etc/cassandra/cassandra.yaml
file
● Single control command
/usr/bin/nodetool
● Monitoring done by DataStax OpsCenter
Troubleshooting Cassandra
Always inspect these files:
● /var/log/cassandra/cassandra.log (Startup)
● /var/log/cassandra/system.log (Normal work)
Backup
Use Cassandra snapshots...
And God said to Noah, Noah make me a backup ... 'cause I shall format
Client (API) Choices
● Thrift, original and still fully supported API:
○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera…
○ Python: Pycassa, Telephus, …
○ Ruby: Fauna
○ PHP: PHP Client Library
○ C#
○ Node.JS
○ GO
○ SImba ODBC
○ C++: LibQtCassandra
○ ORM
○ ….
● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL
CQL3 Create KeySpace
● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh):
● Create a new Keyspace with Replication factor of 3 and NetworkTopology
CREATE KEYSPACE
kenshoo_cass_fans
WITH replication =
{‘class’:’NetworkTopologyStrategy’,
‘us_east_dc’:3};
CQL3 Working with Tables
● CQL3 Example
● Table is a sparse collection of well known ordered columns
CREATE TABLE User
(
user_name text,
password text,
real_name text,
PRIMARY KEY (user_name)
);
---------------------------------------------------------
INSERT INTO User
(user_name, password, real_name)
VALUES
(‘nader’,’sekr8t’,’MR NADER’);
---------------------------------------------------------
SELECT * From User where user_name = ‘NADER’;
user_name| password | real_name
---------+----------+-----------
nader| sekr8t | MR NADER

More Related Content

What's hot

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueDatabricks
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우PgDay.Seoul
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 

What's hot (20)

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 

Similar to Cassandra 101

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2aaronmorton
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010aaronmorton
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache CassandraAlex Thompson
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandrashimi_k
 

Similar to Cassandra 101 (20)

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Cassandra 101

  • 1. Cassandra 101 Introduction to Apache Cassandra
  • 2. What is Cassandra? ● A distributed, columnar database ● Data model inspired by Google BigTable (2006) ● Distribution model inspired by Amazon Dynamo (2007) ● Open Sourced by Facebook in 2008 ● Monolithic Kernel written in Java ● Used by Digg, Facebook, Twitter, Reddit, Rackspace, CloudKick and others
  • 3. Etymology ● In Greek mythology Cassandra (Also known as Alexandra) was the daughter of King Priam and Queen Hecuba of Troy ● Her beauty caused Apollo to grant her the gift of prophecy ● When she did not return his love, Apollo placed a curse on her so that no one would ever believe her predictions
  • 4. Why Cassandra ? ● Minimal Administration ● No Single Point of Failure ● Scale Horizontally ● Writes are durable ● Optimized for writes ● Consistency is flexible, can be updated online ● Schema is flexible, can be updated online ● Handles failure gracefully ● Replication is easy, Rack and DC aware
  • 6. Data Model A Column is the basic unit consisting Key, Value and Timestamp
  • 7. Data Model A Column is the basic unit consisting Key, Value and Timestamp
  • 8. RDBMS vs Cassandra Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
  • 9. Cassandra is good at Reading data from a row in the order it is stored, i.e. by Column Name! Understand the queries you application requires before building the data model
  • 10. Consistent Hashing Load Balancing in a changing world ... ● Evenly map keys to nodes ● Minimize key movement when nodes join or leave
  • 11. The Partitioner: ● RandomPartitioner transforms Keys to Tokens using MD5 ● In C* 1.2 the default hashing is Murmur3 algorithm
  • 12. Keys and Tokens? 0 999010 ‘fop’ ‘foo’ MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00
  • 13. Token Ring. 99 0 ‘fop’ token: 10‘foo’ token: 90
  • 14. Token Ranges (Pre 1.2) Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90
  • 15. Token Ranges With Virtual Nodes in 1.2 Node 1 Node 2 Node 3 ● Easier to Enlarge or shrink the cluster ● The cluster can grow in steps of 1 node ● Node Recovery is much more faster
  • 16. Replication Strategy Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 Selects Replication Factor number of nodes for a row.
  • 17. Replication Strategy Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 SimpleStrategy with RF 3
  • 18. Replication Strategy Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 NetworkTopolgyStrategy Uses Replication Factor per Data Center Node 1 token:0 76-0 1-25 26-5051-75 Node 2 token:25 Node 3 token:50 Node 4 token:75 ‘foo’ token 90 EAST WEST
  • 19. SimpleSnitch Places all nodes in the same DC & RACK (Default)
  • 20. EC2Snitch/EC2MultiRegionSnitch DC is set to AWS Region and a Rack to Availability Zone
  • 21. PropertyFileSnitch Nodes DC and Racks are maintained in a property file
  • 22. GossipPropertyFileSnitch Uses GOSSIP as first source for node info and if not available it uses the property file
  • 23. The Client and the Coordinator Node 1 Node 3 Node 4 Node 2 ‘foo’ token 90 Client
  • 24. Multi DC Client and Coordinator Node 1 Node 3 Node 4 Node 2 ‘foo’ token 90 Client Node 10 Node 20
  • 25. Gossip Nodes share information with small number of neighbours, who share information with other small number of neighbours … ● Used for intra-cluster communication ● Routes client requests ● Detects nodes failure ● Peers are called by seeds in config file.
  • 26. Cassandra Objects ● CommitLog ● MemTable ● SSTable ● Index ● Bloom Filter
  • 27. Consistency ● CAP theorem ○ Trade consistency for availability ○ Consistency is a choice * it doesn't matter if you are good at somethings long as you are consistent. Partition Consistency Availability OR
  • 28. Level Description ZERO Cross fingers ANY 1st to Respond (HH) ONE, TWO, THREE 1st to Respond QUORUM N/2+1 replicas ALL All replicas WRITE Level Description ZERO N/A ANY N/A ONE, TWO, THREE nth to Respond QUORUM* N/2+1 ALL All replicas READ Consistency Level ● Specifies for each request ● Number of nodes to wait for * QUORUM, LOCAL_QUORUM, EACH_QUOROM
  • 29. Write ‘foo’ at Quorum with Hinted Handoff Node 1 Node 3 is Down Node 4 holds ‘foo’ for node 3 Node 2 ‘foo’ token 90 Client
  • 30. Read ‘foo’ at Quorum Node 1 Node 3 is Down Node 4 holds ‘foo’ for node 3 Node 2 ‘foo’ token 90 Client
  • 31. Are used to resolve differences ● Stored for each Column Value ● 64bit Integers Column Node 1 Node 2 Node 3 Vegetable ‘cucumber’ (timestamp 10) ‘cucumber’ (timestamp 10) <missing> Fruit ‘Apple’ (timestamp 10) ‘banana’ (timestamp 15) ‘Apple’ (timestamp 10) Column TimeStamps
  • 32. Strong Consistency W + R > N #Write Nodes + #Read Nodes > Replication Factor ● QUORUM Read + QUORUM Write ● ALL Read + ONE Write ● ONE Read + ALL Write
  • 33. Achieving Consistency ● Consistency Level ● Hinted Handoff ● Read Repair ● Anti Entropy (User triggered Repairs)
  • 34. Write Path ● Append to Commit Log File ● Merge Columns into Memtable ● Asynchronously flush Memtabe to a new file (Never update existing files) ● Data is stored in immutable files called SSTables (Sorted String Tables)
  • 36. Read Path Bloom Filter (cache) Index/Key Cache Memory SStable-1.Data.db foo: fruit (ts:10) apple vegetable (ts:15) cucumber …. …. …. SSTable-1-Index.db Disk Bloom Filter (cache) Index/Key Cache SStable-2.Data.db foo: fruit (ts:10) apple vegetable (ts:10) Pepper …. …. …. SSTable-2-Index.db Bloom Filter Bloom Filter
  • 37. Compactions Compactions merges truth from multiple SSTables into one SSTable with the same truth (Manual and continuous background process) Column SSTable 1 SStable 2 New Vegetable ‘cucumber’ (timestamp 10) ‘cucumber’ (timestamp 10) ‘cucumber’ (timestamp 10) Fruit ‘Apple’ (timestamp 10) <tombstone> (timestamp 15) <tombstone> (timestamp: 15)
  • 39. Managing Cassandra ● Single configuration file /etc/cassandra/cassandra.yaml file ● Single control command /usr/bin/nodetool ● Monitoring done by DataStax OpsCenter
  • 40. Troubleshooting Cassandra Always inspect these files: ● /var/log/cassandra/cassandra.log (Startup) ● /var/log/cassandra/system.log (Normal work)
  • 41. Backup Use Cassandra snapshots... And God said to Noah, Noah make me a backup ... 'cause I shall format
  • 42. Client (API) Choices ● Thrift, original and still fully supported API: ○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera… ○ Python: Pycassa, Telephus, … ○ Ruby: Fauna ○ PHP: PHP Client Library ○ C# ○ Node.JS ○ GO ○ SImba ODBC ○ C++: LibQtCassandra ○ ORM ○ …. ● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL
  • 43. CQL3 Create KeySpace ● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh): ● Create a new Keyspace with Replication factor of 3 and NetworkTopology CREATE KEYSPACE kenshoo_cass_fans WITH replication = {‘class’:’NetworkTopologyStrategy’, ‘us_east_dc’:3};
  • 44. CQL3 Working with Tables ● CQL3 Example ● Table is a sparse collection of well known ordered columns CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name) ); --------------------------------------------------------- INSERT INTO User (user_name, password, real_name) VALUES (‘nader’,’sekr8t’,’MR NADER’); --------------------------------------------------------- SELECT * From User where user_name = ‘NADER’; user_name| password | real_name ---------+----------+----------- nader| sekr8t | MR NADER