SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
Introduction to Apache Cassandra 
Luke Tillman (@LukeTillman) 
Language Evangelist at DataStax
Who are you?! 
•Evangelist with a focus on the .NET Community 
•Long-time Developer 
•Recently presented at Cassandra Summit 2014 with Microsoft 
•Very Recent Denver Transplant 
2
DataStax and Cassandra 
•DataStax Enterprise 
–Apache Cassandra, now with more QA! 
–Easy integrations with Solr, Apache Spark, Hadoop 
•Dev and Ops Tooling 
–DevCenter IDE, OpsCenter 
•Open source drivers 
–Java, C#, Python, C++, Ruby, NodeJS 
3
•Unlimited, free use of DataStax Enterprise 
•No limit on number of nodes or other hidden restrictions 
•If you’re a startup, it’s free. 
•Requirements: 
–< $2M annual revenue, < $20M capital raised 
4 
www.datastax.com/startups
1 
What is Cassandra? 
2 
How does it work? 
3 
Cassandra Query Language (CQL) 
4 
Who’s using it? 
5 
Questions 
5
What is Cassandra? 
6
What is Cassandra? 
•A Linearly Scaling and Fault Tolerant Distributed Database 
•Fully Distributed 
–Data spread over many nodes 
–All nodes participate in a cluster 
–All nodes are equal 
–No SPOF (shared nothing) 
7
What is Cassandra? 
•Linearly Scaling 
–Have More Data? Add more nodes. 
–Need More Throughput? Add more nodes. 
8 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
What is Cassandra? 
•Fault Tolerant 
–Nodes Down != Database Down 
–Datacenter Down != Database Down 
9
What is Cassandra? 
•Fully Replicated 
•Clients write local 
•Data syncs across WAN 
•Replication Factor per DC 
10 
US 
Europe 
Client
Cassandra and the CAP Theorem 
•The CAP Theorem limits what distributed systems can do 
•Consistency 
•Availability 
•Partition Tolerance 
•Limits? “Pick 2 out of 3” 
11
Cassandra and the CAP Theorem 
Consistency 
•When I ask the same question to any part of the system, I should get the same answer 
12 
Is he guilty yet? 
No. 
No. 
No. 
Consistent
Cassandra and the CAP Theorem 
Consistency 
•When I ask the same question to any part of the system, I should get the same answer 
13 
Is he guilty yet? 
No. 
Yes. 
Yes. 
Not Consistent
Cassandra and the CAP Theorem 
Availability 
•When I ask a question, I will get an answer 
14 
Is he guilty yet? 
Yes. 
Available
Cassandra and the CAP Theorem 
Availability 
•When I ask a question, I will get an answer 
15 
Is he guilty yet? 
I don’t know, we have to wait for Dreamy to wake up. 
Not Available
Cassandra and the CAP Theorem 
Partition Tolerance 
•I can ask questions even when the system is having intra-system communication problems. 
16 
Is he guilty yet? 
Tolerant 
No. 
Team Tyrion 
Team Cersei
Cassandra and the CAP Theorem 
Partition Tolerance 
•I can ask questions even when the system is having intra-system communication problems. 
17 
Is he guilty yet? 
Not Tolerant 
I’m not sure without asking them and we’re not speaking (I’m pretty sure that one helped kill my sister). 
Team Tyrion 
Team Cersei
Cassandra and the CAP Theorem 
•Cassandra is an AP system that is Eventually Consistent 
18 
Is he guilty yet? 
No. 
Wait, he’s going to take the black. Yes. 
No. 
Eventually Consistent
Cassandra and the CAP Theorem 
•Cassandra is an AP system that is Eventually Consistent 
19 
Is he guilty yet? 
Yes. 
Yes. 
Eventually Consistent 
Yes.
How does it work? 
20
Two knobs control Cassandra fault tolerance 
•Replication Factor (server side) 
–How many copies of the data should exist? 
21 
Client 
B 
AD 
C 
AB 
A 
CD 
D 
BC 
Write A 
RF=3
Two knobs control Cassandra fault tolerance 
•Consistency Level (client side) 
–How many replicas do we need to hear from before we acknowledge? 
22 
Client 
B 
AD 
C AB 
A 
CD 
D 
BC 
Write A 
CL=QUORUM 
Client 
B 
AD 
C 
AB 
A CD 
D 
BC 
Write A 
CL=ONE
Consistency Levels 
•Applies to both Reads and Writes (i.e. is set on each query) 
•ONE – one replica from any DC 
•LOCAL_ONE – one replica from local DC 
•QUORUM – 51% of replicas from any DC 
•LOCAL_QUORUM – 51% of replicas from local DC 
•ALL – all replicas 
•TWO 
23
Consistency Level and Speed 
•How many replicas we need to hear from can affect how quickly we can read and write data in Cassandra 
24 
Client 
B 
AD 
C AB 
A 
CD 
D 
BC 
5 μs ack 
300 μs ack 
12 μs ack 
12 μs ack 
Read A 
(CL=QUORUM)
Consistency Level and Availability 
•Consistency Level choice affects availability 
•For example, QUORUM can tolerate one replica being down and still be available (in RF=3) 
25 
Client 
B 
AD 
C 
AB 
A CD 
D 
BC 
A=2 
A=2 
A=2 
Read A 
(CL=QUORUM)
Consistency Level and Eventual Consistency 
•Cassandra is an AP system that is Eventually Consistent so replicas may disagree 
•Column values are timestamped 
•In Cassandra, Last Write Wins (LWW) 
26 
Client 
B AD 
C AB 
A 
CD 
D 
BC 
A=2 
Newer 
A=1 Older 
A=2 
Read A 
(CL=QUORUM) 
Christos from Netflix: “Eventual Consistency != Hopeful Consistency” https://www.youtube.com/watch?v=lwIA8tsDXXE
Writes in the cluster 
•Fully distributed, no SPOF 
•Node that receives a request is the Coordinator for request 
•Any node can act as Coordinator 
27 
Client 
B 
AD 
C 
AB 
A CD 
D BC 
Write A 
(CL=ONE) 
Coordinator Node
Writes in the cluster – Data Distribution 
•Partition Key determines node placement 
28 
Partition Key 
id='pmcfadin' 
lastname='McFadin' 
id='jhaddad' 
firstname='Jon' 
lastname='Haddad' 
id='ltillman' 
firstname='Luke' 
lastname='Tillman' 
CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) );
Writes in the cluster – Data Distribution 
•The Partition Key is hashed using a consistent hashing function (Murmur 3) and the output is used to place the data on a node 
•The data is also replicated to RF-1 other nodes 
29 
Partition Key 
id='ltillman' 
firstname='Luke' 
lastname='Tillman' 
Murmur3 
id: ltillman 
Murmur3: A 
B 
AD 
C AB 
A 
CD 
D 
BC 
RF=3
Hashing – Back to Reality 
•Back in reality, Partition Keys actually hash to 128 bit numbers 
•Nodes in Cassandra own token ranges (i.e. hash ranges) 
30 
B AD 
C 
AB 
A 
CD 
D BC 
Range 
Start 
End 
A 
0xC000000..1 
0x0000000..0 
B 
0x0000000..1 
0x4000000..0 
C 
0x4000000..1 
0x8000000..0 
D 
0x8000000..1 
0xC000000..0 
Partition Key 
id='ltillman' 
Murmur3 
0xadb95e99da887a8a4cb474db86eb5769
Writes on a single node 
•Client makes a write request 
Client 
UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' 
Disk 
Memory
Writes on a single node 
•Data is appended to the Commit Log 
•Cassandra writes are FAST due to log appended storage 
Client 
UPDATE users 
SET firstname = 'Luke' 
WHERE id = 'ltillman' 
Commit Log 
id='ltillman', firstname='Luke' 
… 
… 
Disk 
Memory
Writes on a single node 
•Data is written to Memtable 
Client 
UPDATE users 
SET firstname = 'Luke' 
WHERE id = 'ltillman' 
Commit Log 
id='ltillman', firstname='Luke' 
… 
… 
Disk 
Memory 
Memtable for Users 
Some Other Memtable 
id='ltillman' 
firstname='Luke' 
lastname='Tillman'
Writes on a single node 
•Server acknowledges to client 
Client 
UPDATE users 
SET firstname = 'Luke' 
WHERE id = 'ltillman' 
Commit Log 
id='ltillman', firstname='Luke' 
… 
… 
Disk 
Memory 
Memtable for Users 
Some Other Memtable 
id='ltillman' 
firstname='Luke' 
lastname='Tillman'
Writes on a single node 
•Once Memtable is full, data is flushed to disk as SSTable (Sorted String Table) 
Client 
UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' 
Data Directory 
Disk 
Memory 
Memtable for Users 
Some Other Memtable 
id='ltillman' 
firstname='Luke' 
lastname='Tillman' 
Some Other SSTable 
SSTable #1 for Users 
SSTable #2 for Users
Compaction 
•Compactions merge and unify data in our SSTables 
•SSTables are immutable, so this is when we consolidate rows 
36 
SSTable #1 for Users 
SSTable #2 for Users 
SSTable #3 for Users 
id='ltillman' 
firstname='Lucas' (timestamp=Older) 
lastname='Tillman' 
id='ltillman' 
firstname='Luke' 
lastname='Tillman' 
id='ltillman' 
firstname='Luke' (timestamp=Newer)
Reads in the cluster 
•Same as writes in the cluster, reads are coordinated 
•Any node can be the Coordinator Node 
37 
Client 
B AD 
C 
AB 
A CD 
D 
BC 
Read A 
(CL=QUORUM) 
Coordinator Node
Reads on a single node 
•Client makes a read request 
38 
Client 
SELECT firstname, lastname FROM users WHERE id = 'ltillman' 
Disk 
Memory
Reads on a single node 
•Data is read from (possibly multiple) SSTables and merged 
•Reads in Cassandra are also FAST but are limited by Disk IO 
39 
Client 
SELECT firstname, lastname FROM users WHERE id = 'ltillman' 
Disk 
Memory 
SSTable #1 for Users 
id='ltillman' 
firstname='Lucas' (timestamp=Older) 
lastname='Tillman' 
SSTable #2 for Users 
id='ltillman' 
firstname='Luke' 
(timestamp=Newer) 
firstname='Luke' 
lastname='Tillman'
Reads on a single node 
•Any unflushed Memtable data is also merged 
40 
Client 
SELECT firstname, lastname 
FROM users 
WHERE id = 'ltillman' 
Disk 
Memory 
firstname='Luke' 
lastname='Tillman' 
Memtable for Users
Reads on a single node 
•Client gets acknowledgement with the data 
41 
Client 
SELECT firstname, lastname 
FROM users 
WHERE id = 'ltillman' 
Disk 
Memory 
firstname='Luke' 
lastname='Tillman'
Compaction - Revisited 
•Compactions merge and unify data in our SSTables, making them important to reads (less SSTables = less to read/merge) 
42 
SSTable #1 for Users 
SSTable #2 for Users 
SSTable #3 for Users 
id='ltillman' 
firstname='Lucas' (timestamp=Older) 
lastname='Tillman' 
id='ltillman' 
firstname='Luke' 
lastname='Tillman' 
id='ltillman' 
firstname='Luke' (timestamp=Newer)
Cassandra Query Language (CQL) 
43
Data Structures 
•Keyspace is like RDBMS Database or Schema 
•Like RDBMS, Cassandra uses Tables to store data 
•Partitions can have one row (narrow) or multiple rows (wide) 
44 
Keyspace 
Tables 
Partitions 
Rows
Schema Definition (DDL) 
•Easy to define tables for storing data 
•First part of Primary Key is the Partition Key 
CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
Schema Definition (DDL) 
•One row per partition (familiar) 
CREATE TABLE videos ( 
videoid uuid, 
userid uuid, 
name text, 
description text, 
tags set<text>, 
added_date timestamp, 
PRIMARY KEY (videoid) 
); 
name 
... 
Keyboard Cat 
... 
Nyan Cat 
... 
Original Grumpy Cat 
... 
videoid 
689d56e5- … 
93357d73- … 
d978b136- …
Clustering Columns 
•Second part of Primary Key is Clustering Columns 
•Clustering columns affect ordering of data (on disk) 
•Multiple rows per partition 
47 
CREATE TABLE comments_by_video ( 
videoid uuid, 
commentid timeuuid, 
userid uuid, 
comment text, 
PRIMARY KEY (videoid, commentid) 
) WITH CLUSTERING ORDER BY (commentid DESC);
Clustering Columns – Wide Rows (Partitions) 
•Use of Clustering Columns is where the term “Wide Rows” comes from 
48 
videoid='0fe6a...' 
userid= 
'ac346...' 
comment= 'Awesome!' 
commentid='82be1...' 
(10/1/2014 9:36AM) 
userid= 'f89d3...' 
comment= 
'Garbage!' 
commentid='765ac...' (9/17/2014 7:55AM) 
CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
Inserts and Updates 
•Use INSERT or UPDATE to add and modify data 
•Both will overwrite data (no constraints like RDBMS) 
•INSERT and UPDATE functionally equivalent 
49 
INSERT INTO comments_by_video ( 
videoid, commentid, userid, comment) 
VALUES ( 
'0fe6a...', '82be1...', 'ac346...', 'Awesome!'); 
UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';
TTL and Deletes 
•Can specify a Time to Live (TTL) in seconds when doing an INSERT or UPDATE 
•Use DELETE statement to remove data 
•Can optionally specify columns to remove part of a row 
50 
INSERT INTO comments_by_video ( ... ) 
VALUES ( ... ) 
USING TTL 86400; 
DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';
Querying 
•Use SELECT to get data from your tables 
•Always include Partition Key and optionally Clustering Columns 
•Can use ORDER BY and LIMIT 
•Use range queries (for example, by date) to slice partitions 
51 
SELECT * FROM comments_by_video 
WHERE videoid = 'a67cd...' 
LIMIT 10;
Cassandra Data Modeling 
•Requires a different mindset than RDBMS modeling 
•Know your data and your queries up front 
•Queries drive a lot of the modeling decisions (i.e. “table per query” pattern) 
•Denormalize/Duplicate data at write time to do as few queries as possible come read time 
•Remember, disk is cheap and writes in Cassandra are FAST 
52
Cassandra Data Modeling – A Quick Example 
•Users need to be looked up by a unique Id, but when logging in, need to look them up by email address 
•Some data is duplicated (email, userid) but that’s OK 
53 
CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, PRIMARY KEY (userid) ); 
CREATE TABLE users_by_email ( 
email text, 
password text, 
userid uuid, 
PRIMARY KEY (email) 
);
Who’s using it? 
54
Cassandra Adoption
Some Common Use Case Categories 
•Product Catalogs and Playlists 
•Internet of Things (IoT) and Sensor Data 
•Messaging (emails, IMs, alerts, comments) 
•Recommendation and Personalization 
•Fraud Detection 
•Time series and temporal ordered data 
http://planetcassandra.org/apache-cassandra-use-cases/
The “Slide Heard Round the World” 
•From Cassandra Summit 2014, got a lot of attention 
•75,000+ nodes 
•10s of PBs of data 
•Millions ops/s 
•One of the largest known Cassandra deployments 
57
Spotify 
•Streaming music web service 
•> 24,000,000 music tracks 
•> 50TB of data in Cassandra 
Why Cassandra? 
•Was PostgreSQL, but hit scaling problems 
•Multi Datacenter Availability 
•Integration with Spark for data processing and analytics 
Usage 
•Catalog 
•User playlists 
•Artists following 
•Radio Stations 
•Event notifications 
58 
http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/
eBay 
•Online auction site 
•> 250TB of data, dozens of nodes, multiple data centres 
•> 6 billion writes, > 5 billion reads per day 
Why Cassandra? 
•Low latency, high scale, multiple data centers 
•Suited for graph structures using wide rows 
Usage 
•Building next generation of recommendation engine 
•Storing user activity data 
•Updating models of user interests in real time 
59 
http://planetcassandra.org/blog/5-minute-c-interview-ebay/
FullContact 
•Contact management: from multiple sources, sync, de-dupe, APIs available 
•2 clusters, dozens of nodes, running in AWS 
•Based here in Denver 
Why Cassandra? 
•Migated from MongoDB after running into scaling issues 
•Operational simplicity 
•Resilience and Availability 
Usage 
•Person API (search by email, Twitter handle, Facebook, or phone) 
•Searched data from multiple sources (ingested by Hadoop M/R jobs) 
•Resolved profiles 
60 
http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/
Instagram 
•Photo-sharing, video-sharing and social networking service 
•Originally AWS (Now Facebook data centers?) 
•> 20k writes/second, >15k reads/second 
Why Cassandra? 
•Migrated from Redis (problems keeping everything in memory) 
•No painful “sharding” process 
•75% reduction in costs 
Usage 
•Auditing information – security, integrity, spam detection 
•News feed (“inboxes” or activity feed) 
–Likes, Follows, etc. 
61 
http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/ Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY
Netflix 
•TV and Movie streaming service 
•> 2700+ nodes on over 90 clusters 
•4 Datacenters 
•> 1 Trillion operations per day 
Why Cassandra? 
•Migrated from Oracle 
•Massive amounts of data 
•Multi datacenter, No SPOF 
•No downtime for schema changes 
Usage 
•Everything! (Almost – 95% of DB use) 
•Example: Personalization 
–What titles do you play? 
–What do you play before/after? 
–Where did you pause? 
–What did you abandon watching after 5 minutes? 
62 
http://planetcassandra.org/blog/case-study-netflix/ Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA
Questions? 
Follow me for updates or to ask questions later: @LukeTillman 
63

Mais conteúdo relacionado

Mais procurados

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra 2012
Cassandra 2012Cassandra 2012
Cassandra 2012beobal
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Michaël Figuière
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 

Mais procurados (20)

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra 2012
Cassandra 2012Cassandra 2012
Cassandra 2012
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
ION Bangladesh - DANE, DNSSEC, and TLS Testing in the Go6lab
ION Bangladesh - DANE, DNSSEC, and TLS Testing in the Go6labION Bangladesh - DANE, DNSSEC, and TLS Testing in the Go6lab
ION Bangladesh - DANE, DNSSEC, and TLS Testing in the Go6lab
 
Cassandra NoSQL
Cassandra NoSQLCassandra NoSQL
Cassandra NoSQL
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 

Destaque

From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraLuke Tillman
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with CassandraLuke Tillman
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraLuke Tillman
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Luke Tillman
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraLuke Tillman
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Luke Tillman
 

Destaque (9)

From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
 
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)
 
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 

Semelhante a Introduction to Apache Cassandra

Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Russell Spitzer
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Cassandra - decentralized structured database
Cassandra - decentralized structured databaseCassandra - decentralized structured database
Cassandra - decentralized structured databaseHuynh Thai Bao
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...Dataconomy Media
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
Things YouShould Be Doing When Using Cassandra Drivers
Things YouShould Be Doing When Using Cassandra DriversThings YouShould Be Doing When Using Cassandra Drivers
Things YouShould Be Doing When Using Cassandra DriversRebecca Mills
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
 
Using Apache Cassandra: What is this thing, and how do I use it?
Using Apache Cassandra: What is this thing, and how do I use it?Using Apache Cassandra: What is this thing, and how do I use it?
Using Apache Cassandra: What is this thing, and how do I use it?jeremiahdjordan
 

Semelhante a Introduction to Apache Cassandra (20)

Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
 
Devops kc
Devops kcDevops kc
Devops kc
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Cassandra - decentralized structured database
Cassandra - decentralized structured databaseCassandra - decentralized structured database
Cassandra - decentralized structured database
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Things YouShould Be Doing When Using Cassandra Drivers
Things YouShould Be Doing When Using Cassandra DriversThings YouShould Be Doing When Using Cassandra Drivers
Things YouShould Be Doing When Using Cassandra Drivers
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Using Apache Cassandra: What is this thing, and how do I use it?
Using Apache Cassandra: What is this thing, and how do I use it?Using Apache Cassandra: What is this thing, and how do I use it?
Using Apache Cassandra: What is this thing, and how do I use it?
 

Último

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Último (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Introduction to Apache Cassandra

  • 1. Introduction to Apache Cassandra Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! •Evangelist with a focus on the .NET Community •Long-time Developer •Recently presented at Cassandra Summit 2014 with Microsoft •Very Recent Denver Transplant 2
  • 3. DataStax and Cassandra •DataStax Enterprise –Apache Cassandra, now with more QA! –Easy integrations with Solr, Apache Spark, Hadoop •Dev and Ops Tooling –DevCenter IDE, OpsCenter •Open source drivers –Java, C#, Python, C++, Ruby, NodeJS 3
  • 4. •Unlimited, free use of DataStax Enterprise •No limit on number of nodes or other hidden restrictions •If you’re a startup, it’s free. •Requirements: –< $2M annual revenue, < $20M capital raised 4 www.datastax.com/startups
  • 5. 1 What is Cassandra? 2 How does it work? 3 Cassandra Query Language (CQL) 4 Who’s using it? 5 Questions 5
  • 7. What is Cassandra? •A Linearly Scaling and Fault Tolerant Distributed Database •Fully Distributed –Data spread over many nodes –All nodes participate in a cluster –All nodes are equal –No SPOF (shared nothing) 7
  • 8. What is Cassandra? •Linearly Scaling –Have More Data? Add more nodes. –Need More Throughput? Add more nodes. 8 http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  • 9. What is Cassandra? •Fault Tolerant –Nodes Down != Database Down –Datacenter Down != Database Down 9
  • 10. What is Cassandra? •Fully Replicated •Clients write local •Data syncs across WAN •Replication Factor per DC 10 US Europe Client
  • 11. Cassandra and the CAP Theorem •The CAP Theorem limits what distributed systems can do •Consistency •Availability •Partition Tolerance •Limits? “Pick 2 out of 3” 11
  • 12. Cassandra and the CAP Theorem Consistency •When I ask the same question to any part of the system, I should get the same answer 12 Is he guilty yet? No. No. No. Consistent
  • 13. Cassandra and the CAP Theorem Consistency •When I ask the same question to any part of the system, I should get the same answer 13 Is he guilty yet? No. Yes. Yes. Not Consistent
  • 14. Cassandra and the CAP Theorem Availability •When I ask a question, I will get an answer 14 Is he guilty yet? Yes. Available
  • 15. Cassandra and the CAP Theorem Availability •When I ask a question, I will get an answer 15 Is he guilty yet? I don’t know, we have to wait for Dreamy to wake up. Not Available
  • 16. Cassandra and the CAP Theorem Partition Tolerance •I can ask questions even when the system is having intra-system communication problems. 16 Is he guilty yet? Tolerant No. Team Tyrion Team Cersei
  • 17. Cassandra and the CAP Theorem Partition Tolerance •I can ask questions even when the system is having intra-system communication problems. 17 Is he guilty yet? Not Tolerant I’m not sure without asking them and we’re not speaking (I’m pretty sure that one helped kill my sister). Team Tyrion Team Cersei
  • 18. Cassandra and the CAP Theorem •Cassandra is an AP system that is Eventually Consistent 18 Is he guilty yet? No. Wait, he’s going to take the black. Yes. No. Eventually Consistent
  • 19. Cassandra and the CAP Theorem •Cassandra is an AP system that is Eventually Consistent 19 Is he guilty yet? Yes. Yes. Eventually Consistent Yes.
  • 20. How does it work? 20
  • 21. Two knobs control Cassandra fault tolerance •Replication Factor (server side) –How many copies of the data should exist? 21 Client B AD C AB A CD D BC Write A RF=3
  • 22. Two knobs control Cassandra fault tolerance •Consistency Level (client side) –How many replicas do we need to hear from before we acknowledge? 22 Client B AD C AB A CD D BC Write A CL=QUORUM Client B AD C AB A CD D BC Write A CL=ONE
  • 23. Consistency Levels •Applies to both Reads and Writes (i.e. is set on each query) •ONE – one replica from any DC •LOCAL_ONE – one replica from local DC •QUORUM – 51% of replicas from any DC •LOCAL_QUORUM – 51% of replicas from local DC •ALL – all replicas •TWO 23
  • 24. Consistency Level and Speed •How many replicas we need to hear from can affect how quickly we can read and write data in Cassandra 24 Client B AD C AB A CD D BC 5 μs ack 300 μs ack 12 μs ack 12 μs ack Read A (CL=QUORUM)
  • 25. Consistency Level and Availability •Consistency Level choice affects availability •For example, QUORUM can tolerate one replica being down and still be available (in RF=3) 25 Client B AD C AB A CD D BC A=2 A=2 A=2 Read A (CL=QUORUM)
  • 26. Consistency Level and Eventual Consistency •Cassandra is an AP system that is Eventually Consistent so replicas may disagree •Column values are timestamped •In Cassandra, Last Write Wins (LWW) 26 Client B AD C AB A CD D BC A=2 Newer A=1 Older A=2 Read A (CL=QUORUM) Christos from Netflix: “Eventual Consistency != Hopeful Consistency” https://www.youtube.com/watch?v=lwIA8tsDXXE
  • 27. Writes in the cluster •Fully distributed, no SPOF •Node that receives a request is the Coordinator for request •Any node can act as Coordinator 27 Client B AD C AB A CD D BC Write A (CL=ONE) Coordinator Node
  • 28. Writes in the cluster – Data Distribution •Partition Key determines node placement 28 Partition Key id='pmcfadin' lastname='McFadin' id='jhaddad' firstname='Jon' lastname='Haddad' id='ltillman' firstname='Luke' lastname='Tillman' CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) );
  • 29. Writes in the cluster – Data Distribution •The Partition Key is hashed using a consistent hashing function (Murmur 3) and the output is used to place the data on a node •The data is also replicated to RF-1 other nodes 29 Partition Key id='ltillman' firstname='Luke' lastname='Tillman' Murmur3 id: ltillman Murmur3: A B AD C AB A CD D BC RF=3
  • 30. Hashing – Back to Reality •Back in reality, Partition Keys actually hash to 128 bit numbers •Nodes in Cassandra own token ranges (i.e. hash ranges) 30 B AD C AB A CD D BC Range Start End A 0xC000000..1 0x0000000..0 B 0x0000000..1 0x4000000..0 C 0x4000000..1 0x8000000..0 D 0x8000000..1 0xC000000..0 Partition Key id='ltillman' Murmur3 0xadb95e99da887a8a4cb474db86eb5769
  • 31. Writes on a single node •Client makes a write request Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Disk Memory
  • 32. Writes on a single node •Data is appended to the Commit Log •Cassandra writes are FAST due to log appended storage Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Commit Log id='ltillman', firstname='Luke' … … Disk Memory
  • 33. Writes on a single node •Data is written to Memtable Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Commit Log id='ltillman', firstname='Luke' … … Disk Memory Memtable for Users Some Other Memtable id='ltillman' firstname='Luke' lastname='Tillman'
  • 34. Writes on a single node •Server acknowledges to client Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Commit Log id='ltillman', firstname='Luke' … … Disk Memory Memtable for Users Some Other Memtable id='ltillman' firstname='Luke' lastname='Tillman'
  • 35. Writes on a single node •Once Memtable is full, data is flushed to disk as SSTable (Sorted String Table) Client UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman' Data Directory Disk Memory Memtable for Users Some Other Memtable id='ltillman' firstname='Luke' lastname='Tillman' Some Other SSTable SSTable #1 for Users SSTable #2 for Users
  • 36. Compaction •Compactions merge and unify data in our SSTables •SSTables are immutable, so this is when we consolidate rows 36 SSTable #1 for Users SSTable #2 for Users SSTable #3 for Users id='ltillman' firstname='Lucas' (timestamp=Older) lastname='Tillman' id='ltillman' firstname='Luke' lastname='Tillman' id='ltillman' firstname='Luke' (timestamp=Newer)
  • 37. Reads in the cluster •Same as writes in the cluster, reads are coordinated •Any node can be the Coordinator Node 37 Client B AD C AB A CD D BC Read A (CL=QUORUM) Coordinator Node
  • 38. Reads on a single node •Client makes a read request 38 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory
  • 39. Reads on a single node •Data is read from (possibly multiple) SSTables and merged •Reads in Cassandra are also FAST but are limited by Disk IO 39 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory SSTable #1 for Users id='ltillman' firstname='Lucas' (timestamp=Older) lastname='Tillman' SSTable #2 for Users id='ltillman' firstname='Luke' (timestamp=Newer) firstname='Luke' lastname='Tillman'
  • 40. Reads on a single node •Any unflushed Memtable data is also merged 40 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory firstname='Luke' lastname='Tillman' Memtable for Users
  • 41. Reads on a single node •Client gets acknowledgement with the data 41 Client SELECT firstname, lastname FROM users WHERE id = 'ltillman' Disk Memory firstname='Luke' lastname='Tillman'
  • 42. Compaction - Revisited •Compactions merge and unify data in our SSTables, making them important to reads (less SSTables = less to read/merge) 42 SSTable #1 for Users SSTable #2 for Users SSTable #3 for Users id='ltillman' firstname='Lucas' (timestamp=Older) lastname='Tillman' id='ltillman' firstname='Luke' lastname='Tillman' id='ltillman' firstname='Luke' (timestamp=Newer)
  • 44. Data Structures •Keyspace is like RDBMS Database or Schema •Like RDBMS, Cassandra uses Tables to store data •Partitions can have one row (narrow) or multiple rows (wide) 44 Keyspace Tables Partitions Rows
  • 45. Schema Definition (DDL) •Easy to define tables for storing data •First part of Primary Key is the Partition Key CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 46. Schema Definition (DDL) •One row per partition (familiar) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); name ... Keyboard Cat ... Nyan Cat ... Original Grumpy Cat ... videoid 689d56e5- … 93357d73- … d978b136- …
  • 47. Clustering Columns •Second part of Primary Key is Clustering Columns •Clustering columns affect ordering of data (on disk) •Multiple rows per partition 47 CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 48. Clustering Columns – Wide Rows (Partitions) •Use of Clustering Columns is where the term “Wide Rows” comes from 48 videoid='0fe6a...' userid= 'ac346...' comment= 'Awesome!' commentid='82be1...' (10/1/2014 9:36AM) userid= 'f89d3...' comment= 'Garbage!' commentid='765ac...' (9/17/2014 7:55AM) CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 49. Inserts and Updates •Use INSERT or UPDATE to add and modify data •Both will overwrite data (no constraints like RDBMS) •INSERT and UPDATE functionally equivalent 49 INSERT INTO comments_by_video ( videoid, commentid, userid, comment) VALUES ( '0fe6a...', '82be1...', 'ac346...', 'Awesome!'); UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';
  • 50. TTL and Deletes •Can specify a Time to Live (TTL) in seconds when doing an INSERT or UPDATE •Use DELETE statement to remove data •Can optionally specify columns to remove part of a row 50 INSERT INTO comments_by_video ( ... ) VALUES ( ... ) USING TTL 86400; DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';
  • 51. Querying •Use SELECT to get data from your tables •Always include Partition Key and optionally Clustering Columns •Can use ORDER BY and LIMIT •Use range queries (for example, by date) to slice partitions 51 SELECT * FROM comments_by_video WHERE videoid = 'a67cd...' LIMIT 10;
  • 52. Cassandra Data Modeling •Requires a different mindset than RDBMS modeling •Know your data and your queries up front •Queries drive a lot of the modeling decisions (i.e. “table per query” pattern) •Denormalize/Duplicate data at write time to do as few queries as possible come read time •Remember, disk is cheap and writes in Cassandra are FAST 52
  • 53. Cassandra Data Modeling – A Quick Example •Users need to be looked up by a unique Id, but when logging in, need to look them up by email address •Some data is duplicated (email, userid) but that’s OK 53 CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, PRIMARY KEY (userid) ); CREATE TABLE users_by_email ( email text, password text, userid uuid, PRIMARY KEY (email) );
  • 56. Some Common Use Case Categories •Product Catalogs and Playlists •Internet of Things (IoT) and Sensor Data •Messaging (emails, IMs, alerts, comments) •Recommendation and Personalization •Fraud Detection •Time series and temporal ordered data http://planetcassandra.org/apache-cassandra-use-cases/
  • 57. The “Slide Heard Round the World” •From Cassandra Summit 2014, got a lot of attention •75,000+ nodes •10s of PBs of data •Millions ops/s •One of the largest known Cassandra deployments 57
  • 58. Spotify •Streaming music web service •> 24,000,000 music tracks •> 50TB of data in Cassandra Why Cassandra? •Was PostgreSQL, but hit scaling problems •Multi Datacenter Availability •Integration with Spark for data processing and analytics Usage •Catalog •User playlists •Artists following •Radio Stations •Event notifications 58 http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/
  • 59. eBay •Online auction site •> 250TB of data, dozens of nodes, multiple data centres •> 6 billion writes, > 5 billion reads per day Why Cassandra? •Low latency, high scale, multiple data centers •Suited for graph structures using wide rows Usage •Building next generation of recommendation engine •Storing user activity data •Updating models of user interests in real time 59 http://planetcassandra.org/blog/5-minute-c-interview-ebay/
  • 60. FullContact •Contact management: from multiple sources, sync, de-dupe, APIs available •2 clusters, dozens of nodes, running in AWS •Based here in Denver Why Cassandra? •Migated from MongoDB after running into scaling issues •Operational simplicity •Resilience and Availability Usage •Person API (search by email, Twitter handle, Facebook, or phone) •Searched data from multiple sources (ingested by Hadoop M/R jobs) •Resolved profiles 60 http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/
  • 61. Instagram •Photo-sharing, video-sharing and social networking service •Originally AWS (Now Facebook data centers?) •> 20k writes/second, >15k reads/second Why Cassandra? •Migrated from Redis (problems keeping everything in memory) •No painful “sharding” process •75% reduction in costs Usage •Auditing information – security, integrity, spam detection •News feed (“inboxes” or activity feed) –Likes, Follows, etc. 61 http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/ Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY
  • 62. Netflix •TV and Movie streaming service •> 2700+ nodes on over 90 clusters •4 Datacenters •> 1 Trillion operations per day Why Cassandra? •Migrated from Oracle •Massive amounts of data •Multi datacenter, No SPOF •No downtime for schema changes Usage •Everything! (Almost – 95% of DB use) •Example: Personalization –What titles do you play? –What do you play before/after? –Where did you pause? –What did you abandon watching after 5 minutes? 62 http://planetcassandra.org/blog/case-study-netflix/ Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA
  • 63. Questions? Follow me for updates or to ask questions later: @LukeTillman 63