SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
Introduction to Cassandra
Nick Bailey
@nickmbailey

Monday, October 28, 13
Who am I?
©2012 DataStax
Monday, October 28, 13

2
What’s DataStax?
©2012 DataStax
Monday, October 28, 13

3
On to the good stuff!
©2012 DataStax
Monday, October 28, 13

4
Why Cassandra?
Cluster Architecture
Node Architecture
5

Data Modeling
Wrap up
©2012 DataStax
Monday, October 28, 13
Why Cassandra?
©2012 DataStax
Monday, October 28, 13

6
Time for buzz
words!

©2012 DataStax
Monday, October 28, 13

Big Data!
NoSQL!

7
Big Data
• Gartner: “...high-volume, high-velocity and
high-variety...”

• 2 sides of ‘big data’
•
•

©2012 DataStax
Monday, October 28, 13

Analytics
Real-time

8
NoSQL
• A terrible label
• Covers a wide range of DBs
•
•
•
•
•

©2012 DataStax
Monday, October 28, 13

Cassandra
Redis
MongoDB
HBase
...

9
Started by Facebook

©2012 DataStax
Monday, October 28, 13

10
Dynamo (Amazon)
+
Big Table (Google)

©2012 DataStax
Monday, October 28, 13

11
©2012 DataStax
Monday, October 28, 13

12
Cassandra is great for...
• Massive, linear scaling

(e.g. CERN hadron collider, Barracuda Networks)

• Extremely heavy writes

(e.g. BlueMountain Capital – financial tick data)

• High availability

(e.g. eBay, Eventbrite, Netflix, SoundCloud,
HeathCare Anytime, Comcast, GoDaddy, Sony
Entertainment Network)

©2012 DataStax
Monday, October 28, 13

13
©2012 DataStax
Monday, October 28, 13

14
©2012 DataStax
Monday, October 28, 13

15
http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html
©2012 DataStax
Monday, October 28, 13

16
9
One size does not fit all
Polyglot persistence

©2012 DataStax
Monday, October 28, 13

17
More Resources
• PlanetCassandra.org
• Blog
• 5 minute interviews

©2012 DataStax
Monday, October 28, 13

18
Cluster Architecture
©2012 DataStax
Monday, October 28, 13

19
Data Distribution
0

75

25

50
Hash_Function(Partition Key) >> Token
©2012 DataStax
Monday, October 28, 13
Replication

©2012 DataStax
Monday, October 28, 13
Failure Modes

©2012 DataStax
Monday, October 28, 13
Consistency Level
• Multiple options
•
•
•
•
•

ONE
QUORUM
ALL
LOCAL_QUORUM
...

• Can be specified per request

©2012 DataStax
Monday, October 28, 13

23
Quorum

©2012 DataStax
Monday, October 28, 13
Quorum

©2012 DataStax
Monday, October 28, 13
Consistency
Write
CL: ONE

©2012 DataStax
Monday, October 28, 13
Consistency
Read
CL: One

©2012 DataStax
Monday, October 28, 13
Failure Types
• UnavailableException
•

Didn’t even try

•

Possible success or failure

• TimedOutException

©2012 DataStax
Monday, October 28, 13

28
Multi DC

©2012 DataStax
Monday, October 28, 13
Gossip
• Manages cluster state
•
•

Nodes up/down
Nodes joining/leaving

• Decentralized

©2012 DataStax
Monday, October 28, 13

30
Snitch
• Responsible for determining cluster topology
• Tracks node responsiveness
• Simple, PropertyFile, Ec2Snitch, etc...

©2012 DataStax
Monday, October 28, 13

31
Node Architecture
©2012 DataStax
Monday, October 28, 13

32
Write Path
Write

Memtable

Memory
Disk

commit log

©2012 DataStax
Monday, October 28, 13

SSTable

33
Read Path
Read

Memtable

Memory
Disk

SSTable

©2012 DataStax
Monday, October 28, 13

SSTable

34
Data Modeling
©2012 DataStax
Monday, October 28, 13

35
CQL
Cassandra Query Language

©2012 DataStax
Monday, October 28, 13

36
Terminology
• Keyspace
• Table (Column Family)
• Row
• Column
• Partition Key
• Clustering Key (Optional)

©2012 DataStax
Monday, October 28, 13

37
For Example:
CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' :
'SimpleStrategy', 'replication_factor' : 1 };
CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' :
'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
CREATE TABLE events (
package_id text,
status_timestamp timestamp,
location text,
notes text,
PRIMARY KEY (package_id, status_timestamp)
);

©2012 DataStax
Monday, October 28, 13

38
Constructs

©2012 DataStax
Monday, October 28, 13

39
Basic Data Types
• blob
• int
• text
• long
• uuid
• etc

©2012 DataStax
Monday, October 28, 13

40
More Data Modeling Constructs
• Collections
•

map, set, list

• Time to live (TTL)
• Counters
• Secondary Indexes

©2012 DataStax
Monday, October 28, 13

41
Approaching Data Modeling
• Model your queries, not your data
•

Optimize your data model for reads

• Don’t be afraid to denormalize
• You will get it wrong, iterate

©2012 DataStax
Monday, October 28, 13

42
An Example:
User Logins

©2012 DataStax
Monday, October 28, 13

43
The Query
What are the last 10 locations nickmbailey logged in from?
SELECT time, location FROM logins WHERE user = ‘nickmbailey’
ORDER BY time DESC LIMIT 10;

©2012 DataStax
Monday, October 28, 13

44
The Query
What are the last 10 locations nickmbailey logged in from?
SELECT time, location FROM logins WHERE user = ‘nickmbailey’
ORDER BY time DESC LIMIT 10;

Partition Key

©2012 DataStax
Monday, October 28, 13

45
The Query
What are the last 10 locations nickmbailey logged in from?
SELECT time, location FROM logins WHERE user = ‘nickmbailey’
ORDER BY time DESC LIMIT 10;

Clustering Key

©2012 DataStax
Monday, October 28, 13

Partition Key

46
The Query
What are the last 10 locations nickmbailey logged in from?
SELECT time, location FROM logins WHERE user = ‘nickmbailey’
ORDER BY time DESC LIMIT 10;

Clustering Key

©2012 DataStax
Monday, October 28, 13

Partition Key
Additional Columns

47
The Query
What are the last 10 locations nickmbailey logged in from?
SELECT time, location FROM logins WHERE user = ‘nickmbailey’
ORDER BY time DESC LIMIT 10;

Clustering Key

Partition Key
Additional Columns

CREATE COLUMN FAMILY logins (
	 user text,
time timestamp,
location text,
PRIMARY KEY (user, time));

©2012 DataStax
Monday, October 28, 13

48
The Query
What are the last 10 locations nickmbailey logged in from?
SELECT time, location FROM logins WHERE user = ‘nickmbailey’
ORDER BY time DESC LIMIT 10;
CREATE COLUMN FAMILY logins (
	 user text,
time timestamp,
location text,
PRIMARY KEY (user, time));
Partition key

Primary key

User

Time

Location

nickmbailey

2013-07-19 09:22:18

Austin, Texas

nickmbailey

2013-07-19 14:49:27

Blacksburg, Virginia

jsmith

2013-07-20 07:59:34

Atlanta, Georgia

©2012 DataStax
Monday, October 28, 13

49
Time-series data
• By far, the most common data model
• Event logs
• Metrics
• Sensor Data
• Etc

©2012 DataStax
Monday, October 28, 13

50
Another Query
When was the last time nickmbailey logged in from San
Francisco, California?
SELECT time FROM logins WHERE user = ‘nickmbailey’ and
location=‘San Francisco, California’;
User

Time

Location

nickmbailey

2013-07-19 09:22:18

Austin, Texas

nickmbailey

2013-07-19 14:49:27

Blacksburg, Virginia

nickmbailey

2013-07-19 14:49:27

Austin, Texas

nickmbailey

2013-05-19 14:49:27

Austin, Texas

nickmbailey

2013-04-19 14:49:27

San Francisco, California

...

...

...

jsmith

2013-07-20 07:59:34

Atlanta, Georgia

©2012 DataStax
Monday, October 28, 13

51
Another Query
When was the last time nickmbailey logged in from Austin,
Texas?
SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’
and location=‘San Francisco, California’;
CREATE COLUMN FAMILY logins_by_location (
user text,
time timestamp,
location text,
PRIMARY KEY (user, location));

©2012 DataStax
Monday, October 28, 13

52
Another Query
When was the last time nickmbailey logged in from Austin,
Texas?
SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’
and location=‘San Francisco, California’;
CREATE COLUMN FAMILY logins_by_location (
user text,
time timestamp,
location text,
PRIMARY KEY (user, location));
User

Location

Time

nickmbailey

Austin, Texas

2013-07-19 09:22:18

nickmbailey

Blacksburg, Virginia

2013-07-19 14:49:27

nickmbailey

San Francisco, California

2013-07-19 14:49:27

©2012 DataStax
Monday, October 28, 13

53
Denormalize
• Create materialized views of the same data to
support different queries

• Storage space is cheap, Cassandra is fast

©2012 DataStax
Monday, October 28, 13

54
Debugging your data model
cqlsh> tracing on;
Now tracing requests.
cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity
| timestamp
| source
| source_elapsed
-------------------------------------+--------------+-----------+---------------execute_cql3_query | 00:02:37,015 | 127.0.0.1 |
0
Parsing statement | 00:02:37,015 | 127.0.0.1 |
81
Preparing statement | 00:02:37,015 | 127.0.0.1 |
273
Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 |
540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 |
779
Messsage received from /127.0.0.1
Applying mutation
Acquiring switchLock
Appending to commitlog
Adding to memtable
Enqueuing response to /127.0.0.1
Sending message to /127.0.0.1

©2012 DataStax
Monday, October 28, 13

|
|
|
|
|
|
|

00:02:37,016
00:02:37,016
00:02:37,016
00:02:37,016
00:02:37,016
00:02:37,016
00:02:37,016

|
|
|
|
|
|
|

127.0.0.2
127.0.0.2
127.0.0.2
127.0.0.2
127.0.0.2
127.0.0.2
127.0.0.2

|
|
|
|
|
|
|

63
220
250
277
378
710
888
55
A note on Transactions
• In general, you want to construct your data
model around them

• The latest version of Cassandra has ‘Compare
and swap’

•
•
•

©2012 DataStax
Monday, October 28, 13

An implementation of Paxos
...IF NOT EXISTS;
...IF column1 = ‘value’;

56
Try it out
©2012 DataStax
Monday, October 28, 13

57
CCM
• CCM - Cassandra Cluster Manager
•

https://github.com/pcmanus/ccm

•
•
•

ccm create test -v 2.0.1
ccm populate -n 3
ccm start

• Warning: not lightweight
• Example:

©2012 DataStax
Monday, October 28, 13

58
Clients
• Cqlsh
•

Bundled with Cassandra

•
•
•
•

java: https://github.com/datastax/java-driver
python: https://github.com/datastax/python-driver
.net: https://github.com/datastax/csharp-driver
and more: http://www.datastax.com/download/
clientdrivers

• Drivers

©2012 DataStax
Monday, October 28, 13

59
Get Help
• IRC: #cassandra on freenode
• Mailing Lists
• Stack Overflow
• DataStax Docs
•

©2012 DataStax
Monday, October 28, 13

http://www.datastax.com/docs

60
Questions?
©2012 DataStax
Monday, October 28, 13

61
Monday, October 28, 13

Mais conteúdo relacionado

Mais procurados

Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016Derek Downey
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and BenchmarksJignesh Shah
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18Derek Downey
 
Auditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPASAuditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPASEDB
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 

Mais procurados (20)

Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
 
Oracle Database 12c : Multitenant
Oracle Database 12c : MultitenantOracle Database 12c : Multitenant
Oracle Database 12c : Multitenant
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18
 
Auditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPASAuditing and Monitoring PostgreSQL/EPAS
Auditing and Monitoring PostgreSQL/EPAS
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 

Destaque

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraAran Deltac
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 

Destaque (20)

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache CassandraCassandra Day Denver 2014: Introduction to Apache Cassandra
Cassandra Day Denver 2014: Introduction to Apache Cassandra
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 

Semelhante a Introduction to Cassandra Basics

Introduction to Cassandra and Data Modeling
Introduction to Cassandra and Data ModelingIntroduction to Cassandra and Data Modeling
Introduction to Cassandra and Data Modelingnickmbailey
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
An Introduction to Cassandra on Linux
An Introduction to Cassandra on LinuxAn Introduction to Cassandra on Linux
An Introduction to Cassandra on Linuxnickmbailey
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!Dave Stokes
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101DataStax Academy
 
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101DataStax Academy
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101DataStax Academy
 
springdatajpatwjug-120527215242-phpapp02.pdf
springdatajpatwjug-120527215242-phpapp02.pdfspringdatajpatwjug-120527215242-phpapp02.pdf
springdatajpatwjug-120527215242-phpapp02.pdfssuser0562f1
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP ConferenceDave Stokes
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruUse Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
 
CFS: Cassandra Backed Storage for Hadoop
CFS: Cassandra Backed Storage for HadoopCFS: Cassandra Backed Storage for Hadoop
CFS: Cassandra Backed Storage for HadoopDataStax Academy
 
CFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for HadoopCFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for Hadoopnickmbailey
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ishDave Stokes
 
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, StrongerCassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, StrongerDataStax
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 

Semelhante a Introduction to Cassandra Basics (20)

Introduction to Cassandra and Data Modeling
Introduction to Cassandra and Data ModelingIntroduction to Cassandra and Data Modeling
Introduction to Cassandra and Data Modeling
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
An Introduction to Cassandra on Linux
An Introduction to Cassandra on LinuxAn Introduction to Cassandra on Linux
An Introduction to Cassandra on Linux
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101
 
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
 
springdatajpatwjug-120527215242-phpapp02.pdf
springdatajpatwjug-120527215242-phpapp02.pdfspringdatajpatwjug-120527215242-phpapp02.pdf
springdatajpatwjug-120527215242-phpapp02.pdf
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruUse Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB Guru
 
CFS: Cassandra Backed Storage for Hadoop
CFS: Cassandra Backed Storage for HadoopCFS: Cassandra Backed Storage for Hadoop
CFS: Cassandra Backed Storage for Hadoop
 
CFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for HadoopCFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for Hadoop
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Bonjour, iCloud
Bonjour, iCloudBonjour, iCloud
Bonjour, iCloud
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ish
 
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, StrongerCassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 

Último

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Último (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Introduction to Cassandra Basics

  • 1. Introduction to Cassandra Nick Bailey @nickmbailey Monday, October 28, 13
  • 2. Who am I? ©2012 DataStax Monday, October 28, 13 2
  • 4. On to the good stuff! ©2012 DataStax Monday, October 28, 13 4
  • 5. Why Cassandra? Cluster Architecture Node Architecture 5 Data Modeling Wrap up ©2012 DataStax Monday, October 28, 13
  • 7. Time for buzz words! ©2012 DataStax Monday, October 28, 13 Big Data! NoSQL! 7
  • 8. Big Data • Gartner: “...high-volume, high-velocity and high-variety...” • 2 sides of ‘big data’ • • ©2012 DataStax Monday, October 28, 13 Analytics Real-time 8
  • 9. NoSQL • A terrible label • Covers a wide range of DBs • • • • • ©2012 DataStax Monday, October 28, 13 Cassandra Redis MongoDB HBase ... 9
  • 10. Started by Facebook ©2012 DataStax Monday, October 28, 13 10
  • 11. Dynamo (Amazon) + Big Table (Google) ©2012 DataStax Monday, October 28, 13 11
  • 13. Cassandra is great for... • Massive, linear scaling (e.g. CERN hadron collider, Barracuda Networks) • Extremely heavy writes (e.g. BlueMountain Capital – financial tick data) • High availability (e.g. eBay, Eventbrite, Netflix, SoundCloud, HeathCare Anytime, Comcast, GoDaddy, Sony Entertainment Network) ©2012 DataStax Monday, October 28, 13 13
  • 17. One size does not fit all Polyglot persistence ©2012 DataStax Monday, October 28, 13 17
  • 18. More Resources • PlanetCassandra.org • Blog • 5 minute interviews ©2012 DataStax Monday, October 28, 13 18
  • 20. Data Distribution 0 75 25 50 Hash_Function(Partition Key) >> Token ©2012 DataStax Monday, October 28, 13
  • 23. Consistency Level • Multiple options • • • • • ONE QUORUM ALL LOCAL_QUORUM ... • Can be specified per request ©2012 DataStax Monday, October 28, 13 23
  • 28. Failure Types • UnavailableException • Didn’t even try • Possible success or failure • TimedOutException ©2012 DataStax Monday, October 28, 13 28
  • 30. Gossip • Manages cluster state • • Nodes up/down Nodes joining/leaving • Decentralized ©2012 DataStax Monday, October 28, 13 30
  • 31. Snitch • Responsible for determining cluster topology • Tracks node responsiveness • Simple, PropertyFile, Ec2Snitch, etc... ©2012 DataStax Monday, October 28, 13 31
  • 33. Write Path Write Memtable Memory Disk commit log ©2012 DataStax Monday, October 28, 13 SSTable 33
  • 36. CQL Cassandra Query Language ©2012 DataStax Monday, October 28, 13 36
  • 37. Terminology • Keyspace • Table (Column Family) • Row • Column • Partition Key • Clustering Key (Optional) ©2012 DataStax Monday, October 28, 13 37
  • 38. For Example: CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2}; CREATE TABLE events ( package_id text, status_timestamp timestamp, location text, notes text, PRIMARY KEY (package_id, status_timestamp) ); ©2012 DataStax Monday, October 28, 13 38
  • 40. Basic Data Types • blob • int • text • long • uuid • etc ©2012 DataStax Monday, October 28, 13 40
  • 41. More Data Modeling Constructs • Collections • map, set, list • Time to live (TTL) • Counters • Secondary Indexes ©2012 DataStax Monday, October 28, 13 41
  • 42. Approaching Data Modeling • Model your queries, not your data • Optimize your data model for reads • Don’t be afraid to denormalize • You will get it wrong, iterate ©2012 DataStax Monday, October 28, 13 42
  • 43. An Example: User Logins ©2012 DataStax Monday, October 28, 13 43
  • 44. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; ©2012 DataStax Monday, October 28, 13 44
  • 45. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Partition Key ©2012 DataStax Monday, October 28, 13 45
  • 46. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key ©2012 DataStax Monday, October 28, 13 Partition Key 46
  • 47. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key ©2012 DataStax Monday, October 28, 13 Partition Key Additional Columns 47
  • 48. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key Partition Key Additional Columns CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time)); ©2012 DataStax Monday, October 28, 13 48
  • 49. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time)); Partition key Primary key User Time Location nickmbailey 2013-07-19 09:22:18 Austin, Texas nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia jsmith 2013-07-20 07:59:34 Atlanta, Georgia ©2012 DataStax Monday, October 28, 13 49
  • 50. Time-series data • By far, the most common data model • Event logs • Metrics • Sensor Data • Etc ©2012 DataStax Monday, October 28, 13 50
  • 51. Another Query When was the last time nickmbailey logged in from San Francisco, California? SELECT time FROM logins WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; User Time Location nickmbailey 2013-07-19 09:22:18 Austin, Texas nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia nickmbailey 2013-07-19 14:49:27 Austin, Texas nickmbailey 2013-05-19 14:49:27 Austin, Texas nickmbailey 2013-04-19 14:49:27 San Francisco, California ... ... ... jsmith 2013-07-20 07:59:34 Atlanta, Georgia ©2012 DataStax Monday, October 28, 13 51
  • 52. Another Query When was the last time nickmbailey logged in from Austin, Texas? SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; CREATE COLUMN FAMILY logins_by_location ( user text, time timestamp, location text, PRIMARY KEY (user, location)); ©2012 DataStax Monday, October 28, 13 52
  • 53. Another Query When was the last time nickmbailey logged in from Austin, Texas? SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; CREATE COLUMN FAMILY logins_by_location ( user text, time timestamp, location text, PRIMARY KEY (user, location)); User Location Time nickmbailey Austin, Texas 2013-07-19 09:22:18 nickmbailey Blacksburg, Virginia 2013-07-19 14:49:27 nickmbailey San Francisco, California 2013-07-19 14:49:27 ©2012 DataStax Monday, October 28, 13 53
  • 54. Denormalize • Create materialized views of the same data to support different queries • Storage space is cheap, Cassandra is fast ©2012 DataStax Monday, October 28, 13 54
  • 55. Debugging your data model cqlsh> tracing on; Now tracing requests. cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example'); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 Applying mutation Acquiring switchLock Appending to commitlog Adding to memtable Enqueuing response to /127.0.0.1 Sending message to /127.0.0.1 ©2012 DataStax Monday, October 28, 13 | | | | | | | 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 | | | | | | | 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 | | | | | | | 63 220 250 277 378 710 888 55
  • 56. A note on Transactions • In general, you want to construct your data model around them • The latest version of Cassandra has ‘Compare and swap’ • • • ©2012 DataStax Monday, October 28, 13 An implementation of Paxos ...IF NOT EXISTS; ...IF column1 = ‘value’; 56
  • 57. Try it out ©2012 DataStax Monday, October 28, 13 57
  • 58. CCM • CCM - Cassandra Cluster Manager • https://github.com/pcmanus/ccm • • • ccm create test -v 2.0.1 ccm populate -n 3 ccm start • Warning: not lightweight • Example: ©2012 DataStax Monday, October 28, 13 58
  • 59. Clients • Cqlsh • Bundled with Cassandra • • • • java: https://github.com/datastax/java-driver python: https://github.com/datastax/python-driver .net: https://github.com/datastax/csharp-driver and more: http://www.datastax.com/download/ clientdrivers • Drivers ©2012 DataStax Monday, October 28, 13 59
  • 60. Get Help • IRC: #cassandra on freenode • Mailing Lists • Stack Overflow • DataStax Docs • ©2012 DataStax Monday, October 28, 13 http://www.datastax.com/docs 60