SlideShare uma empresa Scribd logo
1 de 131
Baixar para ler offline
Training Day | December 3rd 
Beginner Track 
• Introduction to Cassandra 
• Introduction to Spark, Shark, Scala and 
Cassandra 
Advanced Track 
• Data Modeling 
• Performance Tuning 
Conference Day | December 4th 
Cassandra Summit Europe 2014 will be the single 
largest gathering of Cassandra users in Europe. 
Learn how the world's most successful companies are 
transforming their businesses and growing faster than 
ever using Apache Cassandra. 
http://bit.ly/cassandrasummit2014 
© 2014 DataStax, All Rights Reserved. Company Confidential 1
Introduction to Cassandra for Java Developers 
Why, What and How. 
Johnny Miller, Solutions Architect 
@CyanMiller 
www.linkedin.com/in/johnnymiller
Training Day | December 3rd 
Beginner Track 
• Introduction to Cassandra 
• Introduction to Spark, Shark, Scala and 
Cassandra 
Advanced Track 
• Data Modeling 
• Performance Tuning 
Conference Day | December 4th 
Cassandra Summit Europe 2014 will be the single 
largest gathering of Cassandra users in Europe. 
Learn how the world's most successful companies are 
transforming their businesses and growing faster than 
ever using Apache Cassandra. 
http://bit.ly/cassandrasummit2014 
© 2014 DataStax, All Rights Reserved. Company Confidential 3
DataStax 
• Founded in April 2010 
• We drive Apache Cassandra™ 
• 300+ employees 
• Home to Apache Cassandra™ Chair & most committers 
– Contribute ~ 80% of code into Apache Cassandra™ code base 
• Headquartered in San Francisco Bay area 
• European headquarters in London 
© 2014 DataStax, All Rights Reserved. 4
DataStax Enterprise 
www.datastax.com 
© 2014 DataStax, All Rights Reserved. 5
DataStax Enterprise is free for startups!!! 
• Unlimited, free use of the software in DataStax Enterprise. 
• No limit on number of nodes or other hidden restrictions. 
• If you’re a startup, it’s free! 
www.datastax.com/startups 
© 2014 DataStax, All Rights Reserved. 6
Training 
• Checkout the DataStax Academy for free online training 
– http://academy.datastax.com 
• Public courses 
• On-site training 
www.datastax.com/training
We are hiring! 
www.datastax.com/careers 
@DataStaxCareers 
8
Why Cassandra? 
9
Apache Cassandra 
• Cassandra is a massively scaleable open source NoSQL database 
• 100% uptime 
• Predictable scaling 
• High Performance 
Dynamo 
BigTable 
Node 1 
Node 4 Node 3 
Node 5 
BigTable: http://research.google.com/archive/bigtable-osdi06.pdf 
Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf 
Node 2 
© 2014 DataStax, All Rights Reserved. 10
Cassandra Core Values 
• Ease of use 
• Massive scalability 
• High performance 
• Always Available 
© 2014 DataStax, All Rights Reserved. 11
Availability and Speed Matters for online apps! 
• UK retailers lost 8.5 billion last year to slow web sites, which is 1 million for 
every 10 million in online sales 
• Over half of all web users expect a response time of 2 seconds or less 
• A 1 second delay causes a nearly 10% reduction in customer interactions 
• A 1 second decrease in Amazon page load time costs the company $1.6 billion 
in sales 
© 2014 DataStax, All Rights Reserved. Company Confidential 12
Adoption 
http://db-engines.com/en/ranking 
September 2014 
© 2014 DataStax, All Rights Reserved. 13
Performance & Scale 
Cassandra works for small to huge deployments. 
• Cassandra footprint @ Netflix 
• 80+ Clusters 
• 2500+ nodes 
• 4 Data Centres (Amazon Regions) 
• > 1 Trillion transactions per day 
See: http://www.datastax.com/resources/casestudies/netflix
Performance & Scale 
“In terms of scalability, there is a clear winner throughout our experiments. 
Cassandra achieves the highest throughput for the maximum number of nodes 
in all experiments with a linear increasing throughput.” 
Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2012. 
Benchmark paper presented at the Very Large Database Conference, 2012. 
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf 
End Point Independent NoSQL Benchmark 
Lowest in latency… 
Netflix Cloud Benchmark… 
Highest in throughput… 
© 2014 DataStax, All Rights Reserved. 16 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability- 
on.html http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking- 
Top-NoSQL-Databases.pdf
Availability 
© 2014 DataStax, All Rights Reserved. 17
Ideal Cassandra Use Cases 
Product catalogs and playlists 
• Collection of items 
Internet of things 
• Sensor data 
Messaging 
• Emails, instant messages, alerts, comments 
Recommendation and personalization 
• Trend information 
Fraud detection 
• Data patterns 
Time-series and time-ordered data 
• Data with time dimension 
http://planetcassandra.org 
© 2014 DataStax, All Rights Reserved. Company Confidential 18
What is Apache Cassandra? 
19
Overview 
• Cassandra was designed with the understanding that system/hardware failures 
can and do occur 
• Peer-to-peer, distributed system 
• All nodes the same 
Data partitioned among all nodes in the cluster 
Node 1 
• • Custom data replication to ensure fault tolerance 
• Read/Write-anywhere and across data centres 
Node 2 
Node 4 Node 3 
Node 5
Cassandra > 1 Server 
• All nodes participate in a cluster 
• Add or remove as needed 
• All nodes the same – masterless with no single point of failure 
• Each node communicates with each other through the Gossip protocol, 
which exchanges information across the cluster every second 
• Data partitioned among all nodes in the cluster 
• More capacity? Add a server! 
• More throughput? Add a server! 
Node 1 
Node 2 
Node 4 Node 3 
Node 5 
© 2014 DataStax, All Rights Reserved. Company Confidential 21
Write Path on individual server 
• A commit log is used on each node to capture write activity. Data durability is assured 
• Data also written to an in-memory structure (memtable) and then to disk once the 
memory structure is full (an SStable) 
© 2014 DataStax, All Rights Reserved. Company Confidential 22
Distributed 
• Replication factor (RF): How many copies of your 
data? 
• RF = 3 in this example 
• Each node is storing 60% of the clusters total data 
i.e. 3/5 
Handy Calculator: http://www.ecyrd.com/cassandracalculator/ 
Node 1 
1st copy 
Node 4 
Node 5 
Node 2 
2nd copy 
© 2014 DataStax, All Rights Reserved. Company Confidential 23 
Node 3 
3rd copy 
• Client reads or writes to any node 
• Node coordinates with others 
• Data read or replicated in parallel 
Node 2 
2nd copy
Rack Aware 
© 2014 DataStax, All Rights Reserved. Company Confidential 
24 
• Cassandra is aware of which rack (or availability zone) 
each node resides in. 
• It will attempt to place each data copy in a 
different rack. 
• RF = 3 in this example 
Node 1 
1st copy 
Node 4 
Node 2 
Node 3 
2nd copy 
Rack 1 
Rack 2 Rack 2 
Rack 3 
Rack 1 
Node 5 
3rd copy
Data Distribution and Replication 
• Cassandra is designed as a peer-to-peer system that makes copies of the data and 
distributes the copies among a group of nodes 
• Data is organized by table and identified by a primary key. The primary key 
determines which node the data is stored on. 
• Each node is assigned data based on the hash value of the primary key and the 
range of hash values associated with that node. 
© 2014 DataStax, All Rights Reserved. Company Confidential 25
Partitioners 
• A partitioner determines how data is distributed across the 
nodes in the cluster. 
• Cassandra has three: 
– Murmur3Partitioner (default) 
– RandomPartitioner 
– ByteOrderedPartitioner 
(don’t use this unless you know what your doing and 
have a very good reason to do this) 
© 2014 DataStax, All Rights Reserved. Company Confidential 26
Murmer3 Example 
• Data: 
Primary Key 
jim age: 36 car: ford gender: M 
carol age: 37 car: bmw gender: F 
johnny age: 12 gender: M 
suzy: age: 10 gender: F 
• Murmer3 Hash Values: 
Primary Key Murmur3 hash value 
jim -2245462676723223822 
carol 7723358927203680754 
johnny -6723372854036780875 
suzy 1168604627387940318 
© 2014 DataStax, All Rights Reserved. Company Confidential 27
Murmer3 Example 
4 node cluster: 
Node Murmur3 start range Murmur3 end range 
A -9223372036854775808 
-4611686018427387903 
B -4611686018427387904 
-1 
C 0 4611686018427387903 
D 4611686018427387904 9223372036854775807 
© 2014 DataStax, All Rights Reserved. Company Confidential 28
Murmer3 Example 
Data is distributed as: 
Node Start range End range Primary 
key 
Hash value 
A -92233720368547758 
08 
-4611686018427387903 
johnny -672337285403678 
0875 
B -461168601842738790 
4 
-1 jim -224546267672322 
3822 
C 0 4611686018427387903 suzy 1168604627387940 
318 
D 461168601842738790 
4 
9223372036854775807 carol 7723358927203680 
754 
© 2014 DataStax, All Rights Reserved. Company Confidential 29
Replicas 
• A replication strategy determines the nodes where replicas are placed. 
• All replicas are equally important; there is no primary or master replica. 
• Two replication strategies available: 
– SimpleStrategy (prototype only, don’t use in production) 
– NetworkTopologyStrategy 
• Replication strategy is set at the keyspace/schema level. 
© 2014 DataStax, All Rights Reserved. Company Confidential 30
NetworkTopologyStrategy 
• Specified how many replicas you want in each data center 
(physical or virtual). 
• NetworkTopologyStrategy places replicas in the same data 
center by walking the ring clockwise until reaching the first 
node in another rack. 
© 2014 DataStax, All Rights Reserved. Company Confidential 31
Data Center Aware 
© 2014 DataStax, All Rights Reserved. 
DC: USA DC: EUROPE 
Node 5 Node 2 
32 
Node 1 
1st copy 
Node 4 
2nd copy 
Node 3 
3rd copy 
Node 1 
1st copy 
Node 5 Node 2 
Node 4 
2nd copy 
Node 3 
3rd copy 
• Active Everywhere!! 
• Client writes local 
• Data syncs across WAN
Tunable Consistency 
Node 1 
1st copy 
5 μs ack 
Parallel 
Write 
12 μs ack 
Node 5 Node 2 
Node 4 
2nd copy 
12 μs ack 
Node 3 
3rd copy 
Write 
CL=QUORUM 
500 μs ack 
• Consistency Level (CL) 
• Client specifies per operation 
• Handles multi-data center 
operations 
ALL = All replicas ack 
QUORUM = > 51% of replicas ack 
LOCAL_QUORUM = > 51% in local DC ack 
ONE = Only one replica acks 
Plus more…. (see docs) 
© 2014 DataStax, All Rights Reserved. Company Confidential 33
Node Failure 
• A single node failure shouldn’t bring failure. 
• Replication Factor + Consistency Level = Success 
• This example: 
– RF = 3 
– CL = QUORUM 
Node 1 
1st copy 
5 μs ack 
Parallel 
Write 
12 μs ack 
Node 5 Node 2 
Node 4 
2nd copy 
12 μs ack 
Node 3 
3rd copy 
Write 
CL=QUORUM 
>51% ack – so request is a success 
© 2014 DataStax, All Rights Reserved. Company Confidential 34
Node Recovery 
• When a write is performed and a replica node for the row is unavailable the coordinator 
will store a hint locally. 
• When the node recovers, the coordinator replays the missed writes. 
• Note: a hinted write does not count towards the consistency level 
Node 1 
1st copy 
• Note: you should still run repairs across your cluster 
Node 5 Node 2 
Node 4 
2nd copy 
Node 3 
3rd copy 
Stores Hints while Node 3 is offline 
© 2014 DataStax, All Rights Reserved. Company Confidential 35
Rack Failure 
• Cassandra will place the data in as many different racks or 
availability zones as it can. 
• This example: 
– RF = 3 
– CL = QUORUM 
– Rack 2 fails 
• Data copies still available in Node 1 and Node 5 
• Quorum can be honored i.e. > 51% ack 
request is a success! 
Rack 1 
Node 1 
1st copy 
Rack 2 Rack 2 
Node 4 
Rack 1 
Node 2 
Node 3 
2nd copy 
Rack 3 
Node 5 
3rd copy 
© 2014 DataStax, All Rights Reserved. Company Confidential 36
Don’t be afraid of weak consistency 
• More tolerant to failure 
• Consistency Level of 1 is the most popular (I think) 
• If you want stronger consistency go for LOCAL_QUORUM 
i.e. quorum is honored in the local data centre. 
• If you go stronger than LOCAL_QUORUM – really understand what this means and 
why you are doing it. 
• Remember – you can have different consistency levels for reads and writes e.g. write 
with CL:1, read with CL:LOCAL_QUORUM 
© 2014 DataStax, All Rights Reserved. Company Confidential 37
Ring fenced resources 
• If you need to isolate resources for different uses, Cassandra is a great fit. 
• You can create separate virtual data centres optimised as required – different 
workloads, hardware, availability etc.. 
• Cassandra will replicate the data for you automatically – no ETL is necessary 
Customer 
Facing Analytics 
Cassandra 
Replication 
© 2014 DataStax, All Rights Reserved. Company Confidential 38
Hybrid Cloud 
• Cassandra is fmulti-data centre and cloud capable 
• Data written to any node is automatically and transparently written to all other nodes in 
multiple data centres i.e. NO ETL 
Public Cloud 
Data Centre 1 Data Centre2 
© 2014 DataStax, All Rights Reserved. Company Confidential 39
Security 
BENEFITS FEATURES 
Internal Authentication 
Manages login IDs and 
passwords inside the 
database 
+ Ensures only authorized 
users can access a 
database system using 
internal validation 
+ Simple to implement and 
easy to understand 
+ No learning curve from the 
relational world 
Object Permission 
Management 
controls who has access to 
what and who can do what in 
the database 
+ Provides granular based 
control over who can add/ 
change/delete/read data 
+ Uses familiar GRANT/ 
REVOKE from relational 
systems 
+ No learning curve 
Client to Node Encryption 
protects data in flight to and 
from a database cluster 
+ Ensures data cannot be 
captured/stolen in route to 
a server 
+ Data is safe both in flight 
from/to a database and on 
the database; complete 
coverage is ensured 
© 2014 DataStax, All Rights Reserved. Company Confidential 40
Uptime 
Web Tier Web Tier Web Tier 
Web Tier 
App Tier App Tier App Tier 
DC: USA DC: Europe DC: Asia 
Cassandra 
Replication 
Cassandra 
Replication 
© 2014 DataStax, All Rights Reserved. Company Confidential 41
DC Failure 
Web Tier Web Tier Web Tier 
Web Tier 
App Tier App Tier App Tier 
DC: USA DC: Europe DC: Asia 
Cassandra 
Replication 
Cassandra 
Replication 
© 2014 DataStax, All Rights Reserved. Company Confidential 42
Tier Failure 
Web Tier Web Tier Web Tier 
Web Tier 
App Tier App Tier App Tier 
DC: USA DC: Europe DC: Asia 
Cassandra 
Replication 
© 2014 DataStax, All Rights Reserved. Company Confidential 43
WAN Failure 
Web Tier Web Tier Web Tier 
Web Tier 
App Tier App Tier App Tier 
Consistency level? 
DC: USA DC: Europe DC: Asia 
Cassandra 
Replication 
Cassandra 
Replication 
© 2014 DataStax, All Rights Reserved. Company Confidential 44
How? 
45
Data Modeling
Why Good Data Modeling is Important? 
Cassandra is a highly available, highly scalable, & highly 
distributed database - with no single point of failure 
To achieve this, Cassandra is optimized for non-relational data 
models. 
• Joins do not function well on distributed databases. 
• Locking and transactions jam up distributed nodes
So what do I do? 
• Lots of different approaches to this… 
• Basically, create multiple tables optimized for your queries. 
Query Based Modeling 
• The main point is: 
DON’T BE AFRAID OF WRITES 
DENORMALISE AND MODEL YOUR DATA AROUND YOUR 
QUERIES 
© 2014 DataStax, All Rights Reserved. Company Confidential 48
For more on data modeling… 
• Data modeling video series by Patrick McFadin 
• Part 1: The Data Model is Dead, Long Live the Data Model 
http://www.youtube.com/watch?v=px6U2n74q3g 
• Part 2: Become a Super Modeler 
http://www.youtube.com/watch?v=qphhxujn5Es 
• Part 3: The World's Next Top Data Model 
http://www.youtube.com/watch?v=HdJlsOZVGwM 
© 2014 DataStax, All Rights Reserved. Company Confidential 49
CQL 
Cassandra Query Language
CQL 
• Cassandra Query Language 
• SQL–like language to query Cassandra 
• Limited predicates. Attempts to prevent bad queries 
– but, you can still get into trouble! 
• Keyspace – analogous to a schema. 
– Has various storage attributes. 
– The keyspace determines the RF (replication factor). 
• Table – looks like a SQL Table. 
– A table must have a Primary Key. 
– We can fully qualify a table as <keyspace>.<table>
DevCenter 
© 2014 DataStax, All Rights Reserved. Company Confidential 52
CQLSH 
Command line interface comes with Cassandra 
Launching on Linux 
• $ cqlsh [options] [host [port]] 
" 
Launching on Windows 
• python cqlsh [options] [host [port]]" 
Example 
• $ cqlsh" 
• $ cqlsh -u student -p cassandra 127.0.0.1 9160" 
© 2014 DataStax, All Rights Reserved. Company Confidential 53
CQLSH 
© 2014 DataStax, All Rights Reserved. Company Confidential 54
CQL Basics 
• Usual statements 
• CREATE / DROP / ALTER TABLE 
• SELECT / INSERT / UPDATE 
• Plus many, many more! 
www.datastax.com/docs 
© 2014 DataStax, All Rights Reserved. Company Confidential 55
What is a keyspace? 
• Keyspace or schema is a top-level namespace 
• All data objects (e.g., tables) must belong to some keyspace 
• Defines how data is replicated on nodes 
• Keyspace per application is a good idea 
• Replica placement strategy 
• SimpleStrategy (prototyping) 
• NetworkTopologyStrategy (production) 
© 2014 DataStax, All Rights Reserved. Company Confidential 56
Creating a keyspace - Multiple Data Centre 
© 2014 DataStax, All Rights Reserved. Company Confidential 57
Use and Drop a keyspace 
• To work with data objects (e.g., tables) in a keyspace: 
• USE pchstats;" 
• To delete a keyspace and all internal data objects 
• DROP KEYSPACE pchstats;" 
If you drop a keyspace or table, Cassandra will back it up! 
You can disable this via auto_snapshot in the cassandra.yaml 
© 2014 DataStax, All Rights Reserved. Company Confidential 58
Creating a table 
CREATE TABLE cities (" 
city_name varchar," 
elevation int," 
population int," 
latitude float," 
longitude float," 
PRIMARY KEY (city_name)! 
);" 
• We can visualize it this way: 
• Here, city_name is the partition key 
• In this example, the partition key = primary key 
© 2014 DataStax, All Rights Reserved. Company Confidential 59
Compound Primary Key 
The Primary Key 
• The key uniquely identifies a row. 
• A composite primary key consists of: 
• A partition key 
• One or more clustering columns 
• e.g. PRIMARY KEY (partition key, cluster columns, ...)" 
• The partition key determines on which node the partition 
resides 
• Data is ordered in cluster column order within the partition 
© 2014 DataStax, All Rights Reserved. Company Confidential 60
Compound Primary Key 
CREATE TABLE sporty_league (" 
team_name varchar," 
player_name varchar," 
jersey int," 
PRIMARY KEY (team_name, player_name)! 
);" 
Partition Key 
Clustering Column 
© 2014 DataStax, All Rights Reserved. Company Confidential 61
Simple Select 
This is a bad query 
SELECT * FROM sporty_league;" 
Partition Key Clustering Column 
ordered 
• More that a few rows can be slow. (Limited to 10,000 rows by 
default) 
• Use LIMIT keyword to choose fewer or more rows 
© 2014 DataStax, All Rights Reserved. Company Confidential 62
Select on Partition Key and Cluster Columns 
SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’;" 
SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’ 
and player_name = ‘Lucky’;" 
© 2014 DataStax, All Rights Reserved. Company Confidential 63
Composite Partition Key 
CREATE TABLE cities (" 
Partition Key 
city_name varchar," 
state varchar" 
PRIMARY KEY ((city_name, state))" 
);" 
© 2014 DataStax, All Rights Reserved. Company Confidential 64
CQL Predicates 
• On the partition key: = and IN 
• On the cluster columns: <, <=, =, >=, >, IN 
© 2014 DataStax, All Rights Reserved. Company Confidential 65
Performance Considerations 
• The best queries are in a single partition. 
i.e. WHERE partition key = <something> 
• Each new partition requires a new disk seek. 
• Queries that span multiple partitions are s-l-o-w 
• Queries that span multiple cluster columns are fast 
Company Confidential 66
Ordering 
• Partition keys are not ordered, but the cluster columns are. 
• However, you can only order by a column if it’s a cluster column. 
• Data will returned by default in the order of the clustering column. 
• You can also use the ORDER BY keyword – but only on the clustering 
column! 
SELECT * FROM sporty_league 
WHERE team_name = ‘Mighty Mutts’ 
ORDER BY player_name DESC;" 
© 2014 DataStax, All Rights Reserved. Company Confidential 67
Secondary Indexes 
If we want to do a query on a column that is not part of your PK, you can create 
an index: 
CREATE INDEX ON <table>(<column>); 
" 
Than you can do a select: 
SELECT * FROM product WHERE type= ’PC';" 
Suitable for low cardinality data 
Much more efficient to model your data around the query 
68 
Watch out for global indexes in Cassandra 3.0 (CASSANDRA-6477)
Insert/Update 
INSERT INTO sporty_league (team_name, player_name, jersey) 
VALUES ('Mighty Mutts',’Felix’,90); 
UPDATE sporty_league SET jersey = 77" 
WHERE team_name = 'Mighty Mutts’ AND player_name = ‘Felix’;" 
Inserts a row into a table 
• Must specify columns to insert values into 
• Primary key columns are mandatory (identify the row) 
• Other columns do not have to have values and Non-existent ‘values’ do not take up space 
Inserts are atomic 
• All values of a row are inserted or none 
Inserts are isolated 
• Two inserts with the same values in primary key columns will not interfere – executed one after 
another 
© 2014 DataStax, All Rights Reserved. Company Confidential 69
What is an upsert? 
• UPdate + inSERT 
• Both UPDATE and INSERT are write operations 
• No reading before writing 
• Term “upsert” denotes the following behavior 
• INSERT updates or overwrites an existing row 
– When inserting a row in a table that already has another row with the same values in 
primary key columns 
• UPDATE inserts a new row 
– When a to-be-updated row, identified by values in primary key columns, does not exist 
• Upserts are legal and do not result in error or warning messages 
© 2014 DataStax, All Rights Reserved. Company Confidential 70
How to avoid UPSERTS 
Guarantee that your primary keys are unique from one another 
Use an appropriate natural key based on your data 
Use a surrogate key for partition key 
Risks with natural keys 
Depending on the type of natural key that is used, there may still be an increased risk of 
UPSERTs 
Changing the datum used for a Natural Key requires a lot of overhead. 
So why not use a sequence to generate a surrogate key? 
You cant – Cassandra doesn’t provide sequences! 
© 2014 DataStax, All Rights Reserved. Company Confidential 71
What? No sequences! 
• Sequences are a handy feature in RDMBS for auto-creation of IDs for you 
data. 
– Guaranteed unique 
– E.g. INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)" 
• Cassandra has no sequences! 
– Extremely difficult in a masterless distributed system 
– Requires a lock (perf killer) 
• What to do? 
– Use part of the data to create a unique key 
– Use a UUID or TIMEUUID 
© 2014 DataStax, All Rights Reserved. Company Confidential 72
UUID 
• Universal Unique ID 
• 128 bit number represented in character form e.g. 
99051fe9-6a9c-46c2-b949-38ef78858dd0 
• Easily generated on the client 
• Version 1 has a timestamp component (TIMEUUID) 
• Version 4 has no timestamp component 
– Faster to generate 
© 2014 DataStax, All Rights Reserved. Company Confidential 73
TIMEUUID 
TIMEUUID data type supports Version 1 UUIDs 
Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 
bits) 
– CQL function ‘now()’ generates a new TIMEUUID 
Time can be extracted from TIMEUUID 
– CQL function dateOf() extracts the timestamp as a date 
TIMEUUID values in clustering columns or in column names are ordered based on time 
– DESC order on TIMEUUID lists most recent data first 
© 2014 DataStax, All Rights Reserved. Company Confidential 74
Counters 
Stores a number that incrementally counts the occurrences of a particular event or process. 
Note: If a table has a counter column, all non-counter columns must be part of a 
primary key 
CREATE TABLE UserActions (" 
user VARCHAR," 
action VARCHAR," 
total COUNTER,! 
PRIMARY KEY (user, action)" 
);" 
UPDATE UserActions SET total = total + 2 
WHERE user = 123 AND action = ’xyz';" 
© 2014 DataStax, All Rights Reserved. Company Confidential 75
Counters 
Pre-Cassandra 2.1 
• Edge-cases around accuracy 
• Counter update was not an idempotent operation 
Cassandra 2.1 
• Simpler implementation, no more edge cases 
• Possible to properly repair now 
• Significantly less garbage and internode traffic 
generated 
• Better performance for 99% of uses 
© 2014 DataStax, All Rights Reserved. Company Confidential 76
Time-to-Live (TTL) 
TTL a row:! 
INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’) " 
USING TTL 3600; // Expires data in one hour 
" 
TTL a column: 
UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’" 
– TTL in seconds 
– Can also set default TTL at a table level 
– Expired columns/values automatically deleted 
– With no TTL specified, columns/values never expire 
– TTL is useful for automatic deletion 
– Re-inserting the same row before it expires will overwrite TTL 
© 2014 DataStax, All Rights Reserved. Company Confidential 77
Collection Data Type 
• CQL supports having columns that contain collections 
of data. 
• The collection types include: 
– Set, List and Map. 
CREATE TABLE collections_example (" 
"id int PRIMARY KEY," 
!set_example set<text>,! 
!list_example list<text>,! 
!map_example map<int, text>! 
); 
• These data types are intended to support the type of one-to-many relationships that can be modeled 
in a relational DB e.g. a user has many email addresses. 
• Some performance considerations around collections. 
– Often more efficient to denormalise further rather than use collections if intending to store lots 
of data. 
– Favor sets over list 
You now have collection indexing in Cassandra 2.1 
© 2014 DataStax, All Rights Reserved. Company Confidential 78
Secondary indexes on collections 
CREATE TABLE songs (" 
id uuid PRIMARY KEY," 
artist text," 
album text," 
title text," 
data blob," 
tags set<text>! 
);" 
" 
CREATE INDEX song_tags_idx ON songs(tags);! 
" 
SELECT * FROM songs WHERE tags CONTAINS 'blues';! 
" 
id | album | artist | tags | title" 
----------+---------------+-------------------+-----------------------+------------------" 
5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind" 
" 
© 2014 DataStax, All Rights Reserved. Company Confidential 79
Secondary indexes on map keys 
If you prefer indexing the map keys, you can do so by creating a KEYS index and by using CONTAINS 
KEY 
CREATE TABLE products (" 
id int PRIMARY KEY," 
description text," 
price int," 
categories set<text>," 
features map<text, text>" 
);" 
" 
CREATE INDEX feat_key_index ON products(KEYS(features));! 
" 
SELECT id, description" 
FROM products" 
WHERE features CONTAINS KEY 'refresh-rate';! 
" 
id | description" 
-------+-----------------------------" 
34134 | 120-inch 1080p 3D plasma TV" 
© 2014 DataStax, All Rights Reserved. Company Confidential 80
Lightweight Transactions (LWT) 
Why? 
• Solve a class of race conditions in Cassandra that you would otherwise need to install an 
external locking manager to solve. 
Syntax: 
"INSERT INTO customer_account (customerID, customer_email)" 
"VALUES (‘Johnny’, ‘jmiller@datastax.com’) 
"IF NOT EXISTS;! 
" 
"UPDATE customer_account " 
"SET customer_email=’jmiller@datastax.com’" 
!IF customer_email=’jmiller@datastax.com’;! 
! 
Example Use Case: 
• Registering a user 
© 2014 DataStax, All Rights Reserved. Company Confidential 81
Race Condition 
82 
SELECT name" 
FROM users" 
WHERE username = 'johnny';" 
(0 rows)" 
SELECT name" 
FROM users" 
WHERE username = 'johnny';" 
(0 rows)" 
INSERT INTO users " 
(username, name, email," 
password, created_date)" 
VALUES ('johnny'," 
'Johnny Miller'," 
['jmiller@datastax.com']," 
'ba27e03fd9...'," 
'2011-06-20 13:50:00');" 
INSERT INTO users " 
(username, name, email," 
password, created_date)" 
VALUES ('johnny'," 
'Johnny Miller'," 
['jmiller@datastax.com']," 
'ea24e13ad9...'," 
'2011-06-20 13:50:01');" 
This one wins!
Lightweight Transactions (LWT) 
INSERT INTO users " 
(username, name, email," 
password, created_date)" 
VALUES ('johnny'," 
'Johnny Miller'," 
['jmiller@datastax.com']," 
'ba27e03fd9...'," 
'2011-06-20 13:50:00')" 
IF NOT EXISTS;! 
INSERT INTO users " 
(username, name, email," 
password, created_date)" 
VALUES ('johnny'," 
'Johnny Miller'," 
['jmiller@datastax.com']," 
'ea24e13ad9...'," 
'2011-06-20 13:50:01’)" 
IF NOT EXISTS;! 
" 
[applied]" 
-----------" 
True! 
[applied] | username | created_date | name " 
-----------+----------+----------------+----------------" 
False | johnny | 2011-06-20 ... | Johnny Miller! 
© 2014 DataStax, All Rights Reserved. 83
Lightweight Transactions (LWT) 
Consequences of Lightweight Transactions 
4 round trips vs. 1 for normal updates (uses Paxos algorithm) 
Operations are done on a per-partition basis 
Will be going across data centres to obtain consensus (unless you use 
LOCAL_SERIAL consistency) 
Cassandra user will need read and write access i.e. you get back the row! 
Great for 1% your app, but eventual consistency is still your friend! 
© 2014 DataStax, All Rights Reserved. Company Confidential 84
Batch Statements 
BEGIN BATCH" 
INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user')" 
UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2'" 
INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c')" 
DELETE name FROM users WHERE userID = 'user2’" 
APPLY BATCH; 
BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single 
logical operation 
Atomic operation 
If any statement in the batch succeeds, all will 
No batch isolation 
Other “transactions” can read and write data being affected by a partially executed batch 
© 2014 DataStax, All Rights Reserved. Company Confidential 85
Batch Statements with LWT 
BEGIN BATCH " 
"UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; " 
!UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4; ! 
APPLY BATCH;" 
Allows you to group multiple conditional updates in a batch as long as 
all those updates apply to the same partition 
© 2014 DataStax, All Rights Reserved. Company Confidential 86
Static Columns 
A static column is a special column that is shared by all the rows of the same 
partition 
" 
CREATE TABLE foo ( " 
x text, " 
y bigint, " 
t bigint static, ! 
z bigint, " 
PRIMARY KEY (x, y) );" 
" 
INSERT INTO foo (x,y,t, z) VALUES ('a', 1, 1, 10);" 
INSERT INTO foo (x,y,t, z) VALUES ('a', 2, 2, 20);" 
" 
SELECT * from foo;" 
" 
x | y | t | z" 
---+---+---+----" 
a | 1 | 2 | 10" 
a | 2 | 2 | 20" 
© 2014 DataStax, All Rights Reserved. Company Confidential 87
Static Columns 
Considerations 
• Use them when you want to store some per-partition “static” 
information alongside clustered rows and still want to be able 
to query both of those with a single SELECT. 
• only columns not part of the PRIMARY key can be static. 
• only tables with at least one clustering column can have 
static columns 
• tables with the COMPACT STORAGE option cannot have static 
columns. 
© 2014 DataStax, All Rights Reserved. Company Confidential 88
User Defined Types (UDT’s) 
CREATE TYPE address (! 
street text,! 
city text,! 
zip_code int,! 
phones set<text>! 
)! 
" 
CREATE TABLE users (" 
id uuid PRIMARY KEY," 
name text," 
addresses map<text, frozen <address>>" 
)" 
" 
SELECT id, name, addresses.city, addresses.phones FROM users;" 
" 
id | name | addresses.city | addresses.phones! 
--------------------+----------------+--------------------------" 
63bf691f | johnny | London | {’0201234567', ’0796622222'}" 
© 2014 DataStax, All Rights Reserved. Company Confidential 89
Lets store some JSON 
{ 
"productId": 2, 
"name": "Kitchen Table", 
"price": 249.99, 
"description" : "Rectangular table with oak finish", 
"dimensions": { 
"units": "inches", 
"length": 50.0, 
"width": 66.0, 
"height": 32 
}, 
"categories": { 
{ 
"category" : "Home Furnishings" { 
"catalogPage": 45, 
"url": "/home/furnishings" 
}, 
{ 
"category" : "Kitchen Furnishings" { 
"catalogPage": 108, 
"url": "/kitchen/furnishings" 
} 
} 
} 
© 2014 DataStax, All Rights Reserved. Company Confidential 90
Lets store some JSON 
{ 
"productId": 2, 
"name": "Kitchen Table", 
"price": 249.99, 
"description" : "Rectangular table with oak finish", 
"dimensions": { 
"units": "inches", 
"length": 50.0, 
"width": 66.0, 
"height": 32 
}, 
"categories": { 
{ 
"category" : "Home Furnishings" { 
"catalogPage": 45, 
"url": "/home/furnishings" 
}, 
{ 
"category" : "Kitchen Furnishings" { 
"catalogPage": 108, 
"url": "/kitchen/furnishings" 
} 
} 
} 
CREATE TYPE dimensions (! 
units text,! 
length float,! 
width float,! 
height float! 
)! 
© 2014 DataStax, All Rights Reserved. Company Confidential 91
Lets store some JSON 
{ 
"productId": 2, 
"name": "Kitchen Table", 
"price": 249.99, 
"description" : "Rectangular table with oak finish", 
"dimensions": { 
"units": "inches", 
"length": 50.0, 
"width": 66.0, 
"height": 32 
}, 
"categories": { 
{ 
"category" : "Home Furnishings" { 
"catalogPage": 45, 
"url": "/home/furnishings" 
}, 
{ 
"category" : "Kitchen Furnishings" { 
"catalogPage": 108, 
"url": "/kitchen/furnishings" 
} 
} 
} 
CREATE TYPE dimensions (! 
units text,! 
length float,! 
width float,! 
height float! 
)! 
! 
! 
CREATE TYPE category (! 
catalogPage int,! 
url text! 
);! 
© 2014 DataStax, All Rights Reserved. Company Confidential 92
Lets store some JSON 
{ 
"productId": 2, 
"name": "Kitchen Table", 
"price": 249.99, 
"description" : "Rectangular table with oak finish", 
"dimensions": { 
"units": "inches", 
"length": 50.0, 
"width": 66.0, 
"height": 32 
}, 
"categories": { 
{ 
"category" : "Home Furnishings" { 
"catalogPage": 45, 
"url": "/home/furnishings" 
}, 
{ 
"category" : "Kitchen Furnishings" { 
"catalogPage": 108, 
"url": "/kitchen/furnishings" 
} 
} 
} 
CREATE TYPE dimensions (! 
units text,! 
length float,! 
width float,! 
height float! 
)! 
! 
! 
CREATE TYPE category (! 
catalogPage int,! 
url text! 
);! 
! 
! 
! 
CREATE TABLE product (! 
What’s frozen mean? 
productId int,! 
name text,! 
price float,! 
description text,! 
dimensions frozen <dimensions>,! 
categories map <text, frozen <category>>,! 
PRIMARY KEY (productId)! 
);! 
© 2014 DataStax, All Rights Reserved. Company Confidential 93
Frozen 
• This applies to User Defined Types, Tuples and nested collections. 
• Frozen types are serialized as a single value in Cassandra's storage 
engine, whereas non-frozen types are stored in a form that allows 
updates to individual subfields. 
• When using the frozen keyword, you cannot update parts of a user-defined 
type value. The entire value must be overwritten. 
• Cassandra treats the value of a frozen type like a blob. 
• Cassandra 3.0 will support non-frozen. As of 2.1, we only support 
frozen UDTs. 
• Helps us avoid tech debt! 
© 2014 DataStax, All Rights Reserved. Company Confidential 94
Lets store some JSON 
INSERT INTO product (productId, name, price, 
description, dimensions, categories)" 
VALUES (2, 'Kitchen Table', 249.99, 'Rectangular 
table with oak finish'," 
{! 
units: 'inches',! 
length: 50.0,! 
width: 66.0,! 
height: 32! 
},! 
{" 
'Home Furnishings': {" 
catalogPage: 45," 
url: '/home/furnishings'" 
}," 
'Kitchen Furnishings': {" 
catalogPage: 108," 
url: '/kitchen/furnishings'" 
}" 
" 
}" 
);" 
" 
dimensions frozen <dimensions>! 
categories map <text, frozen <category>>! 
© 2014 DataStax, All Rights Reserved. Company Confidential 95
Tuples 
CREATE TABLE tuple_table (" 
id int PRIMARY KEY," 
three_tuple frozen <tuple<int, text, float>>," 
four_tuple frozen <tuple<int, text, float, inet>>," 
five_tuple frozen <tuple<int, text, float, inet, ascii>>" 
);" 
"" 
• Track a drone’s position 
• x, y, z in a 3D Cartesian 
" 
© 2014 DataStax, All Rights Reserved. Company Confidential 96
Query Tracing 
• You can turn on tracing on or off for queries with the TRACING ON | OFF 
command. 
• Helps you understand what Cassandra is doing and identify any performance 
problems. 
© 2014 DataStax, All Rights Reserved. Company Confidential 97
Authentication and Authorisation 
CQL supports creating users and granting them access to tables 
etc.. 
You need to enable authentication in the cassandra.yaml 
config file. 
You can create, alter, drop and list users 
You can then GRANT permissions to users accordingly – ALTER, 
AUTHORIZE, DROP, MODIFY, SELECT. 
© 2014 DataStax, All Rights Reserved. Company Confidential 98
Plus lots, lots more you can do with CQL …. 
• Have a look at our documentation: 
http://www.datastax.com/documentation 
• Technical Blog 
http://www.datastax.com/dev/blog 
© 2014 DataStax, All Rights Reserved. Company Confidential 99
DataStax Java Driver 
© 2014 DataStax, All RighCtso Rmepsaenryv eCdo. n fidential 100
How to get it? 
• https://github.com/datastax/java-driver 
<dependency> 
<groupId>com.datastax.cassandra</groupId> 
<artifactId>cassandra-­‐driver-­‐core</artifactId> 
<version>2.1.0</version> 
</dependency> 
• The Java client driver 2.1 (branch 2.1) is compatible with Apache Cassandra 1.2, 
2.0 and 2.1. 
• If you try to use a feature that’s not in the version of Cassandra you are connecting 
to, you will get an UnsupportedFeatureException
Not Thrift 
• Traditionally, Cassandra clients (Hector, Astynax1 etc..) were developed using Thrift 
• With Cassandra 1.2 (Jan 2013) and the introduction of CQL3 and the CQL native 
protocol a new easier way of using Cassandra was introduced. 
• Why? 
– Easier to develop and model 
– Best practices for building modern distributed applications 
– Integrated tools and experience 
– Enable Cassandra to evolve easier and support new features 
1Astynax is being updated to include the native driver: https://github.com/Netflix/astyanax/wiki/Astyanax-over-Java-Driver 
© 2014 DataStax, All Rights Reserved. Company Confidential 102
Thrift post-Cassandra 2.1 
There is no more work on Thrift post 2.1.0 
– http://bit.ly/freezethrift 
Will retain it for backwards compatibility, but no new features or 
changes to the Thrift API after 2.1.0 
• “CQL3 is almost two years old now and has proved to be the better 
API that Cassandra needed. CQL drivers have caught up with and 
passed the Thrift ones in terms of features, performance, and 
usability. CQL is easier to learn and more productive than Thrift.” 
• - Jonathan Ellis, Apache Chair, Cassandra 
© 2014 DataStax, All Rights Reserved. Company Confidential 103
Native Protocol 
Thrift: 
Native: 
© 2014 DataStax, All Rights Reserved. Company Confidential 104
Notifications 
Without Notifications 
Polling 
Notifications are for technical events only: 
• Topology changes 
• Node status changes 
• Schema changes 
Pushing 
© 2014 DataStax, All Rights Reserved. Company Confidential 105
Asynchronous Architecture 
Client 
Thread 
Client 
Thread 
Client 
Thread 
Driver 
1 
2 3 
© 2014 DataStax, All Rights Reserved. Company Confidential 
4 
5 
6
Connect and Write 
Cluster cluster = Cluster.builder() 
.addContactPoints("10.158.02.40", "10.158.02.44") 
.build(); 
Session session = cluster.connect("akeyspace"); 
session.execute( 
"INSERT INTO user (username, password) ” 
+ "VALUES(‘johnny’, ‘password1234’)" 
); 
Note: Clusters and Sessions should be long-lived and re-used. 
© 2014 DataStax, All Rights Reserved. Company Confidential 107
Read from a table 
ResultSet rs = session.execute("SELECT * FROM user"); 
List<Row> rows = rs.all(); 
for (Row row : rows) { 
String userName = row.getString("username"); 
String password = row.getString("password"); 
} 
© 2014 DataStax, All Rights Reserved. Company Confidential 108
Paging 
This is great! 
Historically difficult to get huge result sets out of Cassandra. It has generally been 
necessary to explicitly enumerate your row keys in reasonably small batches (1000 rows 
or so per batch would be common). 
This feature now allows you to get huge result sets (including “select * from table), and 
have the server automatically page the results, while the client is just able to trivially 
iterate over the entire result set. 
Makes data exploration much easier. 
© 2014 DataStax, All Rights Reserved. Company Confidential 109
Asynchronous Requests 
ResultSetFuture future = session.executeAsync("SELECT * FROM user"); 
for (Row row : future.get()) { 
String userName = row.getString("username"); 
String password = row.getString("password"); 
} 
Note: The future returned implements Guava's ListenableFuture interface. This means you can use all 
Guava's Futures1 methods! 
1http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/ 
Futures.html 
© 2014 DataStax, All Rights Reserved. Company Confidential 110
Read with Callbacks 
final ResultSetFuture future = session.executeAsync("SELECT * FROM user"); 
Futures.addCallback(future, new FutureCallback<ResultSet>() { 
@Override 
public void onFailure(Throwable t) { 
log.error("Problem encountered“, t); 
//Do something... 
} 
@Override 
public void onSuccess(ResultSet r) { 
log.debug("Success!!); 
//Do something... 
} 
} 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 111
Parallelize Calls 
int queryCount = 99; 
List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>(); 
for (int i=0; i<queryCount; i++) { 
futures.add( 
session.executeAsync("SELECT * FROM user " 
+"WHERE username = '"+i+"'")); 
} 
for(ResultSetFuture future : futures) { 
for (Row row : future.getUninterruptibly()) { 
//do something 
} 
} 
© 2014 DataStax, All Rights Reserved. Company Confidential 112
Tip 
• If you need to do a lot of work, it’s often better to make many 
small queries concurrently than to make one big query. 
– executeAsync and Futures – makes this really easy! 
– Big queries can put a high load on one coordinator 
– Big queries can skew your 99th percentile latencies for other queries 
– If one small query fails you can easily retry, if a big query than you have to 
retry the whole thing 
© 2014 DataStax, All Rights Reserved. Company Confidential 113
Prepared Statements 
PreparedStatement statement = session.prepare( 
"INSERT INTO user (username, password) " 
+ "VALUES (?, ?)"); 
BoundStatement bs = statement.bind(); 
bs.setString("username", "johnny"); 
bs.setString("password", "password1234"); 
session.execute(bs); 
© 2014 DataStax, All Rights Reserved. Company Confidential 114
Query Builder 
Query query = QueryBuilder 
.select() 
.all() 
.from("akeyspace", "user") 
.where(eq("username", "johnny")); 
query.setConsistencyLevel(ConsistencyLevel.ONE); 
ResultSet rs = session.execute(query); 
© 2014 DataStax, All Rights Reserved. Company Confidential 115
Multi Data Center Load Balancing 
Local nodes are queried first, if none are available the request will be sent to a 
remote data center 
Cluster cluster = Cluster.builder() 
.addContactPoints("10.158.02.40", "10.158.02.44") 
.withLoadBalancingPolicy( 
new DCAwareRoundRobinPolicy("DC1")) 
.build(); 
Name of the local DC 
© 2014 DataStax, All Rights Reserved. Company Confidential 116
Token Aware Load Balancing 
Nodes that own a replica of the data being read or 
written by the query will be contacted first 
Cluster cluster = Cluster.builder() 
.addContactPoints("10.158.02.40", "10.158.02.44") 
.withLoadBalancingPolicy( 
new TokenAwarePolicy( 
new DCAwareRoundRobinPolicy("DC1"))) 
.build(); 
© 2014 DataStax, All Rights Reserved. Company Confidential 117
Retry Policies 
The behavior to adopt when a request returns a timeout or is unavailable. 
Cluster cluster = Cluster.builder() 
.addContactPoints("10.158.02.40", "10.158.02.44") 
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE) 
.withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy("DC1"))) 
.build(); 
• DefaultRetryPolicy 
• DowngradingConsistencyRetryPolicy 
• FallthroughRetryPolicy 
• LoggingRetryPolicy 
© 2014 DataStax, All Rights Reserved. Company Confidential 118
Reconnection Policies 
Policy that decides how often the reconnection to a dead node is attempted. 
Cluster cluster = Cluster.builder() 
.addContactPoints("10.158.02.40", "10.158.02.44”) 
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE) 
.withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) 
.withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy("DC1"))) 
.build(); 
• ConstantReconnectionPolicy 
• ExponentialReconnectionPolicy (Default) 
© 2014 DataStax, All Rights Reserved. Company Confidential 119
Query Tracing 
Tracing is enabled on a per-query basis. 
Query query = QueryBuilder 
.select() 
.all() 
.from("akeyspace", "user") 
.where(eq("username", "johnny")) 
.enableTracing(); 
ResultSet rs = session.execute(query); 
ExecutionInfo executionInfo = rs.getExecutionInfo(); 
QueryTrace queryTrace = executionInfo.getQueryTrace(); 
© 2014 DataStax, All Rights Reserved. Company Confidential 120
Query Tracing 
Connected 
to 
cluster: 
xerxes 
Simplex 
keyspace 
and 
schema 
created. 
Host 
(queried): 
/127.0.0.1 
Host 
(tried): 
/127.0.0.1 
Trace 
id: 
96ac9400-­‐a3a5-­‐11e2-­‐96a9-­‐4db56cdc5fe7 
activity 
| 
timestamp 
| 
source 
| 
source_elapsed 
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ 
Parsing 
statement 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
28 
Peparing 
statement 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
199 
Determining 
replicas 
for 
mutation 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
348 
Sending 
message 
to 
/127.0.0.3 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
788 
Sending 
message 
to 
/127.0.0.2 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
805 
Acquiring 
switchLock 
read 
lock 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
828 
Appending 
to 
commitlog 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
848 
Adding 
to 
songs 
memtable 
| 
12:17:16.736 
| 
/127.0.0.1 
| 
900 
Message 
received 
from 
/127.0.0.1 
| 
12:17:16.737 
| 
/127.0.0.2 
| 
34 
Message 
received 
from 
/127.0.0.1 
| 
12:17:16.737 
| 
/127.0.0.3 
| 
25 
Acquiring 
switchLock 
read 
lock 
| 
12:17:16.737 
| 
/127.0.0.2 
| 
672 
Acquiring 
switchLock 
read 
lock 
| 
12:17:16.737 
| 
/127.0.0.3 
| 
525 
Appending 
to 
commitlog 
| 
12:17:16.737 
| 
/127.0.0.2 
| 
692 
Appending 
to 
commitlog 
| 
12:17:16.737 
| 
/127.0.0.3 
| 
541 
Adding 
to 
songs 
memtable 
| 
12:17:16.737 
| 
/127.0.0.2 
| 
741 
Adding 
to 
songs 
memtable 
| 
12:17:16.737 
| 
/127.0.0.3 
| 
583 
Enqueuing 
response 
to 
/127.0.0.1 
| 
12:17:16.737 
| 
/127.0.0.3 
| 
751 
Enqueuing 
response 
to 
/127.0.0.1 
| 
12:17:16.738 
| 
/127.0.0.2 
| 
950 
Message 
received 
from 
/127.0.0.3 
| 
12:17:16.738 
| 
/127.0.0.1 
| 
178 
Sending 
message 
to 
/127.0.0.1 
| 
12:17:16.738 
| 
/127.0.0.2 
| 
1189 
Message 
received 
from 
/127.0.0.2 
| 
12:17:16.738 
| 
/127.0.0.1 
| 
249 
Processing 
response 
from 
/127.0.0.3 
| 
12:17:16.738 
| 
/127.0.0.1 
| 
345 
Processing 
response 
from 
/127.0.0.2 
| 
12:17:16.738 
| 
/127.0.0.1 
| 
377
User Defined Types - Retrieval 
Row 
row 
= 
session.execute( 
"SELECT 
* 
FROM 
user_profiles").one(); 
UDTValue 
address 
= 
row.getUDTValue("address"); 
// 
You 
get 
a 
map-­‐like 
object 
with 
the 
usual 
// 
getters 
and 
setters 
(by 
name, 
by 
index): 
String 
street 
= 
address.getString("street"); 
int 
zip 
= 
address.getInt(2); 
USE 
ks; 
CREATE 
TYPE 
address 
( 
street 
text, 
city 
text, 
zip 
int 
); 
CREATE 
TABLE 
user_profiles 
( 
email 
text 
PRIMARY 
KEY, 
address 
address 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 122
User Defined Types - Creation 
// 
Must 
first 
get 
hold 
of 
the 
datatype: 
// 
From 
an 
existing 
value: 
UserType 
addressType 
= 
address.getType(); 
// 
OR... 
// 
From 
the 
cluster 
metadata: 
UserType 
addressType 
= 
cluster.getMetadata() 
.getKeyspace("ks") 
.getUserType("address"); 
USE 
ks; 
CREATE 
TYPE 
address 
( 
street 
text, 
city 
text, 
zip 
int 
); 
CREATE 
TABLE 
user_profiles 
( 
email 
text 
PRIMARY 
KEY, 
address 
address 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 123
User Defined Types - Creation 
UDTValue 
address 
= 
addressType.newValue() 
.setString("street", 
"...") 
.setString("city", 
"Washington") 
.setInt("zip", 
20500); 
// 
Now 
use 
it 
in 
a 
query 
like 
any 
other 
type: 
session.execute( 
"INSERT 
INTO 
user_profiles 
(email, 
address) 
VALUES 
(?, 
?)", 
"xyz@example.com", 
address); 
USE 
ks; 
CREATE 
TYPE 
address 
( 
street 
text, 
city 
text, 
zip 
int 
); 
CREATE 
TABLE 
user_profiles 
( 
email 
text 
PRIMARY 
KEY, 
address 
address 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 124
Tuples - retrieval 
Row 
row 
= 
session.execute( 
"SELECT 
* 
FROM 
points_of_interest").one(); 
TupleValue 
coordinates 
= 
row.getTupleValue("coordinates"); 
float 
latitude 
= 
coordinates.getFloat(0); 
float 
longitude 
= 
coordinates.getFloat(1); 
USE 
ks; 
CREATE 
TABLE 
points_of_interest 
( 
id 
int 
PRIMARY 
KEY, 
name 
text, 
coordinates 
tuple<float, 
float> 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 125
Tuples - creation 
TupleType 
coordinatesType 
= 
TupleType.of(DataType.cfloat(), 
DataType.cfloat()); 
TupleValue 
newCoordinates 
= 
coordinatesType.newValue(48.858222F, 
2.2945F); 
USE 
ks; 
CREATE 
TABLE 
points_of_interest 
( 
id 
int 
PRIMARY 
KEY, 
name 
text, 
coordinates 
tuple<float, 
float> 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 126
Simple Object Mapper 
Query results ↔ Java objects 
– Avoid boilerplate for common use cases (CRUD, etc.) 
– Keep it simple and close to the bare metal 
• Do not hide Cassandra from the developer 
• Avoid "clever tricks" à la Hibernate 
It comes as an additional Maven artifact: 
<dependency> 
<groupId>com.datastax.cassandra</groupId> 
<artifactId>cassandra-­‐driver-­‐mapping</artifactId> 
<version>2.1.0</version> 
</dependency> 
© 2014 DataStax, All Rights Reserved. Company Confidential 127
Object Mapper Basic CRUD 
@UDT(keyspace 
= 
"ks", 
name 
= 
"address") 
public 
class 
Address 
{ 
private 
String 
street; 
private 
String 
city; 
private 
int 
zip; 
// 
getters 
and 
setters 
omitted... 
} 
@Table(keyspace 
= 
"ks", 
name 
= 
"user_profiles") 
public 
class 
UserProfile 
{ 
@PartitionKey 
private 
String 
email; 
private 
Address 
address; 
// 
getters 
and 
setters 
omitted... 
} 
USE 
ks; 
CREATE 
TYPE 
address 
( 
street 
text, 
city 
text, 
zip 
int 
); 
CREATE 
TABLE 
user_profiles 
( 
email 
text 
PRIMARY 
KEY, 
address 
address 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 128
Object Mapper Basic CRUD 
MappingManager 
manager 
= 
new 
MappingManager(session); 
Mapper 
mapper 
= 
manager.mapper(UserProfile.class); 
UserProfile 
myProfile 
= 
mapper.get("xyz@example.com"); 
ListenableFuture 
saveFuture 
= 
mapper.saveAsync(anotherProfile); 
mapper.delete("xyz@example.com"); 
USE 
ks; 
CREATE 
TYPE 
address 
( 
street 
text, 
city 
text, 
zip 
int 
); 
CREATE 
TABLE 
user_profiles 
( 
email 
text 
PRIMARY 
KEY, 
address 
address 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 129
Accessor custom queries 
@Accessor 
interface 
ProfileAccessor 
{ 
@Query("SELECT 
* 
FROM 
user_profiles 
LIMIT 
:max") 
Result 
firstN(@Param("max") 
int 
limit); 
} 
ProfileAccessor 
accessor 
= 
manager.createAccessor(ProfileAccessor.class); 
Result 
profiles 
= 
accessor.firstN(10); 
// 
Result 
is 
like 
ResultSet, 
but 
specialized 
for 
a 
// 
mapped 
class: 
for 
(UserProfile 
profile 
: 
profiles) 
{ 
System.out.println( 
profile.getAddress().getZip() 
); 
} 
USE 
ks; 
CREATE 
TYPE 
address 
( 
street 
text, 
city 
text, 
zip 
int 
); 
CREATE 
TABLE 
user_profiles 
( 
email 
text 
PRIMARY 
KEY, 
address 
address 
); 
© 2014 DataStax, All Rights Reserved. Company Confidential 130
Training Day | December 3rd 
Beginner Track 
• Introduction to Cassandra 
• Introduction to Spark, Shark, Scala and 
Cassandra 
Advanced Track 
• Data Modeling 
• Performance Tuning 
Conference Day | December 4th 
Cassandra Summit Europe 2014 will be the single 
largest gathering of Cassandra users in Europe. 
Learn how the world's most successful companies are 
transforming their businesses and growing faster than 
ever using Apache Cassandra. 
http://bit.ly/cassandrasummit2014 
© 2014 DataStax, All Rights Reserved. Company Confidential 131

Mais conteúdo relacionado

Mais procurados

BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...Edureka!
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraChetan Baheti
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...DataStax
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 

Mais procurados (20)

Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
What is Apache Cassandra? | Apache Cassandra Tutorial | Apache Cassandra Intr...
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra ppt 2
Cassandra ppt 2Cassandra ppt 2
Cassandra ppt 2
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 

Destaque

Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaCassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaDataStax Academy
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Chris Richardson
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and ToolsDuyhai Doan
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandraniallmilton
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraDataStax Academy
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 

Destaque (6)

Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaCassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...
 
Cassandra Drivers and Tools
Cassandra Drivers and ToolsCassandra Drivers and Tools
Cassandra Drivers and Tools
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandra
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 

Semelhante a Cassandra Summit Europe 2014 Training Agenda

Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkDataStax Academy
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Johnny Miller
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranData Con LA
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Cédrick Lunven
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-finalHaluk Ulubay
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...Daniel Cohen
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceMysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceBernd Ocklin
 

Semelhante a Cassandra Summit Europe 2014 Training Agenda (20)

Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High PerformanceMysql NDB Cluster's Asynchronous Parallel Design for High Performance
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Cassandra Summit Europe 2014 Training Agenda

  • 1. Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 © 2014 DataStax, All Rights Reserved. Company Confidential 1
  • 2. Introduction to Cassandra for Java Developers Why, What and How. Johnny Miller, Solutions Architect @CyanMiller www.linkedin.com/in/johnnymiller
  • 3. Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 © 2014 DataStax, All Rights Reserved. Company Confidential 3
  • 4. DataStax • Founded in April 2010 • We drive Apache Cassandra™ • 300+ employees • Home to Apache Cassandra™ Chair & most committers – Contribute ~ 80% of code into Apache Cassandra™ code base • Headquartered in San Francisco Bay area • European headquarters in London © 2014 DataStax, All Rights Reserved. 4
  • 5. DataStax Enterprise www.datastax.com © 2014 DataStax, All Rights Reserved. 5
  • 6. DataStax Enterprise is free for startups!!! • Unlimited, free use of the software in DataStax Enterprise. • No limit on number of nodes or other hidden restrictions. • If you’re a startup, it’s free! www.datastax.com/startups © 2014 DataStax, All Rights Reserved. 6
  • 7. Training • Checkout the DataStax Academy for free online training – http://academy.datastax.com • Public courses • On-site training www.datastax.com/training
  • 8. We are hiring! www.datastax.com/careers @DataStaxCareers 8
  • 10. Apache Cassandra • Cassandra is a massively scaleable open source NoSQL database • 100% uptime • Predictable scaling • High Performance Dynamo BigTable Node 1 Node 4 Node 3 Node 5 BigTable: http://research.google.com/archive/bigtable-osdi06.pdf Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Node 2 © 2014 DataStax, All Rights Reserved. 10
  • 11. Cassandra Core Values • Ease of use • Massive scalability • High performance • Always Available © 2014 DataStax, All Rights Reserved. 11
  • 12. Availability and Speed Matters for online apps! • UK retailers lost 8.5 billion last year to slow web sites, which is 1 million for every 10 million in online sales • Over half of all web users expect a response time of 2 seconds or less • A 1 second delay causes a nearly 10% reduction in customer interactions • A 1 second decrease in Amazon page load time costs the company $1.6 billion in sales © 2014 DataStax, All Rights Reserved. Company Confidential 12
  • 13. Adoption http://db-engines.com/en/ranking September 2014 © 2014 DataStax, All Rights Reserved. 13
  • 14.
  • 15. Performance & Scale Cassandra works for small to huge deployments. • Cassandra footprint @ Netflix • 80+ Clusters • 2500+ nodes • 4 Data Centres (Amazon Regions) • > 1 Trillion transactions per day See: http://www.datastax.com/resources/casestudies/netflix
  • 16. Performance & Scale “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.” Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2012. Benchmark paper presented at the Very Large Database Conference, 2012. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf End Point Independent NoSQL Benchmark Lowest in latency… Netflix Cloud Benchmark… Highest in throughput… © 2014 DataStax, All Rights Reserved. 16 http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability- on.html http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking- Top-NoSQL-Databases.pdf
  • 17. Availability © 2014 DataStax, All Rights Reserved. 17
  • 18. Ideal Cassandra Use Cases Product catalogs and playlists • Collection of items Internet of things • Sensor data Messaging • Emails, instant messages, alerts, comments Recommendation and personalization • Trend information Fraud detection • Data patterns Time-series and time-ordered data • Data with time dimension http://planetcassandra.org © 2014 DataStax, All Rights Reserved. Company Confidential 18
  • 19. What is Apache Cassandra? 19
  • 20. Overview • Cassandra was designed with the understanding that system/hardware failures can and do occur • Peer-to-peer, distributed system • All nodes the same Data partitioned among all nodes in the cluster Node 1 • • Custom data replication to ensure fault tolerance • Read/Write-anywhere and across data centres Node 2 Node 4 Node 3 Node 5
  • 21. Cassandra > 1 Server • All nodes participate in a cluster • Add or remove as needed • All nodes the same – masterless with no single point of failure • Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second • Data partitioned among all nodes in the cluster • More capacity? Add a server! • More throughput? Add a server! Node 1 Node 2 Node 4 Node 3 Node 5 © 2014 DataStax, All Rights Reserved. Company Confidential 21
  • 22. Write Path on individual server • A commit log is used on each node to capture write activity. Data durability is assured • Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable) © 2014 DataStax, All Rights Reserved. Company Confidential 22
  • 23. Distributed • Replication factor (RF): How many copies of your data? • RF = 3 in this example • Each node is storing 60% of the clusters total data i.e. 3/5 Handy Calculator: http://www.ecyrd.com/cassandracalculator/ Node 1 1st copy Node 4 Node 5 Node 2 2nd copy © 2014 DataStax, All Rights Reserved. Company Confidential 23 Node 3 3rd copy • Client reads or writes to any node • Node coordinates with others • Data read or replicated in parallel Node 2 2nd copy
  • 24. Rack Aware © 2014 DataStax, All Rights Reserved. Company Confidential 24 • Cassandra is aware of which rack (or availability zone) each node resides in. • It will attempt to place each data copy in a different rack. • RF = 3 in this example Node 1 1st copy Node 4 Node 2 Node 3 2nd copy Rack 1 Rack 2 Rack 2 Rack 3 Rack 1 Node 5 3rd copy
  • 25. Data Distribution and Replication • Cassandra is designed as a peer-to-peer system that makes copies of the data and distributes the copies among a group of nodes • Data is organized by table and identified by a primary key. The primary key determines which node the data is stored on. • Each node is assigned data based on the hash value of the primary key and the range of hash values associated with that node. © 2014 DataStax, All Rights Reserved. Company Confidential 25
  • 26. Partitioners • A partitioner determines how data is distributed across the nodes in the cluster. • Cassandra has three: – Murmur3Partitioner (default) – RandomPartitioner – ByteOrderedPartitioner (don’t use this unless you know what your doing and have a very good reason to do this) © 2014 DataStax, All Rights Reserved. Company Confidential 26
  • 27. Murmer3 Example • Data: Primary Key jim age: 36 car: ford gender: M carol age: 37 car: bmw gender: F johnny age: 12 gender: M suzy: age: 10 gender: F • Murmer3 Hash Values: Primary Key Murmur3 hash value jim -2245462676723223822 carol 7723358927203680754 johnny -6723372854036780875 suzy 1168604627387940318 © 2014 DataStax, All Rights Reserved. Company Confidential 27
  • 28. Murmer3 Example 4 node cluster: Node Murmur3 start range Murmur3 end range A -9223372036854775808 -4611686018427387903 B -4611686018427387904 -1 C 0 4611686018427387903 D 4611686018427387904 9223372036854775807 © 2014 DataStax, All Rights Reserved. Company Confidential 28
  • 29. Murmer3 Example Data is distributed as: Node Start range End range Primary key Hash value A -92233720368547758 08 -4611686018427387903 johnny -672337285403678 0875 B -461168601842738790 4 -1 jim -224546267672322 3822 C 0 4611686018427387903 suzy 1168604627387940 318 D 461168601842738790 4 9223372036854775807 carol 7723358927203680 754 © 2014 DataStax, All Rights Reserved. Company Confidential 29
  • 30. Replicas • A replication strategy determines the nodes where replicas are placed. • All replicas are equally important; there is no primary or master replica. • Two replication strategies available: – SimpleStrategy (prototype only, don’t use in production) – NetworkTopologyStrategy • Replication strategy is set at the keyspace/schema level. © 2014 DataStax, All Rights Reserved. Company Confidential 30
  • 31. NetworkTopologyStrategy • Specified how many replicas you want in each data center (physical or virtual). • NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first node in another rack. © 2014 DataStax, All Rights Reserved. Company Confidential 31
  • 32. Data Center Aware © 2014 DataStax, All Rights Reserved. DC: USA DC: EUROPE Node 5 Node 2 32 Node 1 1st copy Node 4 2nd copy Node 3 3rd copy Node 1 1st copy Node 5 Node 2 Node 4 2nd copy Node 3 3rd copy • Active Everywhere!! • Client writes local • Data syncs across WAN
  • 33. Tunable Consistency Node 1 1st copy 5 μs ack Parallel Write 12 μs ack Node 5 Node 2 Node 4 2nd copy 12 μs ack Node 3 3rd copy Write CL=QUORUM 500 μs ack • Consistency Level (CL) • Client specifies per operation • Handles multi-data center operations ALL = All replicas ack QUORUM = > 51% of replicas ack LOCAL_QUORUM = > 51% in local DC ack ONE = Only one replica acks Plus more…. (see docs) © 2014 DataStax, All Rights Reserved. Company Confidential 33
  • 34. Node Failure • A single node failure shouldn’t bring failure. • Replication Factor + Consistency Level = Success • This example: – RF = 3 – CL = QUORUM Node 1 1st copy 5 μs ack Parallel Write 12 μs ack Node 5 Node 2 Node 4 2nd copy 12 μs ack Node 3 3rd copy Write CL=QUORUM >51% ack – so request is a success © 2014 DataStax, All Rights Reserved. Company Confidential 34
  • 35. Node Recovery • When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally. • When the node recovers, the coordinator replays the missed writes. • Note: a hinted write does not count towards the consistency level Node 1 1st copy • Note: you should still run repairs across your cluster Node 5 Node 2 Node 4 2nd copy Node 3 3rd copy Stores Hints while Node 3 is offline © 2014 DataStax, All Rights Reserved. Company Confidential 35
  • 36. Rack Failure • Cassandra will place the data in as many different racks or availability zones as it can. • This example: – RF = 3 – CL = QUORUM – Rack 2 fails • Data copies still available in Node 1 and Node 5 • Quorum can be honored i.e. > 51% ack request is a success! Rack 1 Node 1 1st copy Rack 2 Rack 2 Node 4 Rack 1 Node 2 Node 3 2nd copy Rack 3 Node 5 3rd copy © 2014 DataStax, All Rights Reserved. Company Confidential 36
  • 37. Don’t be afraid of weak consistency • More tolerant to failure • Consistency Level of 1 is the most popular (I think) • If you want stronger consistency go for LOCAL_QUORUM i.e. quorum is honored in the local data centre. • If you go stronger than LOCAL_QUORUM – really understand what this means and why you are doing it. • Remember – you can have different consistency levels for reads and writes e.g. write with CL:1, read with CL:LOCAL_QUORUM © 2014 DataStax, All Rights Reserved. Company Confidential 37
  • 38. Ring fenced resources • If you need to isolate resources for different uses, Cassandra is a great fit. • You can create separate virtual data centres optimised as required – different workloads, hardware, availability etc.. • Cassandra will replicate the data for you automatically – no ETL is necessary Customer Facing Analytics Cassandra Replication © 2014 DataStax, All Rights Reserved. Company Confidential 38
  • 39. Hybrid Cloud • Cassandra is fmulti-data centre and cloud capable • Data written to any node is automatically and transparently written to all other nodes in multiple data centres i.e. NO ETL Public Cloud Data Centre 1 Data Centre2 © 2014 DataStax, All Rights Reserved. Company Confidential 39
  • 40. Security BENEFITS FEATURES Internal Authentication Manages login IDs and passwords inside the database + Ensures only authorized users can access a database system using internal validation + Simple to implement and easy to understand + No learning curve from the relational world Object Permission Management controls who has access to what and who can do what in the database + Provides granular based control over who can add/ change/delete/read data + Uses familiar GRANT/ REVOKE from relational systems + No learning curve Client to Node Encryption protects data in flight to and from a database cluster + Ensures data cannot be captured/stolen in route to a server + Data is safe both in flight from/to a database and on the database; complete coverage is ensured © 2014 DataStax, All Rights Reserved. Company Confidential 40
  • 41. Uptime Web Tier Web Tier Web Tier Web Tier App Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication © 2014 DataStax, All Rights Reserved. Company Confidential 41
  • 42. DC Failure Web Tier Web Tier Web Tier Web Tier App Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication © 2014 DataStax, All Rights Reserved. Company Confidential 42
  • 43. Tier Failure Web Tier Web Tier Web Tier Web Tier App Tier App Tier App Tier DC: USA DC: Europe DC: Asia Cassandra Replication © 2014 DataStax, All Rights Reserved. Company Confidential 43
  • 44. WAN Failure Web Tier Web Tier Web Tier Web Tier App Tier App Tier App Tier Consistency level? DC: USA DC: Europe DC: Asia Cassandra Replication Cassandra Replication © 2014 DataStax, All Rights Reserved. Company Confidential 44
  • 47. Why Good Data Modeling is Important? Cassandra is a highly available, highly scalable, & highly distributed database - with no single point of failure To achieve this, Cassandra is optimized for non-relational data models. • Joins do not function well on distributed databases. • Locking and transactions jam up distributed nodes
  • 48. So what do I do? • Lots of different approaches to this… • Basically, create multiple tables optimized for your queries. Query Based Modeling • The main point is: DON’T BE AFRAID OF WRITES DENORMALISE AND MODEL YOUR DATA AROUND YOUR QUERIES © 2014 DataStax, All Rights Reserved. Company Confidential 48
  • 49. For more on data modeling… • Data modeling video series by Patrick McFadin • Part 1: The Data Model is Dead, Long Live the Data Model http://www.youtube.com/watch?v=px6U2n74q3g • Part 2: Become a Super Modeler http://www.youtube.com/watch?v=qphhxujn5Es • Part 3: The World's Next Top Data Model http://www.youtube.com/watch?v=HdJlsOZVGwM © 2014 DataStax, All Rights Reserved. Company Confidential 49
  • 51. CQL • Cassandra Query Language • SQL–like language to query Cassandra • Limited predicates. Attempts to prevent bad queries – but, you can still get into trouble! • Keyspace – analogous to a schema. – Has various storage attributes. – The keyspace determines the RF (replication factor). • Table – looks like a SQL Table. – A table must have a Primary Key. – We can fully qualify a table as <keyspace>.<table>
  • 52. DevCenter © 2014 DataStax, All Rights Reserved. Company Confidential 52
  • 53. CQLSH Command line interface comes with Cassandra Launching on Linux • $ cqlsh [options] [host [port]] " Launching on Windows • python cqlsh [options] [host [port]]" Example • $ cqlsh" • $ cqlsh -u student -p cassandra 127.0.0.1 9160" © 2014 DataStax, All Rights Reserved. Company Confidential 53
  • 54. CQLSH © 2014 DataStax, All Rights Reserved. Company Confidential 54
  • 55. CQL Basics • Usual statements • CREATE / DROP / ALTER TABLE • SELECT / INSERT / UPDATE • Plus many, many more! www.datastax.com/docs © 2014 DataStax, All Rights Reserved. Company Confidential 55
  • 56. What is a keyspace? • Keyspace or schema is a top-level namespace • All data objects (e.g., tables) must belong to some keyspace • Defines how data is replicated on nodes • Keyspace per application is a good idea • Replica placement strategy • SimpleStrategy (prototyping) • NetworkTopologyStrategy (production) © 2014 DataStax, All Rights Reserved. Company Confidential 56
  • 57. Creating a keyspace - Multiple Data Centre © 2014 DataStax, All Rights Reserved. Company Confidential 57
  • 58. Use and Drop a keyspace • To work with data objects (e.g., tables) in a keyspace: • USE pchstats;" • To delete a keyspace and all internal data objects • DROP KEYSPACE pchstats;" If you drop a keyspace or table, Cassandra will back it up! You can disable this via auto_snapshot in the cassandra.yaml © 2014 DataStax, All Rights Reserved. Company Confidential 58
  • 59. Creating a table CREATE TABLE cities (" city_name varchar," elevation int," population int," latitude float," longitude float," PRIMARY KEY (city_name)! );" • We can visualize it this way: • Here, city_name is the partition key • In this example, the partition key = primary key © 2014 DataStax, All Rights Reserved. Company Confidential 59
  • 60. Compound Primary Key The Primary Key • The key uniquely identifies a row. • A composite primary key consists of: • A partition key • One or more clustering columns • e.g. PRIMARY KEY (partition key, cluster columns, ...)" • The partition key determines on which node the partition resides • Data is ordered in cluster column order within the partition © 2014 DataStax, All Rights Reserved. Company Confidential 60
  • 61. Compound Primary Key CREATE TABLE sporty_league (" team_name varchar," player_name varchar," jersey int," PRIMARY KEY (team_name, player_name)! );" Partition Key Clustering Column © 2014 DataStax, All Rights Reserved. Company Confidential 61
  • 62. Simple Select This is a bad query SELECT * FROM sporty_league;" Partition Key Clustering Column ordered • More that a few rows can be slow. (Limited to 10,000 rows by default) • Use LIMIT keyword to choose fewer or more rows © 2014 DataStax, All Rights Reserved. Company Confidential 62
  • 63. Select on Partition Key and Cluster Columns SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’;" SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’ and player_name = ‘Lucky’;" © 2014 DataStax, All Rights Reserved. Company Confidential 63
  • 64. Composite Partition Key CREATE TABLE cities (" Partition Key city_name varchar," state varchar" PRIMARY KEY ((city_name, state))" );" © 2014 DataStax, All Rights Reserved. Company Confidential 64
  • 65. CQL Predicates • On the partition key: = and IN • On the cluster columns: <, <=, =, >=, >, IN © 2014 DataStax, All Rights Reserved. Company Confidential 65
  • 66. Performance Considerations • The best queries are in a single partition. i.e. WHERE partition key = <something> • Each new partition requires a new disk seek. • Queries that span multiple partitions are s-l-o-w • Queries that span multiple cluster columns are fast Company Confidential 66
  • 67. Ordering • Partition keys are not ordered, but the cluster columns are. • However, you can only order by a column if it’s a cluster column. • Data will returned by default in the order of the clustering column. • You can also use the ORDER BY keyword – but only on the clustering column! SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’ ORDER BY player_name DESC;" © 2014 DataStax, All Rights Reserved. Company Confidential 67
  • 68. Secondary Indexes If we want to do a query on a column that is not part of your PK, you can create an index: CREATE INDEX ON <table>(<column>); " Than you can do a select: SELECT * FROM product WHERE type= ’PC';" Suitable for low cardinality data Much more efficient to model your data around the query 68 Watch out for global indexes in Cassandra 3.0 (CASSANDRA-6477)
  • 69. Insert/Update INSERT INTO sporty_league (team_name, player_name, jersey) VALUES ('Mighty Mutts',’Felix’,90); UPDATE sporty_league SET jersey = 77" WHERE team_name = 'Mighty Mutts’ AND player_name = ‘Felix’;" Inserts a row into a table • Must specify columns to insert values into • Primary key columns are mandatory (identify the row) • Other columns do not have to have values and Non-existent ‘values’ do not take up space Inserts are atomic • All values of a row are inserted or none Inserts are isolated • Two inserts with the same values in primary key columns will not interfere – executed one after another © 2014 DataStax, All Rights Reserved. Company Confidential 69
  • 70. What is an upsert? • UPdate + inSERT • Both UPDATE and INSERT are write operations • No reading before writing • Term “upsert” denotes the following behavior • INSERT updates or overwrites an existing row – When inserting a row in a table that already has another row with the same values in primary key columns • UPDATE inserts a new row – When a to-be-updated row, identified by values in primary key columns, does not exist • Upserts are legal and do not result in error or warning messages © 2014 DataStax, All Rights Reserved. Company Confidential 70
  • 71. How to avoid UPSERTS Guarantee that your primary keys are unique from one another Use an appropriate natural key based on your data Use a surrogate key for partition key Risks with natural keys Depending on the type of natural key that is used, there may still be an increased risk of UPSERTs Changing the datum used for a Natural Key requires a lot of overhead. So why not use a sequence to generate a surrogate key? You cant – Cassandra doesn’t provide sequences! © 2014 DataStax, All Rights Reserved. Company Confidential 71
  • 72. What? No sequences! • Sequences are a handy feature in RDMBS for auto-creation of IDs for you data. – Guaranteed unique – E.g. INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)" • Cassandra has no sequences! – Extremely difficult in a masterless distributed system – Requires a lock (perf killer) • What to do? – Use part of the data to create a unique key – Use a UUID or TIMEUUID © 2014 DataStax, All Rights Reserved. Company Confidential 72
  • 73. UUID • Universal Unique ID • 128 bit number represented in character form e.g. 99051fe9-6a9c-46c2-b949-38ef78858dd0 • Easily generated on the client • Version 1 has a timestamp component (TIMEUUID) • Version 4 has no timestamp component – Faster to generate © 2014 DataStax, All Rights Reserved. Company Confidential 73
  • 74. TIMEUUID TIMEUUID data type supports Version 1 UUIDs Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 bits) – CQL function ‘now()’ generates a new TIMEUUID Time can be extracted from TIMEUUID – CQL function dateOf() extracts the timestamp as a date TIMEUUID values in clustering columns or in column names are ordered based on time – DESC order on TIMEUUID lists most recent data first © 2014 DataStax, All Rights Reserved. Company Confidential 74
  • 75. Counters Stores a number that incrementally counts the occurrences of a particular event or process. Note: If a table has a counter column, all non-counter columns must be part of a primary key CREATE TABLE UserActions (" user VARCHAR," action VARCHAR," total COUNTER,! PRIMARY KEY (user, action)" );" UPDATE UserActions SET total = total + 2 WHERE user = 123 AND action = ’xyz';" © 2014 DataStax, All Rights Reserved. Company Confidential 75
  • 76. Counters Pre-Cassandra 2.1 • Edge-cases around accuracy • Counter update was not an idempotent operation Cassandra 2.1 • Simpler implementation, no more edge cases • Possible to properly repair now • Significantly less garbage and internode traffic generated • Better performance for 99% of uses © 2014 DataStax, All Rights Reserved. Company Confidential 76
  • 77. Time-to-Live (TTL) TTL a row:! INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’) " USING TTL 3600; // Expires data in one hour " TTL a column: UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’" – TTL in seconds – Can also set default TTL at a table level – Expired columns/values automatically deleted – With no TTL specified, columns/values never expire – TTL is useful for automatic deletion – Re-inserting the same row before it expires will overwrite TTL © 2014 DataStax, All Rights Reserved. Company Confidential 77
  • 78. Collection Data Type • CQL supports having columns that contain collections of data. • The collection types include: – Set, List and Map. CREATE TABLE collections_example (" "id int PRIMARY KEY," !set_example set<text>,! !list_example list<text>,! !map_example map<int, text>! ); • These data types are intended to support the type of one-to-many relationships that can be modeled in a relational DB e.g. a user has many email addresses. • Some performance considerations around collections. – Often more efficient to denormalise further rather than use collections if intending to store lots of data. – Favor sets over list You now have collection indexing in Cassandra 2.1 © 2014 DataStax, All Rights Reserved. Company Confidential 78
  • 79. Secondary indexes on collections CREATE TABLE songs (" id uuid PRIMARY KEY," artist text," album text," title text," data blob," tags set<text>! );" " CREATE INDEX song_tags_idx ON songs(tags);! " SELECT * FROM songs WHERE tags CONTAINS 'blues';! " id | album | artist | tags | title" ----------+---------------+-------------------+-----------------------+------------------" 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind" " © 2014 DataStax, All Rights Reserved. Company Confidential 79
  • 80. Secondary indexes on map keys If you prefer indexing the map keys, you can do so by creating a KEYS index and by using CONTAINS KEY CREATE TABLE products (" id int PRIMARY KEY," description text," price int," categories set<text>," features map<text, text>" );" " CREATE INDEX feat_key_index ON products(KEYS(features));! " SELECT id, description" FROM products" WHERE features CONTAINS KEY 'refresh-rate';! " id | description" -------+-----------------------------" 34134 | 120-inch 1080p 3D plasma TV" © 2014 DataStax, All Rights Reserved. Company Confidential 80
  • 81. Lightweight Transactions (LWT) Why? • Solve a class of race conditions in Cassandra that you would otherwise need to install an external locking manager to solve. Syntax: "INSERT INTO customer_account (customerID, customer_email)" "VALUES (‘Johnny’, ‘jmiller@datastax.com’) "IF NOT EXISTS;! " "UPDATE customer_account " "SET customer_email=’jmiller@datastax.com’" !IF customer_email=’jmiller@datastax.com’;! ! Example Use Case: • Registering a user © 2014 DataStax, All Rights Reserved. Company Confidential 81
  • 82. Race Condition 82 SELECT name" FROM users" WHERE username = 'johnny';" (0 rows)" SELECT name" FROM users" WHERE username = 'johnny';" (0 rows)" INSERT INTO users " (username, name, email," password, created_date)" VALUES ('johnny'," 'Johnny Miller'," ['jmiller@datastax.com']," 'ba27e03fd9...'," '2011-06-20 13:50:00');" INSERT INTO users " (username, name, email," password, created_date)" VALUES ('johnny'," 'Johnny Miller'," ['jmiller@datastax.com']," 'ea24e13ad9...'," '2011-06-20 13:50:01');" This one wins!
  • 83. Lightweight Transactions (LWT) INSERT INTO users " (username, name, email," password, created_date)" VALUES ('johnny'," 'Johnny Miller'," ['jmiller@datastax.com']," 'ba27e03fd9...'," '2011-06-20 13:50:00')" IF NOT EXISTS;! INSERT INTO users " (username, name, email," password, created_date)" VALUES ('johnny'," 'Johnny Miller'," ['jmiller@datastax.com']," 'ea24e13ad9...'," '2011-06-20 13:50:01’)" IF NOT EXISTS;! " [applied]" -----------" True! [applied] | username | created_date | name " -----------+----------+----------------+----------------" False | johnny | 2011-06-20 ... | Johnny Miller! © 2014 DataStax, All Rights Reserved. 83
  • 84. Lightweight Transactions (LWT) Consequences of Lightweight Transactions 4 round trips vs. 1 for normal updates (uses Paxos algorithm) Operations are done on a per-partition basis Will be going across data centres to obtain consensus (unless you use LOCAL_SERIAL consistency) Cassandra user will need read and write access i.e. you get back the row! Great for 1% your app, but eventual consistency is still your friend! © 2014 DataStax, All Rights Reserved. Company Confidential 84
  • 85. Batch Statements BEGIN BATCH" INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user')" UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2'" INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c')" DELETE name FROM users WHERE userID = 'user2’" APPLY BATCH; BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical operation Atomic operation If any statement in the batch succeeds, all will No batch isolation Other “transactions” can read and write data being affected by a partially executed batch © 2014 DataStax, All Rights Reserved. Company Confidential 85
  • 86. Batch Statements with LWT BEGIN BATCH " "UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; " !UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4; ! APPLY BATCH;" Allows you to group multiple conditional updates in a batch as long as all those updates apply to the same partition © 2014 DataStax, All Rights Reserved. Company Confidential 86
  • 87. Static Columns A static column is a special column that is shared by all the rows of the same partition " CREATE TABLE foo ( " x text, " y bigint, " t bigint static, ! z bigint, " PRIMARY KEY (x, y) );" " INSERT INTO foo (x,y,t, z) VALUES ('a', 1, 1, 10);" INSERT INTO foo (x,y,t, z) VALUES ('a', 2, 2, 20);" " SELECT * from foo;" " x | y | t | z" ---+---+---+----" a | 1 | 2 | 10" a | 2 | 2 | 20" © 2014 DataStax, All Rights Reserved. Company Confidential 87
  • 88. Static Columns Considerations • Use them when you want to store some per-partition “static” information alongside clustered rows and still want to be able to query both of those with a single SELECT. • only columns not part of the PRIMARY key can be static. • only tables with at least one clustering column can have static columns • tables with the COMPACT STORAGE option cannot have static columns. © 2014 DataStax, All Rights Reserved. Company Confidential 88
  • 89. User Defined Types (UDT’s) CREATE TYPE address (! street text,! city text,! zip_code int,! phones set<text>! )! " CREATE TABLE users (" id uuid PRIMARY KEY," name text," addresses map<text, frozen <address>>" )" " SELECT id, name, addresses.city, addresses.phones FROM users;" " id | name | addresses.city | addresses.phones! --------------------+----------------+--------------------------" 63bf691f | johnny | London | {’0201234567', ’0796622222'}" © 2014 DataStax, All Rights Reserved. Company Confidential 89
  • 90. Lets store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } © 2014 DataStax, All Rights Reserved. Company Confidential 90
  • 91. Lets store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions (! units text,! length float,! width float,! height float! )! © 2014 DataStax, All Rights Reserved. Company Confidential 91
  • 92. Lets store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions (! units text,! length float,! width float,! height float! )! ! ! CREATE TYPE category (! catalogPage int,! url text! );! © 2014 DataStax, All Rights Reserved. Company Confidential 92
  • 93. Lets store some JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions (! units text,! length float,! width float,! height float! )! ! ! CREATE TYPE category (! catalogPage int,! url text! );! ! ! ! CREATE TABLE product (! What’s frozen mean? productId int,! name text,! price float,! description text,! dimensions frozen <dimensions>,! categories map <text, frozen <category>>,! PRIMARY KEY (productId)! );! © 2014 DataStax, All Rights Reserved. Company Confidential 93
  • 94. Frozen • This applies to User Defined Types, Tuples and nested collections. • Frozen types are serialized as a single value in Cassandra's storage engine, whereas non-frozen types are stored in a form that allows updates to individual subfields. • When using the frozen keyword, you cannot update parts of a user-defined type value. The entire value must be overwritten. • Cassandra treats the value of a frozen type like a blob. • Cassandra 3.0 will support non-frozen. As of 2.1, we only support frozen UDTs. • Helps us avoid tech debt! © 2014 DataStax, All Rights Reserved. Company Confidential 94
  • 95. Lets store some JSON INSERT INTO product (productId, name, price, description, dimensions, categories)" VALUES (2, 'Kitchen Table', 249.99, 'Rectangular table with oak finish'," {! units: 'inches',! length: 50.0,! width: 66.0,! height: 32! },! {" 'Home Furnishings': {" catalogPage: 45," url: '/home/furnishings'" }," 'Kitchen Furnishings': {" catalogPage: 108," url: '/kitchen/furnishings'" }" " }" );" " dimensions frozen <dimensions>! categories map <text, frozen <category>>! © 2014 DataStax, All Rights Reserved. Company Confidential 95
  • 96. Tuples CREATE TABLE tuple_table (" id int PRIMARY KEY," three_tuple frozen <tuple<int, text, float>>," four_tuple frozen <tuple<int, text, float, inet>>," five_tuple frozen <tuple<int, text, float, inet, ascii>>" );" "" • Track a drone’s position • x, y, z in a 3D Cartesian " © 2014 DataStax, All Rights Reserved. Company Confidential 96
  • 97. Query Tracing • You can turn on tracing on or off for queries with the TRACING ON | OFF command. • Helps you understand what Cassandra is doing and identify any performance problems. © 2014 DataStax, All Rights Reserved. Company Confidential 97
  • 98. Authentication and Authorisation CQL supports creating users and granting them access to tables etc.. You need to enable authentication in the cassandra.yaml config file. You can create, alter, drop and list users You can then GRANT permissions to users accordingly – ALTER, AUTHORIZE, DROP, MODIFY, SELECT. © 2014 DataStax, All Rights Reserved. Company Confidential 98
  • 99. Plus lots, lots more you can do with CQL …. • Have a look at our documentation: http://www.datastax.com/documentation • Technical Blog http://www.datastax.com/dev/blog © 2014 DataStax, All Rights Reserved. Company Confidential 99
  • 100. DataStax Java Driver © 2014 DataStax, All RighCtso Rmepsaenryv eCdo. n fidential 100
  • 101. How to get it? • https://github.com/datastax/java-driver <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-­‐driver-­‐core</artifactId> <version>2.1.0</version> </dependency> • The Java client driver 2.1 (branch 2.1) is compatible with Apache Cassandra 1.2, 2.0 and 2.1. • If you try to use a feature that’s not in the version of Cassandra you are connecting to, you will get an UnsupportedFeatureException
  • 102. Not Thrift • Traditionally, Cassandra clients (Hector, Astynax1 etc..) were developed using Thrift • With Cassandra 1.2 (Jan 2013) and the introduction of CQL3 and the CQL native protocol a new easier way of using Cassandra was introduced. • Why? – Easier to develop and model – Best practices for building modern distributed applications – Integrated tools and experience – Enable Cassandra to evolve easier and support new features 1Astynax is being updated to include the native driver: https://github.com/Netflix/astyanax/wiki/Astyanax-over-Java-Driver © 2014 DataStax, All Rights Reserved. Company Confidential 102
  • 103. Thrift post-Cassandra 2.1 There is no more work on Thrift post 2.1.0 – http://bit.ly/freezethrift Will retain it for backwards compatibility, but no new features or changes to the Thrift API after 2.1.0 • “CQL3 is almost two years old now and has proved to be the better API that Cassandra needed. CQL drivers have caught up with and passed the Thrift ones in terms of features, performance, and usability. CQL is easier to learn and more productive than Thrift.” • - Jonathan Ellis, Apache Chair, Cassandra © 2014 DataStax, All Rights Reserved. Company Confidential 103
  • 104. Native Protocol Thrift: Native: © 2014 DataStax, All Rights Reserved. Company Confidential 104
  • 105. Notifications Without Notifications Polling Notifications are for technical events only: • Topology changes • Node status changes • Schema changes Pushing © 2014 DataStax, All Rights Reserved. Company Confidential 105
  • 106. Asynchronous Architecture Client Thread Client Thread Client Thread Driver 1 2 3 © 2014 DataStax, All Rights Reserved. Company Confidential 4 5 6
  • 107. Connect and Write Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .build(); Session session = cluster.connect("akeyspace"); session.execute( "INSERT INTO user (username, password) ” + "VALUES(‘johnny’, ‘password1234’)" ); Note: Clusters and Sessions should be long-lived and re-used. © 2014 DataStax, All Rights Reserved. Company Confidential 107
  • 108. Read from a table ResultSet rs = session.execute("SELECT * FROM user"); List<Row> rows = rs.all(); for (Row row : rows) { String userName = row.getString("username"); String password = row.getString("password"); } © 2014 DataStax, All Rights Reserved. Company Confidential 108
  • 109. Paging This is great! Historically difficult to get huge result sets out of Cassandra. It has generally been necessary to explicitly enumerate your row keys in reasonably small batches (1000 rows or so per batch would be common). This feature now allows you to get huge result sets (including “select * from table), and have the server automatically page the results, while the client is just able to trivially iterate over the entire result set. Makes data exploration much easier. © 2014 DataStax, All Rights Reserved. Company Confidential 109
  • 110. Asynchronous Requests ResultSetFuture future = session.executeAsync("SELECT * FROM user"); for (Row row : future.get()) { String userName = row.getString("username"); String password = row.getString("password"); } Note: The future returned implements Guava's ListenableFuture interface. This means you can use all Guava's Futures1 methods! 1http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/ Futures.html © 2014 DataStax, All Rights Reserved. Company Confidential 110
  • 111. Read with Callbacks final ResultSetFuture future = session.executeAsync("SELECT * FROM user"); Futures.addCallback(future, new FutureCallback<ResultSet>() { @Override public void onFailure(Throwable t) { log.error("Problem encountered“, t); //Do something... } @Override public void onSuccess(ResultSet r) { log.debug("Success!!); //Do something... } } ); © 2014 DataStax, All Rights Reserved. Company Confidential 111
  • 112. Parallelize Calls int queryCount = 99; List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>(); for (int i=0; i<queryCount; i++) { futures.add( session.executeAsync("SELECT * FROM user " +"WHERE username = '"+i+"'")); } for(ResultSetFuture future : futures) { for (Row row : future.getUninterruptibly()) { //do something } } © 2014 DataStax, All Rights Reserved. Company Confidential 112
  • 113. Tip • If you need to do a lot of work, it’s often better to make many small queries concurrently than to make one big query. – executeAsync and Futures – makes this really easy! – Big queries can put a high load on one coordinator – Big queries can skew your 99th percentile latencies for other queries – If one small query fails you can easily retry, if a big query than you have to retry the whole thing © 2014 DataStax, All Rights Reserved. Company Confidential 113
  • 114. Prepared Statements PreparedStatement statement = session.prepare( "INSERT INTO user (username, password) " + "VALUES (?, ?)"); BoundStatement bs = statement.bind(); bs.setString("username", "johnny"); bs.setString("password", "password1234"); session.execute(bs); © 2014 DataStax, All Rights Reserved. Company Confidential 114
  • 115. Query Builder Query query = QueryBuilder .select() .all() .from("akeyspace", "user") .where(eq("username", "johnny")); query.setConsistencyLevel(ConsistencyLevel.ONE); ResultSet rs = session.execute(query); © 2014 DataStax, All Rights Reserved. Company Confidential 115
  • 116. Multi Data Center Load Balancing Local nodes are queried first, if none are available the request will be sent to a remote data center Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .withLoadBalancingPolicy( new DCAwareRoundRobinPolicy("DC1")) .build(); Name of the local DC © 2014 DataStax, All Rights Reserved. Company Confidential 116
  • 117. Token Aware Load Balancing Nodes that own a replica of the data being read or written by the query will be contacted first Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .withLoadBalancingPolicy( new TokenAwarePolicy( new DCAwareRoundRobinPolicy("DC1"))) .build(); © 2014 DataStax, All Rights Reserved. Company Confidential 117
  • 118. Retry Policies The behavior to adopt when a request returns a timeout or is unavailable. Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE) .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy("DC1"))) .build(); • DefaultRetryPolicy • DowngradingConsistencyRetryPolicy • FallthroughRetryPolicy • LoggingRetryPolicy © 2014 DataStax, All Rights Reserved. Company Confidential 118
  • 119. Reconnection Policies Policy that decides how often the reconnection to a dead node is attempted. Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44”) .withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE) .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy("DC1"))) .build(); • ConstantReconnectionPolicy • ExponentialReconnectionPolicy (Default) © 2014 DataStax, All Rights Reserved. Company Confidential 119
  • 120. Query Tracing Tracing is enabled on a per-query basis. Query query = QueryBuilder .select() .all() .from("akeyspace", "user") .where(eq("username", "johnny")) .enableTracing(); ResultSet rs = session.execute(query); ExecutionInfo executionInfo = rs.getExecutionInfo(); QueryTrace queryTrace = executionInfo.getQueryTrace(); © 2014 DataStax, All Rights Reserved. Company Confidential 120
  • 121. Query Tracing Connected to cluster: xerxes Simplex keyspace and schema created. Host (queried): /127.0.0.1 Host (tried): /127.0.0.1 Trace id: 96ac9400-­‐a3a5-­‐11e2-­‐96a9-­‐4db56cdc5fe7 activity | timestamp | source | source_elapsed -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ Parsing statement | 12:17:16.736 | /127.0.0.1 | 28 Peparing statement | 12:17:16.736 | /127.0.0.1 | 199 Determining replicas for mutation | 12:17:16.736 | /127.0.0.1 | 348 Sending message to /127.0.0.3 | 12:17:16.736 | /127.0.0.1 | 788 Sending message to /127.0.0.2 | 12:17:16.736 | /127.0.0.1 | 805 Acquiring switchLock read lock | 12:17:16.736 | /127.0.0.1 | 828 Appending to commitlog | 12:17:16.736 | /127.0.0.1 | 848 Adding to songs memtable | 12:17:16.736 | /127.0.0.1 | 900 Message received from /127.0.0.1 | 12:17:16.737 | /127.0.0.2 | 34 Message received from /127.0.0.1 | 12:17:16.737 | /127.0.0.3 | 25 Acquiring switchLock read lock | 12:17:16.737 | /127.0.0.2 | 672 Acquiring switchLock read lock | 12:17:16.737 | /127.0.0.3 | 525 Appending to commitlog | 12:17:16.737 | /127.0.0.2 | 692 Appending to commitlog | 12:17:16.737 | /127.0.0.3 | 541 Adding to songs memtable | 12:17:16.737 | /127.0.0.2 | 741 Adding to songs memtable | 12:17:16.737 | /127.0.0.3 | 583 Enqueuing response to /127.0.0.1 | 12:17:16.737 | /127.0.0.3 | 751 Enqueuing response to /127.0.0.1 | 12:17:16.738 | /127.0.0.2 | 950 Message received from /127.0.0.3 | 12:17:16.738 | /127.0.0.1 | 178 Sending message to /127.0.0.1 | 12:17:16.738 | /127.0.0.2 | 1189 Message received from /127.0.0.2 | 12:17:16.738 | /127.0.0.1 | 249 Processing response from /127.0.0.3 | 12:17:16.738 | /127.0.0.1 | 345 Processing response from /127.0.0.2 | 12:17:16.738 | /127.0.0.1 | 377
  • 122. User Defined Types - Retrieval Row row = session.execute( "SELECT * FROM user_profiles").one(); UDTValue address = row.getUDTValue("address"); // You get a map-­‐like object with the usual // getters and setters (by name, by index): String street = address.getString("street"); int zip = address.getInt(2); USE ks; CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE user_profiles ( email text PRIMARY KEY, address address ); © 2014 DataStax, All Rights Reserved. Company Confidential 122
  • 123. User Defined Types - Creation // Must first get hold of the datatype: // From an existing value: UserType addressType = address.getType(); // OR... // From the cluster metadata: UserType addressType = cluster.getMetadata() .getKeyspace("ks") .getUserType("address"); USE ks; CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE user_profiles ( email text PRIMARY KEY, address address ); © 2014 DataStax, All Rights Reserved. Company Confidential 123
  • 124. User Defined Types - Creation UDTValue address = addressType.newValue() .setString("street", "...") .setString("city", "Washington") .setInt("zip", 20500); // Now use it in a query like any other type: session.execute( "INSERT INTO user_profiles (email, address) VALUES (?, ?)", "xyz@example.com", address); USE ks; CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE user_profiles ( email text PRIMARY KEY, address address ); © 2014 DataStax, All Rights Reserved. Company Confidential 124
  • 125. Tuples - retrieval Row row = session.execute( "SELECT * FROM points_of_interest").one(); TupleValue coordinates = row.getTupleValue("coordinates"); float latitude = coordinates.getFloat(0); float longitude = coordinates.getFloat(1); USE ks; CREATE TABLE points_of_interest ( id int PRIMARY KEY, name text, coordinates tuple<float, float> ); © 2014 DataStax, All Rights Reserved. Company Confidential 125
  • 126. Tuples - creation TupleType coordinatesType = TupleType.of(DataType.cfloat(), DataType.cfloat()); TupleValue newCoordinates = coordinatesType.newValue(48.858222F, 2.2945F); USE ks; CREATE TABLE points_of_interest ( id int PRIMARY KEY, name text, coordinates tuple<float, float> ); © 2014 DataStax, All Rights Reserved. Company Confidential 126
  • 127. Simple Object Mapper Query results ↔ Java objects – Avoid boilerplate for common use cases (CRUD, etc.) – Keep it simple and close to the bare metal • Do not hide Cassandra from the developer • Avoid "clever tricks" à la Hibernate It comes as an additional Maven artifact: <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-­‐driver-­‐mapping</artifactId> <version>2.1.0</version> </dependency> © 2014 DataStax, All Rights Reserved. Company Confidential 127
  • 128. Object Mapper Basic CRUD @UDT(keyspace = "ks", name = "address") public class Address { private String street; private String city; private int zip; // getters and setters omitted... } @Table(keyspace = "ks", name = "user_profiles") public class UserProfile { @PartitionKey private String email; private Address address; // getters and setters omitted... } USE ks; CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE user_profiles ( email text PRIMARY KEY, address address ); © 2014 DataStax, All Rights Reserved. Company Confidential 128
  • 129. Object Mapper Basic CRUD MappingManager manager = new MappingManager(session); Mapper mapper = manager.mapper(UserProfile.class); UserProfile myProfile = mapper.get("xyz@example.com"); ListenableFuture saveFuture = mapper.saveAsync(anotherProfile); mapper.delete("xyz@example.com"); USE ks; CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE user_profiles ( email text PRIMARY KEY, address address ); © 2014 DataStax, All Rights Reserved. Company Confidential 129
  • 130. Accessor custom queries @Accessor interface ProfileAccessor { @Query("SELECT * FROM user_profiles LIMIT :max") Result firstN(@Param("max") int limit); } ProfileAccessor accessor = manager.createAccessor(ProfileAccessor.class); Result profiles = accessor.firstN(10); // Result is like ResultSet, but specialized for a // mapped class: for (UserProfile profile : profiles) { System.out.println( profile.getAddress().getZip() ); } USE ks; CREATE TYPE address ( street text, city text, zip int ); CREATE TABLE user_profiles ( email text PRIMARY KEY, address address ); © 2014 DataStax, All Rights Reserved. Company Confidential 130
  • 131. Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 © 2014 DataStax, All Rights Reserved. Company Confidential 131