SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Event Sourcing with Cassandra
Luke Tillman
Technical Evangelist
@LukeTillman
• Evangelist with a
focus on
Developers
– Long-time
Developer on
RDBMS (lots of
.NET)
• I still write a lot of
code, but now I also
do a lot of teaching
and speaking
Who are you?
2
A Quick Recap of Event Sourcing
3
Persistence with Event Sourcing
• Instead of keeping the
current state, keep a journal
of all the deltas (events)
• Append only (no UPDATE or
DELETE)
• We can replay our journal of
events to get the current
state
4
Shopping Cart (id = 1345)
user_id= 4762
created_on= 7/10/2015…
Cart Created
item_id= 7621
quantity= 1
price= 19.99
Item Added
item_id= 9134
quantity= 2
price= 16.99
Item Added
Item Removed item_id= 7621
Qty Changed
item_id= 9134
quantity= 1
Event Sourcing in Practice
• Typically two kinds of storage:
– Event Journal Store
– Snapshot Store
• A history of how we got to the
current state can be useful
• We've also got a lot more data
to store than we did before
5
Shopping Cart (id = 1345)
user_id= 4762
created_on= 7/10/2015…
Cart Created
item_id= 7621
quantity= 1
price= 19.99
Item Added
item_id= 9134
quantity= 2
price= 16.99
Item Added
Item Removed item_id= 7621
Qty Changed
item_id= 9134
quantity= 1
Why use Cassandra for Event Sourcing?
• Transactional (OLTP) Workload
• Sequentially written, immutable data
– Looks a lot like time series data
• Easy to scale out to capture more events
6
Event Sourcing Example: Akka Persistence
7
Akka Persistence Journal API Summary
• Write Method
– For a given actor, write a group
of messages
• Delete Method
– For a given actor, permanently
or logically delete all messages
up to a given sequence number
• Read Methods
– For a given actor, read back all
the messages between two
sequence numbers
– For a given actor, read the
highest sequence number that's
been written
8
An Event Journal in Cassandra
Data Modeling for Reads and Writes
9
A Simple First Attempt
• Use persistence_id as partition key
– all messages for a given persistence Id
together
• Use sequence_number as clustering
column
– order messages by sequence number
inside a partition
• Read all messages between two
sequence numbers
• Read the highest sequence number
10
CREATE TABLE messages (
persistence_id text,
sequence_number bigint,
message blob,
PRIMARY KEY (
persistence_id, sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND sequence_number >= ?
AND sequence_number <= ?;
SELECT sequence_number FROM messages
WHERE persistence_id = ?
ORDER BY sequence_number DESC LIMIT 1;
A Simple First Attempt
• Write a group of messages
• Use a Cassandra Batch statement to
ensure all messages (success) or no
messages (failure) get written
• What's the problem with this data
model (ignoring implementing deletes
for now)?
11
CREATE TABLE messages (
persistence_id text,
sequence_number bigint,
message blob,
PRIMARY KEY (
persistence_id, sequence_number)
);
BEGIN BATCH
INSERT INTO messages ... ;
INSERT INTO messages ... ;
INSERT INTO messages ... ;
APPLY BATCH;
Unbounded Partition Growth
• Cassandra has a hard limit of 2
billion cells in a partition
• But there's also a practical limit
– Depends on row/cell data size, but
likely not more than millions of rows
12
Journal
INSERT INTO messages ...
persistence_id=
'57ab...'
seq_nr=
1
seq_nr=
2
message=
0x00...
message=
0x00...
∞?
Fixing the Unbounded Partition Growth Problem
• General strategy: add a column to
the partition key
– Compound partition key
• Can be data that's already part of
the model, or a "synthetic" column
• Allow users to configure a partition
size in the plugin
– Partition Size = number of rows per
partition
– This should not be changeable once
messages have been written
• Partition number for a given
sequence number is then easy to
calculate
– (seqNr – 1) / partitionSize
(100 – 1) / 100 = partition 0
(101 – 1) / 100 = partition 1
13
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
Fixing the Unbounded Partition Growth Problem
• Read all messages between two
sequence numbers
• Read the highest sequence number
14
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
SELECT sequence_number FROM messages
WHERE persistence_id = ?
AND partition_number = ?
ORDER BY sequence_number DESC LIMIT 1;
(repeat until we reach sequence number or run out of partitions)
(repeat until we run out of partitions)
Fixing the Unbounded Partition Growth Problem
• Write a group of messages
• A Cassandra Batch statement
might now write to multiple
partitions (if the sequence numbers
cross a partition boundary)
• Is that a problem?
15
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
BEGIN BATCH
INSERT INTO messages ... ;
INSERT INTO messages ... ;
INSERT INTO messages ... ;
APPLY BATCH;
RTFM: Cassandra Batches Edition
16
"Batches are atomic by default. In the context of a Cassandra batch
operation, atomic means that if any of the batch succeeds, all of it will."
- DataStax CQL Docs
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html
"Although an atomic batch guarantees that if any part of the batch succeeds,
all of it will, no other transactional enforcement is done at the batch level.
For example, there is no batch isolation. Clients are able to read the first
updated rows from the batch, while other rows are still being updated on the
server."
- DataStax CQL Docs
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html
Atomic? That's kind of a loaded word.
Multiple Partition Batch Failure Scenario
17
Journal
RF = 3
Multiple Partition Batch Failure Scenario
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
17
Journal
BEGIN BATCH
...
APPLY BATCH;
Batch
Log
Batch
Log
Batch
Log
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
• Batch has been
partially applied
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
• Batch has been
partially applied
• Possible to read a
partially applied batch
since there is no batch
isolation
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
WriteTimeout
- writeType = BATCH
RTFM: Cassandra Batches Edition Part 2
24
"For example, there is no batch isolation. Clients are able to read the first
updated rows from the batch, while other rows are still being updated on the
server. However, transactional row updates within a partition key are
isolated: clients cannot read a partial update."
- DataStax CQL Docs
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html
What we really need is Isolation.
When writing a group of messages, ensure that
we write the group to a single partition.
Logic Changes to Ensure Batch Isolation
• Still use configurable Partition Size
– not a "hard limit" but a "best attempt"
• On write, see if messages will all fit in the
current partition
• If not, roll over to the next partition early
• Reading is slightly more complicated
– For a given sequence number it might be in
partition n or (n+1)
25
seq_nr = 97
seq_nr = 98
seq_nr = 1
99
100
101
partition_nr = 1
partition_nr = 2
PartitionSize=100
Accounting for Deletes
26
Option 1: Mark Individual Messages as Deleted
• Add an is_deleted column
to our messages table
• Read all messages between
two sequence numbers
27
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
is_deleted bool,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
... sequence_number message is_deleted
... 1 0x00 true
... 2 0x00 true
... 3 0x00 false
... 4 0x00 false
Option 1: Mark Individual Messages as Deleted
• Pros:
– On replay, easy to check if a
message has been deleted (comes
included in message query's data)
• Cons:
– Messages not immutable any
more
– Issue lots of UPDATEs to mark
each message as deleted
– Have to scan through a lot of rows
to find max deleted sequence
number if we want to avoid
issuing unnecessary UPDATEs
28
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
is_deleted bool,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
Option 2: Write a Marker Row for Each Deleted Row
• Add a marker column and
make it a clustering column
– Messages written with 'A'
– Deletes get written with 'D'
• Read all messages between
two sequence numbers
29
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
... sequence_number marker message
... 1 A 0x00
... 1 D null
... 2 A 0x00
... 3 A 0x00
Option 2: Write a Marker Row for Each Deleted Row
• Pros
– On replay, easy to peek at next
row to check if deleted (comes
included in message query's data)
– Message data stays immutable
• Cons
– Issue lots of INSERTs to mark
each message as deleted
– Have to scan through a lot of rows
to find max deleted sequence
number if we want to avoid
issuing unnecessary INSERTs
– Potentially twice as many rows to
store
30
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
Looking at Physical Deletes
• Physically delete messages to a
given sequence number
• Still probably want to scan
through rows to see what's
already been deleted first
31
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
BEGIN BATCH
DELETE FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND marker = 'A'
AND sequence_number = ?;
...
APPLY BATCH;
• Can't range delete, so we have
to do lots of individual
DELETEs
Looking at Physical Deletes
• Read all messages between
two sequence numbers
• With how DELETEs work in
Cassandra, is there a potential
problem with this query?
32
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
Tombstone Hell: Queue-like Data Sets
33
Journal persistence_id
'57ab...'
partition_nr
1
message=
0x00...
seq_nr=1
marker='A'
...
message=
0x00...
seq_nr=2
marker='A'
Tombstone Hell: Queue-like Data Sets
33
Journal persistence_id
'57ab...'
partition_nr
1
message=
0x00...
seq_nr=1
marker='A'
...
Delete messages to a sequence number
BEGIN BATCH
DELETE FROM messages
WHERE persistence_id = '57ab...'
AND partition_nr = 1
AND marker = 'A'
AND sequence_nr = 1;
...
APPLY BATCH;
message=
0x00...
seq_nr=2
marker='A'
Tombstone Hell: Queue-like Data Sets
33
Journal persistence_id
'57ab...'
partition_nr
1
message=
0x00...
seq_nr=1
marker='A'
seq_nr=1
marker='A'
Tombstone
NO DATA HERE
...
Delete messages to a sequence number
BEGIN BATCH
DELETE FROM messages
WHERE persistence_id = '57ab...'
AND partition_nr = 1
AND marker = 'A'
AND sequence_nr = 1;
...
APPLY BATCH;
message=
0x00...
seq_nr=2
marker='A'
seq_nr=2
marker='A'
Tombstone
NO DATA HERE
Tombstone Hell: Queue-like Data Sets
• At some point compaction runs and we
don't have two versions any more, but
tombstones don't go away immediately
– Tombstones remain for gc_grace_seconds
– Default is 10 days
33
Journal persistence_id
'57ab...'
partition_nr
1
seq_nr=1
marker='A'
Tombstone
NO DATA HERE
...
seq_nr=2
marker='A'
Tombstone
NO DATA HERE
Tombstone Hell: Queue-like Data Sets
37
Journal persistence_id
'57ab...'
partition_nr
1
seq_nr=1
marker='A'
Tombstone
NO DATA HERE
...
Read all messages between 2 sequence numbers
SELECT * FROM messages
WHERE persistence_id = '57ab...'
AND partition_number = 1
AND sequence_number >= 1
AND sequence_number <= [max value];
seq_nr=2
marker='A'
Tombstone
NO DATA HERE
seq_nr=3
marker='A'
Tombstone
NO DATA HERE
seq_nr=4
marker='A'
Tombstone
NO DATA HERE
Avoid Tombstone Hell
38
We need a way to avoid reading
tombstones when replaying messages.
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
AND sequence_number >= ?
If we know what sequence number we've already deleted to
before we query, we could make that lower bound smarter.
A Third Option for Deletes
• Use marker as a clustering
column, but change the
clustering order
– Messages still 'A', Deletes 'D'
• Read all messages between
two sequence numbers
39
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
marker text,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
marker, sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND marker = 'A'
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
... sequence_number marker message
... 1 A 0x00
... 2 A 0x00
... 3 A 0x00
A Third Option for Deletes
• Messages data no longer has
deleted information, so how do we
know what's already been deleted?
• Get max deleted sequence number
• Can avoid tombstones if done
before getting message data
40
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
marker text,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
marker, sequence_number)
);
SELECT sequence_number FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND marker = 'D'
ORDER BY marker DESC,
sequence_number DESC
LIMIT 1;
A Third Option for Deletes
• Pros
– Message data stays immutable
– Issue a single INSERT when
deleting to a sequence number
– Read a single row to find out
what's been deleted (no more
scanning)
– Can avoid reading tombstones
created by physical deletes
• Cons
– Requires a separate query to find
out what's been deleted before
getting message data
41
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
marker text,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
marker, sequence_number)
);
Lessons Learned
42
Summary
• Seemingly simple data models can
get a lot more complicated
• Avoid unbounded partition growth
– Add data to your partition key
• Be aware of how Cassandra Logged Batches work
– If you need isolation, only write to a single partition
• Avoid queue-like data sets and be aware of how tombstones might
impact your queries
– Try to query with ranges that avoid tombstones
43
Questions?
@LukeTillman
https://www.linkedin.com/in/luketillman/
https://github.com/LukeTillman/
44

Mais conteúdo relacionado

Mais procurados

Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaInfluxData
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OGeorge Cao
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Patternconfluent
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialTim Vaillancourt
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 

Mais procurados (20)

Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/O
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 

Destaque

Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraLuke Tillman
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandraAxel Liljencrantz
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraLuke Tillman
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with CassandraLuke Tillman
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraLuke Tillman
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraLuke Tillman
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Luke Tillman
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
 

Destaque (10)

Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
 

Semelhante a Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)

Lecture 2 coal sping12
Lecture 2 coal sping12Lecture 2 coal sping12
Lecture 2 coal sping12Rabia Khalid
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedis Labs
 
Week 6 java script loops
Week 6   java script loopsWeek 6   java script loops
Week 6 java script loopsbrianjihoonlee
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0MariaDB plc
 
RICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the mazeRICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the mazepalvaro
 
Lec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsLec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsYashJain47002
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patternsTomasz Kowal
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon RedshiftJeff Patti
 
Queuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL ServerQueuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL ServerNiels Berglund
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript TipsLotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript TipsBill Buchan
 
Tech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingTech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingScyllaDB
 
How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012Connor McDonald
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0MariaDB plc
 

Semelhante a Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016) (20)

Lecture 2 coal sping12
Lecture 2 coal sping12Lecture 2 coal sping12
Lecture 2 coal sping12
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
 
Week 6 java script loops
Week 6   java script loopsWeek 6   java script loops
Week 6 java script loops
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 
RICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the mazeRICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the maze
 
Lec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsLec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questions
 
SQL Server 2012 Best Practices
SQL Server 2012 Best PracticesSQL Server 2012 Best Practices
SQL Server 2012 Best Practices
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Queuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL ServerQueuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL Server
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript TipsLotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
Tech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingTech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data Modeling
 
How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012
 
Data race
Data raceData race
Data race
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 

Último

Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 

Último (20)

Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 

Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)

  • 1. Event Sourcing with Cassandra Luke Tillman Technical Evangelist @LukeTillman
  • 2. • Evangelist with a focus on Developers – Long-time Developer on RDBMS (lots of .NET) • I still write a lot of code, but now I also do a lot of teaching and speaking Who are you? 2
  • 3. A Quick Recap of Event Sourcing 3
  • 4. Persistence with Event Sourcing • Instead of keeping the current state, keep a journal of all the deltas (events) • Append only (no UPDATE or DELETE) • We can replay our journal of events to get the current state 4 Shopping Cart (id = 1345) user_id= 4762 created_on= 7/10/2015… Cart Created item_id= 7621 quantity= 1 price= 19.99 Item Added item_id= 9134 quantity= 2 price= 16.99 Item Added Item Removed item_id= 7621 Qty Changed item_id= 9134 quantity= 1
  • 5. Event Sourcing in Practice • Typically two kinds of storage: – Event Journal Store – Snapshot Store • A history of how we got to the current state can be useful • We've also got a lot more data to store than we did before 5 Shopping Cart (id = 1345) user_id= 4762 created_on= 7/10/2015… Cart Created item_id= 7621 quantity= 1 price= 19.99 Item Added item_id= 9134 quantity= 2 price= 16.99 Item Added Item Removed item_id= 7621 Qty Changed item_id= 9134 quantity= 1
  • 6. Why use Cassandra for Event Sourcing? • Transactional (OLTP) Workload • Sequentially written, immutable data – Looks a lot like time series data • Easy to scale out to capture more events 6
  • 7. Event Sourcing Example: Akka Persistence 7
  • 8. Akka Persistence Journal API Summary • Write Method – For a given actor, write a group of messages • Delete Method – For a given actor, permanently or logically delete all messages up to a given sequence number • Read Methods – For a given actor, read back all the messages between two sequence numbers – For a given actor, read the highest sequence number that's been written 8
  • 9. An Event Journal in Cassandra Data Modeling for Reads and Writes 9
  • 10. A Simple First Attempt • Use persistence_id as partition key – all messages for a given persistence Id together • Use sequence_number as clustering column – order messages by sequence number inside a partition • Read all messages between two sequence numbers • Read the highest sequence number 10 CREATE TABLE messages ( persistence_id text, sequence_number bigint, message blob, PRIMARY KEY ( persistence_id, sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND sequence_number >= ? AND sequence_number <= ?; SELECT sequence_number FROM messages WHERE persistence_id = ? ORDER BY sequence_number DESC LIMIT 1;
  • 11. A Simple First Attempt • Write a group of messages • Use a Cassandra Batch statement to ensure all messages (success) or no messages (failure) get written • What's the problem with this data model (ignoring implementing deletes for now)? 11 CREATE TABLE messages ( persistence_id text, sequence_number bigint, message blob, PRIMARY KEY ( persistence_id, sequence_number) ); BEGIN BATCH INSERT INTO messages ... ; INSERT INTO messages ... ; INSERT INTO messages ... ; APPLY BATCH;
  • 12. Unbounded Partition Growth • Cassandra has a hard limit of 2 billion cells in a partition • But there's also a practical limit – Depends on row/cell data size, but likely not more than millions of rows 12 Journal INSERT INTO messages ... persistence_id= '57ab...' seq_nr= 1 seq_nr= 2 message= 0x00... message= 0x00... ∞?
  • 13. Fixing the Unbounded Partition Growth Problem • General strategy: add a column to the partition key – Compound partition key • Can be data that's already part of the model, or a "synthetic" column • Allow users to configure a partition size in the plugin – Partition Size = number of rows per partition – This should not be changeable once messages have been written • Partition number for a given sequence number is then easy to calculate – (seqNr – 1) / partitionSize (100 – 1) / 100 = partition 0 (101 – 1) / 100 = partition 1 13 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) );
  • 14. Fixing the Unbounded Partition Growth Problem • Read all messages between two sequence numbers • Read the highest sequence number 14 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; SELECT sequence_number FROM messages WHERE persistence_id = ? AND partition_number = ? ORDER BY sequence_number DESC LIMIT 1; (repeat until we reach sequence number or run out of partitions) (repeat until we run out of partitions)
  • 15. Fixing the Unbounded Partition Growth Problem • Write a group of messages • A Cassandra Batch statement might now write to multiple partitions (if the sequence numbers cross a partition boundary) • Is that a problem? 15 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) ); BEGIN BATCH INSERT INTO messages ... ; INSERT INTO messages ... ; INSERT INTO messages ... ; APPLY BATCH;
  • 16. RTFM: Cassandra Batches Edition 16 "Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will." - DataStax CQL Docs http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html "Although an atomic batch guarantees that if any part of the batch succeeds, all of it will, no other transactional enforcement is done at the batch level. For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server." - DataStax CQL Docs http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html Atomic? That's kind of a loaded word.
  • 17. Multiple Partition Batch Failure Scenario 17 Journal RF = 3
  • 18. Multiple Partition Batch Failure Scenario 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 19. Multiple Partition Batch Failure Scenario 17 Journal BEGIN BATCH ... APPLY BATCH; Batch Log Batch Log Batch Log CL = QUORUM RF = 3
  • 20. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 21. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 22. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) • Batch has been partially applied 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 23. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) • Batch has been partially applied • Possible to read a partially applied batch since there is no batch isolation 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3 WriteTimeout - writeType = BATCH
  • 24. RTFM: Cassandra Batches Edition Part 2 24 "For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. However, transactional row updates within a partition key are isolated: clients cannot read a partial update." - DataStax CQL Docs http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html What we really need is Isolation. When writing a group of messages, ensure that we write the group to a single partition.
  • 25. Logic Changes to Ensure Batch Isolation • Still use configurable Partition Size – not a "hard limit" but a "best attempt" • On write, see if messages will all fit in the current partition • If not, roll over to the next partition early • Reading is slightly more complicated – For a given sequence number it might be in partition n or (n+1) 25 seq_nr = 97 seq_nr = 98 seq_nr = 1 99 100 101 partition_nr = 1 partition_nr = 2 PartitionSize=100
  • 27. Option 1: Mark Individual Messages as Deleted • Add an is_deleted column to our messages table • Read all messages between two sequence numbers 27 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, is_deleted bool, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions) ... sequence_number message is_deleted ... 1 0x00 true ... 2 0x00 true ... 3 0x00 false ... 4 0x00 false
  • 28. Option 1: Mark Individual Messages as Deleted • Pros: – On replay, easy to check if a message has been deleted (comes included in message query's data) • Cons: – Messages not immutable any more – Issue lots of UPDATEs to mark each message as deleted – Have to scan through a lot of rows to find max deleted sequence number if we want to avoid issuing unnecessary UPDATEs 28 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, is_deleted bool, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) );
  • 29. Option 2: Write a Marker Row for Each Deleted Row • Add a marker column and make it a clustering column – Messages written with 'A' – Deletes get written with 'D' • Read all messages between two sequence numbers 29 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions) ... sequence_number marker message ... 1 A 0x00 ... 1 D null ... 2 A 0x00 ... 3 A 0x00
  • 30. Option 2: Write a Marker Row for Each Deleted Row • Pros – On replay, easy to peek at next row to check if deleted (comes included in message query's data) – Message data stays immutable • Cons – Issue lots of INSERTs to mark each message as deleted – Have to scan through a lot of rows to find max deleted sequence number if we want to avoid issuing unnecessary INSERTs – Potentially twice as many rows to store 30 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) );
  • 31. Looking at Physical Deletes • Physically delete messages to a given sequence number • Still probably want to scan through rows to see what's already been deleted first 31 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) ); BEGIN BATCH DELETE FROM messages WHERE persistence_id = ? AND partition_number = ? AND marker = 'A' AND sequence_number = ?; ... APPLY BATCH; • Can't range delete, so we have to do lots of individual DELETEs
  • 32. Looking at Physical Deletes • Read all messages between two sequence numbers • With how DELETEs work in Cassandra, is there a potential problem with this query? 32 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions)
  • 33. Tombstone Hell: Queue-like Data Sets 33 Journal persistence_id '57ab...' partition_nr 1 message= 0x00... seq_nr=1 marker='A' ... message= 0x00... seq_nr=2 marker='A'
  • 34. Tombstone Hell: Queue-like Data Sets 33 Journal persistence_id '57ab...' partition_nr 1 message= 0x00... seq_nr=1 marker='A' ... Delete messages to a sequence number BEGIN BATCH DELETE FROM messages WHERE persistence_id = '57ab...' AND partition_nr = 1 AND marker = 'A' AND sequence_nr = 1; ... APPLY BATCH; message= 0x00... seq_nr=2 marker='A'
  • 35. Tombstone Hell: Queue-like Data Sets 33 Journal persistence_id '57ab...' partition_nr 1 message= 0x00... seq_nr=1 marker='A' seq_nr=1 marker='A' Tombstone NO DATA HERE ... Delete messages to a sequence number BEGIN BATCH DELETE FROM messages WHERE persistence_id = '57ab...' AND partition_nr = 1 AND marker = 'A' AND sequence_nr = 1; ... APPLY BATCH; message= 0x00... seq_nr=2 marker='A' seq_nr=2 marker='A' Tombstone NO DATA HERE
  • 36. Tombstone Hell: Queue-like Data Sets • At some point compaction runs and we don't have two versions any more, but tombstones don't go away immediately – Tombstones remain for gc_grace_seconds – Default is 10 days 33 Journal persistence_id '57ab...' partition_nr 1 seq_nr=1 marker='A' Tombstone NO DATA HERE ... seq_nr=2 marker='A' Tombstone NO DATA HERE
  • 37. Tombstone Hell: Queue-like Data Sets 37 Journal persistence_id '57ab...' partition_nr 1 seq_nr=1 marker='A' Tombstone NO DATA HERE ... Read all messages between 2 sequence numbers SELECT * FROM messages WHERE persistence_id = '57ab...' AND partition_number = 1 AND sequence_number >= 1 AND sequence_number <= [max value]; seq_nr=2 marker='A' Tombstone NO DATA HERE seq_nr=3 marker='A' Tombstone NO DATA HERE seq_nr=4 marker='A' Tombstone NO DATA HERE
  • 38. Avoid Tombstone Hell 38 We need a way to avoid reading tombstones when replaying messages. SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; AND sequence_number >= ? If we know what sequence number we've already deleted to before we query, we could make that lower bound smarter.
  • 39. A Third Option for Deletes • Use marker as a clustering column, but change the clustering order – Messages still 'A', Deletes 'D' • Read all messages between two sequence numbers 39 CREATE TABLE messages ( persistence_id text, partition_number bigint, marker text, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), marker, sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND marker = 'A' AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions) ... sequence_number marker message ... 1 A 0x00 ... 2 A 0x00 ... 3 A 0x00
  • 40. A Third Option for Deletes • Messages data no longer has deleted information, so how do we know what's already been deleted? • Get max deleted sequence number • Can avoid tombstones if done before getting message data 40 CREATE TABLE messages ( persistence_id text, partition_number bigint, marker text, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), marker, sequence_number) ); SELECT sequence_number FROM messages WHERE persistence_id = ? AND partition_number = ? AND marker = 'D' ORDER BY marker DESC, sequence_number DESC LIMIT 1;
  • 41. A Third Option for Deletes • Pros – Message data stays immutable – Issue a single INSERT when deleting to a sequence number – Read a single row to find out what's been deleted (no more scanning) – Can avoid reading tombstones created by physical deletes • Cons – Requires a separate query to find out what's been deleted before getting message data 41 CREATE TABLE messages ( persistence_id text, partition_number bigint, marker text, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), marker, sequence_number) );
  • 43. Summary • Seemingly simple data models can get a lot more complicated • Avoid unbounded partition growth – Add data to your partition key • Be aware of how Cassandra Logged Batches work – If you need isolation, only write to a single partition • Avoid queue-like data sets and be aware of how tombstones might impact your queries – Try to query with ranges that avoid tombstones 43