SlideShare uma empresa Scribd logo
1 de 63
Baixar para ler offline
C* Path:
Denormalize your data
Eric Zoerner | Software Developer, eBuddy BV

#CASSANDRAEU

Cassandra Summit Europe 2013
London
CASSANDRASUMMITEU
About eBuddy
#CASSANDRAEU

CASSANDRASUMMITEU
XMS

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra in

eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
• Location-based Discovery

#CASSANDRAEU

CASSANDRASUMMITEU
Some Statistics
• Current size of data
– 1,4 TB total (replication of 3x); 467 GB actual data

!
• 12 million sessions (11 million users plus groups)

!
• Almost a billion rows in one column family

(inverse social graph)

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path
#CASSANDRAEU

CASSANDRASUMMITEU
The Problem (a “classic”)
Key-Value Store
(RDB table, NoSQL, etc.)

Complex Object

Person
name: String
birthdate: Date
nickname: String

*

1

Address
street: String
city: String
province: String
postalCode: String
countryCode: String

?

?

?

?

1

?

?

?

?

*
Phone
name: String
number: String

#CASSANDRAEU

CASSANDRASUMMITEU
Some Strategies

Serialization!

#CASSANDRAEU

CASSANDRASUMMITEU
Serialization!

Some Strategies
Person
id

birthdate

nickname

110

John

1985-04-06

Jack

111

Mary

1979-11-30

Mary

person_id

address_id

street

city

110

001

123 Main St

New York

110

002

456 Singel

Amsterdam

111

Normalization!

name

003

78 Hoofd Str

London

Address

Phone
person_id

phone

110

mobile

+15551234

111

home

+44884800

111

#CASSANDRAEU

name

mobile

+44030393

CASSANDRASUMMITEU
Some Strategies

Serialization!

Person
id

birthdate

nickname

110

John

1985-04-06

Jack

111

Mary

1979-11-30

Mary

person_id

address_id

street

city

110

Normalization!

name

001

123 Main St

New York

110

002

456 Singel

Amsterdam

111

003

78 Hoofd Str

London

Address

Decomposition!
name/

John

addresses/@0/street

123 Main St.

phones/@0/number

+31123456789

...

...

Phone
name

phone

110

mobile

+15551234

111

home

+44884800

111

#CASSANDRAEU

person_id

mobile

+44030393

CASSANDRASUMMITEU
Strategies Comparison
Serialization
Single Write
Single Read
Consistent Updates
Structural Access
Cycles

#CASSANDRAEU

Normalization

Decomposition

✔
✔
✔
✘
✔

✘
✘
✔
✔
✔

✔
✔
not enforced

✔
✘

CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing
complex objects into Path-Value pairs —
and storing them in Cassandra
https://github.com/

ebuddy/c-star-path

!
!

*

Artifacts available at Maven Central.

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
– Good for denormalizing data, can read or write large complex
objects with one read or write operation

#CASSANDRAEU

CASSANDRASUMMITEU
How does it work?
#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
dao.writeToPath(rowKey, path, pojo);

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Read from a Path
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
!

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Read from a Path
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
!
Pojo pojo = dao.readFromPath(rowKey, path,
new TypeReference<Pojo>() { });

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Delete
!
!
dao.deletePath(rowKey, path);

#CASSANDRAEU

CASSANDRASUMMITEU
API Example - Batch Operations
!
BatchContext batch = dao.beginBatch();
!
dao.writeToPath(rowKey1, path, pojo1, batch);
dao.writeToPath(rowKey2, path, pojo2, batch);
dao.deletePath(rowKey3, path, pojo3, batch);
!
dao.applyBatch(batch);

#CASSANDRAEU

CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …;
!
Path path = dao.createPath(“x”);
dao.writeToPath(rowKey, path, person);
!

#CASSANDRAEU

CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …;
!
Path path = dao.createPath(“x”);
dao.writeToPath(rowKey, path, person);
!
Path pathToName =
path.withElements(“name”);
String name = dao.readFromPath(rowKey,
pathToName, stringTypeReference);

#CASSANDRAEU

CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations

#CASSANDRAEU

CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations

• Step 2:
– Decompose this basic structure into a map of paths to simple
values (i.e. String, Number, Boolean), done by Decomposer

#CASSANDRAEU

CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations

• Step 2:
– Decompose this basic structure into a map of paths to simple
values (i.e. String, Number, Boolean), done by Decomposer

• Step 3:
– Write this map as key-value pairs in the database

#CASSANDRAEU

CASSANDRASUMMITEU
Example Decomposition - step 1

Person
name: String
birthdate: Date
nickname: String

*

1

Address
street: String
city: String
province: String
postalCode: String
countryCode: String

Simplify structure into regular
Maps, Lists, and simple values

1
*
Phone
name: String
number: String

#CASSANDRAEU

CASSANDRASUMMITEU
Example Decomposition - step 1
Simplify structure into regular
Maps, Lists, and simple values
Map

name = "John"

birthdate = "-39080932298"

nickname="Jack"

addresses=<List>

[0] = <Map>

phones=<List>

[0] = <Map>

street="123 Main"

number="+31651234567"

place="New York"

name="mobile"

[1] = <Map>
street="Singel 45"
place="Amsterdam"

#CASSANDRAEU

CASSANDRASUMMITEU
Example Decomposition - step 2
path

value

name/

“John”

birthdate/

“-39080932298”

nickname/

“Jack”

addresses/@0/street

“123 Main St.”

addresses/@0/place

“New York”

addresses/@1/street

“Singel 45”

addresses/@1/place

“Amsterdam”

phones/@0/name

“mobile”

phones/@1/number

"+31651234567"

#CASSANDRAEU

CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database

#CASSANDRAEU

CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database

• Step 2:
– “Merge” path-value maps back into basic structure

(Maps, Lists, simple values), done by Composer

#CASSANDRAEU

CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database

• Step 2:
– “Merge” path-value maps back into basic structure

(Maps, Lists, simple values), done by Composer

• Step 3:
– Use Jackson to convert basic structure back into domain object
using a TypeReference

#CASSANDRAEU

CASSANDRASUMMITEU
Design & Challenges
#CASSANDRAEU

CASSANDRASUMMITEU
Path Encoding
• Paths stored as strings
• Forward slashes in paths (but hidden by Path API)
• Path elements are internally URL encoded allowing
use of special characters in the implementation
• Special characters: @ for list indices

(@0, @1, @2, ...)
#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.

x/@0/

“1”

x/@1/

“2”

dao.writeToPath(key, “x”, {“1”,”2”});

#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
➁ Write a shorter list.
x/@0/

“1”

x/@1/

“2”

x/@0/

“3”

x/@1/

“2”

dao.writeToPath(key, “x”, {“1”,”2”});

dao.writeToPath(key, “x”, {“3”});

#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
➁ Write a shorter list.
➂ Read the list.
x/@0/

“1”

x/@1/

“2”

x/@0/

“3”

x/@1/

“2”

dao.writeToPath(key, “x”, {“1”,”2”});

dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”,”2”}

#CASSANDRAEU

✘
CASSANDRASUMMITEU
Challenge: “Shrinking Lists”

✔

Solution:
Implementation writes a list
terminator value.
x/@0/
x/@1/

0xFFFFFFFF

x/@0/

“3”

x/@1/

0xFFFFFFFF

x/@2/

dao.writeToPath(key, “x”, {“3”});

“2”

x/@2/

dao.writeToPath(key, “x”, {“1”,”2”});

“1”

0xFFFFFFFF

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”}

#CASSANDRAEU

✔
CASSANDRASUMMITEU
Challenge: “Shrinking Lists”

✔

Solution:
Implementation writes a list
terminator value.

Unfortunately, this is only a partial solution, because it is still possible to
read “stale” list elements using a positional index in the path.

!
This can be avoided by doing a delete before a write, but for performance
reasons the library will not do that automatically.

!
Conclusion: The user must know what they are doing and understand the
implementation.

#CASSANDRAEU

CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Because objects can be updated at any path, there is no
protection against a write “corrupting” an object
structure
Path path = dao.createPath(“x”);
dao.writeToPath(key, path, person1);

#CASSANDRAEU

x/address/street/

“Singel 45”

x/name/

“John”

CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Because objects can be updated at any path, there is no
protection against a write “corrupting” an object
structure
Path path = dao.createPath(“x”);
dao.writeToPath(key, path, person1);

x/address/street/

“Singel 45”

x/name/

“John”

x/address/street/
path = dao.createPath(“x”,”name”);
dao.writeToPath(key, path, person1);

✘
#CASSANDRAEU

“Singel 45”

x/name/

“John”

x/name/address/street/ “Singel 45”
x/name/name/

“John”

CASSANDRASUMMITEU
Challenge: Inconsistent Updates

✔

Solution:
Don’t do that!

* If it does happen...

!

The implementation provides a way to still get the “corrupted” data as
simple structures, but an attempt to convert to a now incompatible POJO
will fail.
Conclusion: The user must know what they are doing and understand
the implementation.
#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
!

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
Instead of storing paths as strings, the implementation
could have used DynamicComposite.
!

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
Instead of storing paths as strings, the implementation
could have used DynamicComposite.

!
We tried it.

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
It can work. CQL supports it as a user-defined type.
!
Unfortunately it causes cqlsh to crash, making it
difficult to “browse” the data.

#CASSANDRAEU

CASSANDRASUMMITEU
Issue: Sorting
Question:

What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?

!
It is still in consideration to use DynamicComposite for
paths in a future version.

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra Data Model
#CASSANDRAEU

CASSANDRASUMMITEU
Thrift

row key

column value

column name

“Singel 45”
“John”

…

column family

x/address/street/
x/name

<UUID>

…

- OR super column name
row key
x

<UUID>

super column family

!

(coming soon)

#CASSANDRAEU

address/street/

“Singel 45”

name

“John”

…

…

CASSANDRASUMMITEU
Thrift
Thrift implementation relies on the Hector client.

ColumnFamilyOperations<K,String,Object> operations =
new ColumnFamilyTemplate<K,String,Object>(

keyspace,KeySerializer,StringSerializer,StructureSerializer);

!
!
!
!

StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);

#CASSANDRAEU

CASSANDRASUMMITEU
CQL
CREATE TABLE person (
key text,
path text,
value text,
PRIMARY KEY (key, path)
)

• Cannot use the path itself as a column name because it
is “dynamic”
• Dynamic column family

#CASSANDRAEU

CASSANDRASUMMITEU
CQL: Data Model Constraints
CREATE TABLE person (
key text,
path text,
value text,
PRIMARY KEY (key, path)
)
•

Need to do a range (“slice”) query on the path

path must be a clustering key

•

Also, the path must be the first clustering key, since otherwise we would need to
have to provide an equals condition on previous clustering keys in a query.

•

One might try putting a secondary index on the path instead of making it a
clustering key, but this doesn’t work since Cassandra indexes only work with
equals conditions

Bad Request: No indexed columns present in by-columns clause with Equal operator

#CASSANDRAEU

CASSANDRASUMMITEU
CQL
CQL implementation relies on the DataStax Java driver.

!

StructuredDataSupport<K> dao = 

new CqlStructuredDataSupport<K>(String tableName,
String partitionKeyColumnName,
String pathColumnName,
String valueColumnName,
Session session);

#CASSANDRAEU

CASSANDRASUMMITEU
And the rest…
#CASSANDRAEU

CASSANDRASUMMITEU
Planned Features

• Sets with simple values: element
values stored in path
• DynamicComposites?
• Multiple row reads and writes
• Slice queries on path ranges
#CASSANDRAEU

CASSANDRASUMMITEU
Credits and Acknowledgements
•

Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback

•

jackson JSON Processor, which is core to the C* Path implementation

http://wiki.fasterxml.com/JacksonHome

•

Image credits:
Slide

image name

author

link

Some Strategies

binary

noegranado

http://www.flickr.com/photos/
43360884@N04/6949896929/

#CASSANDRAEU

CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing
complex objects into Path-Value pairs —
and storing them in Cassandra
https://github.com/

ebuddy/c-star-path

!
!

*

Artifacts available at Maven Central.

#CASSANDRAEU

CASSANDRASUMMITEU

Mais conteúdo relacionado

Mais procurados

Perform Like a frAg Star
Perform Like a frAg StarPerform Like a frAg Star
Perform Like a frAg Starrenaebair
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? DataWorks Summit
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDBDavid Coallier
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
WordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkWordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkExove
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"South Tyrol Free Software Conference
 

Mais procurados (7)

Callimachus
CallimachusCallimachus
Callimachus
 
Perform Like a frAg Star
Perform Like a frAg StarPerform Like a frAg Star
Perform Like a frAg Star
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch?
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
WordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a FrameworkWordPress Café: Using WordPress as a Framework
WordPress Café: Using WordPress as a Framework
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
 

Semelhante a C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradusrandyguck
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Wes McKinney
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandraniallmilton
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...DataStax
 
Suicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and CassandraSuicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and CassandraKen Krugler
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Vincent Royer
 

Semelhante a C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra (20)

Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
C* path
C* pathC* path
C* path
 
Presentation
PresentationPresentation
Presentation
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Overiew of Cassandra and Doradus
Overiew of Cassandra and DoradusOveriew of Cassandra and Doradus
Overiew of Cassandra and Doradus
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Spring Data Cassandra
Spring Data CassandraSpring Data Cassandra
Spring Data Cassandra
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
Suicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and CassandraSuicide Risk Prediction Using Social Media and Cassandra
Suicide Risk Prediction Using Social Media and Cassandra
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019Elassandra schema management - Apache Con 2019
Elassandra schema management - Apache Con 2019
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

  • 1. C* Path: Denormalize your data Eric Zoerner | Software Developer, eBuddy BV #CASSANDRAEU Cassandra Summit Europe 2013 London CASSANDRASUMMITEU
  • 4. Cassandra in
 eBuddy Messaging Platform • User Data Service #CASSANDRAEU CASSANDRASUMMITEU
  • 5. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service #CASSANDRAEU CASSANDRASUMMITEU
  • 6. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store #CASSANDRAEU CASSANDRASUMMITEU
  • 7. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store • Message History #CASSANDRAEU CASSANDRASUMMITEU
  • 8. Cassandra in
 eBuddy Messaging Platform • User Data Service • User Discovery Service • Persistent Session Store • Message History • Location-based Discovery #CASSANDRAEU CASSANDRASUMMITEU
  • 9. Some Statistics • Current size of data – 1,4 TB total (replication of 3x); 467 GB actual data ! • 12 million sessions (11 million users plus groups) ! • Almost a billion rows in one column family
 (inverse social graph) #CASSANDRAEU CASSANDRASUMMITEU
  • 11. The Problem (a “classic”) Key-Value Store (RDB table, NoSQL, etc.) Complex Object Person name: String birthdate: Date nickname: String * 1 Address street: String city: String province: String postalCode: String countryCode: String ? ? ? ? 1 ? ? ? ? * Phone name: String number: String #CASSANDRAEU CASSANDRASUMMITEU
  • 13. Serialization! Some Strategies Person id birthdate nickname 110 John 1985-04-06 Jack 111 Mary 1979-11-30 Mary person_id address_id street city 110 001 123 Main St New York 110 002 456 Singel Amsterdam 111 Normalization! name 003 78 Hoofd Str London Address Phone person_id phone 110 mobile +15551234 111 home +44884800 111 #CASSANDRAEU name mobile +44030393 CASSANDRASUMMITEU
  • 14. Some Strategies Serialization! Person id birthdate nickname 110 John 1985-04-06 Jack 111 Mary 1979-11-30 Mary person_id address_id street city 110 Normalization! name 001 123 Main St New York 110 002 456 Singel Amsterdam 111 003 78 Hoofd Str London Address Decomposition! name/ John addresses/@0/street 123 Main St. phones/@0/number +31123456789 ... ... Phone name phone 110 mobile +15551234 111 home +44884800 111 #CASSANDRAEU person_id mobile +44030393 CASSANDRASUMMITEU
  • 15. Strategies Comparison Serialization Single Write Single Read Consistent Updates Structural Access Cycles #CASSANDRAEU Normalization Decomposition ✔ ✔ ✔ ✘ ✔ ✘ ✘ ✔ ✔ ✔ ✔ ✔ not enforced ✔ ✘ CASSANDRASUMMITEU
  • 16. C* Path Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra https://github.com/ ebuddy/c-star-path ! ! * Artifacts available at Maven Central. #CASSANDRAEU CASSANDRASUMMITEU
  • 17. C* Path: Decomposition • Easy to Use • Simple API #CASSANDRAEU CASSANDRASUMMITEU
  • 18. C* Path: Decomposition • Easy to Use • Simple API • Good for Cassandra because: – Structural Access: Write parts of objects without reading first #CASSANDRAEU CASSANDRASUMMITEU
  • 19. C* Path: Decomposition • Easy to Use • Simple API • Good for Cassandra because: – Structural Access: Write parts of objects without reading first – Good for denormalizing data, can read or write large complex objects with one read or write operation #CASSANDRAEU CASSANDRASUMMITEU
  • 20. How does it work? #CASSANDRAEU CASSANDRASUMMITEU
  • 21. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; ! #CASSANDRAEU CASSANDRASUMMITEU
  • 22. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! #CASSANDRAEU CASSANDRASUMMITEU
  • 23. API Example - Write to a Path StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! dao.writeToPath(rowKey, path, pojo); #CASSANDRAEU CASSANDRASUMMITEU
  • 24. API Example - Read from a Path ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! ! #CASSANDRAEU CASSANDRASUMMITEU
  • 25. API Example - Read from a Path ! Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); ! ! Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { }); #CASSANDRAEU CASSANDRASUMMITEU
  • 26. API Example - Delete ! ! dao.deletePath(rowKey, path); #CASSANDRAEU CASSANDRASUMMITEU
  • 27. API Example - Batch Operations ! BatchContext batch = dao.beginBatch(); ! dao.writeToPath(rowKey1, path, pojo1, batch); dao.writeToPath(rowKey2, path, pojo2, batch); dao.deletePath(rowKey3, path, pojo3, batch); ! dao.applyBatch(batch); #CASSANDRAEU CASSANDRASUMMITEU
  • 28. Read or write at any level of a path Person person = …; ! Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); ! #CASSANDRAEU CASSANDRASUMMITEU
  • 29. Read or write at any level of a path Person person = …; ! Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); ! Path pathToName = path.withElements(“name”); String name = dao.readFromPath(rowKey, pathToName, stringTypeReference); #CASSANDRAEU CASSANDRASUMMITEU
  • 30. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations #CASSANDRAEU CASSANDRASUMMITEU
  • 31. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations • Step 2: – Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer #CASSANDRAEU CASSANDRASUMMITEU
  • 32. Write Implementation: Decomposition • Step 1: – Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations • Step 2: – Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer • Step 3: – Write this map as key-value pairs in the database #CASSANDRAEU CASSANDRASUMMITEU
  • 33. Example Decomposition - step 1 Person name: String birthdate: Date nickname: String * 1 Address street: String city: String province: String postalCode: String countryCode: String Simplify structure into regular Maps, Lists, and simple values 1 * Phone name: String number: String #CASSANDRAEU CASSANDRASUMMITEU
  • 34. Example Decomposition - step 1 Simplify structure into regular Maps, Lists, and simple values Map name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List> [0] = <Map> phones=<List> [0] = <Map> street="123 Main" number="+31651234567" place="New York" name="mobile" [1] = <Map> street="Singel 45" place="Amsterdam" #CASSANDRAEU CASSANDRASUMMITEU
  • 35. Example Decomposition - step 2 path value name/ “John” birthdate/ “-39080932298” nickname/ “Jack” addresses/@0/street “123 Main St.” addresses/@0/place “New York” addresses/@1/street “Singel 45” addresses/@1/place “Amsterdam” phones/@0/name “mobile” phones/@1/number "+31651234567" #CASSANDRAEU CASSANDRASUMMITEU
  • 36. Read implementation: Composition • Step 1: – Read path-value pairs from database #CASSANDRAEU CASSANDRASUMMITEU
  • 37. Read implementation: Composition • Step 1: – Read path-value pairs from database • Step 2: – “Merge” path-value maps back into basic structure
 (Maps, Lists, simple values), done by Composer #CASSANDRAEU CASSANDRASUMMITEU
  • 38. Read implementation: Composition • Step 1: – Read path-value pairs from database • Step 2: – “Merge” path-value maps back into basic structure
 (Maps, Lists, simple values), done by Composer • Step 3: – Use Jackson to convert basic structure back into domain object using a TypeReference #CASSANDRAEU CASSANDRASUMMITEU
  • 40. Path Encoding • Paths stored as strings • Forward slashes in paths (but hidden by Path API) • Path elements are internally URL encoded allowing use of special characters in the implementation • Special characters: @ for list indices
 (@0, @1, @2, ...) #CASSANDRAEU CASSANDRASUMMITEU
  • 41. Challenge: “Shrinking Lists” ➀ Write a list. x/@0/ “1” x/@1/ “2” dao.writeToPath(key, “x”, {“1”,”2”}); #CASSANDRAEU CASSANDRASUMMITEU
  • 42. Challenge: “Shrinking Lists” ➀ Write a list. ➁ Write a shorter list. x/@0/ “1” x/@1/ “2” x/@0/ “3” x/@1/ “2” dao.writeToPath(key, “x”, {“1”,”2”}); dao.writeToPath(key, “x”, {“3”}); #CASSANDRAEU CASSANDRASUMMITEU
  • 43. Challenge: “Shrinking Lists” ➀ Write a list. ➁ Write a shorter list. ➂ Read the list. x/@0/ “1” x/@1/ “2” x/@0/ “3” x/@1/ “2” dao.writeToPath(key, “x”, {“1”,”2”}); dao.writeToPath(key, “x”, {“3”}); dao.readFromPath(key, “x”, new TypeReference<List<String>>() {}); {“3”,”2”} #CASSANDRAEU ✘ CASSANDRASUMMITEU
  • 44. Challenge: “Shrinking Lists” ✔ Solution: Implementation writes a list terminator value. x/@0/ x/@1/ 0xFFFFFFFF x/@0/ “3” x/@1/ 0xFFFFFFFF x/@2/ dao.writeToPath(key, “x”, {“3”}); “2” x/@2/ dao.writeToPath(key, “x”, {“1”,”2”}); “1” 0xFFFFFFFF dao.readFromPath(key, “x”, new TypeReference<List<String>>() {}); {“3”} #CASSANDRAEU ✔ CASSANDRASUMMITEU
  • 45. Challenge: “Shrinking Lists” ✔ Solution: Implementation writes a list terminator value. Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path. ! This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically. ! Conclusion: The user must know what they are doing and understand the implementation. #CASSANDRAEU CASSANDRASUMMITEU
  • 46. Challenge: Inconsistent Updates Because objects can be updated at any path, there is no protection against a write “corrupting” an object structure Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1); #CASSANDRAEU x/address/street/ “Singel 45” x/name/ “John” CASSANDRASUMMITEU
  • 47. Challenge: Inconsistent Updates Because objects can be updated at any path, there is no protection against a write “corrupting” an object structure Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1); x/address/street/ “Singel 45” x/name/ “John” x/address/street/ path = dao.createPath(“x”,”name”); dao.writeToPath(key, path, person1); ✘ #CASSANDRAEU “Singel 45” x/name/ “John” x/name/address/street/ “Singel 45” x/name/name/ “John” CASSANDRASUMMITEU
  • 48. Challenge: Inconsistent Updates ✔ Solution: Don’t do that! * If it does happen... ! The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail. Conclusion: The user must know what they are doing and understand the implementation. #CASSANDRAEU CASSANDRASUMMITEU
  • 49. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! ! #CASSANDRAEU CASSANDRASUMMITEU
  • 50. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! Instead of storing paths as strings, the implementation could have used DynamicComposite. ! #CASSANDRAEU CASSANDRASUMMITEU
  • 51. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! Instead of storing paths as strings, the implementation could have used DynamicComposite. ! We tried it. #CASSANDRAEU CASSANDRASUMMITEU
  • 52. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! It can work. CQL supports it as a user-defined type. ! Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data. #CASSANDRAEU CASSANDRASUMMITEU
  • 53. Issue: Sorting Question:
 What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? ! It is still in consideration to use DynamicComposite for paths in a future version. #CASSANDRAEU CASSANDRASUMMITEU
  • 55. Thrift row key column value column name “Singel 45” “John” … column family x/address/street/ x/name <UUID> … - OR super column name row key x <UUID> super column family ! (coming soon) #CASSANDRAEU address/street/ “Singel 45” name “John” … … CASSANDRASUMMITEU
  • 56. Thrift Thrift implementation relies on the Hector client. ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>(
 keyspace,KeySerializer,StringSerializer,StructureSerializer); ! ! ! ! StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations); #CASSANDRAEU CASSANDRASUMMITEU
  • 57. CQL CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) ) • Cannot use the path itself as a column name because it is “dynamic” • Dynamic column family #CASSANDRAEU CASSANDRASUMMITEU
  • 58. CQL: Data Model Constraints CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) ) • Need to do a range (“slice”) query on the path path must be a clustering key • Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query. • One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditions
 Bad Request: No indexed columns present in by-columns clause with Equal operator #CASSANDRAEU CASSANDRASUMMITEU
  • 59. CQL CQL implementation relies on the DataStax Java driver. ! StructuredDataSupport<K> dao = 
 new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session); #CASSANDRAEU CASSANDRASUMMITEU
  • 61. Planned Features • Sets with simple values: element values stored in path • DynamicComposites? • Multiple row reads and writes • Slice queries on path ranges #CASSANDRAEU CASSANDRASUMMITEU
  • 62. Credits and Acknowledgements • Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback • jackson JSON Processor, which is core to the C* Path implementation
 http://wiki.fasterxml.com/JacksonHome • Image credits: Slide image name author link Some Strategies binary noegranado http://www.flickr.com/photos/ 43360884@N04/6949896929/ #CASSANDRAEU CASSANDRASUMMITEU
  • 63. C* Path Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra https://github.com/ ebuddy/c-star-path ! ! * Artifacts available at Maven Central. #CASSANDRAEU CASSANDRASUMMITEU