In this talk, Pascal Desmarets, CEO and Founder of Hackolade discusses the foundations of NoSQL data modeling. He highlights:
- Why is data modeling a key success factor?
- “Sweet spot” use cases where NoSQL shines the most
- Basic principles of Data Modeling for ScyllaDB
10. Schema changes happen frequently, and often without
warning, resulting in both ugly and unmaintainable code.
10
Architecture complexity
Volume and pace of changes
Data gets lost and out-of-sync
Diverging requirements and
objectives
12. Data Modeling is a Key Success Factor
Data models and schemas are
perhaps the most important part of
developing software, because they
have such a profound effect:
■ not only on how the software is
written,
■ but also on how we think about
the problem that we are solving.
Martin Kleppmann,
Designing Data-Intensive Applications
14. Different databases require different types of data models
14
Relational
Analytics
Key-Value Document Property Graph
Knowledge Graph
Columnar Storage Formats
SQL NoSQL
15. Objectives of a
good NoSQL
data model
15
Achieve good performance by
leveraging features of NoSQL
Maximize developer
productivity
by allowing
agile schema evolution
Minimize total cost of
ownership of the whole
solution
16. Mindshift from
application-agnostic to application-specific modeling
16
Data Data Model Application
Application
Design
Access
patterns
& Queries
Data Model Data
Relational
NoSQL
17. Conceptual Logical Physical
Polyglot
Data Model
Target-specific
Data Model
Relational
data modeling
NoSQL/SQL/API/…
data modeling
Domain
Driven
Data
Modeling
Multiple data modeling stages
18. Modern process with Domain-Driven Data Modeling
■ DDDM allows you to
● focus on your core ("domain and sub-domains")
● break down complex problems into smaller ones ("bounded context")
● use data modeling as a communication tool ("ubiquitous language")
● keep together what belongs together ("aggregates")
● reach a shared understanding between business and tech ("collaboration of domain
experts and developers")
● iterate and evolve ("continuous refinement")
18
20. Benefits of data modeling
■ While traditional data modeling may be perceived to
get in the way of development and take too much
time…
■ Next-gen data modeling tools such as Hackolade
Studio are recognized to:
● facilitate Agile development
● reduce development time
● increase application quality
● implement consistent definitions of data
● improve data quality
● enable better data governance and compliance
● facilitate documentation and communication
To leverage the dynamic schema of ScyllaDB, data modeling
turns out to be even more important than with relational
databases
20
23. It may be misleading that…
■ ScyllaDB tables look like RDBMS tables
■ CQL looks like SQL
23
24. The ideal ScyllaDB application has
the following characteristics
■ Writes exceed reads by a large margin
■ Data is rarely updated and when updates are made, they are
idempotent (the result of a successful performed operation is
independent of the number of times it is executed)
■ Read Access is by a known primary key
■ Data can be partitioned via a key that allows the database to be
spread evenly across multiple nodes
■ There is no need for joins or aggregates
24
25. Excellent ScyllaDB Use Cases
■ Transaction logging: purchases, test scores, movies watched and
movie latest location
■ Recommendation and personalization engines
■ Fraud detection
■ Tracking pretty much anything including order status, packages, etc
■ Storing time series data (as long as you do your own aggregates)
• Health tracker data
• Weather service history
• Internet of things status and event history
• Sensor data in general
■ Messaging systems: chats, collaboration, and instant messaging
apps, etc.
25
26. 26
Denormalization is expected
Writes are (almost) free
No DB-level joins
No referential integrity
Indexing useful in specific
circumstances
Differences
between
ScyllaDB and
relational
databases
27. ScyllaDB Data Model Principles (1 of 3)
■ Keyspace: container for tables in a Cassandra data model
■ Table: container for an ordered collection of rows
■ Rows: made of a primary key plus an ordered set of columns,
themselves made of name/value pairs.
■ No need to store a value for every column each time a new row is
stored.
27
28. ScyllaDB Data Model Principles (2 of 3)
■ Primary key: a composite made of a partition key plus an
optional set of clustering columns.
● Partition key: is responsible for data distribution across the nodes. It
determines which node will store a given row. It can be one or more columns.
● Clustering columns: is responsible for sorting the rows within the partition. It
can be zero or more columns.
28
29. ScyllaDB Data Model Principles (3 of 3)
■ Data type: defined to constrain the values stored in a column. Data
types include character and numeric types, collections, and user-
defined types. A column also has other attributes: timestamps and
time-to-live.
■ Secondary index: an index on any columns that is not part of the
primary key. Secondary indexes are not recommended on columns
with high cardinality or very low cardinality, or on columns that a
frequently updated or deleted.
■ Joins: cannot be performed at the database level. If there is need for a
join, either it must be performed at the application level, or preferably,
the data model should be adapted to create a denormalized table
that represents the join results.
29
30. Data modeling for ScyllaDB is a balancing act
■ Two primary rules of data modeling in ScyllaDB:
● each partition should have roughly same amount of data
● read operations should access minimum partitions, ideally only one
■ The two data modeling principles often conflict, therefore you
have to find a balance between the two based on domain
understanding and business needs
■ Anticipate growth: a data model that may make sense with a
particular transaction volume, may not longer make sense when
multiplied 100x or 1000x
30
32. Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
32
33. Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
33
34. Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
34
35. Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
35
35
38. Slide Title Top Gradient 2
Pellentesque habitant morbi tristique senectus et netus et malesuada
fames ac turpis egestas. Morbi ultrices sed nulla non pellentesque. Ut
ac tortor facilisis neque ultricies egestas quis tempus erat. Aenean a
finibus leo, sit amet congue nibh.
38
38
39. How to Use This Slide Deck Template
There are multiple background options, dark and light with more or
less gradient colors.
The font is Montserrat
Base font size for body text is 18pts, adjust accordingly.
Color Palette
#008dff
#03ddda
#2b3990
#f9ae00
#3d444c
39
The benefits of data modeling are at all levels:
In the perspective of end users, because the delivered application will more closely match their expectations
For management, because it reduces risks and is more productive and efficient
And for developers because collaboration brings clearer requirements, less frustrating rework, and better performance