SlideShare uma empresa Scribd logo
1 de 42
Session 1: Foundations
Introducing Concepts & Principles
Pascal Desmarets
3
4
5
Why is Data Modeling a key success factor?
6
Schemas are everywhere! (even in schemaless DBs…)
7
Before
Now
Polyglot Persistence and Data Pipelines
8
Relational Document Columnar Property
Graph
Semantic Web
Triple Stores
Key-Value APIs & Storage
formats
Schemas aren’t limited to relational databases
■ for data storage (data-at-rest)
● Databases: RDBMS and some NoSQL
● File formats: Avro, JSON, Parquet, YAML, …
■ for data exchanges (data-in-motion)
● APIs (Application Programming Interfaces):
● in web services
● Representational State Transfer
(REST) – including Swagger,
OpenAPI, AsyncAPI
● GraphQL
● RPCs (remote procedure calls)
● ProtoBuf, Thrift, …
● MQs (message-passing systems)
● Kafka, Azure EventHub, Pulsar,
…
● ETLs (Extract-Transform-Load)
9
Copyright © 2016-2023 Hackolade
Schema changes happen frequently, and often without
warning, resulting in both ugly and unmaintainable code.
10
Architecture complexity
Volume and pace of changes
Data gets lost and out-of-sync
Diverging requirements and
objectives
11
Data Modeling is a Key Success Factor
Data models and schemas are
perhaps the most important part of
developing software, because they
have such a profound effect:
■ not only on how the software is
written,
■ but also on how we think about
the problem that we are solving.
Martin Kleppmann,
Designing Data-Intensive Applications
Data Modeling for NoSQL
13
Different databases require different types of data models
14
Relational
Analytics
Key-Value Document Property Graph
Knowledge Graph
Columnar Storage Formats
SQL NoSQL
Objectives of a
good NoSQL
data model
15
Achieve good performance by
leveraging features of NoSQL
Maximize developer
productivity
by allowing
agile schema evolution
Minimize total cost of
ownership of the whole
solution
Mindshift from
application-agnostic to application-specific modeling
16
Data Data Model Application
Application
Design
Access
patterns
& Queries
Data Model Data
Relational
NoSQL
Conceptual Logical Physical
Polyglot
Data Model
Target-specific
Data Model
Relational
data modeling
NoSQL/SQL/API/…
data modeling
Domain
Driven
Data
Modeling
Multiple data modeling stages
Modern process with Domain-Driven Data Modeling
■ DDDM allows you to
● focus on your core ("domain and sub-domains")
● break down complex problems into smaller ones ("bounded context")
● use data modeling as a communication tool ("ubiquitous language")
● keep together what belongs together ("aggregates")
● reach a shared understanding between business and tech ("collaboration of domain
experts and developers")
● iterate and evolve ("continuous refinement")
18
Migrating relational database structures to ScyllaDB
19
RDBMS ScyllaDB
Benefits of data modeling
■ While traditional data modeling may be perceived to
get in the way of development and take too much
time…
■ Next-gen data modeling tools such as Hackolade
Studio are recognized to:
● facilitate Agile development
● reduce development time
● increase application quality
● implement consistent definitions of data
● improve data quality
● enable better data governance and compliance
● facilitate documentation and communication
To leverage the dynamic schema of ScyllaDB, data modeling
turns out to be even more important than with relational
databases
20
Presenter
Pascal Desmarets
Founder & CEO
Hackolade
pascal@hackolade.com
@hackolade
Download free trial at
https://hackolade.com
Original slides
not used
22
It may be misleading that…
■ ScyllaDB tables look like RDBMS tables
■ CQL looks like SQL
23
The ideal ScyllaDB application has
the following characteristics
■ Writes exceed reads by a large margin
■ Data is rarely updated and when updates are made, they are
idempotent (the result of a successful performed operation is
independent of the number of times it is executed)
■ Read Access is by a known primary key
■ Data can be partitioned via a key that allows the database to be
spread evenly across multiple nodes
■ There is no need for joins or aggregates
24
Excellent ScyllaDB Use Cases
■ Transaction logging: purchases, test scores, movies watched and
movie latest location
■ Recommendation and personalization engines
■ Fraud detection
■ Tracking pretty much anything including order status, packages, etc
■ Storing time series data (as long as you do your own aggregates)
• Health tracker data
• Weather service history
• Internet of things status and event history
• Sensor data in general
■ Messaging systems: chats, collaboration, and instant messaging
apps, etc.
25
26
Denormalization is expected
Writes are (almost) free
No DB-level joins
No referential integrity
Indexing useful in specific
circumstances
Differences
between
ScyllaDB and
relational
databases
ScyllaDB Data Model Principles (1 of 3)
■ Keyspace: container for tables in a Cassandra data model
■ Table: container for an ordered collection of rows
■ Rows: made of a primary key plus an ordered set of columns,
themselves made of name/value pairs.
■ No need to store a value for every column each time a new row is
stored.
27
ScyllaDB Data Model Principles (2 of 3)
■ Primary key: a composite made of a partition key plus an
optional set of clustering columns.
● Partition key: is responsible for data distribution across the nodes. It
determines which node will store a given row. It can be one or more columns.
● Clustering columns: is responsible for sorting the rows within the partition. It
can be zero or more columns.
28
ScyllaDB Data Model Principles (3 of 3)
■ Data type: defined to constrain the values stored in a column. Data
types include character and numeric types, collections, and user-
defined types. A column also has other attributes: timestamps and
time-to-live.
■ Secondary index: an index on any columns that is not part of the
primary key. Secondary indexes are not recommended on columns
with high cardinality or very low cardinality, or on columns that a
frequently updated or deleted.
■ Joins: cannot be performed at the database level. If there is need for a
join, either it must be performed at the application level, or preferably,
the data model should be adapted to create a denormalized table
that represents the join results.
29
Data modeling for ScyllaDB is a balancing act
■ Two primary rules of data modeling in ScyllaDB:
● each partition should have roughly same amount of data
● read operations should access minimum partitions, ideally only one
■ The two data modeling principles often conflict, therefore you
have to find a balance between the two based on domain
understanding and business needs
■ Anticipate growth: a data model that may make sense with a
particular transaction volume, may not longer make sense when
multiplied 100x or 1000x
30
Slide Title
31
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
32
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
33
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
34
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
35
35
Key Point
36
Dark Section Title Slide
Slide Title Top Gradient 2
Pellentesque habitant morbi tristique senectus et netus et malesuada
fames ac turpis egestas. Morbi ultrices sed nulla non pellentesque. Ut
ac tortor facilisis neque ultricies egestas quis tempus erat. Aenean a
finibus leo, sit amet congue nibh.
38
38
How to Use This Slide Deck Template
There are multiple background options, dark and light with more or
less gradient colors.
The font is Montserrat
Base font size for body text is 18pts, adjust accordingly.
Color Palette
#008dff
#03ddda
#2b3990
#f9ae00
#3d444c
39
All Blank Slide Title Goes Here
40
Table Example
Column 1 Column 2 Column 3
Requests/Minut
e
12M 500K
AVG Latency 4 ms 8 ms
Max Latency 8 ms 35 ms
41
Keep in touch!
Presenter Name
Job title
Company name
email@email.com
@socialhandle

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Oracle GoldenGate
Oracle GoldenGate Oracle GoldenGate
Oracle GoldenGate
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Getting started with BigQuery
Getting started with BigQueryGetting started with BigQuery
Getting started with BigQuery
 
Oracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub ImplementationOracle Goldengate for Big Data - LendingClub Implementation
Oracle Goldengate for Big Data - LendingClub Implementation
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for Performance
 
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
 
Oracle Data Guard
Oracle Data GuardOracle Data Guard
Oracle Data Guard
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLON
 
Oracle GoldenGate Studio Intro
Oracle GoldenGate Studio IntroOracle GoldenGate Studio Intro
Oracle GoldenGate Studio Intro
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 

Semelhante a NoSQL Data Modeling Foundations — Introducing Concepts & Principles

Semelhante a NoSQL Data Modeling Foundations — Introducing Concepts & Principles (20)

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 

Mais de ScyllaDB

Mais de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

NoSQL Data Modeling Foundations — Introducing Concepts & Principles

  • 1. Session 1: Foundations Introducing Concepts & Principles Pascal Desmarets
  • 2.
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. Why is Data Modeling a key success factor? 6
  • 7. Schemas are everywhere! (even in schemaless DBs…) 7 Before Now
  • 8. Polyglot Persistence and Data Pipelines 8 Relational Document Columnar Property Graph Semantic Web Triple Stores Key-Value APIs & Storage formats
  • 9. Schemas aren’t limited to relational databases ■ for data storage (data-at-rest) ● Databases: RDBMS and some NoSQL ● File formats: Avro, JSON, Parquet, YAML, … ■ for data exchanges (data-in-motion) ● APIs (Application Programming Interfaces): ● in web services ● Representational State Transfer (REST) – including Swagger, OpenAPI, AsyncAPI ● GraphQL ● RPCs (remote procedure calls) ● ProtoBuf, Thrift, … ● MQs (message-passing systems) ● Kafka, Azure EventHub, Pulsar, … ● ETLs (Extract-Transform-Load) 9 Copyright © 2016-2023 Hackolade
  • 10. Schema changes happen frequently, and often without warning, resulting in both ugly and unmaintainable code. 10 Architecture complexity Volume and pace of changes Data gets lost and out-of-sync Diverging requirements and objectives
  • 11. 11
  • 12. Data Modeling is a Key Success Factor Data models and schemas are perhaps the most important part of developing software, because they have such a profound effect: ■ not only on how the software is written, ■ but also on how we think about the problem that we are solving. Martin Kleppmann, Designing Data-Intensive Applications
  • 13. Data Modeling for NoSQL 13
  • 14. Different databases require different types of data models 14 Relational Analytics Key-Value Document Property Graph Knowledge Graph Columnar Storage Formats SQL NoSQL
  • 15. Objectives of a good NoSQL data model 15 Achieve good performance by leveraging features of NoSQL Maximize developer productivity by allowing agile schema evolution Minimize total cost of ownership of the whole solution
  • 16. Mindshift from application-agnostic to application-specific modeling 16 Data Data Model Application Application Design Access patterns & Queries Data Model Data Relational NoSQL
  • 17. Conceptual Logical Physical Polyglot Data Model Target-specific Data Model Relational data modeling NoSQL/SQL/API/… data modeling Domain Driven Data Modeling Multiple data modeling stages
  • 18. Modern process with Domain-Driven Data Modeling ■ DDDM allows you to ● focus on your core ("domain and sub-domains") ● break down complex problems into smaller ones ("bounded context") ● use data modeling as a communication tool ("ubiquitous language") ● keep together what belongs together ("aggregates") ● reach a shared understanding between business and tech ("collaboration of domain experts and developers") ● iterate and evolve ("continuous refinement") 18
  • 19. Migrating relational database structures to ScyllaDB 19 RDBMS ScyllaDB
  • 20. Benefits of data modeling ■ While traditional data modeling may be perceived to get in the way of development and take too much time… ■ Next-gen data modeling tools such as Hackolade Studio are recognized to: ● facilitate Agile development ● reduce development time ● increase application quality ● implement consistent definitions of data ● improve data quality ● enable better data governance and compliance ● facilitate documentation and communication To leverage the dynamic schema of ScyllaDB, data modeling turns out to be even more important than with relational databases 20
  • 21. Presenter Pascal Desmarets Founder & CEO Hackolade pascal@hackolade.com @hackolade Download free trial at https://hackolade.com
  • 23. It may be misleading that… ■ ScyllaDB tables look like RDBMS tables ■ CQL looks like SQL 23
  • 24. The ideal ScyllaDB application has the following characteristics ■ Writes exceed reads by a large margin ■ Data is rarely updated and when updates are made, they are idempotent (the result of a successful performed operation is independent of the number of times it is executed) ■ Read Access is by a known primary key ■ Data can be partitioned via a key that allows the database to be spread evenly across multiple nodes ■ There is no need for joins or aggregates 24
  • 25. Excellent ScyllaDB Use Cases ■ Transaction logging: purchases, test scores, movies watched and movie latest location ■ Recommendation and personalization engines ■ Fraud detection ■ Tracking pretty much anything including order status, packages, etc ■ Storing time series data (as long as you do your own aggregates) • Health tracker data • Weather service history • Internet of things status and event history • Sensor data in general ■ Messaging systems: chats, collaboration, and instant messaging apps, etc. 25
  • 26. 26 Denormalization is expected Writes are (almost) free No DB-level joins No referential integrity Indexing useful in specific circumstances Differences between ScyllaDB and relational databases
  • 27. ScyllaDB Data Model Principles (1 of 3) ■ Keyspace: container for tables in a Cassandra data model ■ Table: container for an ordered collection of rows ■ Rows: made of a primary key plus an ordered set of columns, themselves made of name/value pairs. ■ No need to store a value for every column each time a new row is stored. 27
  • 28. ScyllaDB Data Model Principles (2 of 3) ■ Primary key: a composite made of a partition key plus an optional set of clustering columns. ● Partition key: is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns. ● Clustering columns: is responsible for sorting the rows within the partition. It can be zero or more columns. 28
  • 29. ScyllaDB Data Model Principles (3 of 3) ■ Data type: defined to constrain the values stored in a column. Data types include character and numeric types, collections, and user- defined types. A column also has other attributes: timestamps and time-to-live. ■ Secondary index: an index on any columns that is not part of the primary key. Secondary indexes are not recommended on columns with high cardinality or very low cardinality, or on columns that a frequently updated or deleted. ■ Joins: cannot be performed at the database level. If there is need for a join, either it must be performed at the application level, or preferably, the data model should be adapted to create a denormalized table that represents the join results. 29
  • 30. Data modeling for ScyllaDB is a balancing act ■ Two primary rules of data modeling in ScyllaDB: ● each partition should have roughly same amount of data ● read operations should access minimum partitions, ideally only one ■ The two data modeling principles often conflict, therefore you have to find a balance between the two based on domain understanding and business needs ■ Anticipate growth: a data model that may make sense with a particular transaction volume, may not longer make sense when multiplied 100x or 1000x 30
  • 32. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 32
  • 33. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 33
  • 34. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 34
  • 35. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 35 35
  • 38. Slide Title Top Gradient 2 Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ultrices sed nulla non pellentesque. Ut ac tortor facilisis neque ultricies egestas quis tempus erat. Aenean a finibus leo, sit amet congue nibh. 38 38
  • 39. How to Use This Slide Deck Template There are multiple background options, dark and light with more or less gradient colors. The font is Montserrat Base font size for body text is 18pts, adjust accordingly. Color Palette #008dff #03ddda #2b3990 #f9ae00 #3d444c 39
  • 40. All Blank Slide Title Goes Here 40
  • 41. Table Example Column 1 Column 2 Column 3 Requests/Minut e 12M 500K AVG Latency 4 ms 8 ms Max Latency 8 ms 35 ms 41
  • 42. Keep in touch! Presenter Name Job title Company name email@email.com @socialhandle

Notas do Editor

  1. The benefits of data modeling are at all levels: In the perspective of end users, because the delivered application will more closely match their expectations For management, because it reduces risks and is more productive and efficient And for developers because collaboration brings clearer requirements, less frustrating rework, and better performance