Presented by Austin Zellner, Solutions Architect, MongoDB
Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB Schema Design: Practical Applications and Implications
1. MongoDB Schema Design:
Practical Applications and
Implications
Austin Zellner
Austin.zellner@mongodb.com
MongoDB SolutionArchitect
Ohio Valley Region
2. Agenda
• What the heck is a Mongo?
• MongoDB Data Structure 101: Documents
• Object Relationships
• Enterprise Design Patterns
• Enterprise Smells and Solutions
• If Nothing Else...
10. Deployment from local to global
Single Data Center Deployment
Deploy Global, Access Local
11. Ops Manager – Production Admin Tool
• Automate rollouts and deployments
• Index suggestions to improve your query
performance
• Backups, utility functions, security
13. Built in Functions
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not
call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd
St. and 6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers grouped by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without
rewriting the document
• Select just the mobile phone number in the list
• Sort on the modified date
Left outer join
($lookup)
• Query for all San Francisco residences, lookup their
transactions, and sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark
CRUD Operations
• Find
• Update
• Insert
• Delete
Data Modification
• Aggregation Pipeline
Utility Functions
• Geospatial
• Graph
• Text
• Etc.
14. From Facts to Global Deployments
Database
Collection
MongoDB contains Facts
Document – BSON File
_id
Facts
Index
_idFileLoc
BSON Read / Write Machine
Server
Index
_idFileLoc
Listener
Executes
Javascript
C++
Node
Nodes come together to Replica Set
Node
Node Node
Primary
SecondarySecondary
Replica Set
OpLog OpLog
Replica Sets form Sharded Clusters
Sharded Cluster
Node
Primary
Secondary
Node
Secondary
Node
Node
Primary
Secondary
Node
Secondary
Node
Node
Primary
Secondary
Node
Secondary
Node
Sharding allows expansion
Key 1-99 Key 100-199 Key 200-299 Key 300-399 Key 400-499 Key 400-499
Deployment from local to global
Single Data Center Deployment
Deploy Global, Access Local
Ops Manager – Production Admin Tool
• Automate rollouts and deployments
• Index suggestions to improve your query
performance
• Backups, utility functions, security
17. How would we model
this in MongoDB?
Size: 12"
Infield/Outfield/Pitcher model
2-PieceWeb pattern
Most popularMLB® pattern among pitchers
Pro Stock®American steerhide leather offers rugged durabilityand a superiorfeel
Dual-Welting™on "exposededges"of the fingershelpsmaintainpocketshapeand durability
Pro Stock™hand-designedpattern for unbeatablecraftsmanship
Dri-Lex®ultra-breathablewrist liningrepels moisture from your hand
Black leather with rich brown embellishments
Pattern: B212
Model:WTA2000BBB212
Wilson
18. We use a document
{
category: “glove”,
model: “PRO112PT”,
name: “Air Elite”,
brand: “Rawlings”,
price: 229.99,
available: Date(“2013-03-31”)
}
Fields
Values
Field values
are typed
24. MongoDB Compass
• Visualize & explore
your schema with an
intuitive GUI
• Gain quick insights
about your data with
easy-to-read
histograms
• Build queries with a
few clicks
• Drill down to view
individual documents
in your collection
• Understand and
resolve performance
issues with visual
explain plans
• Check index
utilization
Debug &
Optimize
Visualize &
Explore
The GUI for MongoDB
Visual explain plans and full CRUD functionality are currently in beta.
• Insert new
documents or clone
existing documents
• Modify documents in
place using the
powerful visual editor
• Delete documents in
just a few clicks
Insert, Modify, &
Delete
25. 25
MongoDB Connector for BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI tool
into equivalent MongoDB queries that are sent to
MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then visualize
the data based on user requirements
26. 26
Location & Flow of Data
MongoDB
BI
Connector
Mapping meta-data Application data
{name:
“Andrew”,
address:
{street:…
}}
DocumentTableAnalytics & visualization
33. 1-M : General Recommendations
• Embed, when possible
• Many are weak entities
• Access all information in a single query
• Take advantage of update atomicity
• No additional data duplication
• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
• 16 MB document size
• Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
36. M-M : General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
• Of the two, which one changes the
least?
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both .. It’s ok!
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
HospitalsPhysicians
39. Object Store
Use Case A
Use Case B
Use Case C
Collection: Use Case A
Collection: Use Case B
Collection: Use Case C
Collection: Utility
40. Microservice
Service A
Service B
Service C
Collection: Use Case A
Collection: Use Case B
Collection: Use Case C
Collection: Utility
Apps
Apps
Apps
Apps
AppsApps
Apps
41. Domain and Utility
Registration
Use Case
Edit
Customer
Profile Use
Case
Collection: Customer
Domain
Collection: Registration
Utility
Read / Write
Write
Read / Write
Order Use
Case
Collection: Order
Domain
Read
Read / Write
44. Smell: Aggregation of
disparate data is difficult
Cards
Loans
Deposits
… Data
Warehouse
Batch
Cross-Silo
applications
Issues
• Yesterday’s data
• Details lost
• Inflexible schema
• Slow performance
Datamar
t
Datamar
t
Datamar
t
Batch
Impact
• What happened today?
• Worse customer
satisfaction
• Missed opportunities
• Lost revenue
Batch
Batch
Reporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
45. Solution: Using dynamic
schema and easy scaling
Data
Warehouse
Real-time or
Batch
…
Customer-facing
Applications
Regulatory
applications
Operational Single View Benefits
• Real-time
• Complete details
• Agile
• Higher customer
retention
• Increase wallet share
• Proactive exception
handling
Strategic
Reporting
Operational
Reporting
Cards
Loans
Deposits
…
Customer
Accounts
Cards
Silo 1
Loans
Silo 2
Deposits
Silo N
46. Aligns with Microservices Design Patterns
API Layer
(Microservices, SQL reads,
Spark)
BI UserApp Data Scientist
Customer Info
Service1
Customer Info
Service2
Customer Info
ServiceN
…
…
…
…
MongoDB BI
Connector
MongoDB BI
Connector
Spark Worker1 Spark Worker2…
Mongos
(Query Router)
Mongos
(Query Router)
Mongos
(Query Router)
Mongos
(Query Router)
CustInfo
Shard 1
Mongos
(Query Router)
Mongos
(Query Router)
DC1
DC2
DC3
CustInfo
Shard 2
CustInfo
Shard N
…
MongoDB Ops Manager
• Monitors
• Backups/restores
• Automates management
• REST API for container
orchestration integration
47. Smell: Response From Data Warehouse
or Other System is Slow
Cards
Loans
Deposits
… Data
Warehouse
Issues
• Data stored normalized
• Reports slow to generate
• Data updated daily but user
response must be fast
Impact
• Lost productivity
• Dissatisfied users and
business
Reporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
48. Solution: Optimize Data Structure
as a Datamart In-memory or On-disk
Cards
Loans
Deposits
…
Data
Warehouse
Solution
• Data stored in optimal
structure for reports
• Optionally in memory
Impact
• Response times is as fast
as possible
• Users and business
satisfied
FastReporting
Cards
Silo 1
Loans
Silo 2
Deposits
Silo 3
…
Datamart/Cache
49. Smell: Siloed operational
applications
Silo 1 Data
Silo 2 Data
Silo N Data
…
Impact
• Views are siloed
• Duplicate management
and data access layer
• Need another layer to
aggregate
Silo 1 systems
Silo 2 Systems
Silo N
Systems
…
ReportingReportingReporting
50. Solution: Unified data service
…
Benefit
• Each application can still
save its own data
• Data is already aggregated
for cross-silo reporting
• One cluster and data access
layer to manage
Silo 1 Systems
Silo 2 Systems
Silo N Systems
…
Reporting
……
51. Smell: Master data can be hard
to change and distribute
Golden
Copy
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Common issues
• Hard to change schema
of master data
• Data copied everywhere
and gets out of sync
Impact
• Process breaks from out
of sync data
• Business doesn’t have
data it needs
• Many copies creates
more management
52. Solution: Persistent dynamic
cache replicated globally
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Real-time
Solution:
• Load into primary with
any schema
• Replicate to and read
from secondaries
Benefits
• Easy & fast change at
speed of business
• Easy scale out for one
stop shop for data
• Low TCO
54. Schema Rules of Thumb
• Let your use case guide your schema design
• Going to read same index all the time? – Embed
• Going to read from different indexes? – Reference
• Rule of 80/20 for your Embed vs Reference
• If arrays are going to grow, Reference
55. Easiest way to learn: Do!
• Get a Free Atlas Tier and download Compass
• Experiment and feel free to ask questions!