SlideShare uma empresa Scribd logo
1 de 35
#MongoDBDays

Schema Design
Real World Use Case
Matias Cascallares
Consulting Engineer, MongoDB
matias.cascallares@mongodb.com
Agenda
• Why is schema design important
• A real world use case
– Social Inbox
– History
• Conclusions
Why is Schema Design important?
•

Largest factor for a performant system

•

Schema design with MongoDB is different
•
•

RDBMS – "What answers do I have?"
MongoDB – "What question will I have?"
#1 – Message Inbox
• Let’s get
• Social
Sending Messages

?
Reading my Inbox

?
Design Goals
•

Efficiently send new messages to recipients

•

Efficiently read inbox
3 Approaches (there are more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
Fan out on read
// Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: ”Matias",
to:
[ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: ”Matias" } ).sort( { sent: -1 } )

Schema Design, Matias Cascallares
Fan out on read – IO
Send
Message

Shard 1

Shard 2

Shard 3
Fan out on read – IO
Read
Inbox

Shard 1

Shard 2

Shard 3
Considerations
• Write: one document per message sent
• Reading my inbox means finding all messages with

my own name in the recipient field
• Read: requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find

everything
Fan out on write
// Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: ”Matias",
to:
[ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}

// Send a message
for ( recipient in msg.to ) {
msg.recipient = recipient
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { recipient: "Matias" } ).sort( { sent: -1 } )

Schema Design, Matias Cascallares
Fan out on write – IO
Send
Message

Shard 1

Shard 2

Shard 3
Fan out on write – IO
Read
Inbox

Shard 1

Shard 2

Shard 3
Considerations
• Write: one document per recipient
• Reading my inbox is just finding all of the messages

with me as the recipient
• Can shard on recipient, so inbox reads hit one shard
• But still lots of random IO on the shard
Fan out on write with buckets
// Shard on “owner / sequence”
db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )

msg = {
from: ”Matias",
to:
[ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}

Schema Design, Matias Cascallares
Fan out on write with buckets
// Send a message
for( recipient in msg.to ) {
count = db.users.findAndModify({
query: { user_name: recipient },
update: { "$inc": { "msg_count": 1 } },
upsert: true,
new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update(
{ owner: recipient, sequence: sequence },
{ $push: { "messages": msg } },
{ upsert: true }
);
}
// Read my inbox
db.inbox.find( { owner: "Matias" } ).sort ( { sequence: -1 } ).limit( 2 )

Schema Design, Matias Cascallares
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many messages

per document
• Can shard on recipient, so inbox reads hit one shard
• 1 or 2 documents to read the whole inbox
Fan out on write with buckets - IO
Send
Message

Shard 1

Shard 2

Shard 3
Fan out on write with buckets - IO
Read
Inbox

Shard 1

Shard 2

Shard 3
#2 – History
Design Goals
Need to retain a limited amount of history e.g.
– Number of items
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX, DPA)

Need to query efficiently by
– match
– ranges
3 Approaches (there are more)
•

Bucket by number of messages

•

Fixed size array

•

Bucket by date + TTL Collections
Bucket by number of
messages
db.inbox.find()
{ owner: "Matias", sequence: 25,
messages: [
{ from: "Matias",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T09:59:42.689Z"),
message: "Hi!"
},
…
]}
// Query with a date range
db.inbox.find({ owner: "Matias",
messages: {
$elemMatch: {sent:{$gt: ISODate("…") }}}})
// Remove elements based on a date
db.inbox.update({ owner: "Matias" },
{ $pull: { messages: {
sent: { $lt: ISODate("…") } } } } )
Schema Design, Matias Cascallares
Considerations
•

Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )

•

Removing the document after the last element
in the array as been removed
– { "_id" : …, "messages" : [ ], "owner" : ”Bob",

"sequence" : 0 }
Maintain the latest – Fixed size
array
msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}

// 2.4 Introduces $each, $sort and $slice modifiers for $push
db.messages.update(
{ _id: 1 },
{ $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50
}
}
}
)

Schema Design, Matias Cascallares
Considerations
•

Need to compute the size of the array based on
retention period
TTL Collections
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{
expireAfterSeconds: 31536000 }
)

Schema Design, Matias Cascallares
Conclusion
Summary
•

Multiple ways to model a domain problem

•

Understand the key uses cases of your app

•

Balance between ease of query vs. ease of
write

•

Random IO should be avoided

•

Scatter/gatter should be avoided
Questions?
#MongoDBDays

Thank You
Matias Cascallares
Consulting Engineer, MongoDB
matias.cascallares@mongodb.com

Mais conteúdo relacionado

Mais procurados

Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
MongoDB
 

Mais procurados (20)

5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDB
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo db
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Back to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingBack to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to sharding
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Schema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz MoschettiSchema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz Moschetti
 

Semelhante a Schema Design - Real world use case

Data Modeling Examples from the Real World
Data Modeling Examples from the Real WorldData Modeling Examples from the Real World
Data Modeling Examples from the Real World
MongoDB
 

Semelhante a Schema Design - Real world use case (20)

MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
 
Data Modeling Examples from the Real World
Data Modeling Examples from the Real WorldData Modeling Examples from the Real World
Data Modeling Examples from the Real World
 
Choosing a Shard key
Choosing a Shard keyChoosing a Shard key
Choosing a Shard key
 
Webinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real WorldWebinar: Data Modeling Examples in the Real World
Webinar: Data Modeling Examples in the Real World
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
 
MongoDB Strange Loop 2009
MongoDB Strange Loop 2009MongoDB Strange Loop 2009
MongoDB Strange Loop 2009
 
MongoDB at FrozenRails
MongoDB at FrozenRailsMongoDB at FrozenRails
MongoDB at FrozenRails
 
MongoDB at RuPy
MongoDB at RuPyMongoDB at RuPy
MongoDB at RuPy
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
MongoDB Hadoop DC
MongoDB Hadoop DCMongoDB Hadoop DC
MongoDB Hadoop DC
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC Python
 
MongoDB at CodeMash 2.0.1.0
MongoDB at CodeMash 2.0.1.0MongoDB at CodeMash 2.0.1.0
MongoDB at CodeMash 2.0.1.0
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)
 

Mais de Matias Cascallares

Mais de Matias Cascallares (6)

Elastic{ON} 2017 Recap
Elastic{ON} 2017 RecapElastic{ON} 2017 Recap
Elastic{ON} 2017 Recap
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
 
MongoDB and Schema Design
MongoDB and Schema DesignMongoDB and Schema Design
MongoDB and Schema Design
 
The What and Why of NoSql
The What and Why of NoSqlThe What and Why of NoSql
The What and Why of NoSql
 
What's new in MongoDB 2.6
What's new in MongoDB 2.6What's new in MongoDB 2.6
What's new in MongoDB 2.6
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single click
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Schema Design - Real world use case

  • 1. #MongoDBDays Schema Design Real World Use Case Matias Cascallares Consulting Engineer, MongoDB matias.cascallares@mongodb.com
  • 2. Agenda • Why is schema design important • A real world use case – Social Inbox – History • Conclusions
  • 3. Why is Schema Design important? • Largest factor for a performant system • Schema design with MongoDB is different • • RDBMS – "What answers do I have?" MongoDB – "What question will I have?"
  • 8. Design Goals • Efficiently send new messages to recipients • Efficiently read inbox
  • 9. 3 Approaches (there are more) • Fan out on Read • Fan out on Write • Fan out on Write with Bucketing
  • 10. Fan out on read // Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: ”Matias", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: ”Matias" } ).sort( { sent: -1 } ) Schema Design, Matias Cascallares
  • 11. Fan out on read – IO Send Message Shard 1 Shard 2 Shard 3
  • 12. Fan out on read – IO Read Inbox Shard 1 Shard 2 Shard 3
  • 13. Considerations • Write: one document per message sent • Reading my inbox means finding all messages with my own name in the recipient field • Read: requires scatter-gather on sharded cluster • Then a lot of random IO on a shard to find everything
  • 14. Fan out on write // Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: ”Matias", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = recipient db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Matias" } ).sort( { sent: -1 } ) Schema Design, Matias Cascallares
  • 15. Fan out on write – IO Send Message Shard 1 Shard 2 Shard 3
  • 16. Fan out on write – IO Read Inbox Shard 1 Shard 2 Shard 3
  • 17. Considerations • Write: one document per recipient • Reading my inbox is just finding all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random IO on the shard
  • 18. Fan out on write with buckets // Shard on “owner / sequence” db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: ”Matias", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } Schema Design, Matias Cascallares
  • 19. Fan out on write with buckets // Send a message for( recipient in msg.to ) { count = db.users.findAndModify({ query: { user_name: recipient }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update( { owner: recipient, sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Matias" } ).sort ( { sequence: -1 } ).limit( 2 ) Schema Design, Matias Cascallares
  • 20. Fan out on write with buckets • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inboxes so there’s not too many messages per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  • 21. Fan out on write with buckets - IO Send Message Shard 1 Shard 2 Shard 3
  • 22. Fan out on write with buckets - IO Read Inbox Shard 1 Shard 2 Shard 3
  • 24.
  • 25. Design Goals Need to retain a limited amount of history e.g. – Number of items – Hours, Days, Weeks – May be legislative requirement (e.g. HIPPA, SOX, DPA) Need to query efficiently by – match – ranges
  • 26. 3 Approaches (there are more) • Bucket by number of messages • Fixed size array • Bucket by date + TTL Collections
  • 27. Bucket by number of messages db.inbox.find() { owner: "Matias", sequence: 25, messages: [ { from: "Matias", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ]} // Query with a date range db.inbox.find({ owner: "Matias", messages: { $elemMatch: {sent:{$gt: ISODate("…") }}}}) // Remove elements based on a date db.inbox.update({ owner: "Matias" }, { $pull: { messages: { sent: { $lt: ISODate("…") } } } } ) Schema Design, Matias Cascallares
  • 28. Considerations • Shrinking documents, space can be reclaimed with – db.runCommand ( { compact: '<collection>' } ) • Removing the document after the last element in the array as been removed – { "_id" : …, "messages" : [ ], "owner" : ”Bob", "sequence" : 0 }
  • 29. Maintain the latest – Fixed size array msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice modifiers for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } } ) Schema Design, Matias Cascallares
  • 30. Considerations • Need to compute the size of the array based on retention period
  • 31. TTL Collections // messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] } // Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } ) Schema Design, Matias Cascallares
  • 33. Summary • Multiple ways to model a domain problem • Understand the key uses cases of your app • Balance between ease of query vs. ease of write • Random IO should be avoided • Scatter/gatter should be avoided
  • 35. #MongoDBDays Thank You Matias Cascallares Consulting Engineer, MongoDB matias.cascallares@mongodb.com

Notas do Editor

  1. Define your schema when saving and creating indexesFunctional goalsPerformance goalsIn RDBMSImplement your domain model in the canonical way following normalization practices. Afterwards using relational databases mechanisms like joins and group by answer your queriesIn MongoDBYou first detect your queries, your typical access patterns and using these you implement your schema
  2. Let’s go to our first example
  3. Social media applicationsChronological feedsAll those platforms provide some level of messaging among their users
  4. The message that I write here needs to be sent to hundreds or thousands of usersHow do we structure this in MongoDB?
  5. This feed is unique per user, it’s 100% personalized
  6. The simplest approachThe first idea that is coming to your mindWe’ll use Mongo shell for our code samples‘To’ field is an array, MongoDB when filtering with array fields similar to SQL ‘in’ operatorIt’s a really easy to implement solution
  7. - No need to touch more than one shard, great for horizontal scalability!
  8. - Reading close to the worst case scenario, thanks god we have an index
  9. Write is fastRead is close to the worst caseFor a very read heavy application this is not a good approachIn order to retrieve all these documents when reading the inbox lots of IO
  10. It’s the opposite situation that we faced in the first scenario
  11. Efficient when reading messages but less efficient when writingWhen reading lot of random IO since we don’t have control where MongoDB stores each document, this is where the 3rd solution helps us
  12. This is not a common sense solutionIt’s not going to be your first solution, maybe yes if you have a lot of experience with MongoDB
  13. Let’s see in detail this findAndModify…Sequence is going to take the total count of messages, divide it by 50 and round it down, this is a pagination or bucketing algorithm where sequence is the number of pageWe push the message to the end of the array, each document contains 50 messages at maximumIt seems a lot of work for writing or sending a message
  14. Writing it’s the same amount of work, actually a bit more, than previous solution
  15. Reading it’s much better in this case because I only retrieve one or two documents to build my inbox and using an indexFor really high reading traffic applications this optimization is really important
  16. Tweet is an example of history applicationRead a time window of messages
  17. - Give me everything between 6 and 4 months ago.
  18. Similar example to our previous case using sequence as a paginationUpdate operation is atomic at document level
  19. Using pull command we shrink documents and produce fragmentationYou can fix that using compact in periodical basis, maybe with a cron job. Compaction it’s slow, it lockes, etc, good alternative to run it on secondariesRemember to delete the document once you got rid of all your messages
  20. With this approach instead of deleting messages we are going to keep the latest messages when we insert them
  21. - We need to know the size of the array,adaptative, based on user, or overestimate it
  22. Another approach would be to set sequence with a Datetime in the future and expireAfterSeconds equals to 0TTL collections are quite popular for this kind of expiration
  23. Schema design in no relational databases is not trivialThere is not a unique solution like in RDBMSThere is nothing mathematically tested like normalization formsWhich solution is best depends on your users and how do they use your application