In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
2. Single Table En
Agenda
• Why is schema design important
• 4 Real World Schemas
– Inbox
– History
– IndexedAttributes
– Multiple Identities
• Conclusions
3. Why is Schema Design
important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RBMS – "What answers do I have?"
• MongoDB – "What question will I have?"
9. 3 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
10. // Shard on "from"
db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox reads
db.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
db.inbox.save( msg )
// Read my inbox
db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on read
11. Fan out on read – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
12. Fan out on read – Inbox Read
Shard 1 Shard 2 Shard 3
Read
Inbox
13. Considerations
• 1 document per message sent
• Multiple recipients in an array key
• Reading an inbox is finding all messages with
my own name in the recipient field
• Requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find
everything
14. // Shard on “recipient” and “sent”
db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {
from: "Joe”,
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
// Send a message
for ( recipient in msg.to ) {
msg.recipient = recipient
db.inbox.save( msg );
}
// Read my inbox
db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
Fan out on write
15. Fan out on write – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
16. Fan out on write– Read Inbox
Shard 1 Shard 2 Shard 3
Read
Inbox
17. Considerations
• 1 document per recipient
• Reading my inbox is just finding all of the
messages with me as the recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random IO on the shard
18. Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inbox documents so there’s not too many
per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
24. Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX, DPA)
• Need to query efficiently by
– match
– ranges
25. 3 Approaches (there are
more)
• Bucket by Number of messages
• Fixed size Array
• Bucket by Date + TTL Collections
26. db.inbox.find()
{ owner: "Joe", sequence: 25,
messages: [
{ from: "Joe",
to: [ "Bob", "Jane" ],
sent: ISODate("2013-03-01T09:59:42.689Z"),
message: "Hi!"
},
…
] }
// Query with a date range
db.inbox.find ( { owner: "friend1",
messages: {
$elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}})
// Remove elements based on a date
db.inbox.update( { owner: "friend1" },
{ $pull:
{ messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )
Inbox – Bucket by #
messages
27. Considerations
• Shrinking documents, space can be reclaimed
with
– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in
the array as been removed
– { "_id" : …, "messages" : [ ], "owner" :
"friend1", "sequence" : 0 }
28. msg = {
from: "Your Boss",
to: [ "Bob" ],
sent: new Date(),
message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice for $push
db.messages.update(
{ _id: 1 },
{ $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50 }
}
}
)
Maintain the latest – Fixed
Size Array
30. // messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{ expireAfterSeconds: 31536000 } )
TTL Collections
32. Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Queries needed
– Equality
– Range based
• Need to be efficient, regardless of the number of
attributes
33. 2 Approaches (there are
more)
• Attributes
• Attributes as Objects in an Array
34. db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("2013-03-01T09:59:42.689Z" } } )
db.files.insert( { _id:"local.1",
attr: { type: "text", size: 128} } )
db.files.insert( { _id:"mongod",
attr: { type: "binary", size: 256,
created: ISODate("2013-04-01T18:13:42.689Z") } } )
// Need to create an index for each item in the sub-document
db.files.ensureIndex( { "attr.type": 1 } )
db.files.find( { "attr.type": "text"} )
// Can perform range queries
db.files.ensureIndex( { "attr.size": 1 } )
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Attributes as a Sub-
Document
44. Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shards
• Secondary keys cannot have a unique index
45. // Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each users document
db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )
db.identities.save( { identifier : { email: "joe@example.com" }, user: "1200-42" } )
db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _id
db.shardCollection( "mongodbdays.identities", { identifier : 1 } )
// Create unique index
db.users.ensureIndex( { _id: 1} , { unique: true} )
// Create a docuemnt that holds all the other user attributes
db.users.save( { _id: "1200-42", ... } )
// Shard collection by _id
db.shardCollection( "mongodbdays.users", { _id: 1 } )
Document per Identity
49. Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Random IO should be avoided