6. This talk
‣Intro
-Terms / Definitions
‣Getting a flavor
-Creating a Schema
-Indexes
-Evolving the Schema
7. This talk
‣Intro
-Terms / Definitions
‣Getting a flavor
-Creating a Schema
-Indexes
-Evolving the Schema
‣Data modeling
-DBRef
-Single Table Inheritance
-Many - Many
-Trees
-Lists / Queues / Stacks
8. Document Oriented
Basic unit of data: JSON Documents
Not Relational, Key Value
Not OODB
- Associations implied by Document Structure
- but your database schema != your program schema
9. Terms
Table -> Collection
Row(s) -> JSON Document
Index -> Index
Join -> Embedding and Linking
across documents
Partition -> Shard
Partition Key -> Shard Key
10. Considerations
What are the requirements ?
- Functionality to be supported
- Access Patterns ?
- Data Life Cycle (insert, update, deletes)
- Expected Performance / Workload ?
Capabilities of the database ?
11. DB Considerations
How can we manipulate this data ?
Dynamic Queries
Secondary Indexes
Atomic Updates
Map Reduce
Access Patterns ?
Read / Write Ratio
Types of updates
Types of queries
Considerations
No Joins
Single Document Transactions only
12. Design Session
Use Rich Design Documents
post = {author: “kyle”,
date: new Date(),
text: “my blog post...”,
tags: [“mongodb”, “intro”]}
>db.post.save(post)
13. >db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "My first blog",
tags : [ "mongodb", "intro" ] }
Notes:
- ID is unique, but can be anything you’d like
14. Secondary index for “author”
// 1 means ascending, -1 means descending
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'kyle'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
... }
15. Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
16. Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
// Index on author
{ _id : ObjectId("4c4ba6c5672c685e5e8aabf4"),
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }
22. // create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”Fred”})
23. // create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”kyle”})
// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)
24. // create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”kyle”})
// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)
// most commented post:
>db.posts.find().sort({comments_count:-1}).limit(1)
When sorting, check if you need an index
25. Map Reduce
Aggregation and batch manipulation
Collection in, Collection out
Parallel in sharded environments
26. Map reduce
mapFunc = function () {
this.tags.forEach(function (z) {emit(z, {count:1});});
}
reduceFunc = function (k, v) {
var total = 0;
for (var i = 0; i < v.length; i++) { total += v[i].count; }
return {count:total}; }
res = db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find()
{ _id : "intro", value : { count : 1 } }
{ _id : "mongodb", value : { count : 1 } }
27. Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
29. Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Observations:
- Using Rich Documents works well
- Simplify relations by embedding them
- Iterative development is easy with MongoDB
32. One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
33. One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
34. One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
35. Many - Many
Example:
- Product can be in many categories
- Category can have many products
Products id | product_id | category_id Category
38. products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product
>db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
39. products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All categories for a given product
>db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})
//All products for a given category
>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
40. Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
41. Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
// All products for a given category
>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
42. Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
// All products for a given category
>db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})
// All categories for a given product
product = db.products.find(_id : some_id)
>db.categories.find({_id : {$in : product.category_ids}})
43. Trees
Full Tree in Document
{ comments: [
{ author: “rpb”, text: “...”,
replies: [
{author: “Fred”, text: “...”,
replies: []}
]}
]}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 4MB limit
44. Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
45. Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
46. Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})
47. Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})
//find all ancestors of f:
>ancestors = db.tree2.findOne({_id:’f’}).ancestors
>db.tree2.find({_id: { $in : ancestors})
48. findAndModify
Queue example
//Example: grab highest priority job and mark
job = db.jobs.findAndModify({
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})