The document discusses using MongoDB to store and analyze diverse datasets. It covers several lessons including abstracting data access, considering schema design, avoiding overuse of MongoDB's schemaless features, and monitoring database performance metrics.
2. Big and Fat
Using MongoDB with deep and diverse datasets:
A case study
Tuesday, February 8, 2011
3. About me
• My name is Jeremy McAnally
• “Software architect” at Intridea
• Write a lot of books, OSS, etc.
• http://github.com/jm
• http://twitter.com/jm
• http://authoringebooks.com
• http://wickhamhousebrand.com
Tuesday, February 8, 2011
14. Lesson 1
Abstraction is a double-
edged sword.
Tuesday, February 8, 2011
15. Abstract away!
Talking to all data (no matter
the source) the same way will
keep you sane.
Tuesday, February 8, 2011
16. users = MySQL::Query.execute("SELECT * FROM users;")
users.each do |u|
posts = db.collection('posts').find(:user_id => u['id'])
# [...]
comments = db.collection('comments').find("$where" => "sum
(this.admin_count, this.moderator_count) == 5")
end
Tuesday, February 8, 2011
17. users = User.all
users.each do |u|
posts = Post.find(:user_id => u.id)
# [...]
comments = Comment.where("sum(this.admin_count,
this.moderator_count) == 5")
end
Tuesday, February 8, 2011
18. users = User.all
users.each do |u|
posts = Post.find(:user_id => u.id)
# [...]
comments = Comment.with_five_things
end
Tuesday, February 8, 2011
19. ...but wait!
MongoDB has a lot of features
that will perform better and be
less (and often better) code.
Tuesday, February 8, 2011
20. pharmacists = {}
Patient.all.each do |patient|
patient.prescriptions.each do |prescription|
pharmacists[presciption.name] ||= 0
pharmacists[presciption.name] += 1
end
end
Tuesday, February 8, 2011
21. AS
W P
pharmacists = {}
O A
Patient.all.each do |patient|
L
patient.prescriptions.each do |prescription|
S R
pharmacists[presciption.name] ||= 0
pharmacists[presciption.name] += 1
end
C
end
Tuesday, February 8, 2011
22. map = "function(){
this.prescriptions.forEach(
function(p) {
emit(p.name, { count : 1 });
})}"
reduce = "function(k, v) {
var number = 0;
for v.forEach(function() {
number += v[i].count;
});
return { count : number };
}"
pharms = @patients.map_reduce(map, reduce)
Tuesday, February 8, 2011
23. map = "function(){
this.prescriptions.forEach(
function(p) {
emit(p.name, { count : 1 });
})}"
reduce = "function(k, v) {
var number = 0;
for v.forEach(function() {
number += v[i].count;
});
return { count : number };
}"
pharms = @patients.map_reduce(map, reduce)
Tuesday, February 8, 2011
24. Lesson 2
Schema design matters.
Tuesday, February 8, 2011
25. DAT Lesson 2
A design matters.
Schema
MOD
EL
Tuesday, February 8, 2011
26. Embedding
works.
Embedding documents is a
smart decision in a lot of cases.
Tuesday, February 8, 2011
27. SELECT * FROM patients WHERE id=212;
SELECT * FROM prescriptions WHERE patient_id=212;
SELECT * FROM appointments WHERE patient_id=212;
SELECT * FROM contacts WHERE patient_id=212;
SELECT * FROM claims WHERE patient_id=212;
.
.
.
Tuesday, February 8, 2011
43. Schemaless Joy
• Transforming data models is a delight
• Formless data isn’t awkward
• Arbitrary embedding is awesome
• Building to work with schemaless data can lead
to some really powerful app concepts
Tuesday, February 8, 2011
44. ...but be wary.
Going nuts will create
headaches for you.
Tuesday, February 8, 2011
47. Schemaless Pain
• Weird app behavior
• Huge, long-running data transformations
Tuesday, February 8, 2011
48. Schemaless Pain
• Weird app behavior
• Huge, long-running data transformations
• Annoying data transforms for development env’s
Tuesday, February 8, 2011
49. Schemaless Pain
• Weird app behavior
• Huge, long-running data transformations
• Annoying data transforms for development env’s
• Difficult to version data models
Tuesday, February 8, 2011