8. Why do we need replication
•Failover
•Backups
•Secondary batch jobs
•High availability
Thursday, 8 November 12
9. Replica Sets
Data Availability across nodes
• Data Protection
• Multiple copies of the data
• Spread across Data Centers, AZs
• High Availability
• Automated Failover
• Automated Recovery
Thursday, 8 November 12
17. Sharding
Data Distribution across nodes
• Data location transparent to your code
• Data distribution is automatic
• Data re-distribution is automatic
• Aggregate system resources horizontally
• No code changes
Thursday, 8 November 12
18. Sharding - Range distribution
sh.shardCollection("test.tweets", {_id: 1} , false)
shard01 shard02 shard03
Thursday, 8 November 12
19. Sharding - Range distribution
shard01 shard02 shard03
a-i j-r s-z
Thursday, 8 November 12
31. Two choices for consistency
•Eventual consistency
•Allow updates when a system has been partitioned
•Resolve conflicts later
•Example: CouchDB, Cassandra
•Immediate consistency
•Limit the application of updates to a single master
node for a given slice of data
•Another node can take over after a failure is detected
•Avoids the possibility of conflicts
•Example: MongoDB
Thursday, 8 November 12
32. Durability
•For how long is my data available?
•When do I now that my data is safe?
•Where?
•Mongodb style
•Fire and Forget
•Get Last Error
•Journal Sync
•Replica Safe
Thursday, 8 November 12
35. Data Model
• Why JSON?
• Provides a simple, well understood
encapsulation of data
• Maps simply to the object in your OO language
• Linking & Embedding to describe relationships
Thursday, 8 November 12
36. Json
place1 = {
name : "10gen HQ",
address : "578 Broadway 7th Floor",
city : "New York",
zip : "10011",
tags : [ "business", "tech" ]
}
Thursday, 8 November 12
37. Schema Design
Relational Database
Thursday, 8 November 12
38. Schema Design
MongoDB embedding
linking
Thursday, 8 November 12
39. Schemas in MongoDB
Design documents that simply map to
your application
post = {author: "Hergé",
date: new Date(),
text: "Destination Moon",
tags: ["comic", "adventure"]}
> db.posts.save(post)
Thursday, 8 November 12
41. JSON & Scaleout
• Embedding removes need for
• Distributed Joins
• Two Phase commit
• Enables data to be distributed across many nodes
without penalty
Thursday, 8 November 12