SlideShare uma empresa Scribd logo
1 de 48
Technical Account Manager Lead, MongoDB Inc
@antoinegirbal
Antoine Girbal
JavaOne 2013
Building a scalable inbox
system with MongoDB and
Java
Single Table En
Agenda
• Problem Overview
• Schema and queries
• Java Development
• Design Options
– Fan out on Read
– Fan out on Write
– Bucketed Fan out on Write
– Cached Inbox
• Discussion
Problem Overview
Let‟s get
Social
Sending Messages
?
Reading my Inbox
?
Schema and Queries
Basic CRUD
• Save your first document:
> db.test.insert({firstName: "Antoine", lastName: "Girbal" } )
• Find the document:
> db.test.find({firstName: "Antoine" } )
{ _id: ObjectId("524495105889411fab0cdfa3"),firstName: "Antoine", lastName: "Girbal"
}
• Update the document:
> db.test.update({_id: ObjectId("524495105889411fab0cdfa3")}, { x: 1, y: 2 } )
• Remove the document:
> db.test.remove({_id: ObjectId("524495105889411fab0cdfa3")})
• No schema definition or other declaration, it's easy!
The User Document
{ "_id": ObjectId("519c12d53004030e5a6316d2"),
"address": {
"streetAddress": "2600 Rafe Lane",
"city": "Jackson",
"state": "MS",
"zip": 39201,
"country": "US" },
"birthday": "IDODate("1980-12-26T00:00:00.000Z"),
"company": "Parade of Shoes",
"domain": "SanFranciscoAgency.com",
"email": "AnthonyJDacosta@pookmail.com",
"firstName": "Anthony",
"gender": "male",
"lastName": "Dacosta",
"location": [ -90.183518, 32.368619 ],
…
}
The User Collection
The collection statistics:
> db.users.stats()
{
"ns": "edges.users",
"count": 1000000, // number of documents
"size": 637864480, // size of all documents
"avgObjSize": 637.86448,
"storageSize": 845197312,
"numExtents": 16,
"nindexes": 2,
"lastExtentSize": 227786752,
"paddingFactor": 1.0000000000260925, // padding after documents
"systemFlags": 1,
"userFlags": 0,
"totalIndexSize": 66070256,
"indexSizes": { "_id_": 29212848, "uid_1": 36857408 },
"ok": 1
}
Queries on Users
Finding a user by email address…
> db.users.find({ "email": "AnthonyJDacosta@pookmail.com" }).pretty()
{ "_id": ObjectId("519c12d53004030e5a6316d2"),
…
By default will use a slow table scan…
> db.users.find({ "email": "AnthonyJDacosta@pookmail.com" } ).explain()
{ "cursor": "BasicCursor",
"nscannedObjects": 1000000, // 1m objects scanned
"nscanned": 1000000,
…
Use an index for fast performance…
> db.users.ensureIndex({ "email": 1 } ) // does not do anything if index is there
> db.users.find({ "email": "AnthonyJDacosta@pookmail.com" }).explain()
{ "cursor": "BtreeCursor email_1", // Btree, sweet!
"nscannedObjects": 1, // document is found almost right away
"nscanned": 1,
…
Users Relationships
• Here the follower / followee relationships are of
"many-to-many" type. It can be either stored as:
1. a list of followers in user
2. a list of followees in user
3. a relationship collection: "followees"
4. two relationship collections: "followees" and "followers".
• Ideal solutions:
– a few million users and a 1000 followee limit: Solution #2
– no boundaries and relative scaling: Solution #3
– no boundaries and max scaling: Solution #4
Relationship Data
Let's look at a sample document:
> use edges
switched to db edges
> db.followees.findOne()
{ "_id": ObjectId(),
"user": "17052001”,
"followee": "31554261”
}
And the statistics:
> db.followees.stats()
{
"ns": "edges.followees",
"count": 1000000,
"size": 64000048,
"avgObjSize": 64.000048,
"storageSize": 86310912,
"numExtents": 10,
"nindexes": 2,
"lastExtentSize": 27869184,
"paddingFactor": 1,
"systemFlags": 1,
"userFlags": 0,
"totalIndexSize": 85561840,
"indexSizes": {
"_id_": 32458720,
"user_1_followee_1": 53103120 },
"ok": 1
}
Relationship Queries
To find all the users that a user follows:
> db.followees.ensureIndex({ user: 1, followee: 1 }) // why not just index on user? We shall see
> db.followees.find({user: "11622712"})
{ "_id" : ObjectId("51641c02e4b0ef6827a34569"), "user" : "11622712", "followee" : "30432718" }
…
> db.followees.find({user: "11622712"}).explain()
{
"cursor" : "BtreeCursor user_1_followee_1",
"n" : 66,
"indexOnly" : false,
"millis" : 0, // this is fast
Even faster if using a “covered” index:
> db.followees.find({user: "11622712"}, {followee: 1, _id: 0}).explain()
{
"cursor" : "BtreeCursor user_1_followee_1",
"n" : 66,
"nscannedObjects" : 0,
"nscanned" : 66,
"indexOnly" : true, // this means covered
To find all the followers of a user, we just need the opposite index::
> db.followees.ensureIndex({followee: 1, user: 1})
> db.followees.find({followee: "30313973"}, {user: 1, _id: 0})
Message Document
The message document:
> db.messages.findOne()
{
"_id": "ObjectId("519d4858e4b079162fe7eb12"),
"uid": "48268973", // the author id
"username": "Abiall", // why store the username?
"text": "Lorem ipsum dolor sit amet, consectetur ...",
"created": ISODate(2013-05-22T22:36:08.663Z"),
"location": [ -95.470188, 37.366044 ],
"tags": [ "gadgets" ]
}
Collection statistics:
> db.messages.stats()
{
"ns": "msg.messages",
"count": 21440518,
"size": 14184598000,
"avgObjSize": 661.5790719235422,
"storageSize": 15749418944,
"numExtents": 27,
"nindexes": 2,
"lastExtentSize": 2146426864,
"paddingFactor": 1,
"systemFlags": 1,
"userFlags": 0,
"totalIndexSize": 1454289648,
"indexSizes": {
"_id_": 695646784,
"uid_1_created_1": 758642864 },
"ok": 1
}
Implementing the Outbox
The query is on "uid" and needs to be sorted by descending "created" time:
> db.messages.ensureIndex({ "uid": 1, "created": 1 } ) // use a compound index
> db.messages.find({ "uid": "31837072" } ).sort({ "created": -1 } ).limit(100)
{ "_id": ObjectId("519d626ae4b07916312e15b1") }, "uid": "31837072", "username": "Roya
"text": "Lorem ipsum dolor sit amet, consectetur adipisicing elit , sed do eiusmod tempor …",
"created": ISODate("2013-05-23T00:27:22.369Z"),
"location": [ "-118.296138", "33.772832" ],
"tags": [ "Art" ] }
…
> db.messages.find({ "uid": "31837072" }).sort({ "created": -1 }).limit(100).explain()
{
"cursor": "BtreeCursor uid_1_created_1 reverse",
"n": 18,
"nscannedObjects": 18,
"nscanned": 18,
"scanAndOrder": false,
"millis": 0
…
Java Development
Java support
• Java driver is open source, available on github
and Maven.
• mongo.jar is the driver, bson.jar is a subset with
BSON library only.
• Java driver is probably the most used MongoDB
driver
• It receives active development by MongoDB Inc
and the community
Driver Features
• CRUD
• Support for replica sets
• Connection pooling
• Distributed reads to slave servers
• BSON serializer/deserializer (lazy option)
• JSON serializer/deserializer
• GridFS
Message Store
public class MessageStoreDAO implements MessageStore {
private Morphia morphia;
private Datastore ds;
public MessageStoreDAO( MongoClient mongo ) {
this.morphia = new Morphia();
this.morphia.map(DBMessage.class);
this.ds = morphia.createDatastore(mongo, "messages");
this.ds.getCollection(DBMessage.class).
ensureIndex(new BasicDBObject("sender",1).append("sentAt",1) );
}
// get a message
public Message get(String user_id, String msg_id) {
return (Message) this.ds.find(DBMessage.class)
.filter("sender", user_id)
.filter("_id", new ObjectId(msg_id))
.get();
}
Message Store
// save a message
public Message save(String user_id, String message, Date date) {
Message msg = new DBMessage( user_id, message, date );
ds.save( msg );
return msg;
}
// find message by author sorted by descending time
public List<Message> sentBy(String user_id) {
return (List) this.ds.find(DBMessage.class)
.filter("sender",user_id).order("-sentAt").limit(50).asList();
}
// find message by several authors sorted by descending time
public List<Message> sentBy(List<String> user_ids) {
return (List) this.ds.find(DBMessage.class)
.field("sender").in(user_ids).order("-sentAt").limit(50).asList();
}
Graph Store
Below uses Solution #4: both a follower and followee list
public class GraphStoreDAO implements GraphStore {
private DBCollection friends;
private DBCollection followers;
public GraphStoreDAO(MongoClient mongo) {
this.followers = mongo.getDB("edges").getCollection("followers");
this.friends = mongo.getDB("edges").getCollection("friends");
followers.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique", true));
friends.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique",true));
}
// find users that are followed
public List<String> friendsOf(String user_id) {
List<String> theFriends = new ArrayList<String>();
DBCursor cursor = friends.find( new BasicDBObject("u",user_id), new
BasicDBObject("_id",0).append("o",1));
while(cursor.hasNext())
theFriends.add( (String) cursor.next().get("o"));
return theFriends;
}
Graph Store
// find followers of a user
public List<String> followersOf(String user_id) {
List<String> theFollowers = new ArrayList<String>();
DBCursor cursor = followers.find( new BasicDBObject("u",user_id),
new BasicDBObject("_id",0).append("o",1));
while(cursor.hasNext())
theFollowers.add( (String) cursor.next().get("o"));
return theFollowers;
}
public void follow(String user_id, String toFollow) {
friends.save( new BasicDBObject("u",user_id).append("o",toFollow));
followers.save( new BasicDBObject("u",toFollow).append("o",user_id));
}
public void unfollow(String user_id, String toUnFollow) {
friends.remove(new BasicDBObject("u", user_id).append("o", toUnFollow));
followers.remove(new BasicDBObject("u", toUnFollow).append("o", user_id));
}
Design Options
4 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Bucketed Fan out on Write
• Inbox Caches
Fan out on read
• Generally, not the right approach
• 1 document per message sent
• Reading an inbox is finding all messages sent by
the list of people users follow
• Requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find
everything
Fan out on Read
Put the followees ids in a list:
> var fees = []
> db.followees.find({user: "11622712"})
.forEach( function(doc) { fees.push( doc.followee ) } )
Use $in and sort() and limit() to gather the inbox:
> db.messages.find({ uid: { $in: fees } }).sort({ created: -1 }).limit(100)
{ "_id": ObjectId("519d627ce4b07916312f0a09"), "uid": "34660390", "username": "Dingdowas"
{ "_id": ObjectId("519d627ce4b07916312f0a10"), "uid": "34661390", "username": "John" } …
{ "_id": ObjectId("519d627ce4b07916312f0a11"), "uid": "34662390", "username": "Brenda" } …
…
Fan out on read – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
Fan out on read – Inbox Read
Shard 1 Shard 2 Shard 3
Read
Inbox
Fan out on read
> db.messages.find({ uid: { $in: fees } } ).sort({ created: -1 } ).limit(100).explain()
{
"cursor": "BtreeCursor uid_1_created_1 multi",
"isMultiKey": false,
"n": 100,
"nscannedObjects": 1319,
"nscanned": 1384,
"nscannedObjectsAllPlans": 1425,
"nscannedAllPlans": 1490,
"scanAndOrder": true, // it is sorting in RAM??
"indexOnly": false,
"nYields": 0,
"nChunkSkips": 0,
"millis": 31 // takes about 30ms
}
Fan out on read - sort
Fan out on write
• Tends to scale better than fan out on read
• 1 document per recipient
• Reading my inbox is just finding all of the
messages with me as the recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random IO on the shard
Fan out on Write
// Shard on “recipient” and “sent”
db.shardCollection(”myapp.inbox”, { ”recipient”: 1, ”sent”: 1 } )
msg = { from: "Joe”, sent: new Date(), message: ”Hi!” }
// Send a message, write one message per follower
for( follower in followersOf( msg.from) ) {
msg.recipient = recipient
db.inbox.save(msg);
}
// Read my inbox, super easy
db.inbox.find({ recipient: ”Joe” }).sort({ sent: -1 })
Fan out on write – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
Fan out on write– Read Inbox
Shard 1 Shard 2 Shard 3
Read
Inbox
Bucketed Fan out on write
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inbox documents so there‟s not too many
per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
Bucketed Fan out on Write
// Shard on “owner / sequence”
db.shardCollection(”myapp.buckets”, { ”owner”: 1, ”sequence”: 1 } )
db.shardCollection(”myapp.users”, { ”user_name”: 1 } )
msg = { from: "Joe”, sent: new Date(), message: ”Hi!” }
// Send a message, have to find the right sequence document
for( follower in followersOf( msg.from) ) {
sequence = db.users.findAndModify({
query: { user_name: recipient},
update: { '$inc': { ‟msg_count': 1 }},
upsert: true,
new: true }).msg_count / 50;
db.buckets.update({ owner: recipient, sequence: sequence},
{ $push: { „messages‟: msg } },
{ upsert: true });
}
// Read my inbox
db.buckets.find({ owner: ”Joe” }).sort({ sequence: -1 }).limit(2)
Bucketed fan out on write -
Send
Shard 1 Shard 2 Shard 3
Send
Message
Bucketed fan out on write -
Read
Shard 1 Shard 2 Shard 3
Read
Inbox
Cached inbox
• Recent messages are fast, but older messages
are slower
• Store a cache of last N messages per user
• Used capped array to age out older messages
• Create cache lazily when user accesses inbox
• Only write the message if cache exists.
• Use TTL collection to time out caches for inactive
users
Cached Inbox
// Shard on “owner"
db.shardCollection(”myapp.caches”, { ”owner”: 1 } )
// Send a message, add it to the existing caches of followers
for( follower in followersOf( msg.from) ) {
db.caches.update({ owner: recipient }, { $push: { messages: {
$each: [ msg ],
$sort: { „sent‟: 1 },
$slice: -50 } } } );
// Read my inbox
If( msgs = db.caches.find({ owner: ”Joe” }) ) {
// cache document exists
return msgs;
} else {
// fall back to "fan out on read" and cache it
db.caches.save({owner:‟joe‟, messages:[]});
msgs = db.outbox.find({sender: { $in: [ followersOf( msg.from ) ] }}).sort({sent:-1}).limit(50);
db.caches.update({user:‟joe‟}, {$push: msgs });
}
Cached Inbox – Send
Shard 1 Shard 2 Shard 3
Send
Message
Cached Inbox- Read
Shard 1 Shard 2 Shard 3
Read
Inbox
1
2
Cache Hit
Cache Miss
Discussion
Tradeoffs
Fan out on
Read
Fan out on
Write
Bucketed Fan
out on Write
Inbox Cache
Send
Message
Performance
Best
Single shard
Single write
Good
Shard per
recipient
Multiple writes
Worst
Shard per recipient
Appends (grows)
Mixed
Depends on how
many users are in
cache
Read Inbox
Performance
Worst
Broadcast all
shards
Random reads
Good
Single shard
Random reads
Best
Single shard
Single read
Mixed
Recent
messages fast
Older messages
are slow
Data Size Best
Message stored
once
Worst
Copy per
recipient
Worst
Copy per recipient
Good
Same as FoR +
size of cache
Things to consider
• Lots of recipients
• Fan out on write might become prohibitive
• Consider introducing a “Group”
• Make fan out asynchronous
• Very large message size
• Multiple copies of messages can be a burden
• Consider single copy of message with a “pointer” per inbox
• More writes than reads
• Fan out on read might be okay
Summary
• Multiple ways to model status updates
• Think about characteristics of your network
– Number of users
– Number of edges
– Publish frequency
– Access patterns
• Try to minimize random IO
Technical Account Manager Lead, MongoDB Inc
Antoine Girbal
JavaOne 2013
Thank You

Mais conteúdo relacionado

Mais procurados

Reverse proxies & Inconsistency
Reverse proxies & InconsistencyReverse proxies & Inconsistency
Reverse proxies & InconsistencyGreenD0g
 
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015CODE BLUE
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Oracle APEX Uso de Servicios REST JSON (Free Tier)
Oracle APEX Uso de Servicios REST JSON (Free Tier)Oracle APEX Uso de Servicios REST JSON (Free Tier)
Oracle APEX Uso de Servicios REST JSON (Free Tier)A+ Steel, SRL
 
[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka
[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka
[2019] 바르게, 빠르게! Reactive를 품은 Spring KafkaNHN FORWARD
 
Ajax ppt - 32 slides
Ajax ppt - 32 slidesAjax ppt - 32 slides
Ajax ppt - 32 slidesSmithss25
 
Incognito 2016 - IoT 펌웨어 추출과 분석
Incognito 2016 - IoT 펌웨어 추출과 분석Incognito 2016 - IoT 펌웨어 추출과 분석
Incognito 2016 - IoT 펌웨어 추출과 분석Benjamin Oh
 
Pentesting Modern Web Apps: A Primer
Pentesting Modern Web Apps: A PrimerPentesting Modern Web Apps: A Primer
Pentesting Modern Web Apps: A PrimerBrian Hysell
 
こわくないよ❤️ Playframeworkソースコードリーディング入門
こわくないよ❤️ Playframeworkソースコードリーディング入門こわくないよ❤️ Playframeworkソースコードリーディング入門
こわくないよ❤️ Playframeworkソースコードリーディング入門tanacasino
 
Original slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talkOriginal slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talkAarti Parikh
 
Node.js Event Emitter
Node.js Event EmitterNode.js Event Emitter
Node.js Event EmitterEyal Vardi
 
Understanding JWT Exploitation
Understanding JWT ExploitationUnderstanding JWT Exploitation
Understanding JWT ExploitationAkshaeyBhosale
 
A Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility CloakA Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility CloakSoroush Dalili
 
XML External Entity (XXE)
XML External Entity (XXE)XML External Entity (XXE)
XML External Entity (XXE)Jay Thakker
 
A Deep Dive into JSON-LD and Hydra
A Deep Dive into JSON-LD and HydraA Deep Dive into JSON-LD and Hydra
A Deep Dive into JSON-LD and HydraMarkus Lanthaler
 
PerlでWeb API入門
PerlでWeb API入門PerlでWeb API入門
PerlでWeb API入門Yusuke Wada
 
Atomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas HunterAtomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas HunterRedis Labs
 

Mais procurados (20)

Reverse proxies & Inconsistency
Reverse proxies & InconsistencyReverse proxies & Inconsistency
Reverse proxies & Inconsistency
 
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
XSS Attacks Exploiting XSS Filter by Masato Kinugawa - CODE BLUE 2015
 
Fetch API Talk
Fetch API TalkFetch API Talk
Fetch API Talk
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Oracle APEX Uso de Servicios REST JSON (Free Tier)
Oracle APEX Uso de Servicios REST JSON (Free Tier)Oracle APEX Uso de Servicios REST JSON (Free Tier)
Oracle APEX Uso de Servicios REST JSON (Free Tier)
 
[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka
[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka
[2019] 바르게, 빠르게! Reactive를 품은 Spring Kafka
 
Ajax ppt - 32 slides
Ajax ppt - 32 slidesAjax ppt - 32 slides
Ajax ppt - 32 slides
 
Incognito 2016 - IoT 펌웨어 추출과 분석
Incognito 2016 - IoT 펌웨어 추출과 분석Incognito 2016 - IoT 펌웨어 추출과 분석
Incognito 2016 - IoT 펌웨어 추출과 분석
 
Pentesting Modern Web Apps: A Primer
Pentesting Modern Web Apps: A PrimerPentesting Modern Web Apps: A Primer
Pentesting Modern Web Apps: A Primer
 
こわくないよ❤️ Playframeworkソースコードリーディング入門
こわくないよ❤️ Playframeworkソースコードリーディング入門こわくないよ❤️ Playframeworkソースコードリーディング入門
こわくないよ❤️ Playframeworkソースコードリーディング入門
 
Express JS
Express JSExpress JS
Express JS
 
Original slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talkOriginal slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talk
 
Node.js Event Emitter
Node.js Event EmitterNode.js Event Emitter
Node.js Event Emitter
 
Understanding JWT Exploitation
Understanding JWT ExploitationUnderstanding JWT Exploitation
Understanding JWT Exploitation
 
A Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility CloakA Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility Cloak
 
XML External Entity (XXE)
XML External Entity (XXE)XML External Entity (XXE)
XML External Entity (XXE)
 
A Deep Dive into JSON-LD and Hydra
A Deep Dive into JSON-LD and HydraA Deep Dive into JSON-LD and Hydra
A Deep Dive into JSON-LD and Hydra
 
PerlでWeb API入門
PerlでWeb API入門PerlでWeb API入門
PerlでWeb API入門
 
Atomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas HunterAtomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas Hunter
 
Kamailio World 2014 - Kamailio - The Platform for Interoperable WebRTC
Kamailio World 2014 - Kamailio - The Platform for Interoperable WebRTCKamailio World 2014 - Kamailio - The Platform for Interoperable WebRTC
Kamailio World 2014 - Kamailio - The Platform for Interoperable WebRTC
 

Destaque

MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 
Pontos por função
Pontos por funçãoPontos por função
Pontos por funçãolipe_assis
 
Facebook Ads: strategy and performance
Facebook Ads: strategy and performanceFacebook Ads: strategy and performance
Facebook Ads: strategy and performanceFederico Oliveri
 
Análise de Pontos de Função
Análise de Pontos de FunçãoAnálise de Pontos de Função
Análise de Pontos de FunçãoCristhiano Garcia
 
Medida de Esforço de Software com Análise de Ponto de Função
Medida de Esforço de Software com Análise de Ponto de FunçãoMedida de Esforço de Software com Análise de Ponto de Função
Medida de Esforço de Software com Análise de Ponto de FunçãoÁlvaro Farias Pinheiro
 
Estimativa de software usando pontos de função
Estimativa de software usando pontos de funçãoEstimativa de software usando pontos de função
Estimativa de software usando pontos de funçãoClaudio Martins
 

Destaque (6)

MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Pontos por função
Pontos por funçãoPontos por função
Pontos por função
 
Facebook Ads: strategy and performance
Facebook Ads: strategy and performanceFacebook Ads: strategy and performance
Facebook Ads: strategy and performance
 
Análise de Pontos de Função
Análise de Pontos de FunçãoAnálise de Pontos de Função
Análise de Pontos de Função
 
Medida de Esforço de Software com Análise de Ponto de Função
Medida de Esforço de Software com Análise de Ponto de FunçãoMedida de Esforço de Software com Análise de Ponto de Função
Medida de Esforço de Software com Análise de Ponto de Função
 
Estimativa de software usando pontos de função
Estimativa de software usando pontos de funçãoEstimativa de software usando pontos de função
Estimativa de software usando pontos de função
 

Semelhante a Building a Scalable Inbox System with MongoDB and Java

Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status FeedMongoDB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedMongoDB
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationMongoDB
 
Back to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationBack to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationJoe Drumgoole
 
Montreal Elasticsearch Meetup
Montreal Elasticsearch MeetupMontreal Elasticsearch Meetup
Montreal Elasticsearch MeetupLoïc Bertron
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBMongoDB
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling rogerbodamer
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 

Semelhante a Building a Scalable Inbox System with MongoDB and Java (20)

Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
 
MongoDB With Style
MongoDB With StyleMongoDB With Style
MongoDB With Style
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 
Back to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationBack to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB Application
 
Montreal Elasticsearch Meetup
Montreal Elasticsearch MeetupMontreal Elasticsearch Meetup
Montreal Elasticsearch Meetup
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 

Último

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Building a Scalable Inbox System with MongoDB and Java

  • 1. Technical Account Manager Lead, MongoDB Inc @antoinegirbal Antoine Girbal JavaOne 2013 Building a scalable inbox system with MongoDB and Java
  • 2. Single Table En Agenda • Problem Overview • Schema and queries • Java Development • Design Options – Fan out on Read – Fan out on Write – Bucketed Fan out on Write – Cached Inbox • Discussion
  • 8. Basic CRUD • Save your first document: > db.test.insert({firstName: "Antoine", lastName: "Girbal" } ) • Find the document: > db.test.find({firstName: "Antoine" } ) { _id: ObjectId("524495105889411fab0cdfa3"),firstName: "Antoine", lastName: "Girbal" } • Update the document: > db.test.update({_id: ObjectId("524495105889411fab0cdfa3")}, { x: 1, y: 2 } ) • Remove the document: > db.test.remove({_id: ObjectId("524495105889411fab0cdfa3")}) • No schema definition or other declaration, it's easy!
  • 9. The User Document { "_id": ObjectId("519c12d53004030e5a6316d2"), "address": { "streetAddress": "2600 Rafe Lane", "city": "Jackson", "state": "MS", "zip": 39201, "country": "US" }, "birthday": "IDODate("1980-12-26T00:00:00.000Z"), "company": "Parade of Shoes", "domain": "SanFranciscoAgency.com", "email": "AnthonyJDacosta@pookmail.com", "firstName": "Anthony", "gender": "male", "lastName": "Dacosta", "location": [ -90.183518, 32.368619 ], … }
  • 10. The User Collection The collection statistics: > db.users.stats() { "ns": "edges.users", "count": 1000000, // number of documents "size": 637864480, // size of all documents "avgObjSize": 637.86448, "storageSize": 845197312, "numExtents": 16, "nindexes": 2, "lastExtentSize": 227786752, "paddingFactor": 1.0000000000260925, // padding after documents "systemFlags": 1, "userFlags": 0, "totalIndexSize": 66070256, "indexSizes": { "_id_": 29212848, "uid_1": 36857408 }, "ok": 1 }
  • 11. Queries on Users Finding a user by email address… > db.users.find({ "email": "AnthonyJDacosta@pookmail.com" }).pretty() { "_id": ObjectId("519c12d53004030e5a6316d2"), … By default will use a slow table scan… > db.users.find({ "email": "AnthonyJDacosta@pookmail.com" } ).explain() { "cursor": "BasicCursor", "nscannedObjects": 1000000, // 1m objects scanned "nscanned": 1000000, … Use an index for fast performance… > db.users.ensureIndex({ "email": 1 } ) // does not do anything if index is there > db.users.find({ "email": "AnthonyJDacosta@pookmail.com" }).explain() { "cursor": "BtreeCursor email_1", // Btree, sweet! "nscannedObjects": 1, // document is found almost right away "nscanned": 1, …
  • 12. Users Relationships • Here the follower / followee relationships are of "many-to-many" type. It can be either stored as: 1. a list of followers in user 2. a list of followees in user 3. a relationship collection: "followees" 4. two relationship collections: "followees" and "followers". • Ideal solutions: – a few million users and a 1000 followee limit: Solution #2 – no boundaries and relative scaling: Solution #3 – no boundaries and max scaling: Solution #4
  • 13. Relationship Data Let's look at a sample document: > use edges switched to db edges > db.followees.findOne() { "_id": ObjectId(), "user": "17052001”, "followee": "31554261” } And the statistics: > db.followees.stats() { "ns": "edges.followees", "count": 1000000, "size": 64000048, "avgObjSize": 64.000048, "storageSize": 86310912, "numExtents": 10, "nindexes": 2, "lastExtentSize": 27869184, "paddingFactor": 1, "systemFlags": 1, "userFlags": 0, "totalIndexSize": 85561840, "indexSizes": { "_id_": 32458720, "user_1_followee_1": 53103120 }, "ok": 1 }
  • 14. Relationship Queries To find all the users that a user follows: > db.followees.ensureIndex({ user: 1, followee: 1 }) // why not just index on user? We shall see > db.followees.find({user: "11622712"}) { "_id" : ObjectId("51641c02e4b0ef6827a34569"), "user" : "11622712", "followee" : "30432718" } … > db.followees.find({user: "11622712"}).explain() { "cursor" : "BtreeCursor user_1_followee_1", "n" : 66, "indexOnly" : false, "millis" : 0, // this is fast Even faster if using a “covered” index: > db.followees.find({user: "11622712"}, {followee: 1, _id: 0}).explain() { "cursor" : "BtreeCursor user_1_followee_1", "n" : 66, "nscannedObjects" : 0, "nscanned" : 66, "indexOnly" : true, // this means covered To find all the followers of a user, we just need the opposite index:: > db.followees.ensureIndex({followee: 1, user: 1}) > db.followees.find({followee: "30313973"}, {user: 1, _id: 0})
  • 15. Message Document The message document: > db.messages.findOne() { "_id": "ObjectId("519d4858e4b079162fe7eb12"), "uid": "48268973", // the author id "username": "Abiall", // why store the username? "text": "Lorem ipsum dolor sit amet, consectetur ...", "created": ISODate(2013-05-22T22:36:08.663Z"), "location": [ -95.470188, 37.366044 ], "tags": [ "gadgets" ] } Collection statistics: > db.messages.stats() { "ns": "msg.messages", "count": 21440518, "size": 14184598000, "avgObjSize": 661.5790719235422, "storageSize": 15749418944, "numExtents": 27, "nindexes": 2, "lastExtentSize": 2146426864, "paddingFactor": 1, "systemFlags": 1, "userFlags": 0, "totalIndexSize": 1454289648, "indexSizes": { "_id_": 695646784, "uid_1_created_1": 758642864 }, "ok": 1 }
  • 16. Implementing the Outbox The query is on "uid" and needs to be sorted by descending "created" time: > db.messages.ensureIndex({ "uid": 1, "created": 1 } ) // use a compound index > db.messages.find({ "uid": "31837072" } ).sort({ "created": -1 } ).limit(100) { "_id": ObjectId("519d626ae4b07916312e15b1") }, "uid": "31837072", "username": "Roya "text": "Lorem ipsum dolor sit amet, consectetur adipisicing elit , sed do eiusmod tempor …", "created": ISODate("2013-05-23T00:27:22.369Z"), "location": [ "-118.296138", "33.772832" ], "tags": [ "Art" ] } … > db.messages.find({ "uid": "31837072" }).sort({ "created": -1 }).limit(100).explain() { "cursor": "BtreeCursor uid_1_created_1 reverse", "n": 18, "nscannedObjects": 18, "nscanned": 18, "scanAndOrder": false, "millis": 0 …
  • 18. Java support • Java driver is open source, available on github and Maven. • mongo.jar is the driver, bson.jar is a subset with BSON library only. • Java driver is probably the most used MongoDB driver • It receives active development by MongoDB Inc and the community
  • 19. Driver Features • CRUD • Support for replica sets • Connection pooling • Distributed reads to slave servers • BSON serializer/deserializer (lazy option) • JSON serializer/deserializer • GridFS
  • 20. Message Store public class MessageStoreDAO implements MessageStore { private Morphia morphia; private Datastore ds; public MessageStoreDAO( MongoClient mongo ) { this.morphia = new Morphia(); this.morphia.map(DBMessage.class); this.ds = morphia.createDatastore(mongo, "messages"); this.ds.getCollection(DBMessage.class). ensureIndex(new BasicDBObject("sender",1).append("sentAt",1) ); } // get a message public Message get(String user_id, String msg_id) { return (Message) this.ds.find(DBMessage.class) .filter("sender", user_id) .filter("_id", new ObjectId(msg_id)) .get(); }
  • 21. Message Store // save a message public Message save(String user_id, String message, Date date) { Message msg = new DBMessage( user_id, message, date ); ds.save( msg ); return msg; } // find message by author sorted by descending time public List<Message> sentBy(String user_id) { return (List) this.ds.find(DBMessage.class) .filter("sender",user_id).order("-sentAt").limit(50).asList(); } // find message by several authors sorted by descending time public List<Message> sentBy(List<String> user_ids) { return (List) this.ds.find(DBMessage.class) .field("sender").in(user_ids).order("-sentAt").limit(50).asList(); }
  • 22. Graph Store Below uses Solution #4: both a follower and followee list public class GraphStoreDAO implements GraphStore { private DBCollection friends; private DBCollection followers; public GraphStoreDAO(MongoClient mongo) { this.followers = mongo.getDB("edges").getCollection("followers"); this.friends = mongo.getDB("edges").getCollection("friends"); followers.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique", true)); friends.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique",true)); } // find users that are followed public List<String> friendsOf(String user_id) { List<String> theFriends = new ArrayList<String>(); DBCursor cursor = friends.find( new BasicDBObject("u",user_id), new BasicDBObject("_id",0).append("o",1)); while(cursor.hasNext()) theFriends.add( (String) cursor.next().get("o")); return theFriends; }
  • 23. Graph Store // find followers of a user public List<String> followersOf(String user_id) { List<String> theFollowers = new ArrayList<String>(); DBCursor cursor = followers.find( new BasicDBObject("u",user_id), new BasicDBObject("_id",0).append("o",1)); while(cursor.hasNext()) theFollowers.add( (String) cursor.next().get("o")); return theFollowers; } public void follow(String user_id, String toFollow) { friends.save( new BasicDBObject("u",user_id).append("o",toFollow)); followers.save( new BasicDBObject("u",toFollow).append("o",user_id)); } public void unfollow(String user_id, String toUnFollow) { friends.remove(new BasicDBObject("u", user_id).append("o", toUnFollow)); followers.remove(new BasicDBObject("u", toUnFollow).append("o", user_id)); }
  • 25. 4 Approaches (there are more) • Fan out on Read • Fan out on Write • Bucketed Fan out on Write • Inbox Caches
  • 26. Fan out on read • Generally, not the right approach • 1 document per message sent • Reading an inbox is finding all messages sent by the list of people users follow • Requires scatter-gather on sharded cluster • Then a lot of random IO on a shard to find everything
  • 27. Fan out on Read Put the followees ids in a list: > var fees = [] > db.followees.find({user: "11622712"}) .forEach( function(doc) { fees.push( doc.followee ) } ) Use $in and sort() and limit() to gather the inbox: > db.messages.find({ uid: { $in: fees } }).sort({ created: -1 }).limit(100) { "_id": ObjectId("519d627ce4b07916312f0a09"), "uid": "34660390", "username": "Dingdowas" { "_id": ObjectId("519d627ce4b07916312f0a10"), "uid": "34661390", "username": "John" } … { "_id": ObjectId("519d627ce4b07916312f0a11"), "uid": "34662390", "username": "Brenda" } … …
  • 28. Fan out on read – Send Message Shard 1 Shard 2 Shard 3 Send Message
  • 29. Fan out on read – Inbox Read Shard 1 Shard 2 Shard 3 Read Inbox
  • 30. Fan out on read > db.messages.find({ uid: { $in: fees } } ).sort({ created: -1 } ).limit(100).explain() { "cursor": "BtreeCursor uid_1_created_1 multi", "isMultiKey": false, "n": 100, "nscannedObjects": 1319, "nscanned": 1384, "nscannedObjectsAllPlans": 1425, "nscannedAllPlans": 1490, "scanAndOrder": true, // it is sorting in RAM?? "indexOnly": false, "nYields": 0, "nChunkSkips": 0, "millis": 31 // takes about 30ms }
  • 31. Fan out on read - sort
  • 32. Fan out on write • Tends to scale better than fan out on read • 1 document per recipient • Reading my inbox is just finding all of the messages with me as the recipient • Can shard on recipient, so inbox reads hit one shard • But still lots of random IO on the shard
  • 33. Fan out on Write // Shard on “recipient” and “sent” db.shardCollection(”myapp.inbox”, { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe”, sent: new Date(), message: ”Hi!” } // Send a message, write one message per follower for( follower in followersOf( msg.from) ) { msg.recipient = recipient db.inbox.save(msg); } // Read my inbox, super easy db.inbox.find({ recipient: ”Joe” }).sort({ sent: -1 })
  • 34. Fan out on write – Send Message Shard 1 Shard 2 Shard 3 Send Message
  • 35. Fan out on write– Read Inbox Shard 1 Shard 2 Shard 3 Read Inbox
  • 36. Bucketed Fan out on write • Each “inbox” document is an array of messages • Append a message onto “inbox” of recipient • Bucket inbox documents so there‟s not too many per document • Can shard on recipient, so inbox reads hit one shard • 1 or 2 documents to read the whole inbox
  • 37. Bucketed Fan out on Write // Shard on “owner / sequence” db.shardCollection(”myapp.buckets”, { ”owner”: 1, ”sequence”: 1 } ) db.shardCollection(”myapp.users”, { ”user_name”: 1 } ) msg = { from: "Joe”, sent: new Date(), message: ”Hi!” } // Send a message, have to find the right sequence document for( follower in followersOf( msg.from) ) { sequence = db.users.findAndModify({ query: { user_name: recipient}, update: { '$inc': { ‟msg_count': 1 }}, upsert: true, new: true }).msg_count / 50; db.buckets.update({ owner: recipient, sequence: sequence}, { $push: { „messages‟: msg } }, { upsert: true }); } // Read my inbox db.buckets.find({ owner: ”Joe” }).sort({ sequence: -1 }).limit(2)
  • 38. Bucketed fan out on write - Send Shard 1 Shard 2 Shard 3 Send Message
  • 39. Bucketed fan out on write - Read Shard 1 Shard 2 Shard 3 Read Inbox
  • 40. Cached inbox • Recent messages are fast, but older messages are slower • Store a cache of last N messages per user • Used capped array to age out older messages • Create cache lazily when user accesses inbox • Only write the message if cache exists. • Use TTL collection to time out caches for inactive users
  • 41. Cached Inbox // Shard on “owner" db.shardCollection(”myapp.caches”, { ”owner”: 1 } ) // Send a message, add it to the existing caches of followers for( follower in followersOf( msg.from) ) { db.caches.update({ owner: recipient }, { $push: { messages: { $each: [ msg ], $sort: { „sent‟: 1 }, $slice: -50 } } } ); // Read my inbox If( msgs = db.caches.find({ owner: ”Joe” }) ) { // cache document exists return msgs; } else { // fall back to "fan out on read" and cache it db.caches.save({owner:‟joe‟, messages:[]}); msgs = db.outbox.find({sender: { $in: [ followersOf( msg.from ) ] }}).sort({sent:-1}).limit(50); db.caches.update({user:‟joe‟}, {$push: msgs }); }
  • 42. Cached Inbox – Send Shard 1 Shard 2 Shard 3 Send Message
  • 43. Cached Inbox- Read Shard 1 Shard 2 Shard 3 Read Inbox 1 2 Cache Hit Cache Miss
  • 45. Tradeoffs Fan out on Read Fan out on Write Bucketed Fan out on Write Inbox Cache Send Message Performance Best Single shard Single write Good Shard per recipient Multiple writes Worst Shard per recipient Appends (grows) Mixed Depends on how many users are in cache Read Inbox Performance Worst Broadcast all shards Random reads Good Single shard Random reads Best Single shard Single read Mixed Recent messages fast Older messages are slow Data Size Best Message stored once Worst Copy per recipient Worst Copy per recipient Good Same as FoR + size of cache
  • 46. Things to consider • Lots of recipients • Fan out on write might become prohibitive • Consider introducing a “Group” • Make fan out asynchronous • Very large message size • Multiple copies of messages can be a burden • Consider single copy of message with a “pointer” per inbox • More writes than reads • Fan out on read might be okay
  • 47. Summary • Multiple ways to model status updates • Think about characteristics of your network – Number of users – Number of edges – Publish frequency – Access patterns • Try to minimize random IO
  • 48. Technical Account Manager Lead, MongoDB Inc Antoine Girbal JavaOne 2013 Thank You