SlideShare uma empresa Scribd logo
1 de 26
mongodb @ foursquare
MongoSF - 5/24/2011
Jorge Ortiz (@jorgeortiz85)
what is foursquare?
location-based social network - “check-in” to bars,
restaurants, museums, parks, etc
  friend-finder (where are my friends right now?)
  virtual game (badges, points, mayorships)
  city guide (local, personalized recommendations)
  location diary + stats engine (where was I a year ago?)
  specials (get rewards at your favorite restaurant)
foursquare: the numbers

>9M users
~3M checkins/day
>15M venues
>300k merchants
>60 employees
foursquare: the tech

 Nginx, HAProxy
 Scala, Lift
 MongoDB, PostgreSQL (legacy)
 (Kestrel, Munin, Ganglia, Python, Memcache, ...)
 All on EC2
foursquare <3’s mongodb

fast
indexes & rich queries
sharding, auto-balancing
replication (see: http://engineering.foursquare.com/)
geo-indexes
amazing support
mongodb: our numbers

8 clusters
  some sharded, some not
  some master/slave, some replica set
~40 machines (68.4GB, m2.4xl on EC2)
2.3 billion records
~15k QPS
mongodb: lessons learned

keep working set in memory
avoid long-running queries (reads or writes)
monitor everything (especially per-collection stats)
shard from day 1
beware EBS
use small field names for large collections
keep working set in memory
avoid long-running queries
monitor everything
(per collection stats)
shard from day 1
beware EBS
use small field names for
large collections
mongodb: pain points


mongos -- failover and thundering herds
index creation -- production impact unclear
auto-balancing -- getting there
replication chains -- use replica sets
rogue: a scala dsl for mongo

 type-safe
 all mongo query features
 logging & validation hooks
 pagination
 index-aware

         http://github.com/foursquare/rogue
mongo-java-driver query
val query =
  (BasicDBOBjectBuilder
    .start
    .push(“mayorid”)
      .add(“$lte”, 100)
    .pop
    .push(“veneuname”)
      .add(“$eq”, “Starbucks”)
    .pop
    .get)
rogue: code example

Venue where (_.mayorid <= 100)
        and (_.venuename eqs “Starbucks”)
        and (_.tags contains “wifi”)
        and (_.latlng near
              (39.0, -74.0, Degrees(0.2))
  orderDesc (_._id)
      fetch (5)
rogue: schema example

class Venue extends MongoRecord[Venue] {
  object _id extends ObjectIdField(this)
  object venuename extends StringField(this)
  object mayorid extends LongField(this)
  object tags extends ListField[String](this)
  object latlng extends LatLngField(this)
}
rogue: logging & validation

 logging:
   slf4j
   Tracer
 validation:
   radius, $in size
   index checks
rogue: pagination

val query: Query[Venue] = ...

val vs: List[Venue] =
  (query
    .countPerPage(20)
    .setPage(5)
    .fetch())
rogue: cursors


val query: Query[Checkin] = ...

for (checkin <- query) {
  ... f(checkin) ...
}
rogue: index-aware


val vs: List[Checkin] =
  (Checkin
    where (_.userid eqs 646)
      and (_.venueid eqs vid)
    fetch ())
rogue: index-aware


val vs: List[Checkin] =
  (Checkin
    where (_.userid eqs 646)
      and (_.venueid eqs vid) // hidden scan!
    fetch ())
rogue: index-aware


val vs: List[Checkin] =
  (Checkin
    where (_.userid eqs 646) // known index
     scan (_.venueid eqs vid) // known scan
    fetch ())
rogue: future directions


 iteratees for cursors
 compile-time index checking
 select partial objects
 generate full javascript for mapreduce
we’re hiring
    (nyc & sf)
http://foursquare.com/jobs
  jorge@foursquare.com
        @jorgeortiz85

Mais conteúdo relacionado

Mais procurados

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
Spark Programming
Spark ProgrammingSpark Programming
Spark ProgrammingTaewook Eom
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with CassandraJacek Lewandowski
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into CassandraDataStax
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraRussell Spitzer
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizationsGal Marder
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayMatthias Niehoff
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 

Mais procurados (20)

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizations
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 

Destaque

MongoDB @ Foursquare
MongoDB @ FoursquareMongoDB @ Foursquare
MongoDB @ FoursquareBruno Furtado
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBMongoDB
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDBFred Chu
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesJared Rosoff
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 

Destaque (6)

MongoDB @ Foursquare
MongoDB @ FoursquareMongoDB @ Foursquare
MongoDB @ Foursquare
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 

Semelhante a MongoSF - mongodb @ foursquare

CS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBCS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBjorgeortiz85
 
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBScala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBjorgeortiz85
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomerzefhemel
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQLRoberto Franchini
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
 
mobl - model-driven engineering lecture
mobl - model-driven engineering lecturemobl - model-driven engineering lecture
mobl - model-driven engineering lecturezefhemel
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Groupsiculars
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from JavaNeo4j
 
Lift 2 0
Lift 2 0Lift 2 0
Lift 2 0SO
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 

Semelhante a MongoSF - mongodb @ foursquare (20)

CS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBCS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDB
 
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBScala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQL
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
huhu
huhuhuhu
huhu
 
mobl - model-driven engineering lecture
mobl - model-driven engineering lecturemobl - model-driven engineering lecture
mobl - model-driven engineering lecture
 
Ricky Bobby's World
Ricky Bobby's WorldRicky Bobby's World
Ricky Bobby's World
 
Apache Spark for Library Developers with William Benton and Erik Erlandson
 Apache Spark for Library Developers with William Benton and Erik Erlandson Apache Spark for Library Developers with William Benton and Erik Erlandson
Apache Spark for Library Developers with William Benton and Erik Erlandson
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Group
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from Java
 
Lift 2 0
Lift 2 0Lift 2 0
Lift 2 0
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
 
mobl
moblmobl
mobl
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 

MongoSF - mongodb @ foursquare

  • 1. mongodb @ foursquare MongoSF - 5/24/2011 Jorge Ortiz (@jorgeortiz85)
  • 2. what is foursquare? location-based social network - “check-in” to bars, restaurants, museums, parks, etc friend-finder (where are my friends right now?) virtual game (badges, points, mayorships) city guide (local, personalized recommendations) location diary + stats engine (where was I a year ago?) specials (get rewards at your favorite restaurant)
  • 3. foursquare: the numbers >9M users ~3M checkins/day >15M venues >300k merchants >60 employees
  • 4. foursquare: the tech Nginx, HAProxy Scala, Lift MongoDB, PostgreSQL (legacy) (Kestrel, Munin, Ganglia, Python, Memcache, ...) All on EC2
  • 5. foursquare <3’s mongodb fast indexes & rich queries sharding, auto-balancing replication (see: http://engineering.foursquare.com/) geo-indexes amazing support
  • 6. mongodb: our numbers 8 clusters some sharded, some not some master/slave, some replica set ~40 machines (68.4GB, m2.4xl on EC2) 2.3 billion records ~15k QPS
  • 7. mongodb: lessons learned keep working set in memory avoid long-running queries (reads or writes) monitor everything (especially per-collection stats) shard from day 1 beware EBS use small field names for large collections
  • 8. keep working set in memory
  • 13. use small field names for large collections
  • 14. mongodb: pain points mongos -- failover and thundering herds index creation -- production impact unclear auto-balancing -- getting there replication chains -- use replica sets
  • 15. rogue: a scala dsl for mongo type-safe all mongo query features logging & validation hooks pagination index-aware http://github.com/foursquare/rogue
  • 16. mongo-java-driver query val query = (BasicDBOBjectBuilder .start .push(“mayorid”) .add(“$lte”, 100) .pop .push(“veneuname”) .add(“$eq”, “Starbucks”) .pop .get)
  • 17. rogue: code example Venue where (_.mayorid <= 100) and (_.venuename eqs “Starbucks”) and (_.tags contains “wifi”) and (_.latlng near (39.0, -74.0, Degrees(0.2)) orderDesc (_._id) fetch (5)
  • 18. rogue: schema example class Venue extends MongoRecord[Venue] { object _id extends ObjectIdField(this) object venuename extends StringField(this) object mayorid extends LongField(this) object tags extends ListField[String](this) object latlng extends LatLngField(this) }
  • 19. rogue: logging & validation logging: slf4j Tracer validation: radius, $in size index checks
  • 20. rogue: pagination val query: Query[Venue] = ... val vs: List[Venue] = (query .countPerPage(20) .setPage(5) .fetch())
  • 21. rogue: cursors val query: Query[Checkin] = ... for (checkin <- query) { ... f(checkin) ... }
  • 22. rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) fetch ())
  • 23. rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) and (_.venueid eqs vid) // hidden scan! fetch ())
  • 24. rogue: index-aware val vs: List[Checkin] = (Checkin where (_.userid eqs 646) // known index scan (_.venueid eqs vid) // known scan fetch ())
  • 25. rogue: future directions iteratees for cursors compile-time index checking select partial objects generate full javascript for mapreduce
  • 26. we’re hiring (nyc & sf) http://foursquare.com/jobs jorge@foursquare.com @jorgeortiz85

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n