MongoDB can be used simply as a log collector using for example a capped collection. Fotopedia has such a system which is used for quick introspection and realtime analysis.
Speech done the 23rd of March, 2011 at MongoFR days in Paris, la Cantine by Pierre Baillet and Mathieu Poumeyrol
2. MONGODB AS A LOG COLLECTOR
photo by Jean-Michel BAUD
Pierre Bai!et & Mathieu Poumeyrol
oct & kali @ fotopedia.com
3. DB.SLIDES.FIND({‘TYPE’:‘TITLE’})
Fotopedia, who we are, what we do, how we do
MongoDB at Fotopedia, current state of our art
Logging, the answer to life, the universe and everything
How we fullfilled this need
Log usage on a daily basis
Future work
5. FOTOPEDIA
WHO ARE WE ?
Company created in 2006
Located in Paris, near the Opéra
17 people, including 8 MongoDB regular users (aka
developers)
we’re hiring
6. FOTOPEDIA
WHAT DO WE DO ?
Images for Humanity
Open to anyone, Amateur or professionnal
Creative Commons aware
Beautiful Wikipedia (http://www.fotopedia.com)
iPad tablebooks (iPhone too): Heritage, National Parks and
Memory of Color
7. INFRASTRUCTURE
Based on Amazon Web Services
Around 20 servers located in the US datacenters
Use centralized deployment procedure (Chef)
Deploy at least once a week with no downtime
8. KEY TECHNOLOGIES
Ruby on Rails (with REE) Lackr (in house java proxy)
Unicorn Sinatra
Varnish Redis and Resque
HAProxy Mysql
NGinx MongoDB
10. CURRENT STATE OF OUR ART
Last year speech about our MongoDB powered metacache
Store complete Wikipedia data in > 10 languages
Since spring 2010, all new database-centric features have
been developped with MongoDB
Our goal : slowly migrate all DB feature to MongoDB
whenever possible
12. OUR SETUP
4 clusters (business data, log and reporting, wikipedia, and
one more)
3 EC-2 XL virtual machines hosting 5 replica-set
at the current time, one machine is master on all RS
5 replica-set are allocated to one of the clusters
every instance holds the 4 mongos
13. SOME FIGURES
in production since september 2009
wikipedia data: wikipedia/en: 5GB, 8M documents (and
about 10 other languages), batch load: 17k insert/s
webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s
overall, average 250 op/s
15. ORIGINAL PHILOSOPHY
Log everything, don’t delete
Collected by Scribe
Comprehensive daily log stored in AWS S3
Hadoop jobs to generates statistics
grep and his merry friends for issue inquiring
Quite efficient, but cumbersome and slow
16. WHY IMPROVE
Issue analysis in realtime (debugging)
Realtime activity analysis
Traffic spikes
Misbehaving crawlers and other suspicious activity
20. LOG COLLECTING
File logging daemons (NGinx, HAProxy)
Ruby tailer script
Memory logging daemons (Varnish)
Dedicated binary that streams varnish SHM into MongoDB
Other Daemons (Lackr, Picor)
Extended logging system to store data in MongoDB
also log ruby exceptions into MongoDB
21. MONGO SHARDING
All servers host the «logs» mongos on port 27002.
All daemons push their logs to«localhost:27002»
The actual storage is a capped collection in a non-sharded
database.
34. ISSUE WITH MONGODB
Scalability of using a capped collection
Official doc says no indices
Size limit vs indices efficiency (400 000 lines for < 2 hours of
log) : our plan is to have 2 days worth of logs.
35. The Library of Congress
FUTURE WORK
«vers l’infini et au delà»
36. FUTURE WORK
Leaner interface
Ugly and jquery-ui based. Should switch to Sencha
framework
Keep more log
Abandon Capped collections
Keep log longer, one collection per day(?)