SlideShare uma empresa Scribd logo
1 de 37
MONGODB AS A LOG COLLECTOR



                                     photo by Jean-Michel BAUD




   Pierre Bai!et & Mathieu Poumeyrol
        oct & kali @ fotopedia.com
DB.SLIDES.FIND({‘TYPE’:‘TITLE’})


Fotopedia, who we are, what we do, how we do

MongoDB at Fotopedia, current state of our art

Logging, the answer to life, the universe and everything

How we fullfilled this need

Log usage on a daily basis

Future work
FOTOPEDIA
«Photos de fami!e»
FOTOPEDIA
                WHO ARE WE ?

Company created in 2006

Located in Paris, near the Opéra

17 people, including 8 MongoDB regular users (aka
developers)

we’re hiring
FOTOPEDIA
             WHAT DO WE DO ?
Images for Humanity

Open to anyone, Amateur or professionnal

Creative Commons aware

Beautiful Wikipedia (http://www.fotopedia.com)

iPad tablebooks (iPhone too): Heritage, National Parks and
Memory of Color
INFRASTRUCTURE


Based on Amazon Web Services

Around 20 servers located in the US datacenters

Use centralized deployment procedure (Chef)

Deploy at least once a week with no downtime
KEY TECHNOLOGIES

Ruby on Rails (with REE)   Lackr (in house java proxy)


Unicorn                    Sinatra


Varnish                    Redis and Resque


HAProxy                    Mysql


NGinx                      MongoDB
MONGODB AT FOTOPEDIA
«C:UtilisateursfotopediaMes Documents»
CURRENT STATE OF OUR ART



Last year speech about our MongoDB powered metacache

Store complete Wikipedia data in > 10 languages

Since spring 2010, all new database-centric features have
been developped with MongoDB

Our goal : slowly migrate all DB feature to MongoDB
whenever possible
MYSQL MIGRATIONS
                                           Alter table

 30


22.5


 15


 7.5


  0
       08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011
OUR SETUP

4 clusters (business data, log and reporting, wikipedia, and
one more)

3 EC-2 XL virtual machines hosting 5 replica-set

at the current time, one machine is master on all RS

5 replica-set are allocated to one of the clusters

every instance holds the 4 mongos
SOME FIGURES


in production since september 2009

wikipedia data: wikipedia/en: 5GB, 8M documents (and
about 10 other languages), batch load: 17k insert/s

webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s

overall, average 250 op/s
jm3




LOGGING
 «l’oeil du lynx»
ORIGINAL PHILOSOPHY

 Log everything, don’t delete

 Collected by Scribe

 Comprehensive daily log stored in AWS S3

 Hadoop jobs to generates statistics

 grep and his merry friends for issue inquiring

Quite efficient, but cumbersome and slow
WHY IMPROVE


Issue analysis in realtime (debugging)

Realtime activity analysis

  Traffic spikes

  Misbehaving crawlers and other suspicious activity
ORIGINAL STACK LAYOUT
Stefano Constanzo




HOW WE SOLVED THIS ISSUE
      «démons et mervei!es»
NORMALIZED LOG FORMAT

{ "_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"),
"facility" : "varnish",

"instance" : "a01",

"date" : NumberLong("1300107724534"),

"http_host" : "www.fotopedia.com",

"method" : "GET",

"http_version" : "HTTP/1.1",

"path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires",

"status" : "404",

"size" : 13,

"elapsed" : 0.00007748600182821974 }
LOG COLLECTING

File logging daemons (NGinx, HAProxy)

  Ruby tailer script

Memory logging daemons (Varnish)

  Dedicated binary that streams varnish SHM into MongoDB

Other Daemons (Lackr, Picor)

  Extended logging system to store data in MongoDB

  also log ruby exceptions into MongoDB
MONGO SHARDING


All servers host the «logs» mongos on port 27002.

All daemons push their logs to«localhost:27002»

The actual storage is a capped collection in a non-sharded
database.
CURRENT STACK LAYOUT
Jesús García Ferrer




LOG USAGE ON A DAILY BASIS
    «l’aigui!e dans la meule de sapin»
SAPIN: EXCEPTION LOGGING

        View Latest Errors
SAPIN: EXCEPTION LOGGING

                     Useful informations:


                 •Source url and parameters

                 •Date and time

                 •Browser identifiers (IP, cookie
                 values, User-Agent)


                 •Full stack dump

                 •Full headers dump

                 •Full user model dump
SAPIN: EXCEPTION LOGGING

       Searching in Exceptions
RAMPLR: SAMPLING ANALYSIS




Sample analysis
SAPIN: REALTIME LOGGING


jQuery-ui based interface

Sinatra Backed

Filter by Facility

Searchable criterias: IP Address, Follow Operation-ID

Display HTTP execution Timeline
SAPIN: REALTIME LOGGING

        Facility Filtering
SAPIN: REALTIME LOGGING

         Url Filtering
SAPIN: REALTIME LOGGING

       IP Address Filtering
SAPIN: REALTIME LOGGING

       Operation ID Filtering
SAPIN: REALTIME LOGGING

        Timeline display
ISSUE WITH MONGODB


Scalability of using a capped collection

  Official doc says no indices

Size limit vs indices efficiency (400 000 lines for < 2 hours of
log) : our plan is to have 2 days worth of logs.
The Library of Congress




FUTURE WORK
 «vers l’infini et au delà»
FUTURE WORK

Leaner interface

  Ugly and jquery-ui based. Should switch to Sencha
  framework

Keep more log

  Abandon Capped collections

  Keep log longer, one collection per day(?)
Great Beyond




QUESTIONS ?
 «je vous dis : au revoir.»

Mais conteúdo relacionado

Mais procurados

«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghubit-people
 
Embulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructureEmbulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructureHiroshi Toyama
 
To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)Geoffrey Anderson
 
PySpark with Juypter
PySpark with JuypterPySpark with Juypter
PySpark with JuypterLi Ming Tsai
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHPMike Lively
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...DataStax Academy
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com琛琳 饶
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableShu Ting Tseng
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Ontico
 
Real time indexes in Sphinx, Yaroslav Vorozhko
Real time indexes in Sphinx, Yaroslav VorozhkoReal time indexes in Sphinx, Yaroslav Vorozhko
Real time indexes in Sphinx, Yaroslav VorozhkoFuenteovejuna
 
Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016Steve Howe
 
Back to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingBack to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingMongoDB
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Async and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRubyAsync and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRubyJoe Kutner
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and CassandraStratio
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 

Mais procurados (20)

«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
Embulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructureEmbulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructure
 
To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)To Hire, or to train, that is the question (Percona Live 2014)
To Hire, or to train, that is the question (Percona Live 2014)
 
PySpark with Juypter
PySpark with JuypterPySpark with Juypter
PySpark with Juypter
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
 
Real time indexes in Sphinx, Yaroslav Vorozhko
Real time indexes in Sphinx, Yaroslav VorozhkoReal time indexes in Sphinx, Yaroslav Vorozhko
Real time indexes in Sphinx, Yaroslav Vorozhko
 
Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016
 
Back to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingBack to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to sharding
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Async and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRubyAsync and Non-blocking IO w/ JRuby
Async and Non-blocking IO w/ JRuby
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 

Destaque

Mongoose v3 :: The Future is Bright
Mongoose v3 :: The Future is BrightMongoose v3 :: The Future is Bright
Mongoose v3 :: The Future is Brightaaronheckmann
 
Becoming Node.js ninja on Cloud Foundry
Becoming Node.js ninja on Cloud FoundryBecoming Node.js ninja on Cloud Foundry
Becoming Node.js ninja on Cloud FoundryRaja Rao DV
 
Testing nodejs apps
Testing nodejs appsTesting nodejs apps
Testing nodejs appsfelipefsilva
 
[C5]deview 2012 nodejs
[C5]deview 2012 nodejs[C5]deview 2012 nodejs
[C5]deview 2012 nodejsNAVER D2
 
소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안
소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안
소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안Jeongsang Baek
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Module, AMD, RequireJS
Module, AMD, RequireJSModule, AMD, RequireJS
Module, AMD, RequireJS偉格 高
 
Asynchronous Module Definition (AMD)
Asynchronous Module Definition (AMD)Asynchronous Module Definition (AMD)
Asynchronous Module Definition (AMD)xMartin12
 
HTML5 Real-Time and Connectivity
HTML5 Real-Time and ConnectivityHTML5 Real-Time and Connectivity
HTML5 Real-Time and ConnectivityPeter Lubbers
 
How We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising SystemHow We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising SystemMongoDB
 
Build Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDBBuild Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDBMongoDB
 
[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'
[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'
[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'sung ki choi
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...MongoSF
 
MongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDBPatrick Stokes
 
영속성 컨텍스트로 보는 JPA
영속성 컨텍스트로 보는 JPA영속성 컨텍스트로 보는 JPA
영속성 컨텍스트로 보는 JPA경원 이
 
Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기JangHyuk You
 

Destaque (20)

Mongoose v3 :: The Future is Bright
Mongoose v3 :: The Future is BrightMongoose v3 :: The Future is Bright
Mongoose v3 :: The Future is Bright
 
Grid FS
Grid FSGrid FS
Grid FS
 
The SPDY Protocol
The SPDY ProtocolThe SPDY Protocol
The SPDY Protocol
 
Becoming Node.js ninja on Cloud Foundry
Becoming Node.js ninja on Cloud FoundryBecoming Node.js ninja on Cloud Foundry
Becoming Node.js ninja on Cloud Foundry
 
Testing nodejs apps
Testing nodejs appsTesting nodejs apps
Testing nodejs apps
 
[C5]deview 2012 nodejs
[C5]deview 2012 nodejs[C5]deview 2012 nodejs
[C5]deview 2012 nodejs
 
소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안
소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안
소셜게임 서버 개발 관점에서 본 Node.js의 장단점과 대안
 
RESTful API Design, Second Edition
RESTful API Design, Second EditionRESTful API Design, Second Edition
RESTful API Design, Second Edition
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Module, AMD, RequireJS
Module, AMD, RequireJSModule, AMD, RequireJS
Module, AMD, RequireJS
 
Asynchronous Module Definition (AMD)
Asynchronous Module Definition (AMD)Asynchronous Module Definition (AMD)
Asynchronous Module Definition (AMD)
 
HTML5 Real-Time and Connectivity
HTML5 Real-Time and ConnectivityHTML5 Real-Time and Connectivity
HTML5 Real-Time and Connectivity
 
How We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising SystemHow We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising System
 
Build Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDBBuild Your Own Custom Mobile Analytics with Node and MongoDB
Build Your Own Custom Mobile Analytics with Node and MongoDB
 
[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'
[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'
[110730/아꿈사발표자료] mongo db 완벽 가이드 : 7장 '고급기능'
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
 
MongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBookMongoDB World 2016 Giant Ideas Stage eBook
MongoDB World 2016 Giant Ideas Stage eBook
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDB
 
영속성 컨텍스트로 보는 JPA
영속성 컨텍스트로 보는 JPA영속성 컨텍스트로 보는 JPA
영속성 컨텍스트로 보는 JPA
 
Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기
 

Semelhante a MongoFr : MongoDB as a log Collector

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLArnab Biswas
 
J-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationJ-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationMaciej Bilas
 
Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18Pierre Joye
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioiguazio
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryMongoDB
 
Build your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesBuild your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesMartin Czygan
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesNETWAYS
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkNicola Ferraro
 
Log everything!
Log everything!Log everything!
Log everything!ICANS GmbH
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloudmstuparu
 
OSOM - Operations in the Cloud
OSOM - Operations in the CloudOSOM - Operations in the Cloud
OSOM - Operations in the CloudMarcela Oniga
 
Scaling with Symfony - PHP UK
Scaling with Symfony - PHP UKScaling with Symfony - PHP UK
Scaling with Symfony - PHP UKRicard Clau
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRick Copeland
 
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...Codemotion
 
PHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the foolPHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the foolAlessandro Cinelli (cirpo)
 

Semelhante a MongoFr : MongoDB as a log Collector (20)

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
J-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationJ-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your application
 
Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18Webdevcon Keynote hh-2012-09-18
Webdevcon Keynote hh-2012-09-18
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Using MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content RepositoryUsing MongoDB to Build a Fast and Scalable Content Repository
Using MongoDB to Build a Fast and Scalable Content Repository
 
Build your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resourcesBuild your own discovery index of scholary e-resources
Build your own discovery index of scholary e-resources
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache Spark
 
Log everything!
Log everything!Log everything!
Log everything!
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloud
 
OSOM - Operations in the Cloud
OSOM - Operations in the CloudOSOM - Operations in the Cloud
OSOM - Operations in the Cloud
 
Scaling with Symfony - PHP UK
Scaling with Symfony - PHP UKScaling with Symfony - PHP UK
Scaling with Symfony - PHP UK
 
Rapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and PythonRapid, Scalable Web Development with MongoDB, Ming, and Python
Rapid, Scalable Web Development with MongoDB, Ming, and Python
 
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
 
PHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the foolPHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the fool
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

MongoFr : MongoDB as a log Collector

  • 1.
  • 2. MONGODB AS A LOG COLLECTOR photo by Jean-Michel BAUD Pierre Bai!et & Mathieu Poumeyrol oct & kali @ fotopedia.com
  • 3. DB.SLIDES.FIND({‘TYPE’:‘TITLE’}) Fotopedia, who we are, what we do, how we do MongoDB at Fotopedia, current state of our art Logging, the answer to life, the universe and everything How we fullfilled this need Log usage on a daily basis Future work
  • 5. FOTOPEDIA WHO ARE WE ? Company created in 2006 Located in Paris, near the Opéra 17 people, including 8 MongoDB regular users (aka developers) we’re hiring
  • 6. FOTOPEDIA WHAT DO WE DO ? Images for Humanity Open to anyone, Amateur or professionnal Creative Commons aware Beautiful Wikipedia (http://www.fotopedia.com) iPad tablebooks (iPhone too): Heritage, National Parks and Memory of Color
  • 7. INFRASTRUCTURE Based on Amazon Web Services Around 20 servers located in the US datacenters Use centralized deployment procedure (Chef) Deploy at least once a week with no downtime
  • 8. KEY TECHNOLOGIES Ruby on Rails (with REE) Lackr (in house java proxy) Unicorn Sinatra Varnish Redis and Resque HAProxy Mysql NGinx MongoDB
  • 10. CURRENT STATE OF OUR ART Last year speech about our MongoDB powered metacache Store complete Wikipedia data in > 10 languages Since spring 2010, all new database-centric features have been developped with MongoDB Our goal : slowly migrate all DB feature to MongoDB whenever possible
  • 11. MYSQL MIGRATIONS Alter table 30 22.5 15 7.5 0 08/Q3 08/Q4 09/Q1 09/Q2 09/Q3 09/Q4 10/Q1 10/Q2 10/Q3 10/Q4 2011
  • 12. OUR SETUP 4 clusters (business data, log and reporting, wikipedia, and one more) 3 EC-2 XL virtual machines hosting 5 replica-set at the current time, one machine is master on all RS 5 replica-set are allocated to one of the clusters every instance holds the 4 mongos
  • 13. SOME FIGURES in production since september 2009 wikipedia data: wikipedia/en: 5GB, 8M documents (and about 10 other languages), batch load: 17k insert/s webcache: 2GB, 11M records, avg 60 op/s, peak 300 op/s overall, average 250 op/s
  • 15. ORIGINAL PHILOSOPHY Log everything, don’t delete Collected by Scribe Comprehensive daily log stored in AWS S3 Hadoop jobs to generates statistics grep and his merry friends for issue inquiring Quite efficient, but cumbersome and slow
  • 16. WHY IMPROVE Issue analysis in realtime (debugging) Realtime activity analysis Traffic spikes Misbehaving crawlers and other suspicious activity
  • 18. Stefano Constanzo HOW WE SOLVED THIS ISSUE «démons et mervei!es»
  • 19. NORMALIZED LOG FORMAT { "_id" : ObjectId("4d7e11cc7ea68d34fb01f2ac2"), "facility" : "varnish", "instance" : "a01", "date" : NumberLong("1300107724534"), "http_host" : "www.fotopedia.com", "method" : "GET", "http_version" : "HTTP/1.1", "path" : "/albums/fotopedia-fr-Cath%C3%A9drale_m%C3%A9tropolitaine_de_Buenos_Aires", "status" : "404", "size" : 13, "elapsed" : 0.00007748600182821974 }
  • 20. LOG COLLECTING File logging daemons (NGinx, HAProxy) Ruby tailer script Memory logging daemons (Varnish) Dedicated binary that streams varnish SHM into MongoDB Other Daemons (Lackr, Picor) Extended logging system to store data in MongoDB also log ruby exceptions into MongoDB
  • 21. MONGO SHARDING All servers host the «logs» mongos on port 27002. All daemons push their logs to«localhost:27002» The actual storage is a capped collection in a non-sharded database.
  • 23. Jesús García Ferrer LOG USAGE ON A DAILY BASIS «l’aigui!e dans la meule de sapin»
  • 24. SAPIN: EXCEPTION LOGGING View Latest Errors
  • 25. SAPIN: EXCEPTION LOGGING Useful informations: •Source url and parameters •Date and time •Browser identifiers (IP, cookie values, User-Agent) •Full stack dump •Full headers dump •Full user model dump
  • 26. SAPIN: EXCEPTION LOGGING Searching in Exceptions
  • 28. SAPIN: REALTIME LOGGING jQuery-ui based interface Sinatra Backed Filter by Facility Searchable criterias: IP Address, Follow Operation-ID Display HTTP execution Timeline
  • 29. SAPIN: REALTIME LOGGING Facility Filtering
  • 30. SAPIN: REALTIME LOGGING Url Filtering
  • 31. SAPIN: REALTIME LOGGING IP Address Filtering
  • 32. SAPIN: REALTIME LOGGING Operation ID Filtering
  • 33. SAPIN: REALTIME LOGGING Timeline display
  • 34. ISSUE WITH MONGODB Scalability of using a capped collection Official doc says no indices Size limit vs indices efficiency (400 000 lines for < 2 hours of log) : our plan is to have 2 days worth of logs.
  • 35. The Library of Congress FUTURE WORK «vers l’infini et au delà»
  • 36. FUTURE WORK Leaner interface Ugly and jquery-ui based. Should switch to Sencha framework Keep more log Abandon Capped collections Keep log longer, one collection per day(?)
  • 37. Great Beyond QUESTIONS ? «je vous dis : au revoir.»

Notas do Editor

  1. \n
  2. pierre baillet, server architect\nmathieu poumeyrol, director of cloud engineering\n
  3. \n
  4. \n
  5. next slide is what we do\n\n
  6. next slide is about how we do\n
  7. next slide is about key technologies\n
  8. Dernier slide de la section\n
  9. \n
  10. \n
  11. Dernier slide de la section\n
  12. \n
  13. \n
  14. \n
  15. next slide is why should we improve\n
  16. next slide show original logging layout\n
  17. Dernier Slide de la section\n
  18. \n
  19. \n
  20. \n
  21. \n
  22. Dernier slide de la section\n
  23. \n
  24. details on next slide\n
  25. search in exception in next slide\n
  26. next slide is about sampling and ramplr\n
  27. next slide is about technologies used in sapin\n
  28. next slide is about facility filtering\n
  29. describe sapin facility:\n- column selection\n- reloading\n- list of facility\n\nnext slide is about url filtering\n
  30. next slide is about url filtering\n
  31. next slide details an op-id session\n
  32. next slide shows a timeline\n
  33. next slide is about current issues\n
  34. Dernier slide de la section\n
  35. \n
  36. Dernier slide de la pr&amp;#xE9;sentation et de la section avant les questions\n
  37. \n