SlideShare uma empresa Scribd logo
1 de 43
MongoDB @ SFR
sfr.fr
Welcome




Antoine Raith, technical team leader @ SFR
Apache, Tomcat, JEE

1 mutualised platform

30 physical application servers

150 Tomcat deployed




Web development at Internet Direction
22M pageviews per day

4.5M only on homepage

8M customers authentication per day




We NEED to scale!

What do we face ?
Increase our scalability

 Avoid Schema/Table/Column dependency

 Closer to developper team than sysadmin or DBA
 team




NoSQL?
Scalable
Complex queries
Schema-less
Easy deployment and monitoring
Open-Source


Why MongoDB ?
[Live project] customers data

[Live project] sfr.fr targeted ads

[Development project] Products catalog




Our projects based on MongoDB
MongoDB @ SFR
Customer Data
Hello!




Jérôme Leleu, web architect @ SFR
In charge of SSO and user profile service
User profile service (UPS)

Web services (SOAP or JSON)

Get the profile of SFR clients

Data are agregated from many backends of the information
system




Context
Java 1.6, mongo driver 2.6.5, replicat set + sharding

Technical data : « local storage » collection
■   only 1 collection in a database
■   « last connection date » of web account
■   14 millions
■   read/writes by identifier of the web account (shard key)



Some functional data are coming : « internautes » collection
(6 millions)…




Data in UPS
My choice : read on slave and write (without acknowledge)
on master

« local storage » collection needs to be readable immediatly
after write

-> not really compatible with asynchronous replication and
reads on slave

-> use of memcached (like for most data in UPS) as a
cache for reads (let replication happens)




Implementation in MongoDB
2 Go of data and 2 Go of index for 14 millions documents
(from « db.stats(); »)

Insert / update : 600 k each day / communication exception
: 6 k each day
Average insert/update time : 56 ms




Some figures
Default values of the Java mongo driver are inappropriate :
unlimited connect timeout, unlimited read timeout, wait 120
seconds to get a connection from pool !

Cant’ make « AND » query on the same field
before mongo 2.0

Is it a good choice to read on slave / write on master ?
Replication time ? Is it a real use case ?
To replace by :
force acknowledge on writes and read on slave ?
OR
don’t acknowledge writes and read on master ?



Problems & pending question
Mongo @ SFR
Targeted ads application
Hi!




Matthieu Blanc
Web architect @ Degetel, contractor for SFR
Context
Present targeted ads to www.sfr.fr web visitors
Based on :

●   Their profile
●   Their web browsing history
●   Date/Time of the day
●   etc.
Ex : A web visitor consult a smartphone @ www.sfr.fr
A smartphones ad is shown when he goes back to
homepage
Ex : A web visitor goes to www.sfr.fr from a search
engine
An ad related to his search is shown
Problem
Need to keep web visitor web browsing history

Need to track down every :
● Ad views
● Clicks
● Conversions

Mongo DB to the rescue!
image from http://www.flickr.
                                           com/photos/cayusa/




The D.U.N.C.E. principle : everything by default
Java 1.6
         Spring Data for MongoDB 1.0.0
         (uses mongo driver 2.7.1)
         Read/Write on master
         No Sharding
         WriteConcern.NORMAL




The D.U.N.C.E. principle : everything by default
Case Study




Event Logging with MongoDB
Capped collections :

Event Logging
db.createCollection("mycoll", {capped: true, size:100000})


Old log data automatically LRU’s out
No risk of filling up a disk

no need to write log archival / deletion scripts

Good performance for a high number of writes compared to
reads




Event Logging
Map Reduce <- we are bad at this

  Cron Job -> Server side logs aggregation by minute
  and by ad

  Aggregated logs persisted in a dedicated collection

  Cron Job 2 consolidate aggregated logs by hour every
  day

  Cron Job 3 consolidate aggregated logs by day every
  week




Log Analysis
Event Logging
The Result
The Result
The Result
Main collection (visitors web browsing history):
36 millions documents and growing
Some Data
Avg. document size 430 bytes

80 millions events processed in less than 3 months

By seconds 60 reads 50 writes (60 finds, 30 updates, 20
inserts)




Conclusion
It works! :)


Some Data
Default properties are good enough even for a high traffic
website (for now...)




Conclusion
Mongo @ SFR
Products catalog
Good morning!




David Rault, web architect @ SFR
In charge of MarketPlace project
@squat80       http://fr.linkedin.com/pub/david-rault/37/722/963
●   Products classified by categories
 ●   Categories determine products features
 ●   Multiple sellers
     ○   can create new products (based on EAN/MPN)
          ■ can modify the products they created

          ■ can only refer to products created by other

            sellers
     ○   publish offers (product id + price)
 ●   Order management is out-of-scope
     ○   delegated to existing order-management system
 ●   Still in development

Context
●   Schema-less: products are structured
    documents
    ○   Different properties depending on product category
        (TVs, phone protections, wires, ...)
    ○   No JOIN required - documents load in a single call
    ○   New categories will come : no migration required
●   Searching capabilities
    ○   Empowers navigating through the store
    ○   Complex-queries on products features
●   Performance
    ○   Our Ops forbid intensive writes into Oracle DB (!)


Why Mongo ?
Java 7 - Tomcat 7

Direct use of Java driver (2.7.2)

Replicat-set (2 replicas + 1 arbiter)

Sharding enabled

Writes are replicas-safe



Technical choices
●   WS for creation/update of products and
     offers
 ●   Triggers (scheduled) to consolidate data
     ○   for each product : valid offers on a 2-day window
         are agregated into the product
     ○   for each categories : product counts, pseudo-
         enumerated field values (e.g. list of brands) are
         agregrated into the product
 ●   "Live streaming" into Google Search
     Appliance
     ○   feed for both internal keyword searches & portal-
         wide searches (within *.sfr.fr sites)

"Back-office" Design
●   Straight-forward queries
     ○   mostly READs
     ○   by product id, by category
     ○   filtering (min/max price, by brand, by color, ...)
           ■ filters are category-specific

 ●   Customer-activity tracking
     ○   build knowledge base for future features:
          ■ recommendation engine

     ○   products viewed, previous orders, wish-list, etc.
     ○   both for identified and anonymous visitors




"Front-office" design
●   Need to unlearn 10+ years EXP in
     relational design/development
     ○   Think "document", not relation
     ○   No magical (a.k.a ORM) framework
             ●   bye bye Hibernate ;)
     ○   Some surprises/confusion with the query syntax
          ■ No "$and" in versions <2.0, didn't manage some

            queries (though it worked in mongo shell)
             ●   "min_price > a and min_price > b" with the Java driver
         ■   Function operators appear at varying positions
             ●   { "$lt": { "some_field": some_value }}
             ●   { "some_field": { "$in" : some_values }}




How is it going ?
●   Good performance
     ○   Although relatively low number of documents
         (~5-10 000 documents)
 ●   Fast development cycle
     ○   Only a few hours to have the first prototype
         running
     ○   With google's help and a couple of hours, build a
         micro full-text indexing search feature
 ●   Mongo Shell is my friend
     ○   as well as Google & MongoDB.org
     ○   at last, a developer-friendly (command-line) tool
             ●   bye bye sqlplus ;)



How is it still going ?
"borrowed" from Geek and Poke http://geekandpoke.typepad.com/




       Thank You!

Mais conteúdo relacionado

Mais procurados

Getting Started with MongoDB
Getting Started with MongoDBGetting Started with MongoDB
Getting Started with MongoDBPankaj Bajaj
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB
 
Munching the mongo
Munching the mongoMunching the mongo
Munching the mongoVulcanMinds
 
MongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer GuideMongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer GuideShiv K Sah
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy servertkramar
 
Mango Database - Web Development
Mango Database - Web DevelopmentMango Database - Web Development
Mango Database - Web Developmentmssaman
 
GR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with GrailsGR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with GrailsGR8Conf
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Dbchriskite
 
High Level Infrastructure of Data Driven Blog
High Level Infrastructure of Data Driven BlogHigh Level Infrastructure of Data Driven Blog
High Level Infrastructure of Data Driven BlogSafique Ahmed Faruque
 
Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Dobrica Pavlinušić
 

Mais procurados (15)

MongoDB FabLab León
MongoDB FabLab LeónMongoDB FabLab León
MongoDB FabLab León
 
Getting Started with MongoDB
Getting Started with MongoDBGetting Started with MongoDB
Getting Started with MongoDB
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
 
Mongodb (1)
Mongodb (1)Mongodb (1)
Mongodb (1)
 
Munching the mongo
Munching the mongoMunching the mongo
Munching the mongo
 
Grails and Neo4j
Grails and Neo4jGrails and Neo4j
Grails and Neo4j
 
MongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer GuideMongoDB NoSQL - Developer Guide
MongoDB NoSQL - Developer Guide
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy server
 
Mango Database - Web Development
Mango Database - Web DevelopmentMango Database - Web Development
Mango Database - Web Development
 
GR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with GrailsGR8Conf 2011: Building Progressive UIs with Grails
GR8Conf 2011: Building Progressive UIs with Grails
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
MongoDB
MongoDBMongoDB
MongoDB
 
High Level Infrastructure of Data Driven Blog
High Level Infrastructure of Data Driven BlogHigh Level Infrastructure of Data Driven Blog
High Level Infrastructure of Data Driven Blog
 
Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?
 

Semelhante a MongoDB@sfr.fr

Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...Distinct Buzz
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioMáté Lang
 
Web App Prototypes with Google App Engine
Web App Prototypes with Google App EngineWeb App Prototypes with Google App Engine
Web App Prototypes with Google App EngineVlad Filippov
 
Programming for non tech entrepreneurs
Programming for non tech entrepreneursProgramming for non tech entrepreneurs
Programming for non tech entrepreneursRodrigo Gil
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database AuditingJuan Berner
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
PyGrunn2013 High Performance Web Applications with TurboGears
PyGrunn2013  High Performance Web Applications with TurboGearsPyGrunn2013  High Performance Web Applications with TurboGears
PyGrunn2013 High Performance Web Applications with TurboGearsAlessandro Molina
 
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...Jimmy DeadcOde
 
Mongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam HelmanMongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam HelmanHakka Labs
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGPablo Garbossa
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Hakka Labs
 
GWT - Building Rich Internet Applications Using OO Tools
GWT - Building Rich Internet Applications Using OO ToolsGWT - Building Rich Internet Applications Using OO Tools
GWT - Building Rich Internet Applications Using OO Toolsbarciszewski
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesHadi Ariawan
 

Semelhante a MongoDB@sfr.fr (20)

Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Dust.js
Dust.jsDust.js
Dust.js
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
Bogdan Belu @IMWORLD 2014. Case study: How Distinct helped evoMAG.ro handle B...
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Web App Prototypes with Google App Engine
Web App Prototypes with Google App EngineWeb App Prototypes with Google App Engine
Web App Prototypes with Google App Engine
 
Programming for non tech entrepreneurs
Programming for non tech entrepreneursProgramming for non tech entrepreneurs
Programming for non tech entrepreneurs
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
PyGrunn2013 High Performance Web Applications with TurboGears
PyGrunn2013  High Performance Web Applications with TurboGearsPyGrunn2013  High Performance Web Applications with TurboGears
PyGrunn2013 High Performance Web Applications with TurboGears
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitM...
 
Mongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam HelmanMongo db - How we use Go and MongoDB by Sam Helman
Mongo db - How we use Go and MongoDB by Sam Helman
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
 
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 
GWT - Building Rich Internet Applications Using OO Tools
GWT - Building Rich Internet Applications Using OO ToolsGWT - Building Rich Internet Applications Using OO Tools
GWT - Building Rich Internet Applications Using OO Tools
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 

Último

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

MongoDB@sfr.fr

  • 3. Apache, Tomcat, JEE 1 mutualised platform 30 physical application servers 150 Tomcat deployed Web development at Internet Direction
  • 4. 22M pageviews per day 4.5M only on homepage 8M customers authentication per day We NEED to scale! What do we face ?
  • 5. Increase our scalability Avoid Schema/Table/Column dependency Closer to developper team than sysadmin or DBA team NoSQL?
  • 6. Scalable Complex queries Schema-less Easy deployment and monitoring Open-Source Why MongoDB ?
  • 7. [Live project] customers data [Live project] sfr.fr targeted ads [Development project] Products catalog Our projects based on MongoDB
  • 9. Hello! Jérôme Leleu, web architect @ SFR In charge of SSO and user profile service
  • 10. User profile service (UPS) Web services (SOAP or JSON) Get the profile of SFR clients Data are agregated from many backends of the information system Context
  • 11. Java 1.6, mongo driver 2.6.5, replicat set + sharding Technical data : « local storage » collection ■ only 1 collection in a database ■ « last connection date » of web account ■ 14 millions ■ read/writes by identifier of the web account (shard key) Some functional data are coming : « internautes » collection (6 millions)… Data in UPS
  • 12. My choice : read on slave and write (without acknowledge) on master « local storage » collection needs to be readable immediatly after write -> not really compatible with asynchronous replication and reads on slave -> use of memcached (like for most data in UPS) as a cache for reads (let replication happens) Implementation in MongoDB
  • 13. 2 Go of data and 2 Go of index for 14 millions documents (from « db.stats(); ») Insert / update : 600 k each day / communication exception : 6 k each day Average insert/update time : 56 ms Some figures
  • 14. Default values of the Java mongo driver are inappropriate : unlimited connect timeout, unlimited read timeout, wait 120 seconds to get a connection from pool ! Cant’ make « AND » query on the same field before mongo 2.0 Is it a good choice to read on slave / write on master ? Replication time ? Is it a real use case ? To replace by : force acknowledge on writes and read on slave ? OR don’t acknowledge writes and read on master ? Problems & pending question
  • 15. Mongo @ SFR Targeted ads application
  • 16. Hi! Matthieu Blanc Web architect @ Degetel, contractor for SFR
  • 17. Context Present targeted ads to www.sfr.fr web visitors Based on : ● Their profile ● Their web browsing history ● Date/Time of the day ● etc.
  • 18. Ex : A web visitor consult a smartphone @ www.sfr.fr
  • 19. A smartphones ad is shown when he goes back to homepage
  • 20. Ex : A web visitor goes to www.sfr.fr from a search engine
  • 21. An ad related to his search is shown
  • 22. Problem Need to keep web visitor web browsing history Need to track down every : ● Ad views ● Clicks ● Conversions Mongo DB to the rescue!
  • 23. image from http://www.flickr. com/photos/cayusa/ The D.U.N.C.E. principle : everything by default
  • 24. Java 1.6 Spring Data for MongoDB 1.0.0 (uses mongo driver 2.7.1) Read/Write on master No Sharding WriteConcern.NORMAL The D.U.N.C.E. principle : everything by default
  • 25. Case Study Event Logging with MongoDB
  • 26. Capped collections : Event Logging db.createCollection("mycoll", {capped: true, size:100000}) Old log data automatically LRU’s out No risk of filling up a disk no need to write log archival / deletion scripts Good performance for a high number of writes compared to reads Event Logging
  • 27. Map Reduce <- we are bad at this Cron Job -> Server side logs aggregation by minute and by ad Aggregated logs persisted in a dedicated collection Cron Job 2 consolidate aggregated logs by hour every day Cron Job 3 consolidate aggregated logs by day every week Log Analysis
  • 32. Main collection (visitors web browsing history): 36 millions documents and growing Some Data Avg. document size 430 bytes 80 millions events processed in less than 3 months By seconds 60 reads 50 writes (60 finds, 30 updates, 20 inserts) Conclusion
  • 33. It works! :) Some Data Default properties are good enough even for a high traffic website (for now...) Conclusion
  • 35. Good morning! David Rault, web architect @ SFR In charge of MarketPlace project @squat80 http://fr.linkedin.com/pub/david-rault/37/722/963
  • 36. Products classified by categories ● Categories determine products features ● Multiple sellers ○ can create new products (based on EAN/MPN) ■ can modify the products they created ■ can only refer to products created by other sellers ○ publish offers (product id + price) ● Order management is out-of-scope ○ delegated to existing order-management system ● Still in development Context
  • 37. Schema-less: products are structured documents ○ Different properties depending on product category (TVs, phone protections, wires, ...) ○ No JOIN required - documents load in a single call ○ New categories will come : no migration required ● Searching capabilities ○ Empowers navigating through the store ○ Complex-queries on products features ● Performance ○ Our Ops forbid intensive writes into Oracle DB (!) Why Mongo ?
  • 38. Java 7 - Tomcat 7 Direct use of Java driver (2.7.2) Replicat-set (2 replicas + 1 arbiter) Sharding enabled Writes are replicas-safe Technical choices
  • 39. WS for creation/update of products and offers ● Triggers (scheduled) to consolidate data ○ for each product : valid offers on a 2-day window are agregated into the product ○ for each categories : product counts, pseudo- enumerated field values (e.g. list of brands) are agregrated into the product ● "Live streaming" into Google Search Appliance ○ feed for both internal keyword searches & portal- wide searches (within *.sfr.fr sites) "Back-office" Design
  • 40. Straight-forward queries ○ mostly READs ○ by product id, by category ○ filtering (min/max price, by brand, by color, ...) ■ filters are category-specific ● Customer-activity tracking ○ build knowledge base for future features: ■ recommendation engine ○ products viewed, previous orders, wish-list, etc. ○ both for identified and anonymous visitors "Front-office" design
  • 41. Need to unlearn 10+ years EXP in relational design/development ○ Think "document", not relation ○ No magical (a.k.a ORM) framework ● bye bye Hibernate ;) ○ Some surprises/confusion with the query syntax ■ No "$and" in versions <2.0, didn't manage some queries (though it worked in mongo shell) ● "min_price > a and min_price > b" with the Java driver ■ Function operators appear at varying positions ● { "$lt": { "some_field": some_value }} ● { "some_field": { "$in" : some_values }} How is it going ?
  • 42. Good performance ○ Although relatively low number of documents (~5-10 000 documents) ● Fast development cycle ○ Only a few hours to have the first prototype running ○ With google's help and a couple of hours, build a micro full-text indexing search feature ● Mongo Shell is my friend ○ as well as Google & MongoDB.org ○ at last, a developer-friendly (command-line) tool ● bye bye sqlplus ;) How is it still going ?
  • 43. "borrowed" from Geek and Poke http://geekandpoke.typepad.com/ Thank You!