SlideShare a Scribd company logo
1 of 29
Download to read offline
New York Web Tech Scaling
Scaling Business Insider by Pax Dickinson

1/29/2013
Our Sponsors

✤   Business Insider
    http://businessinsider.com

✤   Varnish Software
    http://varnish-software.com

✤   Thanks to our host, 10Gen
Scaling Business Insider
Constraints on Scaling BI

✤   Coping with the inherently
    unpredictable nature of news
    traffic

✤   Some events bring predictably
    high traffic (Apple product
    announcements), but traffic
    spikes can happen anytime
Constraints on Scaling BI

✤   Being on top of breaking news is
    of huge importance to us

✤   We can’t afford to have latency
    between CMS and production
Back-end vs. Front-end Scaling

✤   Two different scaling strategies that apply to different use cases

✤   Backend scaling is useful for any site since it helps no matter how
    dynamic your content is, but it’s difficult because it involves the
    entire stack.

✤   Front-end scaling is useful for sites that need to deliver the same
    content to a huge audience.

✤   Which do we use at Business Insider? Both, of course.
Scaling Both Ways
MongoDB for Backend Scaling

✤   MongoDB is a NoSQL database, it stores documents rather than
    relational data and it lacks transactions.

✤   MongoDB doesn’t ensure writes by default.

✤   These choices make it fast but they need to be understood by
    developers using MongoDB.
Business Insider Data Constraints

✤   Our data storage for the site content itself is less than 10 GB, growing
    a few GB a year.

✤   Images that are blog content, stored in the database using GridFS,
    come to another 100 or so GB, growing a few GB a month.

✤   We need to constantly record internal analytics of page views and
    unique visitors for business use.

✤   We need updates to be reflected immediately.

✤   Our architecture needs to allow for exponential transaction volume
    growth.
Business Insider & MongoDB
✤   The blog (including images) is stored on DB1.

✤   Analytics are written to DB3 so the analytics write locks don’t affect
    blog performance.

✤   A shared slave is ready to step in
    during a failure of either server.

✤   All transactions are performed
    against the primaries.
Business Insider & MongoDB
✤   As we grow, we’ll plan to move GridFS off the blog server and onto
    its own server & slave.

✤   The blog and analytics can be
    sharded for performance when that’s
    eventually needed.

✤   Our DB servers are dual quad-core
    Xeons with 64GB of RAM and SSD
    data storage

✤   We’re handling 800-1000 ops/sec
    with negligible CPU load.
Data Modeling for Scaling

✤   MongoDB allows the storage of embedded documents within a
    document. We use this to store an array of comments within the blog
    post document they belong to.

✤   This eliminates the costly joins traditional
    SQL would require. In order to provide a
    moderation interface displaying all recent
    comments, we de-normalize that data and
    double store it.
Varnish for Front-end Caching

✤
    Varnish is a front-end caching reverse proxy.



✤   Varnish retrieves web pages on behalf of clients and caches the result.
    Clients that can be served from cache don’t add any load to your
    backend.
How Varnish Works
Varnish & Business Insider

✤   We use two Varnish servers with single quad-core processors and
    32GB of RAM each. They are randomly load balanced between and
    each store a full 24GB RAM cache of the site.

✤   Average weekdays peak around 700 reqs/sec on each Varnish server,
    spiking to over 1500 reqs/sec during breaking news such as Apple
    quarterly earnings.

✤   Our four backend Apache/PHP
    servers tend to see 50-60 reqs/s,
    and during breaking news this
    only spikes to 70-80 reqs/s.
Varnish Active Bans

✤   When an editor publishes or edits a post, a ban request is sent to the
    Varnish servers.

✤   They add post pages, vertical pages, and author pages associated
    with that post to the Varnish ban list.

✤   The next time a request matches that list, the content is retrieved fresh
    from the backend and the cache is refreshed.

✤   This lets us cache but still keep our content totally up-to-date.
Varnish Active Bans
Edge Side Includes
✤   Varnish allows for the use of Edge Side Includes, parts of a page that
    are retrieved separately and have different cache lifetimes.

✤   One example is the “Most Read” widget in
    the right rail on Business Insider.

✤   The page hosting the module may be
    cached for an hour but the Edge Side
    Include hosting the widget has a 5 minute
    TTL, keeping it up to date.
Edge Side Includes




Another example is the top user menu. This is an Edge Side Include that
includes the logged in user’s ID in the hash, so each user gets this block
customized for them while the rest of the page can be cached more
generally.

Varnish allows you full programmatic control of what in included in the
hash for each request, so complex tricks like this are possible.
Edge Side Includes



Edge Side Includes are a common caching standard, this is the same
format used by Akamai and other CDN providers.

Generally you should add a header tag to any page including an ESI.
You need to tell Varnish to process your ESI tags and if you run that
processing on every payload you’ll waste resources and risk corrupting
any binary or image that happens to contain the sequence “<esi:”.
Scaling Varnish Servers
✤   The load balancer
    randomly sends
    traffic to the Varnish
    servers.

✤   Each Varnish server
    caches every page.

✤   Every cache ban
    results in two
    backend requests, one
    from each Varnish
    server.
Scaling Varnish Servers
✤   Adding a third
    Varnish server means
    a third backend
    request for every ban.

✤   That defeats our
    purpose entirely!
Scaling Varnish Servers
✤   We can solve this a
    few ways, but we’ll
    use Layer 7 load
    balancing.

✤   We can send a subset
    of the URL space to
    each Varnish server

✤   Only one copy of each
    cached page will exist
    on the cluster,
    reducing load on the
    backend.
A Closing Testimonial From Jay-Z

✤   “If you’re having
    scaling problems,
    I feel bad for you son...

    Clients sent 99 requests
    but my backend got
    one.”
                                Photo by flickr user matthew_harrison
Til Next Time...

✤   Feb. 28th, 2013
    (tentative)

✤   Topic TBD
Sources & Links

✤   MongoDB
    http://www.mongodb.org

✤   Varnish Cache
    http://www.varnish-cache.org

More Related Content

What's hot

Link Header-based Invalidation of Caches
Link Header-based Invalidation of CachesLink Header-based Invalidation of Caches
Link Header-based Invalidation of Cachesmikekelly
 
Html5 storage suggestions for challenges.pptx
Html5 storage   suggestions for challenges.pptxHtml5 storage   suggestions for challenges.pptx
Html5 storage suggestions for challenges.pptxdeepmoteria
 
Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...
Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...
Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...Neklo
 
Altitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edgeAltitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edgeFastly
 
Asynchronous Web Programming with HTML5 WebSockets and Java
Asynchronous Web Programming with HTML5 WebSockets and JavaAsynchronous Web Programming with HTML5 WebSockets and Java
Asynchronous Web Programming with HTML5 WebSockets and JavaJames Falkner
 
Magento applications and modules functional testing
Magento applications and modules functional testingMagento applications and modules functional testing
Magento applications and modules functional testingNeklo
 
At Your Service - Abusing the Service Workers Web API
At Your Service - Abusing the Service Workers Web APIAt Your Service - Abusing the Service Workers Web API
At Your Service - Abusing the Service Workers Web APIDaniel Abeles
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowAli Kheyrollahi
 
Using html5 to build offline applications
Using html5 to build offline applicationsUsing html5 to build offline applications
Using html5 to build offline applicationsWoody Pewitt
 
Tips and Tricks For Faster Asp.NET and MVC Applications
Tips and Tricks For Faster Asp.NET and MVC ApplicationsTips and Tricks For Faster Asp.NET and MVC Applications
Tips and Tricks For Faster Asp.NET and MVC ApplicationsSarvesh Kushwaha
 
Web Storage & Web Workers
Web Storage & Web WorkersWeb Storage & Web Workers
Web Storage & Web WorkersInbal Geffen
 
Ускоряем загрузку картинок вебсокетами
Ускоряем загрузку картинок вебсокетамиУскоряем загрузку картинок вебсокетами
Ускоряем загрузку картинок вебсокетами2ГИС Технологии
 
Using WebSockets with ColdFusion
Using WebSockets with ColdFusionUsing WebSockets with ColdFusion
Using WebSockets with ColdFusioncfjedimaster
 
salkdjfhdjkghdfkjh
salkdjfhdjkghdfkjhsalkdjfhdjkghdfkjh
salkdjfhdjkghdfkjhelodiaevie
 

What's hot (17)

Link Header-based Invalidation of Caches
Link Header-based Invalidation of CachesLink Header-based Invalidation of Caches
Link Header-based Invalidation of Caches
 
Html5 storage suggestions for challenges.pptx
Html5 storage   suggestions for challenges.pptxHtml5 storage   suggestions for challenges.pptx
Html5 storage suggestions for challenges.pptx
 
Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...
Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...
Create scalable and failure safe cluster MagentoCommerce using cloud hosting ...
 
Altitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edgeAltitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edge
 
Asynchronous Web Programming with HTML5 WebSockets and Java
Asynchronous Web Programming with HTML5 WebSockets and JavaAsynchronous Web Programming with HTML5 WebSockets and Java
Asynchronous Web Programming with HTML5 WebSockets and Java
 
Http caching
Http cachingHttp caching
Http caching
 
Magento applications and modules functional testing
Magento applications and modules functional testingMagento applications and modules functional testing
Magento applications and modules functional testing
 
At Your Service - Abusing the Service Workers Web API
At Your Service - Abusing the Service Workers Web APIAt Your Service - Abusing the Service Workers Web API
At Your Service - Abusing the Service Workers Web API
 
Caching in asp.net mvc
Caching in asp.net mvcCaching in asp.net mvc
Caching in asp.net mvc
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCow
 
Using html5 to build offline applications
Using html5 to build offline applicationsUsing html5 to build offline applications
Using html5 to build offline applications
 
Tips and Tricks For Faster Asp.NET and MVC Applications
Tips and Tricks For Faster Asp.NET and MVC ApplicationsTips and Tricks For Faster Asp.NET and MVC Applications
Tips and Tricks For Faster Asp.NET and MVC Applications
 
Web Storage & Web Workers
Web Storage & Web WorkersWeb Storage & Web Workers
Web Storage & Web Workers
 
Ускоряем загрузку картинок вебсокетами
Ускоряем загрузку картинок вебсокетамиУскоряем загрузку картинок вебсокетами
Ускоряем загрузку картинок вебсокетами
 
Using WebSockets with ColdFusion
Using WebSockets with ColdFusionUsing WebSockets with ColdFusion
Using WebSockets with ColdFusion
 
WebSockets and Java
WebSockets and JavaWebSockets and Java
WebSockets and Java
 
salkdjfhdjkghdfkjh
salkdjfhdjkghdfkjhsalkdjfhdjkghdfkjh
salkdjfhdjkghdfkjh
 

Similar to How Varnish & MongoDB Scale Business Insider

Magento Meetup New Delhi- Magento2 Speed Optimization
Magento Meetup New Delhi- Magento2 Speed OptimizationMagento Meetup New Delhi- Magento2 Speed Optimization
Magento Meetup New Delhi- Magento2 Speed OptimizationWebkul Software Pvt. Ltd.
 
Word press optimizations
Word press optimizations Word press optimizations
Word press optimizations Shawn DeWolfe
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017ElifTech
 
I Can Haz More Performanz?
I Can Haz More Performanz?I Can Haz More Performanz?
I Can Haz More Performanz?Andy Melichar
 
Guide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdf
Guide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdfGuide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdf
Guide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdfpersuebusiness
 
WordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdfWordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdfcodearachnid_test
 
WordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdfWordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdfcodearachnid_test
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
How to Build a Website Similar to WorldStarHipHop
How to Build a Website Similar to WorldStarHipHopHow to Build a Website Similar to WorldStarHipHop
How to Build a Website Similar to WorldStarHipHopTarik Pierce
 
Growing MongoDB on AWS
Growing MongoDB on AWSGrowing MongoDB on AWS
Growing MongoDB on AWScolinthehowe
 
Web Client Performance
Web Client PerformanceWeb Client Performance
Web Client PerformanceHerea Adrian
 
WordCamp RVA 2011 - Performance & Tuning
WordCamp RVA 2011 - Performance & TuningWordCamp RVA 2011 - Performance & Tuning
WordCamp RVA 2011 - Performance & TuningTimothy Wood
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibrePablo Moretti
 
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008codebits
 

Similar to How Varnish & MongoDB Scale Business Insider (20)

Magento Meetup New Delhi- Magento2 Speed Optimization
Magento Meetup New Delhi- Magento2 Speed OptimizationMagento Meetup New Delhi- Magento2 Speed Optimization
Magento Meetup New Delhi- Magento2 Speed Optimization
 
Word press optimizations
Word press optimizations Word press optimizations
Word press optimizations
 
23 Ways To Speed Up WordPress
23 Ways To Speed Up WordPress23 Ways To Speed Up WordPress
23 Ways To Speed Up WordPress
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
 
I Can Haz More Performanz?
I Can Haz More Performanz?I Can Haz More Performanz?
I Can Haz More Performanz?
 
Guide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdf
Guide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdfGuide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdf
Guide 4 - How To Dramatically Speed Up Your Website Using A Caching Plugin.pdf
 
WordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdfWordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdf
 
WordCamp RVA
WordCamp RVAWordCamp RVA
WordCamp RVA
 
WordCamp RVA
WordCamp RVAWordCamp RVA
WordCamp RVA
 
WordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdfWordCamp RVA 2011 - Performance & Tuning.pdf
WordCamp RVA 2011 - Performance & Tuning.pdf
 
Web browser architecture.pptx
Web browser architecture.pptxWeb browser architecture.pptx
Web browser architecture.pptx
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
How to Build a Website Similar to WorldStarHipHop
How to Build a Website Similar to WorldStarHipHopHow to Build a Website Similar to WorldStarHipHop
How to Build a Website Similar to WorldStarHipHop
 
Caching 101
Caching 101Caching 101
Caching 101
 
Growing MongoDB on AWS
Growing MongoDB on AWSGrowing MongoDB on AWS
Growing MongoDB on AWS
 
Web Client Performance
Web Client PerformanceWeb Client Performance
Web Client Performance
 
WordCamp RVA 2011 - Performance & Tuning
WordCamp RVA 2011 - Performance & TuningWordCamp RVA 2011 - Performance & Tuning
WordCamp RVA 2011 - Performance & Tuning
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibre
 
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
 
Mdb dn 2016_11_ops_mgr
Mdb dn 2016_11_ops_mgrMdb dn 2016_11_ops_mgr
Mdb dn 2016_11_ops_mgr
 

How Varnish & MongoDB Scale Business Insider

  • 1. New York Web Tech Scaling Scaling Business Insider by Pax Dickinson 1/29/2013
  • 2. Our Sponsors ✤ Business Insider http://businessinsider.com ✤ Varnish Software http://varnish-software.com ✤ Thanks to our host, 10Gen
  • 4.
  • 5.
  • 6.
  • 7. Constraints on Scaling BI ✤ Coping with the inherently unpredictable nature of news traffic ✤ Some events bring predictably high traffic (Apple product announcements), but traffic spikes can happen anytime
  • 8. Constraints on Scaling BI ✤ Being on top of breaking news is of huge importance to us ✤ We can’t afford to have latency between CMS and production
  • 9. Back-end vs. Front-end Scaling ✤ Two different scaling strategies that apply to different use cases ✤ Backend scaling is useful for any site since it helps no matter how dynamic your content is, but it’s difficult because it involves the entire stack. ✤ Front-end scaling is useful for sites that need to deliver the same content to a huge audience. ✤ Which do we use at Business Insider? Both, of course.
  • 11. MongoDB for Backend Scaling ✤ MongoDB is a NoSQL database, it stores documents rather than relational data and it lacks transactions. ✤ MongoDB doesn’t ensure writes by default. ✤ These choices make it fast but they need to be understood by developers using MongoDB.
  • 12. Business Insider Data Constraints ✤ Our data storage for the site content itself is less than 10 GB, growing a few GB a year. ✤ Images that are blog content, stored in the database using GridFS, come to another 100 or so GB, growing a few GB a month. ✤ We need to constantly record internal analytics of page views and unique visitors for business use. ✤ We need updates to be reflected immediately. ✤ Our architecture needs to allow for exponential transaction volume growth.
  • 13. Business Insider & MongoDB ✤ The blog (including images) is stored on DB1. ✤ Analytics are written to DB3 so the analytics write locks don’t affect blog performance. ✤ A shared slave is ready to step in during a failure of either server. ✤ All transactions are performed against the primaries.
  • 14. Business Insider & MongoDB ✤ As we grow, we’ll plan to move GridFS off the blog server and onto its own server & slave. ✤ The blog and analytics can be sharded for performance when that’s eventually needed. ✤ Our DB servers are dual quad-core Xeons with 64GB of RAM and SSD data storage ✤ We’re handling 800-1000 ops/sec with negligible CPU load.
  • 15. Data Modeling for Scaling ✤ MongoDB allows the storage of embedded documents within a document. We use this to store an array of comments within the blog post document they belong to. ✤ This eliminates the costly joins traditional SQL would require. In order to provide a moderation interface displaying all recent comments, we de-normalize that data and double store it.
  • 16. Varnish for Front-end Caching ✤ Varnish is a front-end caching reverse proxy. ✤ Varnish retrieves web pages on behalf of clients and caches the result. Clients that can be served from cache don’t add any load to your backend.
  • 18. Varnish & Business Insider ✤ We use two Varnish servers with single quad-core processors and 32GB of RAM each. They are randomly load balanced between and each store a full 24GB RAM cache of the site. ✤ Average weekdays peak around 700 reqs/sec on each Varnish server, spiking to over 1500 reqs/sec during breaking news such as Apple quarterly earnings. ✤ Our four backend Apache/PHP servers tend to see 50-60 reqs/s, and during breaking news this only spikes to 70-80 reqs/s.
  • 19. Varnish Active Bans ✤ When an editor publishes or edits a post, a ban request is sent to the Varnish servers. ✤ They add post pages, vertical pages, and author pages associated with that post to the Varnish ban list. ✤ The next time a request matches that list, the content is retrieved fresh from the backend and the cache is refreshed. ✤ This lets us cache but still keep our content totally up-to-date.
  • 21. Edge Side Includes ✤ Varnish allows for the use of Edge Side Includes, parts of a page that are retrieved separately and have different cache lifetimes. ✤ One example is the “Most Read” widget in the right rail on Business Insider. ✤ The page hosting the module may be cached for an hour but the Edge Side Include hosting the widget has a 5 minute TTL, keeping it up to date.
  • 22. Edge Side Includes Another example is the top user menu. This is an Edge Side Include that includes the logged in user’s ID in the hash, so each user gets this block customized for them while the rest of the page can be cached more generally. Varnish allows you full programmatic control of what in included in the hash for each request, so complex tricks like this are possible.
  • 23. Edge Side Includes Edge Side Includes are a common caching standard, this is the same format used by Akamai and other CDN providers. Generally you should add a header tag to any page including an ESI. You need to tell Varnish to process your ESI tags and if you run that processing on every payload you’ll waste resources and risk corrupting any binary or image that happens to contain the sequence “<esi:”.
  • 24. Scaling Varnish Servers ✤ The load balancer randomly sends traffic to the Varnish servers. ✤ Each Varnish server caches every page. ✤ Every cache ban results in two backend requests, one from each Varnish server.
  • 25. Scaling Varnish Servers ✤ Adding a third Varnish server means a third backend request for every ban. ✤ That defeats our purpose entirely!
  • 26. Scaling Varnish Servers ✤ We can solve this a few ways, but we’ll use Layer 7 load balancing. ✤ We can send a subset of the URL space to each Varnish server ✤ Only one copy of each cached page will exist on the cluster, reducing load on the backend.
  • 27. A Closing Testimonial From Jay-Z ✤ “If you’re having scaling problems, I feel bad for you son... Clients sent 99 requests but my backend got one.” Photo by flickr user matthew_harrison
  • 28. Til Next Time... ✤ Feb. 28th, 2013 (tentative) ✤ Topic TBD
  • 29. Sources & Links ✤ MongoDB http://www.mongodb.org ✤ Varnish Cache http://www.varnish-cache.org