SlideShare a Scribd company logo
1 of 23
Scalable File System
In 14 Days
Jeff Hoffer, Software Architect
Alex Zherdev, Sr. Software Engineer
Our Background
In the beginning...
“YouTube” for Documents
Today
“Make every small business better”
Professional Documents
Custom Documents
Business Licenses
Jason Nazar
Alon Shwartz
The Team
Our Product
www.docstoc.com
Initial Approach
Pros:
• Existing libraries used
• Reliable storage
• Replication
Cons:
• Hard to scale out
• Replication can’t keep up
• Taxed all data
SELECT `text_data` FROM `documents` WHERE `doc_id` = 8675309;
IIS HTTP Based Solution
Pros:
• HTTP GET
• IIS Static Content Cache
• 5TB = Years of Growth
• Easy Setup & Deploy
Cons:
• Not scalable
• NTFS & 30M small files
• Replication In-House
HTTP GET http://docs.api/text/160717/8675309.txt
Importance of Performance
• IIS Source Failed early
2013
• Page speed heavily
influenced our traffic
and SEO
• MongoDB solution
implemented within 2
weeks and results
immediately felt
0
5
10
15
20
25
Speed
0
1
2
3
4
Views
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core business data
 Doc Page Peak Read Load of 200 / second < 4s
 REST Interface – switch only requires changing URL
 Easy to Maintain – maintenance cost of no more than 1 FTE / day
/ month
 99.9% uptime
 Can handle # of our current set of text files 43 M
 Production Rollout within 3 weeks
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core business data
 Doc Page Peak Read Load of 200 / second < 4s
 REST Interface – switch only requires changing URL
 Easy to Maintain – maintenance cost of no more than 1 FTE /
day / month
 99.9% uptime
 Can handle # of our current set of text files 43 M
 Production Rollout within 3 weeks
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core business data
 Doc Page Peak Read Load of 200 / second < 4s
 REST Interface – switch only requires changing URL
 Easy to Maintain – maintenance cost of no more than 1 FTE /
day / month
 99.9% uptime
 Can handle # of our current set of text files 43 M
 Production Rollout within 3 weeks
Requirements
 Sharded – horizontal scale out of reads and writes
 Replication – no single point of failure for core business data
 Doc Page Peak Read Load of 200 / second < 4s
 REST Interface – switch only requires changing URL
 Easy to Maintain – maintenance cost of no more than 1 FTE /
day / month
 99.9% uptime
 Can handle # of our current set of text files 43 M
 Production Rollout within 3 weeks
MongoDB FTW
Test Setup
{
id : {document_id}
body: {text_content}
created: {date_time}
}
• Simple Structure
• Object Size 50KB
• Shard on hashed id
• Rarely modified
• Heavy Reads
Mongo Collection Structure
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empty
Collection
20 min
(3x)
**10x peak load
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empty
Collection
20 min
(3x)
**10x peak load
Test Setup
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empty
Collection
20 min
(3x)
JMeter ASP.NET
REST
Server*
Empty
Collection
20 min
(3x)
*ASP.NET MVC 4 Web API
**10x peak load
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empty
Collection
20 min
(3x)
JMeter ASP.NET
REST
Server*
Empty
Collection
20 min
(3x)
Jmeter ASP.NET
REST
Server*
Seeded
Collection
2M
30 min
(3x)
*ASP.NET MVC 4 Web API
**10x peak load
Test Setup
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empty
Collection
20 min
(3x)
JMeter ASP.NET
REST
Server*
Empty
Collection
20 min
(3x)
Jmeter ASP.NET
REST
Server*
Seeded
Collection
2M
30 min
(3x)
.NET
Console
Loader
ASP.NET
REST
Server*
Seeded
Collection
2M
1 hour
(3x)
*ASP.NET MVC 4 Web API
**10x peak load
Tests
Client Server MongoDB Duration Reads
(100/sec)
Writes
(100/sec)
Read+Writes
(200/sec)**
JMeter Ruby REST
Server
Empty
Collection
20 min
(3x)
JMeter ASP.NET
REST
Server*
Empty
Collection
20 min
(3x)
Jmeter ASP.NET
REST
Server*
Seeded
Collection
2M
30 min
(3x)
.NET
Console
Loader
ASP.NET
REST
Server*
Seeded
Collection
2M
1 hour
(3x)
.NET
Console
Loader
ASP.NET
REST
Server*
Seeded
Collection
6M
Overnight
(10 hrs)
*ASP.NET MVC 4 Web API
**10x peak load
Production
In Conclusion…
It’s Good Enough, It’s Fast Enough, and Doggone It, Developers Like It!
• Fast Prototype
• Low Maintenance
• Quick Deployment
• Scale Out
• Stable
• Linux, Windows, Mac
• Excellent Support

More Related Content

What's hot

Building Scalable .NET Web Applications
Building Scalable .NET Web ApplicationsBuilding Scalable .NET Web Applications
Building Scalable .NET Web Applications
Buu Nguyen
 
High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010
Barry Abrahamson
 
Scaling a Web Service
Scaling a Web ServiceScaling a Web Service
Scaling a Web Service
Leon Ho
 

What's hot (20)

Ui perf
Ui perfUi perf
Ui perf
 
SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...
SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...
SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...
 
MongoDB 2.6 is great but what about 2.8?
MongoDB 2.6 is great but what about 2.8?MongoDB 2.6 is great but what about 2.8?
MongoDB 2.6 is great but what about 2.8?
 
Web api scalability and performance
Web api scalability and performanceWeb api scalability and performance
Web api scalability and performance
 
Building Scalable .NET Web Applications
Building Scalable .NET Web ApplicationsBuilding Scalable .NET Web Applications
Building Scalable .NET Web Applications
 
Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14Stack Exchange Infrastructure - LISA 14
Stack Exchange Infrastructure - LISA 14
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
 
Fluent 2012 v2
Fluent 2012   v2Fluent 2012   v2
Fluent 2012 v2
 
Startups to Scale
Startups to ScaleStartups to Scale
Startups to Scale
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
 
Performance engineering
Performance engineeringPerformance engineering
Performance engineering
 
Introdcution to Adobe CQ
Introdcution to Adobe CQIntrodcution to Adobe CQ
Introdcution to Adobe CQ
 
WordPress + NGINX Best Practices with EasyEngine
WordPress + NGINX Best Practices with EasyEngineWordPress + NGINX Best Practices with EasyEngine
WordPress + NGINX Best Practices with EasyEngine
 
Cold fusion is racecar fast
Cold fusion is racecar fastCold fusion is racecar fast
Cold fusion is racecar fast
 
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeter
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeterHow to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeter
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeter
 
High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010High Performance WordPress - WordCamp Jerusalem 2010
High Performance WordPress - WordCamp Jerusalem 2010
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
 
Server side caching Vs other alternatives
Server side caching Vs other alternativesServer side caching Vs other alternatives
Server side caching Vs other alternatives
 
High Performance WordPress II
High Performance WordPress IIHigh Performance WordPress II
High Performance WordPress II
 
Scaling a Web Service
Scaling a Web ServiceScaling a Web Service
Scaling a Web Service
 

Similar to Scalable Text File Service with MongoDB (Intuit)

Itp web application development
Itp web application developmentItp web application development
Itp web application development
Shibu S R
 
Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environment
vmaximiuk
 
Website Performance
Website PerformanceWebsite Performance
Website Performance
Hugo Fonseca
 
CTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should KnowCTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should Know
Spiffy
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_Farm
Nigel Price
 

Similar to Scalable Text File Service with MongoDB (Intuit) (20)

SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
SharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 PerformanceSharePoint Saturday The Conference 2011 - SP2010 Performance
SharePoint Saturday The Conference 2011 - SP2010 Performance
 
SharePoint 2010 Boost your farm performance!
SharePoint 2010 Boost your farm performance!SharePoint 2010 Boost your farm performance!
SharePoint 2010 Boost your farm performance!
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
Itp web application development
Itp web application developmentItp web application development
Itp web application development
 
Mtn view sql server nov 2014
Mtn view sql server nov 2014Mtn view sql server nov 2014
Mtn view sql server nov 2014
 
10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environment
 
SharePoint Performance Optimization In 10 Steps for the IT Professional
SharePoint Performance Optimization In 10 Steps for the IT ProfessionalSharePoint Performance Optimization In 10 Steps for the IT Professional
SharePoint Performance Optimization In 10 Steps for the IT Professional
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
 
Website Performance
Website PerformanceWebsite Performance
Website Performance
 
CTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should KnowCTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should Know
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
 
A faster web
A faster webA faster web
A faster web
 
Shopzilla - Performance By Design
Shopzilla - Performance By DesignShopzilla - Performance By Design
Shopzilla - Performance By Design
 
Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)
 
What's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by companyWhat's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by company
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_Farm
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Scalable Text File Service with MongoDB (Intuit)

  • 1. Scalable File System In 14 Days Jeff Hoffer, Software Architect Alex Zherdev, Sr. Software Engineer
  • 2. Our Background In the beginning... “YouTube” for Documents Today “Make every small business better” Professional Documents Custom Documents Business Licenses Jason Nazar Alon Shwartz The Team
  • 4. Initial Approach Pros: • Existing libraries used • Reliable storage • Replication Cons: • Hard to scale out • Replication can’t keep up • Taxed all data SELECT `text_data` FROM `documents` WHERE `doc_id` = 8675309;
  • 5. IIS HTTP Based Solution Pros: • HTTP GET • IIS Static Content Cache • 5TB = Years of Growth • Easy Setup & Deploy Cons: • Not scalable • NTFS & 30M small files • Replication In-House HTTP GET http://docs.api/text/160717/8675309.txt
  • 6. Importance of Performance • IIS Source Failed early 2013 • Page speed heavily influenced our traffic and SEO • MongoDB solution implemented within 2 weeks and results immediately felt 0 5 10 15 20 25 Speed 0 1 2 3 4 Views
  • 7. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  • 8. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  • 9. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  • 10. Requirements  Sharded – horizontal scale out of reads and writes  Replication – no single point of failure for core business data  Doc Page Peak Read Load of 200 / second < 4s  REST Interface – switch only requires changing URL  Easy to Maintain – maintenance cost of no more than 1 FTE / day / month  99.9% uptime  Can handle # of our current set of text files 43 M  Production Rollout within 3 weeks
  • 13. { id : {document_id} body: {text_content} created: {date_time} } • Simple Structure • Object Size 50KB • Shard on hashed id • Rarely modified • Heavy Reads Mongo Collection Structure
  • 14. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) **10x peak load
  • 15. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) **10x peak load
  • 17. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) *ASP.NET MVC 4 Web API **10x peak load
  • 18. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) *ASP.NET MVC 4 Web API **10x peak load
  • 20. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M 1 hour (3x) *ASP.NET MVC 4 Web API **10x peak load
  • 21. Tests Client Server MongoDB Duration Reads (100/sec) Writes (100/sec) Read+Writes (200/sec)** JMeter Ruby REST Server Empty Collection 20 min (3x) JMeter ASP.NET REST Server* Empty Collection 20 min (3x) Jmeter ASP.NET REST Server* Seeded Collection 2M 30 min (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 2M 1 hour (3x) .NET Console Loader ASP.NET REST Server* Seeded Collection 6M Overnight (10 hrs) *ASP.NET MVC 4 Web API **10x peak load
  • 23. In Conclusion… It’s Good Enough, It’s Fast Enough, and Doggone It, Developers Like It! • Fast Prototype • Low Maintenance • Quick Deployment • Scale Out • Stable • Linux, Windows, Mac • Excellent Support