SlideShare uma empresa Scribd logo
1 de 35
NoSQL Now 2011Why Wordnik went Non-Relational Tony Tam @fehguy
What this Talk is About 5 Key reasons why Wordnik migrated into a Non-Relational database Process for selection, migration Optimizations and tips from living survivors of the battle field
Why Should You Care? MongoDB user for almost 2 years Lessons learned, analysis, benefits from process We migrated from MySQL to MongoDB with no downtime We have interesting/challenging data needs, likely relevant to you
More on Wordnik World’s fastest updating English dictionary Based on input of text up to 8k words/second Word Graph as basis to our analysis Synchronous & asynchronous processing 10’s of Billions of documents in NR storage 20M daily REST API calls, billions served Powered by Swagger OSS API framework swagger.wordnik.com Powered API
Architectural History 2008: Wordnik was born as a LAMP AWS EC2 stack 2009: Introduced public REST API, powered wordnik.com, partner APIs 2009: drank NoSQL cool-aid 2010: Scala 2011: Micro SOA
Non-relational by Necessity Moved to NR because of “4S” Speed Stability Scaling Simplicity But… MySQL can go a LONG way Takes right team, right reasons (+ patience) NR offerings simply too compelling to focus on scaling MySQL
Wordnik’s 5 Whys for NoSQL
Why #1: Speed bumps with MySQL Inserting data fast (50k recs/second) caused MySQL mayhem Maintaining indexes largely to blame Operations for consistency unnecessary but "cannot be turned off” Devised twisted schemes to avoid client blocking Aka the “master/slave tango”
Why #2: Retrieval Complexity Objects typically mapped to tables Object Hierarchy always => inner + outer joins Lots of static data, so why join? “Noun”is not getting renamed in my code’s lifetime! Logic like this is probably in application logic Since storage is cheap I’ll choose speed
Why #2: Retrieval Complexity One definition = 10+ joins  50 requests per second!
Why #2: Retrieval Complexity Embed objects in rows “sort of works” Filtering gets really nasty Native XML in MySQL? If a full table-scan is OK… OK, then cache it! Layers of caching introduced layers of complexity Stale data/corruption Object versionitis Cache stampedes
Why #3: Object Modeling Object models being compromised for sake of persistence This is backwards! Extra abstraction for the wrong reason OK, then performance suffers In-application joins across objects “Who ran the fetch all query against production?!” –any sysadmin “My zillionth ORM layer that only I understand” (and can maintain)
Why #4: Scaling Needed "cloud friendly storage" Easy up, easy down! Startup: Sync your data, and announce to clients when ready for business Shutdown: Announce your departure and leave Adding MySQL instances was a dance Snapshot + bin files mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_PASSWORD='xxx', MASTER_LOG_FILE='master-relay.000431', MASTER_LOG_POS=1035435402;
Why #4: Scaling What about those VMs? So convenient!  But… they kind of suck Can the database succeed on a VM? VM Performance: Memory, CPU or I/O—Pick only one Can your database really reduce CPU or disk I/O with lots of RAM?
Why #5: Big Picture BI tools use relational constraints for discovery Is this the right reason for them? Can we work around this? Let’s have a BI tool revolution, too! True service architecture makes relational constraints impractical/impossible Distributed sharding makes relational constraints impractical/impossible
Why #5: Big Picture Is your app smarter than your database? The logic line is probably blurry! What does count(*)really mean when you add 5k records/sec? Maybe eventual consistency is not so bad… 2PC?  Do some reading and decide! http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Ok, I’m in! I thought deciding was easy!? Many quickly maturing products Divergent features tackle different needs Wordnik spent 8 weeks researching and testing NoSQL solutions This is a long time! (for a startup) Wrote ODM classes and migrated our data Surprise!  There were surprises Be prepared to compromise
Choice Made, Now What? We went with MongoDB *** Fastest to implement Most reliable Best community Why? Why #1: Fast loading/retrieval Why #2: Fast ODM (50 tps => 1000 tps!) Why #3: Document Models === Object models Why #4: MMF => Kernel-managed memory + RS Why #5: It’s 2011, is there no progress?
More on Why MongoDB Testing, testing, testing Used our migration tools to load test Read from MySQL, write to MongoDB We loaded 5+ billion documents, many times over In the end, one server could… Insert 100k records/sec sustained Read 250k records/sec sustained Support concurrent loading/reading
Migration & Testing Iterated ODM mapping multiple times Some issues Type Safety cur.next.get("iWasAnIntOnce").asInstanceOf[Long] Dates as Strings obj.put("a_date", "2011-12-31") !=  obj.put("a_date", new Date("2011-12-31")) Storage Size obj.put("very_long_field_name", true) >>  obj.put("vsfn", true)
Migration & Testing Expect data model iterations Wordnik migrated table to Mongo collection "as-is” Easier to migrate, test _id field used same MySQL PK Auto Increment? Used MySQL to “check-out” sequences One row per mongo collection Run out of sequences => get more Need exclusive locks here!
Migration & Testing Sequence generator in-process SequenceGenerator.checkout("doc_metadata,100") Sequence generator as web service Centralized UID management
Migration & Testing Expect data access pattern iterations So much more flexibility! Reach into objects > db.dictionary_entry.find({"hdr.sr":"cmu"}) Access to a whole object tree at query time Overwrite a whole object at once… when desired Not always! This clobbers the whole record > db.foo.save({_id:18727353,foo:"bar"}) Update a single field: > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
Flip the Switch Migrate production with zero downtime We temporarily halted loading data Added a switch to flip between MySQL/MongoDB Instrument, monitor, flip it, analyze, flip back Profiling your code is key What is slow? Build this in your app from day 1
Flip the Switch
Flip the Switch Storage selected at runtime valh = shouldUseMongoDb match { case true => new MongoDbSentenceDAO 	case _ => new MySQLDbSentenceDAO } h.find(...) Hot-swappable storage via configuration It worked!
Then What? Watch our deployment, many iterations to mapping layer Settled on in-house, type-safe mapper  https://github.com/fehguy/mongodb-benchmark-tools Some gotchas (of course) Locking issues on long-running updates (more in a minute) We want more of this! Migrated shared files to Mongo GridFS Easy-IT
Performance + Optimization Loading data is fast! Fixed collection padding, similarly-sized records Tail of collection is always in memory Append faster than MySQL in every case tested But... random access started getting slow Indexes in RAM?  Yes Data in RAM?  No, > 2TB per server Limited by disk I/O /seek performance EC2 + EBS for storage?
Performance + Optimization Moved to physical data center DAS & 72GB RAM => great uncached performance Good move?  Depends on use case If “access anything anytime”, not many options You want to support this?
Performance + Optimization Inserts are fast, how about updates? Well… update => find object, update it, save Lock acquired at “find”, released after “save” If hitting disk, lock time could be large Easy answer, pre-fetch on update Oh, and NEVER do “update all records” against a large collection
Performance + Optimization Indexes Can't always keep index in ram. MMF "does it's thing" Right-balanced b-tree keeps necessary index hot Indexes hit disk => mute your pager 17 15 27
More Mongo, Please! We modeled our word graph in mongo ,[object Object]
80M Edges
80mS edge fetch,[object Object]
What’s next Liberate our models stop worrying about how to store them (for the most part) New features almost always NR Some MySQL left Less on each release

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Ruby performance - The low hanging fruit
Ruby performance - The low hanging fruitRuby performance - The low hanging fruit
Ruby performance - The low hanging fruit
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
 
MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas
MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to AtlasMongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas
MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas
 
Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)Prometheus lightning talk (Devops Dublin March 2015)
Prometheus lightning talk (Devops Dublin March 2015)
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)
 
Introduction to MERN Stack
Introduction to MERN StackIntroduction to MERN Stack
Introduction to MERN Stack
 
RavenDB 4.0
RavenDB 4.0RavenDB 4.0
RavenDB 4.0
 
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
 
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
GlobalsDB: Its significance for Node.js Developers
GlobalsDB: Its significance for Node.js DevelopersGlobalsDB: Its significance for Node.js Developers
GlobalsDB: Its significance for Node.js Developers
 
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...
 
What's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by companyWhat's new in MongoDB 2.6 at India event by company
What's new in MongoDB 2.6 at India event by company
 
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile AppsMongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
MongoDB .local Bengaluru 2019: Realm: The Secret Sauce for Better Mobile Apps
 
Scaling Marketplace to 10,000 Add-Ons - Arun Bhalla
Scaling Marketplace to 10,000 Add-Ons  - Arun BhallaScaling Marketplace to 10,000 Add-Ons  - Arun Bhalla
Scaling Marketplace to 10,000 Add-Ons - Arun Bhalla
 
Mtn view sql server nov 2014
Mtn view sql server nov 2014Mtn view sql server nov 2014
Mtn view sql server nov 2014
 
Internet scaleservice
Internet scaleserviceInternet scaleservice
Internet scaleservice
 
WordPress Speed & Performance from Pagely's CTO
WordPress Speed & Performance from Pagely's CTOWordPress Speed & Performance from Pagely's CTO
WordPress Speed & Performance from Pagely's CTO
 
NodeSummit - MEAN Stack
NodeSummit - MEAN StackNodeSummit - MEAN Stack
NodeSummit - MEAN Stack
 
All the reasons for choosing react js that you didn't know about - Avi Marcus...
All the reasons for choosing react js that you didn't know about - Avi Marcus...All the reasons for choosing react js that you didn't know about - Avi Marcus...
All the reasons for choosing react js that you didn't know about - Avi Marcus...
 
RavenDB embedded at massive scales
RavenDB embedded at massive scalesRavenDB embedded at massive scales
RavenDB embedded at massive scales
 

Destaque (6)

ACP Cup 2013
ACP Cup 2013ACP Cup 2013
ACP Cup 2013
 
Tactical motifs 2
Tactical motifs 2Tactical motifs 2
Tactical motifs 2
 
TEI 4
TEI 4TEI 4
TEI 4
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
Migrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at WordnikMigrating from MySQL to MongoDB at Wordnik
Migrating from MySQL to MongoDB at Wordnik
 
Futureled fish rebel
Futureled fish rebelFutureled fish rebel
Futureled fish rebel
 

Semelhante a Why Wordnik went non-relational

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
MongoSF
 
Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3
epee
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 

Semelhante a Why Wordnik went non-relational (20)

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)From MySQL to MongoDB at Wordnik (Tony Tam)
From MySQL to MongoDB at Wordnik (Tony Tam)
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EUBuilding Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
Building Super Fast Cloud-Native Data Platforms - Yaron Haviv, KubeCon 2017 EU
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Making it fast: Zotonic & Performance
Making it fast: Zotonic & PerformanceMaking it fast: Zotonic & Performance
Making it fast: Zotonic & Performance
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 

Mais de Tony Tam

Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)
Tony Tam
 

Mais de Tony Tam (18)

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with Swagger
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with Swagger
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger Inflector
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + Swagger
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-api
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startups
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at Wordnik
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing Swagger
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 

Último

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Why Wordnik went non-relational

  • 1. NoSQL Now 2011Why Wordnik went Non-Relational Tony Tam @fehguy
  • 2. What this Talk is About 5 Key reasons why Wordnik migrated into a Non-Relational database Process for selection, migration Optimizations and tips from living survivors of the battle field
  • 3. Why Should You Care? MongoDB user for almost 2 years Lessons learned, analysis, benefits from process We migrated from MySQL to MongoDB with no downtime We have interesting/challenging data needs, likely relevant to you
  • 4. More on Wordnik World’s fastest updating English dictionary Based on input of text up to 8k words/second Word Graph as basis to our analysis Synchronous & asynchronous processing 10’s of Billions of documents in NR storage 20M daily REST API calls, billions served Powered by Swagger OSS API framework swagger.wordnik.com Powered API
  • 5. Architectural History 2008: Wordnik was born as a LAMP AWS EC2 stack 2009: Introduced public REST API, powered wordnik.com, partner APIs 2009: drank NoSQL cool-aid 2010: Scala 2011: Micro SOA
  • 6. Non-relational by Necessity Moved to NR because of “4S” Speed Stability Scaling Simplicity But… MySQL can go a LONG way Takes right team, right reasons (+ patience) NR offerings simply too compelling to focus on scaling MySQL
  • 7. Wordnik’s 5 Whys for NoSQL
  • 8. Why #1: Speed bumps with MySQL Inserting data fast (50k recs/second) caused MySQL mayhem Maintaining indexes largely to blame Operations for consistency unnecessary but "cannot be turned off” Devised twisted schemes to avoid client blocking Aka the “master/slave tango”
  • 9. Why #2: Retrieval Complexity Objects typically mapped to tables Object Hierarchy always => inner + outer joins Lots of static data, so why join? “Noun”is not getting renamed in my code’s lifetime! Logic like this is probably in application logic Since storage is cheap I’ll choose speed
  • 10. Why #2: Retrieval Complexity One definition = 10+ joins 50 requests per second!
  • 11. Why #2: Retrieval Complexity Embed objects in rows “sort of works” Filtering gets really nasty Native XML in MySQL? If a full table-scan is OK… OK, then cache it! Layers of caching introduced layers of complexity Stale data/corruption Object versionitis Cache stampedes
  • 12. Why #3: Object Modeling Object models being compromised for sake of persistence This is backwards! Extra abstraction for the wrong reason OK, then performance suffers In-application joins across objects “Who ran the fetch all query against production?!” –any sysadmin “My zillionth ORM layer that only I understand” (and can maintain)
  • 13. Why #4: Scaling Needed "cloud friendly storage" Easy up, easy down! Startup: Sync your data, and announce to clients when ready for business Shutdown: Announce your departure and leave Adding MySQL instances was a dance Snapshot + bin files mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_PASSWORD='xxx', MASTER_LOG_FILE='master-relay.000431', MASTER_LOG_POS=1035435402;
  • 14. Why #4: Scaling What about those VMs? So convenient! But… they kind of suck Can the database succeed on a VM? VM Performance: Memory, CPU or I/O—Pick only one Can your database really reduce CPU or disk I/O with lots of RAM?
  • 15. Why #5: Big Picture BI tools use relational constraints for discovery Is this the right reason for them? Can we work around this? Let’s have a BI tool revolution, too! True service architecture makes relational constraints impractical/impossible Distributed sharding makes relational constraints impractical/impossible
  • 16. Why #5: Big Picture Is your app smarter than your database? The logic line is probably blurry! What does count(*)really mean when you add 5k records/sec? Maybe eventual consistency is not so bad… 2PC? Do some reading and decide! http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
  • 17. Ok, I’m in! I thought deciding was easy!? Many quickly maturing products Divergent features tackle different needs Wordnik spent 8 weeks researching and testing NoSQL solutions This is a long time! (for a startup) Wrote ODM classes and migrated our data Surprise! There were surprises Be prepared to compromise
  • 18. Choice Made, Now What? We went with MongoDB *** Fastest to implement Most reliable Best community Why? Why #1: Fast loading/retrieval Why #2: Fast ODM (50 tps => 1000 tps!) Why #3: Document Models === Object models Why #4: MMF => Kernel-managed memory + RS Why #5: It’s 2011, is there no progress?
  • 19. More on Why MongoDB Testing, testing, testing Used our migration tools to load test Read from MySQL, write to MongoDB We loaded 5+ billion documents, many times over In the end, one server could… Insert 100k records/sec sustained Read 250k records/sec sustained Support concurrent loading/reading
  • 20. Migration & Testing Iterated ODM mapping multiple times Some issues Type Safety cur.next.get("iWasAnIntOnce").asInstanceOf[Long] Dates as Strings obj.put("a_date", "2011-12-31") != obj.put("a_date", new Date("2011-12-31")) Storage Size obj.put("very_long_field_name", true) >> obj.put("vsfn", true)
  • 21. Migration & Testing Expect data model iterations Wordnik migrated table to Mongo collection "as-is” Easier to migrate, test _id field used same MySQL PK Auto Increment? Used MySQL to “check-out” sequences One row per mongo collection Run out of sequences => get more Need exclusive locks here!
  • 22. Migration & Testing Sequence generator in-process SequenceGenerator.checkout("doc_metadata,100") Sequence generator as web service Centralized UID management
  • 23. Migration & Testing Expect data access pattern iterations So much more flexibility! Reach into objects > db.dictionary_entry.find({"hdr.sr":"cmu"}) Access to a whole object tree at query time Overwrite a whole object at once… when desired Not always! This clobbers the whole record > db.foo.save({_id:18727353,foo:"bar"}) Update a single field: > db.foo.update({_id:18727353},{$set:{foo:"bar"}})
  • 24. Flip the Switch Migrate production with zero downtime We temporarily halted loading data Added a switch to flip between MySQL/MongoDB Instrument, monitor, flip it, analyze, flip back Profiling your code is key What is slow? Build this in your app from day 1
  • 26. Flip the Switch Storage selected at runtime valh = shouldUseMongoDb match { case true => new MongoDbSentenceDAO case _ => new MySQLDbSentenceDAO } h.find(...) Hot-swappable storage via configuration It worked!
  • 27. Then What? Watch our deployment, many iterations to mapping layer Settled on in-house, type-safe mapper https://github.com/fehguy/mongodb-benchmark-tools Some gotchas (of course) Locking issues on long-running updates (more in a minute) We want more of this! Migrated shared files to Mongo GridFS Easy-IT
  • 28. Performance + Optimization Loading data is fast! Fixed collection padding, similarly-sized records Tail of collection is always in memory Append faster than MySQL in every case tested But... random access started getting slow Indexes in RAM? Yes Data in RAM? No, > 2TB per server Limited by disk I/O /seek performance EC2 + EBS for storage?
  • 29. Performance + Optimization Moved to physical data center DAS & 72GB RAM => great uncached performance Good move? Depends on use case If “access anything anytime”, not many options You want to support this?
  • 30. Performance + Optimization Inserts are fast, how about updates? Well… update => find object, update it, save Lock acquired at “find”, released after “save” If hitting disk, lock time could be large Easy answer, pre-fetch on update Oh, and NEVER do “update all records” against a large collection
  • 31. Performance + Optimization Indexes Can't always keep index in ram. MMF "does it's thing" Right-balanced b-tree keeps necessary index hot Indexes hit disk => mute your pager 17 15 27
  • 32.
  • 34.
  • 35. What’s next Liberate our models stop worrying about how to store them (for the most part) New features almost always NR Some MySQL left Less on each release
  • 36. Questions? See more about Wordnik APIs http://developer.wordnik.com Migrating from MySQL to MongoDB http://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik Maintaining your MongoDB Installation http://www.slideshare.net/fehguy/mongo-sv-tony-tam Swagger API Framework http://swagger.wordnik.com Mapping Benchmark https://github.com/fehguy/mongodb-benchmark-tools Wordnik OSS Tools https://github.com/wordnik/wordnik-oss

Notas do Editor

  1. Moving to a json-based mapper, 10k/second. Moving to direct mapping, 35k/second